Blog Posts for the ‘Rapid Software Testing’ Category

Exploratory Testing on an API? (Part 2)

Tuesday, July 17th, 2018

Summary:  Loops of exploration, experimentation, studying, modeling, and learning are the essence of testing, not an add-on to it. The intersection of activity and models (such as the Heuristic Test Strategy Model) help us to perform testing while continuously developing, refining, and reviewing it. Testing is much more than writing a bunch of automated checks to confirm that the product can do something; it’s an ongoing investigation in which we continuously develop our understanding of the product.

Last time out, I began the process of providing a deep answer to this question:

Do you perform any exploratory testing on APIs? How do you do it?

That started with reframing the first question

Do you perform any exploratory testing on APIs?

into a different question

Given a product with an API, do you do testing?

The answer was, of course, Yes. This time I’ll turn to addressing the question “How do you do it?” I’ll outline my thought process and the activities that I would perform, and how they feed back on each other.

Note that in Rapid Software Testing, a test is an action performed by a human; neither a specific check nor a scripted test procedure. A test is a burst of exploration and experiments that you perform. As part of that activity, a test include thousands of automated checks within it, or just one, or none at all. Part of the test may be written down, encoded as a specific procedure. Testing might be aided by tools, by documents or other artifacts, or by process models. But the most important part of testing is what testers think and what testers do.

(Note here that when I say “testers” here, I mean any person who is either permanently or temporarily in a testing role. “Tester” applies to a dedicated tester; a solo programmer switching from the building mindset; or a programmer or DevOps person examining the product in a group without dedicated testers.)

It doesn’t much matter where I start, because neither learning nor testing happen in straight lines. They happen in loops, cycles, epicycles; some long and some short; nested inside each other; like a fractal. Testing and learning entail alternation between focusing and defocusing; some quick flashes of insight, some longer periods of reflection; smooth progress at some times, and frequent stumbling blocks at others. Testing, by nature, is an exploratory process involving conversation, study, experimentation, discovery, investigation that leads to more learning and more testing.

As for anything else I might test, when I’m testing a product through an API, I must develop a strategy. In the Rapid Software Testing namespace, your strategy is the set of ideas that guide the design, development, and selection of your tests.

Having the the Heuristic Test Strategy Model in my head and periodically revisiting it helps me to develop useful ideas about how to cover the product with testing. So as I continue to describe my process, I’ll annotate what I’m describing below with some of the guideword heuristics from the HTSM. The references will look like this.

A word of caution, though:  the HTSM isn’t a template or a script.  As I’m encountering the project and the product, test ideas are coming to me largely because I’ve internalized them through practice, introspection, review, and feedback.  I might use the HTSM generatively, to help ideas grow if I’m having a momentary drought; I might use it retrospectively as a checklist against which I review and evaluate my strategy and coverage ideas; or I might use it as a means of describing testing work and sharing ideas with other people, as I’m doing here.

Testing the RST way starts with evaluating my context. That starts with taking stock of my mission, and that starts with the person giving me my mission. Who is my client—that is, to whom am I directly answerable? What does my client want me to investigate?

I’m helping someone—my client, developers, or other stakeholders—to evaluate the quality of the product. Often when we think about value, we think about value to paying customers and to end users, but there are plenty of people who might get value from the product, or have that value threatened. Quality is value to some person who matters, so whose values do we know might matter? Who might have been overlooked? Project Environment/Mission

Before I do anything else, I’ll need to figure out—at least roughly—how much time I’ll have to accomplish the mission. While I’m at it, I’ll ask other time-related questions about the project: are there any deadlines approaching? How often do builds arrive? How much time should I dedicate to preparing reports or other artifacts? Project Environment/Schedule

Has anyone else tested this product? Who are they? Where are they? Can I talk to them? If not, did they produce results or artifacts that will help me? Am I on a team? What skills do we have? What skills do we need? Project Environment/Test Team

What does my client want to me to provide? A test report, almost certainly, and bug reports, probably—but in what form? Oral conversations or informally written summaries? I’m biased towards keeping things light, so that I can offer rapid feedback to clients and developers. Would the client prefer more formal appoaches, using particular reporting or management tools? As much as the client might like that, I’ll also note whenever I see costs of formalization.

What else might the client, developers, and other stakeholders want to see, now or later on? Input that I’ve generated for testing? Code for automated checks? Statistical test results? Visualizations of those results? Tools that I’ve crafted and documentation for them? A description of my perception of the product? Formal reports for regulators and auditors? Project Environment/Deliverables I’ll continue to review my mission and the desired deliverables throughout the project.

So what is this thing I’m about to test? Project Environment/Test Item Having checked on my mission, I proceed to simple stuff so that I can start the process of learning about the product. I can start with any one of these things, or with two or more of them in parallel.

I talk to the developers, if they’re available. Even better, I particpate in design and planning sessions for the product, if I can. My job at such meetings is to learn, to advocate for testability, to bring ideas and ask questions about problems and risks. I ask about testing that the developers have done, and the checking that they’ve set up. Project Environment/Developer Relations

If I’ve been invited to the party late or not at all, I’ll make a note of it. I want to be as helpful as possible, but I also want to keep track of anything that makes my testing harder or slower, so that everyone can learn from that. Maybe I can point out that my testing will be better-informed the earlier and the more easily I can engage with the product, the project, and the team.

I examine the documentation for the API and for the rest of the product. Project Environment/Information I want to develop an understanding of the product: the services it offers, the means of controlling it, and its role in the systems that surround it. I annotate the documentation or take separate notes, so that I can remember and discuss my findings later on. As I do so, I pay special attention to things that seem inconsistent or confusing.

If I’m confused, I don’t worry about being confused. I know that some of my confusion will dissipate as I learn about the product. Some of my confusion might suggest that there are things that I need to learn. Some of my confusion might point to the risk that the users of the product will be confused too. Confusion can be a resource, a motivator, as long as I don’t mind being confused.

As I’m reading the documentation, I ask myself “What simple, ordinary, normal things can I do with the product?” If I have the product available, I’ll do sympathetic testing by trying a few basic requests, using a tool that provides direct interaction with the product through its API. Perhaps it’s a tool developed in-house; perhaps it’s a tool crafted for API testing like Postman or SOAPUI; or maybe I’ll use an interpreter like Ruby’s IRB along with some helpful libraries like HTTParty. Project Environment/Equipment and Tools

I might develop a handful of very simple scripts, or I might retain logs that the tool or the interpreter provides. I’m just as likely to throw this stuff away as I am to keep it. At this stage, my focus is on learning more than on developing formal, reusable checks. I’ll know better how to test and check the product after I’ve tried to test it.

If I find a bug—any kind of inconsistency or misbehaviour that threatens the value of the product—I’ll report it right away, but that’s not all I’ll report. If I have any problems with trying to do sympathetic testing, I’ll report them immediately. They may be usability problems, testability problems, or both at once. At this stage of the project, I’ll bias my choices towards the fastest, least expensive, and least formal reporting I can do.

My primary goal at this point, though, is not to find bugs, but to figure out how people might use the API to get access to the product, how they might get value from it, and how that value might be threatened. I’m developing my models of the product; how it’s intended to work, how to use it, and how to test it. Learning about the product in a comprehensive way prepares me to find better bugs—deeper, subtler, less frequent, more damaging.

To help the learning stick, I aspire to be a good researcher: taking notes; creating diagrams; building lists of features, functions, and risks; making mind maps; annotating existing documentation. Periodically I’ll review these artifacts with programmers, managers, or other colleagues, in order to test my learning.

Irrespective of where I’ve started, I’ll iterate and go deeper, testing the product and refining my models and strategies as I go. We’ll look at that in the next installment.

(At Least) Four Things for Testers To Do in Planning Meetings

Wednesday, October 18th, 2017

There’s much talk these days of DevOps, and Agile development, and “shift left”. Apparently, in these process models, it’s a revelation that testers can do more than test a built product, and that testers can and should be involved at every step of development.

In Rapid Software Testing, that’s not exactly news. From the beginning, we’ve rejected the idea that the product has to be complete, or has to pass some kind of “quality gate” or meet “acceptance criteria” before we start testing. We welcome the opportunity to test anything that anyone is willing to give us. We’ll happily do testing at any time from the moment someone has an idea for a product until long after the product has been released.

When testers are invited to planning meetings, there’s clearly no product to test. So what are we there for?

We’re there to learn. Testing is evaluating a product by learning about it through exploration and experimentation. At the meeting, there is a product to test. Running code is not the only kind of product we can test—not by a long shot. Ideas, designs, documents, drawings, and prototypes are products too. We can explore them, and perform thought experiments on them—and we can learn about them and evaluate them.

At the meeting, we’re there to learn about the product; to learn about the technology; to learn about the contexts in which the product will be used; to learn about plans for building the product. Our role is to become aware of all of the sources of information that might aid in our testing, and in development of the product generally. We’re there to find out about risks that threaten the value of the product in the short and long term, and about problems that might threaten the on-time, successful completion of the product.

We’re there to advocate for testability. Testability might happen by accident, without our help. It’s the role of a responsible tester to make sure that testability happens intentionally, by design. Note that testability is not just about stuff that’s intrinsic to the product. There are factors in the project, in our notions of value, and in our understanding of the risk gap that influence testability. Testability is also subjective with respect to us, our knowledge and skills, and our relationship to the team. So part of our jobs during preparation for development is to ask for the help we’ll need to make ourselves more powerful testers.

We’re there to challenge. Other people are in roles oriented towards building the product. They are focused on synthesis, and envisioning success. If they’re designers, they might be focused on helping the user to accomplish a task, on efficiency, or effectiveness, or on esthetics. If they’re business people, they might be focused on accomplishing some business goal, or meeting a deadline. Developers are often focused more on the details than on the big pictures. All of those people may be anxious to declare and meet a definition of “done”.

The testing role is to think critically about the product and the project; to ask how we might be fooling ourselves. We’re tilted towards asking good questions instead of getting “the right answer”; towards analysis more than synthesis; towards skepticism and suspicion more than optimism; towards anticipating problems more than seeking solutions. We can do those other things, but when we do, we pop for that moment out of a testing role and into a building role.

As testers, we’re trying to notice problems in what people are talking about in the meetings. We’re trying to identify obstacles that might hinder the user’s task; ways in which the product might be ineffective, inefficient, or unappealing. We’re trying to recognize how the business goal might not be met, or how the deadline could be blown. We’re alternating between small details and the big picture. We’re trying to figure out how our definition of done might be inadequate; how we might be fooling ourselves into believing we’re done when we’re not. We’re here to challenge the idea that something is okay when it might not be okay.

We’re there to establish our roles as testers. A role is a heuristic that helps in managing time, focus, and responsibility. The testing role is a commitment to perform valuable and necessary services: to focus on discovering problems, ideally early when they’re small, so that they can be prevented from turning into bigger problems later; to build a product and a project that affords rapid, inexpensive discovery and learning; and to challenge the ideas and artifacts that represent what we think we know about the product and its design. These tasks are socially, psychologically, emotionally, and politically difficult. Unless we handle them gracefully, our questioning, problem-focused role will be seen as merely disruptive, rather than an essential part of the process of building something excellent.

In Rapid Software Testing, we don’t claim that someone must be in the testing role, or must have the job title “tester”. We do believe that having someone responsible for the testing role helps to put focus on the task of providing helpful feedback. This should be a service to the project, not an obstacle. It requires us to maintain close social distance while maintaining a good deal of critical distance.

Of course, the four things that I’ve mentioned here can be done in any development model. They can be done not only in planning meetings, but at any time when we are engaging with others, at any time in the product’s development, at any level of granularity or formality. DevOps and Agile and “shift left” are context. Testing is always testing.

Some related posts:

What Exploratory Testing Is Not (Part 2): After-Everything-Else Testing

Exploratory Testing and Review

Exploratory Testing is All Around You

Testers Don’t Prevent Problems

What Is A Tester?

Testing is…

The Honest Manual Writer Heuristic

Monday, May 30th, 2016

Want a quick idea for a a burst of activity that will reveal both bugs and opportunities for further exploration? Play “Honest Manual Writer”.

Here’s how it works: imagine you’re the world’s most organized, most thorough, and—above all—most honest documentation writer. Your client has assigned you to write a user manual, including both reference and tutorial material, that describes the product or a particular feature of it. The catch is that, unlike other documentation writers, you won’t base your manual on what the product should do, but on what it does do.

You’re also highly skeptical. If other people have helpfully provided you with requirements documents, specifications, process diagrams or the like, you’re grateful for them, but you treat them as rumours to be mistrusted and challenged. Maybe someone has told you some things about the product. You treat those as rumours too. You know that even with the best of intentions, there’s a risk that even the most skillful people will make mistakes from time to time, so the product may not perform exactly as they have intended or declared. If you’ve got use cases in hand, you recognize that they were written by optimists. You know that in real life, there’s a risk that people will inadvertently blunder or actively misuse the product in ways that its designers and builders never imagined. You’ll definitely keep that possibility in mind as you do the research for the manual.

You’re skeptical about your own understanding of the product, too. You realize that when the product appears to be doing something appropriately, it might be fooling you, or it might be doing something inappropriate at the same time. To reduce the risk of being fooled, you model the product and look at it from lots of perspectives (for example, consider its structure, functions, data, interfaces, platform, operations, and its relationship to time; and business risk, and technical risk). You’re also humble enough to realize that you can be fooled in another way: even when you think you see a problem, the product might be working just fine.

Your diligence and your ethics require you to envision multiple kinds of users and to consider their needs and desires for the product (capability, reliability, usability, charisma, security, scalability, performance, installability, supportability…). Your tutorial will be based on plausible stories about how people would use the product in ways that bring value to them.

You aspire to provide a full accounting of how the product works, how it doesn’t work, and how it might not work—warts and all. To do that well, you’ll have to study the product carefully, exploring it and experimenting with it so that your description of it is as complete and as accurate as it can be.

There’s a risk that problems could happen, and if they do, you certainly don’t want either your client or the reader of your manual to be surprised. So you’ll develop a diversified set of ways to recognize problems that might cause loss, harm, annoyance, or diminished value. Armed with those, you’ll try out the product’s functions, using a wide variety of data. You’ll try to stress out the product, doing one thing after another, just like people do in real life. You’ll involve other people and apply lots of tools to assist you as you go.

For the next 90 minutes, your job is to prepare to write this manual (not to write it, but to do the research you would need to write it well) by interacting with the product or feature. To reduce the risk that you’ll lose track of something important, you’ll probably find it a good idea to map out the product, take notes, make sketches, and so forth. At the end of 90 minutes, check in with your client. Present your findings so far and discuss them. If you have reason to believe that there’s still work to be done, identify what it is, and describe it to your client. If you didn’t do as thorough a job as you could have done, report that forthrightly (remember, you’re super-honest). If anything that got in the way of your research or made it more difficult, highlight that; tell your client what you need or recommend. Then have a discussion with your client to agree on what you’ll do next.

Did you notice that I’ve just described testing without using the word “testing”?

On Scripting

Saturday, July 4th, 2015

A script, in the general sense, is something that constrains our actions in some way.

In common talk about testing, there’s one fairly specific and narrow sense of the word “script”—a formal sequence of steps that are intended to specify behaviour on the part of some agent—the tester, a program, or a tool. Let’s call that “formal scripting”. In Rapid Software Testing, we also talk about scripts as something more general, in the same kind of way that some psychologists might talk about “behavioural scripts”: things that direct, constrain, or program our behaviour in some way. Scripts of that nature might be formal or informal, explicit or tacit, and we might follow them consciously or unconsciously. Scripts shape the ways in which people behave, influencing what we might expect people to do in a scenario as the action plays out.

As James Bach says in the comments to our blog post Exploratory Testing 3.0, “By ‘script’ we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.” (my emphasis, there)

When I’m driving to a party out in the country, the list of directions that I got from the host scripts me. Many other things script me too. The starting time of the party—combined with cultural norms that establish whether I should be very prompt or fashionably late—prompts me to leave home at a certain time. The traffic laws and the local driving culture condition my behaviour and my interactions with other people on the road. The marked detour along the route scripts me, as do the weather and the driving conditions. My temperament and my current emotional state script me too. In this more general sense of “scripting”, any activity can become heavily scripted, even if it isn’t written down in a formal way.

Scripts are not universally bad things, of course. They often provide compelling advantages. Scripts can save cognitive effort; the more my behaviour is scripted, the less I have to think, do research, make choices, or get confused. In my driving example, a certain degree of scripting helps me to get where I’m going, to get along with other drivers, and to avoid certain kinds of trouble. Still, if I want to get to the party without harm to myself or other people, I must bring my own agency to the task and stay vigilant, present, and attentive, making conscious and intentional choices. Scripts might influence my choices, and may even help me make better choices, but they should not control me; I must remain in control. Following a script means giving up engagement and responsibility for that part of the action.

From time to time, testing might include formal testing—testing that must be done in a specific way, or to check specific facts. On those occasions, formal scripting—especially the kind of formal script followed by a machine—might be a reasonable approach enabling certain kinds of tasks and managing them successfully. A highly scripted approach could be helpful for rote activities like operating the product following explicitly declared steps and then checking for specific outputs. A highly scripted approach might also enable or extend certain kinds of variation—randomizing data, for example. But there are many other activities in testing: learning about the product, designing a test strategy, interviewing a domain expert, recognizing a new risk, investigating a bug—and dealing with problems in formally scripted activities. In those cases, variability and adaptation are essential, and an overly formal approach is likely to be damaging, time-consuming, or outright impossible. Here’s something else that is almost never formally scripted: the behaviour of normal people using software.

Notice on the one hand that formal testing is, by its nature, highly scripted; most of the time, scripting constrains or even prevents exploration by constraining variation. On the other hand, if you want to make really good decisions about what to test formally, how to test formally, why to test formally, it helps enormously to learn about the product in unscripted and informal ways: conversation, experimentation, investigation… So excellent scripted testing and excellent checking are rooted in exploratory work. They begin with exploratory work and depend on exploratory work. To use language as Harry Collins might, scripted testing is parasitic on exploration.

We say that any testing worthy of the name is fundamentally exploratory. We say that to test a product means to evaluate it by learning about it through experimentation and exploration. To explore a product means to investigate it, to examine it, to create and travel over maps and models of it. Testing includes studying the product, modeling it, questioning it, making inferences about it, operating it, observing it. Testing includes reporting, which itself includes choosing what to report and how to contextualize it. We believe these activities cannot be encoded in explicit procedural scripting in the narrow sense that I mentioned earlier, even though they are all scripted to some degree in the more general sense. Excellent testing—excellent learning—requires us to think and to make choices, which includes thinking about what might be scripting us, and deciding whether to control those scripts or to be controlled by them. We must remain aware of the factors that are scripting us so that we can manage them, taking advantage of them when they help and resisting them when they interfere with our mission.

Exploratory Testing 3.0

Tuesday, March 17th, 2015

This blog post was co-authored by James Bach and me. In the unlikely event that you don’t already read James’ blog, I recommend you go there now.

The summary is that we are beginning the process of deprecating the term “exploratory testing”, and replacing it with, simply, “testing”. We’re happy to receive replies either here or on James’ site.

Oracles Are About Problems, Not Correctness

Thursday, March 12th, 2015

As James Bach and I have have been refining our ideas of testing, we’ve been refining our ideas about oracles. In a recent post, I referred to this passage:

Program testing involves the execution of a program over sample test data followed by analysis of the output. Different kinds of test output can be generated. It may consist of final values of program output variables or of intermediate traces of selected variables. It may also consist of timing information, as in real time systems.

The use of testing requires the existence of an external mechanism which can be used to check test output for correctness. This mechanism is referred to as the test oracle. Test oracles can take on different forms. They can consist of tables, hand calculated values, simulated results, or informal design and requirements descriptions.

—William E. Howden, A Survey of Dynamic Analysis Methods, in Software Validation and Testing Techniques, IEEE Computer Society, 1981

While we have a great deal of respect for the work of testing pioneers like Prof. Howden, there are some problems with this description of testing and its focus on correctness.

  • Correct output from a computer program is not an absolute; an outcome is only correct or incorrect relative to some model, theory, or principle. Trivial example: Even the mathematical rule “one divided by two equals one-half” is a heuristic for dividing things. In most domains, it’s true, but as in George Carlin’s joke, when you cut a crumb in two, you don’t have two half-crumbs; you have two crumbs.
  • A product can produce a result that is functionally correct, and yet still be deeply unsatisfactory to its user. Trivial example: a calculator returns the value “4” from the function “2 + 2″—and displays the result in white on a white background.
  • Conversely, a product can produce an incorrect result and still be quite acceptable. Trivial example: a computer desktop clock’s internal state and second hand drift a few tenths of a second each second, but the program resets itself to be consistent with an atomic clock at the top of every minute. The desktop clock almost never shows the right time precisely, but the human observer doesn’t notice and doesn’t really care. Another trivial example: a product might return a calculation inconsistent with its oracle in the tenth decimal place, when only the first two or three decimal places really matter.
  • The correct outcome of a program or function is not always known in advance. Some development and testing work, like some science, is done in an attempt to discover something new; to establish what a correct answer might look like; to explore a mathematical model; to learn about the limitations of a novel system. In such cases, our ideas of correctness or acceptability are not clear from the outset, and must be developed. (See Collins and Pinch’s The Golem books, which discuss the messiness and confusion of controversial science.) Trivial example: in benchmarking, correctness is not at issue. Comparison between one system and another (or versions of the same system at different times) is the mission of testing here.
  • As we’re developing and testing a product, we may observe things that are unexpected, under-described or completely undescribed. In order to program a machine to make an observation, we must anticipate that observation and encode it. The machine doesn’t imagine, invent, or learn, and a machine cannot produce an unanticipated oracle in response to an observation. By contrast, human observers continually learn and refine their ideas on what to observe. Sometimes we observe a problem without having anticipated it. Sometimes we become aware that we’re making a new observation—one that may or may not represent a problem. Distinct from checking, testing continually affords new things to observe. Testing prompts us to decide when new observations represent problems, and testing informs decisions about what to do about them.
  • An oracle may be in error, or irrelevant. Trivial examples: a program that checks the output of another program may have its own bugs. A reference document may be outdated. A subject matter expert who is usually a reliable source of information may have forgotten something.
  • Oracles might be inconsistent with each other. Even though we have some powerful models for it, temperature measurement in climatology is inherently uncertain. What is the “correct” temperature outdoors? In the sunlight? In the shade? When the thermometer is near a building or farther away? Over grass, or over pavement? Some of the issues are described in this remarkable article (read the comments, too).
  • Although we can demonstrate incorrectness in a program, we cannot prove a program to be correct. As Djikstra put it, testing can only show the presence of errors, not their absence; and to go even deeper, Popper pointed out that theories can only be falsified, and not proven. Trivial example: No matter how many tests we run on that calculator, we can never know that it will always return 4 given the inputs 2 + 2; we can only infer that it will do so through induction, and induction can be deeply problemmatic. In a Nassim Taleb’s example (cribbed from Bertrand Russell and David Hume), every day the turkey uses induction to reinforce his belief in the farmer’s devotion to the desires and interests of turkeys—until a few days before Thanksgiving, when the turkey receives a very sudden, unpleasant, and (alas for the turkey) momentary flash of insight.
  • Sometimes we don’t need to know the correct result to know that the observed result is wrong. Trivial example: the domain of the cosine function ranges from -1 to 1. I don’t need to know the correct value for cos(72) to know that an output of 4.2 is wrong. (Elaine Weyuker discusses this in a paper called “On Testing Nontestable Programs” (Weyuker, Elaine, “On Testing Nontestable Programs”, Department of Computer Science, Courant Institute of Mathematical Sciences, New York University). “Frequently the tester is able to state with assurance that a result is incorrect without actually knowing the correct answer.”)

Checking for correctness—especially when the test output is observed and evaluated mechanically or indirectly—is a risky business. All oracles are fallible. A “passing” test, based on comparison with a fallible oracle cannot prove correctness, and no number of “passing” tests can do that. In this, a test is like a scientific experiment: an experiment’s outcome can falsify one theory while supporting another, but an experiment cannot prove a theory to be true. A million observations of white swans says nothing about the possibility that there might be black swans; a million passing tests, a million observations of correct behaviour cannot eliminate the possibility that there might be swarms of bugs. At best, a passing test is essentially the observation of one more white swan. We urge those who rely on passing acceptance tests to remember this.

A check can suggest the presence of a problem, or can at best provide support for the idea that the program can work. But no matter what oracle we might use, a test cannot prove that a program is working correctly, or that the program will work . So what can oracles actually do for us?

If we invert the focus on correctness, we can produce a more robust heuristic. We can’t logically use an oracle to prove that a system is behaving correctly or that it will behave correctly, but we can use an oracle to help falsify the theory that it is behaving correctly. This is why, in Rapid Software Testing, we say that an oracle is a means by which we recognize a problem when it happens during testing.

Give Us Back Our Testing

Saturday, February 14th, 2015

“Program testing involves the execution of a program over sample test data followed by analysis of the output. Different kinds of test output can be generated. It may consist of final values of program output variables or of intermediate traces of selected variables. It may also consist of timing information, as in real time systems.

“The use of testing requires the existence of an external mechanism which can be used to check test output for correctness. This mechanism is referred to as the test oracle. Test oracles can take on different forms. They can consist of tables, hand calculated values, simulated results, or informal design and requirements descriptions.”

—William E. Howden, A Survey of Dynamic Analysis Methods, in Software Validation and Testing Techniques, IEEE Computer Society, 1981

Once upon a time, computers were used solely for computation. Humans did most of the work that preceded or followed the computation, so the scope of a computer program was limited. In the earliest days, testing a program mostly involved checking to see if the computations were being performed correctly, and that the hardware was working properly before and after the computation.

Over time, designers and programmers became more ambitious and computers became more powerful, enabling more complex and less purely numerical tasks to be encoded and delegated to the machinery. Enormous memory and blinding speed largely replaced the physical work associated with storing, retrieving, revising, and transmitting records. Computers got smaller and became more powerful and protean, used not only by mathematicians but also by scientists, business people, specialists, consumers, and kids.

Software is now used for everything from productivity to communications, control systems, games, audio playback, video displays, thermostats… Yet many of the software development community’s ideas about testing haven’t kept up. In fact, in many ways, they’ve gone backwards.

Ask people in the software business to describe what testing means to them, and many will begin to talk about test cases, and about comparing a program’s output to some predicted or expected result. Yet outside of software development, “testing” has retained its many more expansive meanings.

A teenager tests his parents’ patience. When confronted with a mysterious ailment, doctors perform diagnostic tests (often using very sophisticated tools) with open expectations and results that must be interpreted. Writers in Cook’s Illustrated magazine test techniques for roasting a turkey, and report on the different outcomes that they obtain by varying factors—flavours, colours, moisture, textures, cooking methods, cooking times… The Mythbusters, says Wikipedia, “use elements of the scientific method to test the validity of rumors, myths, movie scenes, adages, Internet videos, and news stories.”

Notice that all of these things called “testing” are focused on exploration, investigation, discovery, and learning. Yet over the last several decades, Howden’s notions of testing as checking for correctness, and of an oracle as a mechanism (or an artifact) became accepted by many people in the development and testing communities at large. Whether people were explicitly aware of those notions, they certainly seem tacitly to have subscribed to the idea that testing should be focused on analysis of the output, displacing those broader and deeper meanings of testing.

That idea might have been more reasonable when computers did nothing but compute. Today, computers and their software are richly intertwined with daily social life and things that we value. Yet for many in software development, “testing” has this narrow, impoverished meaning, limited to what James Bach and I call checking. Checking is a tactic of testing; the part of testing that can be encoded as algorithms and that therefore can be performed entirely by machinery. It is analogous to compiling, the part of programming that can be performed algorithmically.

Oddly, since we started distinguishing between testing and checking, some people have claimed that we’re “redefining” testing. We disagree. We believe that we are recovering testing’s meaning, restoring it to its original, rich, investigative sense. Testing’s meaning was stolen; we’re stealing it back.

The Rapid Software Testing Namespace

Monday, February 2nd, 2015

Just as no one has the right to tell you what language to speak at home, nobody outside of your project has the authority to tell you how to speak inside your project. Every project develops its own namespace, so to speak, and its own formal or informal criteria for naming things inside it.

Rapid Software Testing is, among other things, a project in that sense. For years, James Bach and I have been developing labels for ideas and activities that we talk about in our work and in our classes. While we’re happy to adopt useful ideas and terms from other places, we have the sole authority (for now) to set the vocabulary formally within Rapid Software Testing (RST).

We don’t have the right to impose our vocabulary on anyone else. So what do we do when other people use a word to mean something different from what we mean by the same word?

We invoke “the RST namespace” when we talk about testing and checking, for example, so that we can speak clearly and efficiently about ideas that we bring up in our classes and in the practice of Rapid Software Testing. From time to time, we also try to make it clear why we use words in a specific way.

For example, we make a big deal about testing and checking. We define checking as “the process of making evaluations by applying algorithmic decision rules to specific observations of a product” (and a check is an instance of checking). We define testing as “the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.” (and a test is an instance of testing).

This is in contrast with the ISTQB, which in its Glossary defines “test” as “a set of test cases”—along with “test case” as “a set of input values, execution preconditions, expected results and execution postconditions, developed for a particular objective or test condition, such as to exercise a particular program path or to verify compliance with a specific requirement.”

Interesting, isn’t it: the ISTQB’s definition of test looks a lot like our definition of check. In Rapid Software Testing, we prefer to put learning and experimentation (rather than satisfying requirements and demonstrating fitness for purpose) at the centre of testing. We prefer to think of a test as something that people do as an act of investigation; as a performance, not as an artifact.

Because words convey meaning, we converse (and occasionally argue, and sometimes passionately) the value we see in the words we choose and the ways we think of them. Our goal is to describe things that people haven’t noticed, or to make certain distinctions clear, with the goal of reducing the risk that someone will misunderstand—or miss—something important.

Nonetheless, we freely acknowledge that we have no authority outside of Rapid Software Testing. There’s nothing to stop people from using the words we use in a different way; there are no language police in software development. So we’re also willing to agree to use other people’s labels for things when we’ve had the conversation about what those labels mean, and have come to agreement.

People who tout a “common language” often mean “my common language”, or “my namespace”. They also have the option to certify you as being able to pass a vocabulary test, if anyone thinks that’s important. We don’t.

We think that it’s important for people to notice when words are being used in different ways. We think it’s important for people to become polyglots—and that often means working out which namespace we might be using from one moment to the next.

In our future writing, conversation, classes, and other work, you might wonder what we’re talking about when we refer to “the RST namespace”. This post provides your answer.

Testing is…

Tuesday, October 28th, 2014

Every now and again, someone makes some statement about testing that I find highly questionable or indefensible, whereupon I might ask them what testing means to them. All too often, they’re at a loss to reply because they haven’t really thought deeply about the matter; or because they haven’t internalized what they’ve thought about; or because they’re unwilling to commit to any statement about testing. And then they say something vague or non-committal like “it depends” or “different things to different people” or “that’s a matter of context”, without suggesting relevant dependencies, people, or context factors.

So, for those people, I offer a set of answers from which they can choose one; or they can adopt the entire list wholesale; or they use one or more items as a point of departure for something of their own invention. You don’t have to agree with any of these things; in that case, invent your own ideas about testing from whole cloth. But please: if you claim to be a tester, or if you are making some claim about testing, please prepare yourself and have some answer ready when someone asks you “what is testing?”. Please.

Here are some possible replies; I believe everything is Tweetable, or pretty close.

  • Testing is—among other things—reviewing the product and ideas and descriptions of it, looking for significant and relevant inconsistencies.
  • Testing is—among other things—experimenting with the product to find out how it may be having problems—which is not “breaking the product”, by the way.
  • Testing is—among other things—something that informs quality assurance, but is not in and of itself quality assurance.
  • Testing is—among other things—helping our clients to make empirically informed decisions about the product, project, or business.
  • Testing is—among other things—a process by which we systematically examine any aspect of the product with the goal of preventing surprises.
  • Testing is—among other things—a process of interacting with the product and its systems in many ways that challenge unwarranted optimism.
  • Testing is—among other things—observing and evaluating the product, to see where all those defect prevention ideas might have failed.
  • Testing is—among other things—a special part of the development process focused on discovering what could go badly (or what is going badly).
  • Testing is—among other things—exploring, discovering, investigating, learning, and reporting about the product to reveal new information.
  • Testing is—among other things—gathering information about the product, its users, and conditions of its use, to help defend value.
  • Testing is—among other things—raising questions to help teams to develop products that more quickly and easily reveal their own problems.
  • Testing is—among other things—helping programmers and the team to learn about unanticipated aspects of the product we’re developing.
  • Testing is—among other things—helping our clients to understand the product they’ve got so they can decide if it’s the product they want.
  • Testing is—among other things—using both tools and direct interaction with the product to question and evaluate its behaviours and states.
  • Testing is—among other things—exploring products deeply, imaginatively, and suspiciously, to help find problems that threaten value.
  • Testing is—among other things—performing actual and thought experiments on products and ideas to identify problems and risks.
  • Testing is—among other things—thinking critically and skeptically about products and ideas around them, with the goal of not being fooled.
  • Testing is—among other things—evaluating a product by learning about it through exploration, experimentation, observation and inference.

You’re welcome.

How Models Change

Saturday, July 19th, 2014

Like software products, models change as we test them, gain experience with them, find bugs in them, realize that features are missing. We see opportunities for improving them, and revise them.

A product coverage outline, in Rapid Testing parlance, is an artifact (a map, or list, or table…) that identifies the dimensions or elements of a product. It’s a kind of inventory of aspects of the product that could be tested. Many years ago, my colleague and co-author James Bach wrote an article on product elements, identifying Structure, Function, Data, Platform, and Operations (SFDPO; think “San Francisco DePOt”, he suggested) as a set of heuristic guidewords for creating or structuring or reviewing the highest levels of a coverage outline.

A few years later, I was working as a tester. While I was on that assignment, I missed a few test ideas and almost missed a few bugs that I might have noticed earlier had I thought of “Time” as another guideword for modeling the product. After some discussion, I persuaded James that Time was a worthy addition to the Product Elements list. I wrote my own article on that, Time for New Test Ideas).

Over the years, it seemed that people were excited by the idea of using SFDPOT as the starting point for a general coverage outline. Many people reported getting a lot of value out of it, so in my classes, I’ve placed more and more emphasis on using and practicing the application of that part of the Heuristic Test Strategy Model. One of the exercises involves creating a mind map for a real software product. I typically offer that one way to get started on creating a coverage outline is to walk through the user interface and enumerate each element of the UI in the mind map.

(Sometimes people ask, “Why bother? Don’t the specifications or the documentation or the Help file provide maps of the UI? What’s the point of making another one?” One answer is that the journey, rather than the map, is the point. We learn one set of things by reading about a product; we learn different things—and we typically learn more deeply—by touring the product, interacting with it, gaining experience with it, and organizing descriptions of what we’ve found. Moreover, at each moment, we may notice, infer, or wonder about things that the documentation doesn’t address. When we recognize something new, we can add it to our coverage model, our risk list, or our test ideas—plus we might recognize and note some bugs or issues along the way. Another answer is that we should treat anything that any documentation says about a product as a rumour until we’ve engaged with the product.)

One issue kept coming up in class: on the product coverage outline, where should the map of the user interface go? Under Functions (what the product does)? Or Operations (how people use the product)? Or Structure (the bits and pieces of the product)? My answer was that it doesn’t matter much where you put things on your coverage outline, as long as it fits for you and the people with whom you might be sharing the map. The idea is to identify things that could be tested, and not to miss important stuff.

After one class, I was on the phone with James, and I happened to mention that day’s discussion. “I prefer to put the UI under Structure,” I noted.

What? That’s crazy talk! The UI goes under Functions!”

“What?” I replied. “That’s crazy talk. The UI isn’t Functions. Sure, it triggers functions. But it doesn’t perform those functions.”

“So what?” asked James. “If it’s how the user gets at functions, it fits under Functions just fine. What makes you think the UI goes under Structure?”

“Well, the UI has a structure. It’s… structural.”

Everything has a structure,” said James. “The UI goes under Functions.”

And so we argued on. Then one of us—and I honestly don’t remember who—suggested that maybe the UI was important enough to be its own top-level product element. I do remember James pointing out that if when we think of interfaces, plural, there might be several of them—not just the graphical user interface, but maybe a command-line interface. An application programming interface.

“Hmmm…,” I said. This reminded me of the four-user model mentioned in How to Break Software (human user, API user, operating system user, file system user). “Interfaces,” I said. “Operating system interface, file system interface, network interface, printer interface, debugging interface, other devices…”

“Right,” said James. “Plus there are those other interface-y things—importing and exporting stuff, for instance.”

“Aren’t those covered under ‘Functions’?”

“Sure. Or they might be, depending on how you think about it. But the point of this kind of model isn’t to be a template, or a form you fill out. It’s to help us reduce the chances that we might miss something important. Our models are leaky abstractions; overlaps are okay,” said James. Which, of course, was exactly the same argument I had used on him several years earlier when we had added Time to the model. Then he paused. “Ah! But we don’t want to break the mnemonic, do we? San Francisco DePOT.”

“We can deal with that. Just misspell ‘depot’ San Francisco DIPOT. SFDIPOT.”

And so we updated the model.

I wonder what it will look like five years from now.