Blog Posts for the ‘Rapid Software Testing’ Category

Exploratory Testing on an API? (Part 4)

Wednesday, November 21st, 2018

As promised, (at last!) here are some follow-up notes on previous installments in the series that starts here.

Let’s revisit the original question:

Do you perform any exploratory testing on APIs? How do you do it?

To review: there’s a problem with the question. Asking about “exploratory testing” is a little like asking about “vegetarian cauliflower”, “carbon-based human beings”, or “metallic copper”. Testing is fundamentally exploratory. Testing is an attempt by a self-guided agent to discover something unknown; to learn something about the product, with a special focus on finding problems and revealing risk.

One way to learn about the product is to develop and run a set of automated checks that access the product via the API. That is, a person writes code the machine to operate and observe the product, apply decision rules, and report the outputs. This produces a script, but it’s not a scripted process. Learning about the product, designing the checks, and developing the code to implement them checks are all exploratory processes.

When we use a machine to perform automated checks, there’s no discovery or learning involved in the performance of the check itself. Checks are like the system on your car that monitors certain aspects of your engine and illuminates the “check engine” light. Checks are, in essence, tools by which people become aware of specific conditions in the product. The machine learns no more than the dashboard light learns. It’s the humans, aided by tools that might include checking, who learn.

People learn as they interpret outcomes of the checks and investigate problems that the checks seem to indicate. The machinery has no way of knowing whether a reported “failed” output represent a problem in the product or a problem in the check. Checks don’t know the difference between “changed” and “broken”, between “different” and “fixed”, between “good” and “bad”. The machinery also has no way of knowing whether a reported “passed” output really means that the product is trouble-free; a problem in the check could be masking a problem in the product.

Testing via an API is exploratory testing via an API, because exploratory testing is, simply, testing. (Exploratory) testing not simply “acting like a user”, “testing without tools”, or “a manual testing technique”.

Throughout this whole series, what I’ve been doing is not “manual testing”, and it’s not “automated testing” either. I use tools to get access to the product via the API, but that’s not automated testing. There is no automated testing. Testing is neither manual nor automated. No one talks about “automated” or “manual” experiments. No one talks about “manual” or “automated” research. Testing is done neither by the hands, nor by the machinery, but by minds.

Tools do play an important role in testing. We use tools to extend, enhance, enable, accelerate, and intensify the testing we do. Tools play an important role in programming too, but no one refers to “manual programming”. No one calls compiling “automated programming”. Compiling is something that a machine can do; it is a translation between on set of strings (source code) and another set of strings (machine code). This is not to dismiss the role of the compiler, or of compiler writers; indeed, writing a sophisticated compiler is a job for an advanced programmer.

Programming starts in the complex, imprecise, social world of humans. Designers and programmers repair messy human communication into a form that’s so orderly that a brainless machine can follow the necessary instructions and perform the intended tasks. Throughout the development process, testers explore and experiment with the product to find problems, to help the development team and the business to decide whether the product they’ve got is the product they want. Tools can help, but none these processes cannot be automated.

In the testing work that I’ve described in the previous posts, I haven’t been “testing like a user”. Who uses APIs? It might be tempting to answer “application programs” (it’s an application programming interface, after all), or “machines”. But the real users of an API are human beings. These include the direct users—the various developers who write code to take advantage of what a product offers—and the indirect users—the people who use the products that programmers develop. For sure, some of my testing has been informed by ideas about actions of users of an API. That’s part of testing like a tester.

In several important ways, there’s a lot of opportunity for testability through APIs. Very generally, components and services with APIs tend to be of a smaller scale than entire applications, so studying and understanding them can be much more tractable. An API is deterministic and machine-specific. That means means that certain kinds of risks due to human variability are of lower concern than they might be through a GUI, where all kinds of things can happen at any time.

The API is by definition a programming interface, so it’s natural to use that interface for automated checking. You can use validator checks to detect problems with the syntax of the output, or parallel algorithms to check the semantics of transactions through an API.

Once they’re written, it’s easy to repeat such checks, especially to detect regressions, but be careful. In Rapid Software Testing, regression testing isn’t simply repetition of checks. To us, regression testing means testing focused on risk related to change; “going backwards” (which is what “regress” means; the opposite of “progress”).

A good regression testing strategy is focused on what has changed, how it has changed, and what might be affected. That would involve understanding what has changed; testing the change itself; exploring around that; and a smattering of testing of stuff that should be unaffected by the change (to reveal hidden or misunderstood risk). This applies whether you are testing via the API or not; whether you have a set of automated checks or not; whether you run checks continuously or not.

If you are using automated checks, remember that they can help to detect unanticipated variations from specified results, but they don’t show that everything works, and they don’t show that nothing has broken. Instead, checks verify that output from given functions are consistent from one build to the next. Do not simply confirm that everything is OK; actively search for problems. Explore around. Are all the checks passing? Ask “What else could go wrong?”

Automated checks can take on special relevance when they’re in the form of contract testschecks. The idea here is to solicit checks from actual consumers of an API that represent specified, desired results, and to check the contract from both the supplier and consumer ends. Nonetheless, remember that such checks are heavily focused on confirmation, and not on discovery of problems and risks that aren’t covered by the contracts.

On the other hand, now that you’ve gone to the trouble of writing code to check for specific outputs, why stop there? I’ve used checks in an exploratory way by:

  • varying the input systematically to look for problems related to missing or malformed data, extreme values, messed-up character handling, and other foreseeable data-related bugs;
  • varying the input more randomly (“fuzzing” is one instance of this technique), to help discover surprising hidden boundaries or potential security vulnerabilities;
  • varying the order and sequences of input, to look for subtle state-related bugs;
  • writing routines to stress the product, pumping lots of transactions and lots of data through it, to find performance-related bugs;
  • capturing data (like particular values) or metadata (like transaction times) associated with the checks, visualizing it, and analyzing it, to see problems and understand risks in new ways.

A while back, Peter Houghton told me an elegant example of using checking in exploration. Given an API to a component, he produces a simple script that calls the same function thousands of times from a loop and benchmarks the time that the process took. Periodically he re-runs the script and compares the timing to the first run. If he sees a significant change in the timing, he investigates. About half the time, he says, he finds a bug.

So, to sum up: all testing is exploratory. Exploration is aided by tools, and automated checking is an approach to using tools. Investigation of the unknown and discover of new knowledge is of the essence of exploration. We must explore to find bugs. All testing on APIs is exploratory.

Exploratory Testing on an API? (Part 2)

Tuesday, July 17th, 2018

Summary:  Loops of exploration, experimentation, studying, modeling, and learning are the essence of testing, not an add-on to it. The intersection of activity and models (such as the Heuristic Test Strategy Model) help us to perform testing while continuously developing, refining, and reviewing it. Testing is much more than writing a bunch of automated checks to confirm that the product can do something; it’s an ongoing investigation in which we continuously develop our understanding of the product.

Last time out, I began the process of providing a deep answer to this question:

Do you perform any exploratory testing on APIs? How do you do it?

That started with reframing the first question

Do you perform any exploratory testing on APIs?

into a different question

Given a product with an API, do you do testing?

The answer was, of course, Yes. This time I’ll turn to addressing the question “How do you do it?” I’ll outline my thought process and the activities that I would perform, and how they feed back on each other.

Note that in Rapid Software Testing, a test is an action performed by a human; neither a specific check nor a scripted test procedure. A test is a burst of exploration and experiments that you perform. As part of that activity, a test include thousands of automated checks within it, or just one, or none at all. Part of the test may be written down, encoded as a specific procedure. Testing might be aided by tools, by documents or other artifacts, or by process models. But the most important part of testing is what testers think and what testers do.

(Note here that when I say “testers” here, I mean any person who is either permanently or temporarily in a testing role. “Tester” applies to a dedicated tester; a solo programmer switching from the building mindset to the tester mindset; or a programmer or DevOps person examining the product in a group without dedicated testers.)

It doesn’t much matter where I start, because neither learning nor testing happen in straight lines. They happen in loops, cycles, epicycles; some long and some short; nested inside each other; like a fractal. Testing and learning entail alternation between focusing and defocusing; some quick flashes of insight, some longer periods of reflection; smooth progress at some times, and frequent stumbling blocks at others. Testing, by nature, is an exploratory process involving conversation, study, experimentation, discovery, investigation that leads to more learning and more testing.

As for anything else I might test, when I’m testing a product through an API, I must develop a strategy. In the Rapid Software Testing namespace, your strategy is the set of ideas that guide the design, development, and selection of your tests.

Having the the Heuristic Test Strategy Model in my head and periodically revisiting it helps me to develop useful ideas about how to cover the product with testing. So as I continue to describe my process, I’ll annotate what I’m describing below with some of the guideword heuristics from the HTSM. The references will look like this.

A word of caution, though:  the HTSM isn’t a template or a script.  As I’m encountering the project and the product, test ideas are coming to me largely because I’ve internalized them through practice, introspection, review, and feedback.  I might use the HTSM generatively, to help ideas grow if I’m having a momentary drought; I might use it retrospectively as a checklist against which I review and evaluate my strategy and coverage ideas; or I might use it as a means of describing testing work and sharing ideas with other people, as I’m doing here.

Testing the RST way starts with evaluating my context. That starts with taking stock of my mission, and that starts with the person giving me my mission. Who is my client—that is, to whom am I directly answerable? What does my client want me to investigate?

I’m helping someone—my client, developers, or other stakeholders—to evaluate the quality of the product. Often when we think about value, we think about value to paying customers and to end users, but there are plenty of people who might get value from the product, or have that value threatened. Quality is value to some person who matters, so whose values do we know might matter? Who might have been overlooked? Project Environment/Mission

Before I do anything else, I’ll need to figure out—at least roughly—how much time I’ll have to accomplish the mission. While I’m at it, I’ll ask other time-related questions about the project: are there any deadlines approaching? How often do builds arrive? How much time should I dedicate to preparing reports or other artifacts? Project Environment/Schedule

Has anyone else tested this product? Who are they? Where are they? Can I talk to them? If not, did they produce results or artifacts that will help me? Am I on a team? What skills do we have? What skills do we need? Project Environment/Test Team

What does my client want to me to provide? A test report, almost certainly, and bug reports, probably—but in what form? Oral conversations or informally written summaries? I’m biased towards keeping things light, so that I can offer rapid feedback to clients and developers. Would the client prefer more formal appoaches, using particular reporting or management tools? As much as the client might like that, I’ll also note whenever I see costs of formalization.

What else might the client, developers, and other stakeholders want to see, now or later on? Input that I’ve generated for testing? Code for automated checks? Statistical test results? Visualizations of those results? Tools that I’ve crafted and documentation for them? A description of my perception of the product? Formal reports for regulators and auditors? Project Environment/Deliverables I’ll continue to review my mission and the desired deliverables throughout the project.

So what is this thing I’m about to test? Project Environment/Test Item Having checked on my mission, I proceed to simple stuff so that I can start the process of learning about the product. I can start with any one of these things, or with two or more of them in parallel.

I talk to the developers, if they’re available. Even better, I participate in design and planning sessions for the product, if I can. My job at such meetings is to learn, to advocate for testability, to bring ideas and ask questions about problems and risks. I ask about testing that the developers have done, and the checking that they’ve set up. Project Environment/Developer Relations

If I’ve been invited to the party late or not at all, I’ll make a note of it. I want to be as helpful as possible, but I also want to keep track of anything that makes my testing harder or slower, so that everyone can learn from that. Maybe I can point out that my testing will be better-informed the earlier and the more easily I can engage with the product, the project, and the team.

I examine the documentation for the API and for the rest of the product. Project Environment/Information I want to develop an understanding of the product: the services it offers, the means of controlling it, and its role in the systems that surround it. I annotate the documentation or take separate notes, so that I can remember and discuss my findings later on. As I do so, I pay special attention to things that seem inconsistent or confusing.

If I’m confused, I don’t worry about being confused. I know that some of my confusion will dissipate as I learn about the product. Some of my confusion might suggest that there are things that I need to learn. Some of my confusion might point to the risk that the users of the product will be confused too. Confusion can be a resource, a motivator, as long as I don’t mind being confused.

As I’m reading the documentation, I ask myself “What simple, ordinary, normal things can I do with the product?” If I have the product available, I’ll do sympathetic testing by trying a few basic requests, using a tool that provides direct interaction with the product through its API. Perhaps it’s a tool developed in-house; perhaps it’s a tool crafted for API testing like Postman or SOAPUI; or maybe I’ll use an interpreter like Ruby’s IRB along with some helpful libraries like HTTParty. Project Environment/Equipment and Tools

I might develop a handful of very simple scripts, or I might retain logs that the tool or the interpreter provides. I’m just as likely to throw this stuff away as I am to keep it. At this stage, my focus is on learning more than on developing formal, reusable checks. I’ll know better how to test and check the product after I’ve tried to test it.

If I find a bug—any kind of inconsistency or misbehaviour that threatens the value of the product—I’ll report it right away, but that’s not all I’ll report. If I have any problems with trying to do sympathetic testing, I’ll report them immediately. They may be usability problems, testability problems, or both at once. At this stage of the project, I’ll bias my choices towards the fastest, least expensive, and least formal reporting I can do.

My primary goal at this point, though, is not to find bugs, but to figure out how people might use the API to get access to the product, how they might get value from it, and how that value might be threatened. I’m developing my models of the product; how it’s intended to work, how to use it, and how to test it. Learning about the product in a comprehensive way prepares me to find better bugs—deeper, subtler, less frequent, more damaging.

To help the learning stick, I aspire to be a good researcher: taking notes; creating diagrams; building lists of features, functions, and risks; making mind maps; annotating existing documentation. Periodically I’ll review these artifacts with programmers, managers, or other colleagues, in order to test my learning.

Irrespective of where I’ve started, I’ll iterate and go deeper, testing the product and refining my models and strategies as I go. We’ll look at that in the next installment.

(At Least) Four Things for Testers To Do in Planning Meetings

Wednesday, October 18th, 2017

There’s much talk these days of DevOps, and Agile development, and “shift left”. Apparently, in these process models, it’s a revelation that testers can do more than test a built product, and that testers can and should be involved at every step of development.

In Rapid Software Testing, that’s not exactly news. From the beginning, we’ve rejected the idea that the product has to be complete, or has to pass some kind of “quality gate” or meet “acceptance criteria” before we start testing. We welcome the opportunity to test anything that anyone is willing to give us. We’ll happily do testing at any time from the moment someone has an idea for a product until long after the product has been released.

When testers are invited to planning meetings, there’s clearly no product to test. So what are we there for?

We’re there to learn. Testing is evaluating a product by learning about it through exploration and experimentation. At the meeting, there is a product to test. Running code is not the only kind of product we can test—not by a long shot. Ideas, designs, documents, drawings, and prototypes are products too. We can explore them, and perform thought experiments on them—and we can learn about them and evaluate them.

At the meeting, we’re there to learn about the product; to learn about the technology; to learn about the contexts in which the product will be used; to learn about plans for building the product. Our role is to become aware of all of the sources of information that might aid in our testing, and in development of the product generally. We’re there to find out about risks that threaten the value of the product in the short and long term, and about problems that might threaten the on-time, successful completion of the product.

We’re there to advocate for testability. Testability might happen by accident, without our help. It’s the role of a responsible tester to make sure that testability happens intentionally, by design. Note that testability is not just about stuff that’s intrinsic to the product. There are factors in the project, in our notions of value, and in our understanding of the risk gap that influence testability. Testability is also subjective with respect to us, our knowledge and skills, and our relationship to the team. So part of our jobs during preparation for development is to ask for the help we’ll need to make ourselves more powerful testers.

We’re there to challenge. Other people are in roles oriented towards building the product. They are focused on synthesis, and envisioning success. If they’re designers, they might be focused on helping the user to accomplish a task, on efficiency, or effectiveness, or on esthetics. If they’re business people, they might be focused on accomplishing some business goal, or meeting a deadline. Developers are often focused more on the details than on the big pictures. All of those people may be anxious to declare and meet a definition of “done”.

The testing role is to think critically about the product and the project; to ask how we might be fooling ourselves. We’re tilted towards asking good questions instead of getting “the right answer”; towards analysis more than synthesis; towards skepticism and suspicion more than optimism; towards anticipating problems more than seeking solutions. We can do those other things, but when we do, we pop for that moment out of a testing role and into a building role.

As testers, we’re trying to notice problems in what people are talking about in the meetings. We’re trying to identify obstacles that might hinder the user’s task; ways in which the product might be ineffective, inefficient, or unappealing. We’re trying to recognize how the business goal might not be met, or how the deadline could be blown. We’re alternating between small details and the big picture. We’re trying to figure out how our definition of done might be inadequate; how we might be fooling ourselves into believing we’re done when we’re not. We’re here to challenge the idea that something is okay when it might not be okay.

We’re there to establish our roles as testers. A role is a heuristic that helps in managing time, focus, and responsibility. The testing role is a commitment to perform valuable and necessary services: to focus on discovering problems, ideally early when they’re small, so that they can be prevented from turning into bigger problems later; to build a product and a project that affords rapid, inexpensive discovery and learning; and to challenge the ideas and artifacts that represent what we think we know about the product and its design. These tasks are socially, psychologically, emotionally, and politically difficult. Unless we handle them gracefully, our questioning, problem-focused role will be seen as merely disruptive, rather than an essential part of the process of building something excellent.

In Rapid Software Testing, we don’t claim that someone must be in the testing role, or must have the job title “tester”. We do believe that having someone responsible for the testing role helps to put focus on the task of providing helpful feedback. This should be a service to the project, not an obstacle. It requires us to maintain close social distance while maintaining a good deal of critical distance.

Of course, the four things that I’ve mentioned here can be done in any development model. They can be done not only in planning meetings, but at any time when we are engaging with others, at any time in the product’s development, at any level of granularity or formality. DevOps and Agile and “shift left” are context. Testing is always testing.

Some related posts:

What Exploratory Testing Is Not (Part 2): After-Everything-Else Testing

Exploratory Testing and Review

Exploratory Testing is All Around You

Testers Don’t Prevent Problems

What Is A Tester?

Testing is…

The Honest Manual Writer Heuristic

Monday, May 30th, 2016

Want a quick idea for a a burst of activity that will reveal both bugs and opportunities for further exploration? Play “Honest Manual Writer”.

Here’s how it works: imagine you’re the world’s most organized, most thorough, and—above all—most honest documentation writer. Your client has assigned you to write a user manual, including both reference and tutorial material, that describes the product or a particular feature of it. The catch is that, unlike other documentation writers, you won’t base your manual on what the product should do, but on what it does do.

You’re also highly skeptical. If other people have helpfully provided you with requirements documents, specifications, process diagrams or the like, you’re grateful for them, but you treat them as rumours to be mistrusted and challenged. Maybe someone has told you some things about the product. You treat those as rumours too. You know that even with the best of intentions, there’s a risk that even the most skillful people will make mistakes from time to time, so the product may not perform exactly as they have intended or declared. If you’ve got use cases in hand, you recognize that they were written by optimists. You know that in real life, there’s a risk that people will inadvertently blunder or actively misuse the product in ways that its designers and builders never imagined. You’ll definitely keep that possibility in mind as you do the research for the manual.

You’re skeptical about your own understanding of the product, too. You realize that when the product appears to be doing something appropriately, it might be fooling you, or it might be doing something inappropriate at the same time. To reduce the risk of being fooled, you model the product and look at it from lots of perspectives (for example, consider its structure, functions, data, interfaces, platform, operations, and its relationship to time; and business risk, and technical risk). You’re also humble enough to realize that you can be fooled in another way: even when you think you see a problem, the product might be working just fine.

Your diligence and your ethics require you to envision multiple kinds of users and to consider their needs and desires for the product (capability, reliability, usability, charisma, security, scalability, performance, installability, supportability…). Your tutorial will be based on plausible stories about how people would use the product in ways that bring value to them.

You aspire to provide a full accounting of how the product works, how it doesn’t work, and how it might not work—warts and all. To do that well, you’ll have to study the product carefully, exploring it and experimenting with it so that your description of it is as complete and as accurate as it can be.

There’s a risk that problems could happen, and if they do, you certainly don’t want either your client or the reader of your manual to be surprised. So you’ll develop a diversified set of ways to recognize problems that might cause loss, harm, annoyance, or diminished value. Armed with those, you’ll try out the product’s functions, using a wide variety of data. You’ll try to stress out the product, doing one thing after another, just like people do in real life. You’ll involve other people and apply lots of tools to assist you as you go.

For the next 90 minutes, your job is to prepare to write this manual (not to write it, but to do the research you would need to write it well) by interacting with the product or feature. To reduce the risk that you’ll lose track of something important, you’ll probably find it a good idea to map out the product, take notes, make sketches, and so forth. At the end of 90 minutes, check in with your client. Present your findings so far and discuss them. If you have reason to believe that there’s still work to be done, identify what it is, and describe it to your client. If you didn’t do as thorough a job as you could have done, report that forthrightly (remember, you’re super-honest). If anything that got in the way of your research or made it more difficult, highlight that; tell your client what you need or recommend. Then have a discussion with your client to agree on what you’ll do next.

Did you notice that I’ve just described testing without using the word “testing”?

On Scripting

Saturday, July 4th, 2015

A script, in the general sense, is something that constrains our actions in some way.

In common talk about testing, there’s one fairly specific and narrow sense of the word “script”—a formal sequence of steps that are intended to specify behaviour on the part of some agent—the tester, a program, or a tool. Let’s call that “formal scripting”. In Rapid Software Testing, we also talk about scripts as something more general, in the same kind of way that some psychologists might talk about “behavioural scripts”: things that direct, constrain, or program our behaviour in some way. Scripts of that nature might be formal or informal, explicit or tacit, and we might follow them consciously or unconsciously. Scripts shape the ways in which people behave, influencing what we might expect people to do in a scenario as the action plays out.

As James Bach says in the comments to our blog post Exploratory Testing 3.0, “By ‘script’ we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.” (my emphasis, there)

When I’m driving to a party out in the country, the list of directions that I got from the host scripts me. Many other things script me too. The starting time of the party—combined with cultural norms that establish whether I should be very prompt or fashionably late—prompts me to leave home at a certain time. The traffic laws and the local driving culture condition my behaviour and my interactions with other people on the road. The marked detour along the route scripts me, as do the weather and the driving conditions. My temperament and my current emotional state script me too. In this more general sense of “scripting”, any activity can become heavily scripted, even if it isn’t written down in a formal way.

Scripts are not universally bad things, of course. They often provide compelling advantages. Scripts can save cognitive effort; the more my behaviour is scripted, the less I have to think, do research, make choices, or get confused. In my driving example, a certain degree of scripting helps me to get where I’m going, to get along with other drivers, and to avoid certain kinds of trouble. Still, if I want to get to the party without harm to myself or other people, I must bring my own agency to the task and stay vigilant, present, and attentive, making conscious and intentional choices. Scripts might influence my choices, and may even help me make better choices, but they should not control me; I must remain in control. Following a script means giving up engagement and responsibility for that part of the action.

From time to time, testing might include formal testing—testing that must be done in a specific way, or to check specific facts. On those occasions, formal scripting—especially the kind of formal script followed by a machine—might be a reasonable approach enabling certain kinds of tasks and managing them successfully. A highly scripted approach could be helpful for rote activities like operating the product following explicitly declared steps and then checking for specific outputs. A highly scripted approach might also enable or extend certain kinds of variation—randomizing data, for example. But there are many other activities in testing: learning about the product, designing a test strategy, interviewing a domain expert, recognizing a new risk, investigating a bug—and dealing with problems in formally scripted activities. In those cases, variability and adaptation are essential, and an overly formal approach is likely to be damaging, time-consuming, or outright impossible. Here’s something else that is almost never formally scripted: the behaviour of normal people using software.

Notice on the one hand that formal testing is, by its nature, highly scripted; most of the time, scripting constrains or even prevents exploration by constraining variation. On the other hand, if you want to make really good decisions about what to test formally, how to test formally, why to test formally, it helps enormously to learn about the product in unscripted and informal ways: conversation, experimentation, investigation… So excellent scripted testing and excellent checking are rooted in exploratory work. They begin with exploratory work and depend on exploratory work. To use language as Harry Collins might, scripted testing is parasitic on exploration.

We say that any testing worthy of the name is fundamentally exploratory. We say that to test a product means to evaluate it by learning about it through experimentation and exploration. To explore a product means to investigate it, to examine it, to create and travel over maps and models of it. Testing includes studying the product, modeling it, questioning it, making inferences about it, operating it, observing it. Testing includes reporting, which itself includes choosing what to report and how to contextualize it. We believe these activities cannot be encoded in explicit procedural scripting in the narrow sense that I mentioned earlier, even though they are all scripted to some degree in the more general sense. Excellent testing—excellent learning—requires us to think and to make choices, which includes thinking about what might be scripting us, and deciding whether to control those scripts or to be controlled by them. We must remain aware of the factors that are scripting us so that we can manage them, taking advantage of them when they help and resisting them when they interfere with our mission.

Exploratory Testing 3.0

Tuesday, March 17th, 2015

This blog post was co-authored by James Bach and me. In the unlikely event that you don’t already read James’ blog, I recommend you go there now.

The summary is that we are beginning the process of deprecating the term “exploratory testing”, and replacing it with, simply, “testing”. We’re happy to receive replies either here or on James’ site.

Oracles Are About Problems, Not Correctness

Thursday, March 12th, 2015

As James Bach and I have have been refining our ideas of testing, we’ve been refining our ideas about oracles. In a recent post, I referred to this passage:

Program testing involves the execution of a program over sample test data followed by analysis of the output. Different kinds of test output can be generated. It may consist of final values of program output variables or of intermediate traces of selected variables. It may also consist of timing information, as in real time systems.

The use of testing requires the existence of an external mechanism which can be used to check test output for correctness. This mechanism is referred to as the test oracle. Test oracles can take on different forms. They can consist of tables, hand calculated values, simulated results, or informal design and requirements descriptions.

—William E. Howden, A Survey of Dynamic Analysis Methods, in Software Validation and Testing Techniques, IEEE Computer Society, 1981

While we have a great deal of respect for the work of testing pioneers like Prof. Howden, there are some problems with this description of testing and its focus on correctness.

  • Correct output from a computer program is not an absolute; an outcome is only correct or incorrect relative to some model, theory, or principle. Trivial example: Even the mathematical rule “one divided by two equals one-half” is a heuristic for dividing things. In most domains, it’s true, but as in George Carlin’s joke, when you cut a crumb in two, you don’t have two half-crumbs; you have two crumbs.
  • A product can produce a result that is functionally correct, and yet still be deeply unsatisfactory to its user. Trivial example: a calculator returns the value “4” from the function “2 + 2″—and displays the result in white on a white background.
  • Conversely, a product can produce an incorrect result and still be quite acceptable. Trivial example: a computer desktop clock’s internal state and second hand drift a few tenths of a second each second, but the program resets itself to be consistent with an atomic clock at the top of every minute. The desktop clock almost never shows the right time precisely, but the human observer doesn’t notice and doesn’t really care. Another trivial example: a product might return a calculation inconsistent with its oracle in the tenth decimal place, when only the first two or three decimal places really matter.
  • The correct outcome of a program or function is not always known in advance. Some development and testing work, like some science, is done in an attempt to discover something new; to establish what a correct answer might look like; to explore a mathematical model; to learn about the limitations of a novel system. In such cases, our ideas of correctness or acceptability are not clear from the outset, and must be developed. (See Collins and Pinch’s The Golem books, which discuss the messiness and confusion of controversial science.) Trivial example: in benchmarking, correctness is not at issue. Comparison between one system and another (or versions of the same system at different times) is the mission of testing here.
  • As we’re developing and testing a product, we may observe things that are unexpected, under-described or completely undescribed. In order to program a machine to make an observation, we must anticipate that observation and encode it. The machine doesn’t imagine, invent, or learn, and a machine cannot produce an unanticipated oracle in response to an observation. By contrast, human observers continually learn and refine their ideas on what to observe. Sometimes we observe a problem without having anticipated it. Sometimes we become aware that we’re making a new observation—one that may or may not represent a problem. Distinct from checking, testing continually affords new things to observe. Testing prompts us to decide when new observations represent problems, and testing informs decisions about what to do about them.
  • An oracle may be in error, or irrelevant. Trivial examples: a program that checks the output of another program may have its own bugs. A reference document may be outdated. A subject matter expert who is usually a reliable source of information may have forgotten something.
  • Oracles might be inconsistent with each other. Even though we have some powerful models for it, temperature measurement in climatology is inherently uncertain. What is the “correct” temperature outdoors? In the sunlight? In the shade? When the thermometer is near a building or farther away? Over grass, or over pavement? Some of the issues are described in this remarkable article (read the comments, too).
  • Although we can demonstrate incorrectness in a program, we cannot prove a program to be correct. As Djikstra put it, testing can only show the presence of errors, not their absence; and to go even deeper, Popper pointed out that theories can only be falsified, and not proven. Trivial example: No matter how many tests we run on that calculator, we can never know that it will always return 4 given the inputs 2 + 2; we can only infer that it will do so through induction, and induction can be deeply problemmatic. In a Nassim Taleb’s example (cribbed from Bertrand Russell and David Hume), every day the turkey uses induction to reinforce his belief in the farmer’s devotion to the desires and interests of turkeys—until a few days before Thanksgiving, when the turkey receives a very sudden, unpleasant, and (alas for the turkey) momentary flash of insight.
  • Sometimes we don’t need to know the correct result to know that the observed result is wrong. Trivial example: the domain of the cosine function ranges from -1 to 1. I don’t need to know the correct value for cos(72) to know that an output of 4.2 is wrong. (Elaine Weyuker discusses this in a paper called “On Testing Nontestable Programs” (Weyuker, Elaine, “On Testing Nontestable Programs”, Department of Computer Science, Courant Institute of Mathematical Sciences, New York University). “Frequently the tester is able to state with assurance that a result is incorrect without actually knowing the correct answer.”)

Checking for correctness—especially when the test output is observed and evaluated mechanically or indirectly—is a risky business. All oracles are fallible. A “passing” test, based on comparison with a fallible oracle cannot prove correctness, and no number of “passing” tests can do that. In this, a test is like a scientific experiment: an experiment’s outcome can falsify one theory while supporting another, but an experiment cannot prove a theory to be true. A million observations of white swans says nothing about the possibility that there might be black swans; a million passing tests, a million observations of correct behaviour cannot eliminate the possibility that there might be swarms of bugs. At best, a passing test is essentially the observation of one more white swan. We urge those who rely on passing acceptance tests to remember this.

A check can suggest the presence of a problem, or can at best provide support for the idea that the program can work. But no matter what oracle we might use, a test cannot prove that a program is working correctly, or that the program will work . So what can oracles actually do for us?

If we invert the focus on correctness, we can produce a more robust heuristic. We can’t logically use an oracle to prove that a system is behaving correctly or that it will behave correctly, but we can use an oracle to help falsify the theory that it is behaving correctly. This is why, in Rapid Software Testing, we say that an oracle is a means by which we recognize a problem when it happens during testing.

Give Us Back Our Testing

Saturday, February 14th, 2015

“Program testing involves the execution of a program over sample test data followed by analysis of the output. Different kinds of test output can be generated. It may consist of final values of program output variables or of intermediate traces of selected variables. It may also consist of timing information, as in real time systems.

“The use of testing requires the existence of an external mechanism which can be used to check test output for correctness. This mechanism is referred to as the test oracle. Test oracles can take on different forms. They can consist of tables, hand calculated values, simulated results, or informal design and requirements descriptions.”

—William E. Howden, A Survey of Dynamic Analysis Methods, in Software Validation and Testing Techniques, IEEE Computer Society, 1981

Once upon a time, computers were used solely for computation. Humans did most of the work that preceded or followed the computation, so the scope of a computer program was limited. In the earliest days, testing a program mostly involved checking to see if the computations were being performed correctly, and that the hardware was working properly before and after the computation.

Over time, designers and programmers became more ambitious and computers became more powerful, enabling more complex and less purely numerical tasks to be encoded and delegated to the machinery. Enormous memory and blinding speed largely replaced the physical work associated with storing, retrieving, revising, and transmitting records. Computers got smaller and became more powerful and protean, used not only by mathematicians but also by scientists, business people, specialists, consumers, and kids.

Software is now used for everything from productivity to communications, control systems, games, audio playback, video displays, thermostats… Yet many of the software development community’s ideas about testing haven’t kept up. In fact, in many ways, they’ve gone backwards.

Ask people in the software business to describe what testing means to them, and many will begin to talk about test cases, and about comparing a program’s output to some predicted or expected result. Yet outside of software development, “testing” has retained its many more expansive meanings.

A teenager tests his parents’ patience. When confronted with a mysterious ailment, doctors perform diagnostic tests (often using very sophisticated tools) with open expectations and results that must be interpreted. Writers in Cook’s Illustrated magazine test techniques for roasting a turkey, and report on the different outcomes that they obtain by varying factors—flavours, colours, moisture, textures, cooking methods, cooking times… The Mythbusters, says Wikipedia, “use elements of the scientific method to test the validity of rumors, myths, movie scenes, adages, Internet videos, and news stories.”

Notice that all of these things called “testing” are focused on exploration, investigation, discovery, and learning. Yet over the last several decades, Howden’s notions of testing as checking for correctness, and of an oracle as a mechanism (or an artifact) became accepted by many people in the development and testing communities at large. Whether people were explicitly aware of those notions, they certainly seem tacitly to have subscribed to the idea that testing should be focused on analysis of the output, displacing those broader and deeper meanings of testing.

That idea might have been more reasonable when computers did nothing but compute. Today, computers and their software are richly intertwined with daily social life and things that we value. Yet for many in software development, “testing” has this narrow, impoverished meaning, limited to what James Bach and I call checking. Checking is a tactic of testing; the part of testing that can be encoded as algorithms and that therefore can be performed entirely by machinery. It is analogous to compiling, the part of programming that can be performed algorithmically.

Oddly, since we started distinguishing between testing and checking, some people have claimed that we’re “redefining” testing. We disagree. We believe that we are recovering testing’s meaning, restoring it to its original, rich, investigative sense. Testing’s meaning was stolen; we’re stealing it back.

The Rapid Software Testing Namespace

Monday, February 2nd, 2015

Just as no one has the right to tell you what language to speak at home, nobody outside of your project has the authority to tell you how to speak inside your project. Every project develops its own namespace, so to speak, and its own formal or informal criteria for naming things inside it.

Rapid Software Testing is, among other things, a project in that sense. For years, James Bach and I have been developing labels for ideas and activities that we talk about in our work and in our classes. While we’re happy to adopt useful ideas and terms from other places, we have the sole authority (for now) to set the vocabulary formally within Rapid Software Testing (RST).

We don’t have the right to impose our vocabulary on anyone else. So what do we do when other people use a word to mean something different from what we mean by the same word?

We invoke “the RST namespace” when we talk about testing and checking, for example, so that we can speak clearly and efficiently about ideas that we bring up in our classes and in the practice of Rapid Software Testing. From time to time, we also try to make it clear why we use words in a specific way.

For example, we make a big deal about testing and checking. We define checking as “the process of making evaluations by applying algorithmic decision rules to specific observations of a product” (and a check is an instance of checking). We define testing as “the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.” (and a test is an instance of testing).

This is in contrast with the ISTQB, which in its Glossary defines “test” as “a set of test cases”—along with “test case” as “a set of input values, execution preconditions, expected results and execution postconditions, developed for a particular objective or test condition, such as to exercise a particular program path or to verify compliance with a specific requirement.”

Interesting, isn’t it: the ISTQB’s definition of test looks a lot like our definition of check. In Rapid Software Testing, we prefer to put learning and experimentation (rather than satisfying requirements and demonstrating fitness for purpose) at the centre of testing. We prefer to think of a test as something that people do as an act of investigation; as a performance, not as an artifact.

Because words convey meaning, we converse (and occasionally argue, and sometimes passionately) the value we see in the words we choose and the ways we think of them. Our goal is to describe things that people haven’t noticed, or to make certain distinctions clear, with the goal of reducing the risk that someone will misunderstand—or miss—something important.

Nonetheless, we freely acknowledge that we have no authority outside of Rapid Software Testing. There’s nothing to stop people from using the words we use in a different way; there are no language police in software development. So we’re also willing to agree to use other people’s labels for things when we’ve had the conversation about what those labels mean, and have come to agreement.

People who tout a “common language” often mean “my common language”, or “my namespace”. They also have the option to certify you as being able to pass a vocabulary test, if anyone thinks that’s important. We don’t.

We think that it’s important for people to notice when words are being used in different ways. We think it’s important for people to become polyglots—and that often means working out which namespace we might be using from one moment to the next.

In our future writing, conversation, classes, and other work, you might wonder what we’re talking about when we refer to “the RST namespace”. This post provides your answer.

Testing is…

Tuesday, October 28th, 2014

Every now and again, someone makes some statement about testing that I find highly questionable or indefensible, whereupon I might ask them what testing means to them. All too often, they’re at a loss to reply because they haven’t really thought deeply about the matter; or because they haven’t internalized what they’ve thought about; or because they’re unwilling to commit to any statement about testing. And then they say something vague or non-committal like “it depends” or “different things to different people” or “that’s a matter of context”, without suggesting relevant dependencies, people, or context factors.

So, for those people, I offer a set of answers from which they can choose one; or they can adopt the entire list wholesale; or they use one or more items as a point of departure for something of their own invention. You don’t have to agree with any of these things; in that case, invent your own ideas about testing from whole cloth. But please: if you claim to be a tester, or if you are making some claim about testing, please prepare yourself and have some answer ready when someone asks you “what is testing?”. Please.

Here are some possible replies; I believe everything is Tweetable, or pretty close.

  • Testing is—among other things—reviewing the product and ideas and descriptions of it, looking for significant and relevant inconsistencies.
  • Testing is—among other things—experimenting with the product to find out how it may be having problems—which is not “breaking the product”, by the way.
  • Testing is—among other things—something that informs quality assurance, but is not in and of itself quality assurance.
  • Testing is—among other things—helping our clients to make empirically informed decisions about the product, project, or business.
  • Testing is—among other things—a process by which we systematically examine any aspect of the product with the goal of preventing surprises.
  • Testing is—among other things—a process of interacting with the product and its systems in many ways that challenge unwarranted optimism.
  • Testing is—among other things—observing and evaluating the product, to see where all those defect prevention ideas might have failed.
  • Testing is—among other things—a special part of the development process focused on discovering what could go badly (or what is going badly).
  • Testing is—among other things—exploring, discovering, investigating, learning, and reporting about the product to reveal new information.
  • Testing is—among other things—gathering information about the product, its users, and conditions of its use, to help defend value.
  • Testing is—among other things—raising questions to help teams to develop products that more quickly and easily reveal their own problems.
  • Testing is—among other things—helping programmers and the team to learn about unanticipated aspects of the product we’re developing.
  • Testing is—among other things—helping our clients to understand the product they’ve got so they can decide if it’s the product they want.
  • Testing is—among other things—using both tools and direct interaction with the product to question and evaluate its behaviours and states.
  • Testing is—among other things—exploring products deeply, imaginatively, and suspiciously, to help find problems that threaten value.
  • Testing is—among other things—performing actual and thought experiments on products and ideas to identify problems and risks.
  • Testing is—among other things—thinking critically and skeptically about products and ideas around them, with the goal of not being fooled.
  • Testing is—among other things—evaluating a product by learning about it through exploration, experimentation, observation and inference.

You’re welcome.