Blog Posts for the ‘Coverage’ Category

Exploratory Testing on an API? (Part 4)

Wednesday, November 21st, 2018

As promised, (at last!) here are some follow-up notes on previous installments in the series that starts here.

Let’s revisit the original question:

Do you perform any exploratory testing on APIs? How do you do it?

To review: there’s a problem with the question. Asking about “exploratory testing” is a little like asking about “vegetarian cauliflower”, “carbon-based human beings”, or “metallic copper”. Testing is fundamentally exploratory. Testing is an attempt by a self-guided agent to discover something unknown; to learn something about the product, with a special focus on finding problems and revealing risk.

One way to learn about the product is to develop and run a set of automated checks that access the product via the API. That is, a person writes code the machine to operate and observe the product, apply decision rules, and report the outputs. This produces a script, but it’s not a scripted process. Learning about the product, designing the checks, and developing the code to implement them checks are all exploratory processes.

When we use a machine to perform automated checks, there’s no discovery or learning involved in the performance of the check itself. Checks are like the system on your car that monitors certain aspects of your engine and illuminates the “check engine” light. Checks are, in essence, tools by which people become aware of specific conditions in the product. The machine learns no more than the dashboard light learns. It’s the humans, aided by tools that might include checking, who learn.

People learn as they interpret outcomes of the checks and investigate problems that the checks seem to indicate. The machinery has no way of knowing whether a reported “failed” output represent a problem in the product or a problem in the check. Checks don’t know the difference between “changed” and “broken”, between “different” and “fixed”, between “good” and “bad”. The machinery also has no way of knowing whether a reported “passed” output really means that the product is trouble-free; a problem in the check could be masking a problem in the product.

Testing via an API is exploratory testing via an API, because exploratory testing is, simply, testing. (Exploratory) testing not simply “acting like a user”, “testing without tools”, or “a manual testing technique”.

Throughout this whole series, what I’ve been doing is not “manual testing”, and it’s not “automated testing” either. I use tools to get access to the product via the API, but that’s not automated testing. There is no automated testing. Testing is neither manual nor automated. No one talks about “automated” or “manual” experiments. No one talks about “manual” or “automated” research. Testing is done neither by the hands, nor by the machinery, but by minds.

Tools do play an important role in testing. We use tools to extend, enhance, enable, accelerate, and intensify the testing we do. Tools play an important role in programming too, but no one refers to “manual programming”. No one calls compiling “automated programming”. Compiling is something that a machine can do; it is a translation between on set of strings (source code) and another set of strings (machine code). This is not to dismiss the role of the compiler, or of compiler writers; indeed, writing a sophisticated compiler is a job for an advanced programmer.

Programming starts in the complex, imprecise, social world of humans. Designers and programmers repair messy human communication into a form that’s so orderly that a brainless machine can follow the necessary instructions and perform the intended tasks. Throughout the development process, testers explore and experiment with the product to find problems, to help the development team and the business to decide whether the product they’ve got is the product they want. Tools can help, but none these processes cannot be automated.

In the testing work that I’ve described in the previous posts, I haven’t been “testing like a user”. Who uses APIs? It might be tempting to answer “application programs” (it’s an application programming interface, after all), or “machines”. But the real users of an API are human beings. These include the direct users—the various developers who write code to take advantage of what a product offers—and the indirect users—the people who use the products that programmers develop. For sure, some of my testing has been informed by ideas about actions of users of an API. That’s part of testing like a tester.

In several important ways, there’s a lot of opportunity for testability through APIs. Very generally, components and services with APIs tend to be of a smaller scale than entire applications, so studying and understanding them can be much more tractable. An API is deterministic and machine-specific. That means means that certain kinds of risks due to human variability are of lower concern than they might be through a GUI, where all kinds of things can happen at any time.

The API is by definition a programming interface, so it’s natural to use that interface for automated checking. You can use validator checks to detect problems with the syntax of the output, or parallel algorithms to check the semantics of transactions through an API.

Once they’re written, it’s easy to repeat such checks, especially to detect regressions, but be careful. In Rapid Software Testing, regression testing isn’t simply repetition of checks. To us, regression testing means testing focused on risk related to change; “going backwards” (which is what “regress” means; the opposite of “progress”).

A good regression testing strategy is focused on what has changed, how it has changed, and what might be affected. That would involve understanding what has changed; testing the change itself; exploring around that; and a smattering of testing of stuff that should be unaffected by the change (to reveal hidden or misunderstood risk). This applies whether you are testing via the API or not; whether you have a set of automated checks or not; whether you run checks continuously or not.

If you are using automated checks, remember that they can help to detect unanticipated variations from specified results, but they don’t show that everything works, and they don’t show that nothing has broken. Instead, checks verify that output from given functions are consistent from one build to the next. Do not simply confirm that everything is OK; actively search for problems. Explore around. Are all the checks passing? Ask “What else could go wrong?”

Automated checks can take on special relevance when they’re in the form of contract testschecks. The idea here is to solicit checks from actual consumers of an API that represent specified, desired results, and to check the contract from both the supplier and consumer ends. Nonetheless, remember that such checks are heavily focused on confirmation, and not on discovery of problems and risks that aren’t covered by the contracts.

On the other hand, now that you’ve gone to the trouble of writing code to check for specific outputs, why stop there? I’ve used checks in an exploratory way by:

  • varying the input systematically to look for problems related to missing or malformed data, extreme values, messed-up character handling, and other foreseeable data-related bugs;
  • varying the input more randomly (“fuzzing” is one instance of this technique), to help discover surprising hidden boundaries or potential security vulnerabilities;
  • varying the order and sequences of input, to look for subtle state-related bugs;
  • writing routines to stress the product, pumping lots of transactions and lots of data through it, to find performance-related bugs;
  • capturing data (like particular values) or metadata (like transaction times) associated with the checks, visualizing it, and analyzing it, to see problems and understand risks in new ways.

A while back, Peter Houghton told me an elegant example of using checking in exploration. Given an API to a component, he produces a simple script that calls the same function thousands of times from a loop and benchmarks the time that the process took. Periodically he re-runs the script and compares the timing to the first run. If he sees a significant change in the timing, he investigates. About half the time, he says, he finds a bug.

So, to sum up: all testing is exploratory. Exploration is aided by tools, and automated checking is an approach to using tools. Investigation of the unknown and discover of new knowledge is of the essence of exploration. We must explore to find bugs. All testing on APIs is exploratory.

Very Short Blog Posts (35): Make Things Visible

Tuesday, April 24th, 2018

I hear a lot from testers who discover problems late in development, and who get grief for bringing them up.

On one level, the complaints are baseless, like holding an investigate journalist responsible for a corrupt government. On the other hand, there’s a way for testers to anticipate bad news and reduce the surprises. Try producing a product coverage outline and a risk list.

A product coverage outline is an artifact (a mind map, or list, or table) that identifies factors, dimensions, or elements of a product that might be relevant to testing it. Those factors might include the product’s structure, the functions it performs, the data it processes, the interfaces it provides, the platforms upon which it depends, the operations that people might perform with it, and the way the product is affected by time. (Look here for more detail.) Sketches or diagrams can help too.

As you learn more through deeper testing, add branches to the map, or create more detailed maps of particular areas. Highlight areas that have been tested so far. Use colour to indicate the testing effort that has been applied to each area—and where coverage is shallower.

A risk list is a list of bad things that might happen: Some person(s) will experience a problem with respect to something desirable that can be detected in some set of conditions because of a vulnerability in the system. Generate ideas on that, rank them, and list them.

At the beginning of the project or early as possible, post your coverage outline and risk list in places where people will see and read them. Update it daily. Invite questions and conversations. This can help you change “why didn’t you find that bug?” to “why didn’t we find that bug?”

Finding the Happy Path

Wednesday, February 28th, 2018

In response to yesterday’s post on The Happy Path colleague and friend Albert Gareev raises an important issue:

Until we sufficiently learned about the users, the product, and the environment, we have no idea what usage pattern is a “happy path” and what would be the “edge cases”.

I agree with Albert. (See more of what he has to say here.) This points to a kind of paradox in testing and development: some say that we can’t test the product unless we know what the requirements are—yet we don’t know what many of the requirements are until we’ve tested! Testing helps to reduce ambiguity, uncertainty, and confusion about the product and about its requirements—and yet we don’t know how to test until we’ve tried to test!

Here’s how I might address Albert’s point:

To test a whole product or system means more than demonstrating that it can work, based on the most common or optimistic patterns of its use. We might start testing the whole system there, but if we wanted to develop a comprehensive understanding of it, we wouldn’t stop at that.

On the other hand, the whole system consists of lots of sub-systems, elements, components, and interactions with other things. Each of those can be seen as a system in itself, and studying those systems contributes to our understanding of the larger system.

We build systems, and we build ideas on how to test them. At each step, considering only the most capable, attentive, well-trained users; preparing only the most common or favourable environments; imagining only the most optimistic scenarios; performing only happy-path testing on each part of the product as we build it; all of these present the risk of misunderstanding not only the product but also the happy paths and edge cases for the greater system. If we want to do excellent testing, all of these things—and our understanding of them—must not only be demonstrated, but must be tested as well. This means we must do more than creating a bunch of high-level, automated, confirmatory checks at the beginning of the sprint, and then declaring victory when they all “pass”.

Quality above depends on quality below; excellent testing above depends on excellent testing below. It’s testing all the way down—and all the way up, too.

Drop the Crutches

Thursday, January 5th, 2017

This post is adapted from a recent blast of tweets. You may find answers to some of your questions in the links; as usual, questions and comments are welcome.

Update, 2017-01-07: In response to a couple of people asking, here’s how I’m thinking of “test case” for the purposes of this post: Test cases are formally structured, specific, proceduralized, explicit, documented, and largely confirmatory test ideas. And, often, excessively so. My concern here is directly proportional to the degree to which a given test case or a given test strategy emphasizes these things.

crutches

I had a fun chat with a client/colleague yesterday. He proposed—and I agreed—that test cases are like crutches. I added that the crutches are regularly foisted on people who weren’t limping to start with. It’s as though before the soccer game begins, we hand all the players a crutch. The crutches then hobble them.

We also agreed that test cases often lead to goal displacement. Instead of a thorough investigation of the product, the goal morphs into “finish the test cases!” Managers are inclined to ask “How’s the testing going?” But they usually don’t mean that. Instead, they almost certainly mean “How’s the product doing?” But, it seems to me, testers often interpret “How’s the testing going?” as “Are you done those test cases?”, which ramps up the goal displacement.

Of course, “How’s the testing going?” is an important part of the three-part testing story, especially if problems in the product or project are preventing us from learning more deeply about the product. But most of the time, that’s probably not the part of story we want to lead with. In my experience, both as a program manager and as a tester, managers want to know one thing above all:

Are there problems that threaten the on-time, successful completion of the project?

The most successful and respected testers—in my experience—are the ones that answer that question by actively investigating the product and telling the story of what they’ve found. The testers that overfocus on test cases distract themselves AND their teams and managers from that investigation, and from the problems investigation would reveal.

For a tester, there’s nothing wrong with checking quickly to see that the product can do something—but there’s not much right—or interesting—about it either. Checking seems to me to be a reasonably good thing to work into your programming practice; checks can be excellent alerts to unwanted low-level changes. But demonstration—showing that the product can work—is different from testing—investigating and experimenting to find out how it does (or doesn’t) work in a variety of circumstances and conditions.

Sometimes people object saying that they have to confirm that the product works and that they have don’t have time to investigate. To me, that’s getting things backwards. If you actively, vigorously look for problems and don’t find them, you’ll get that confirmation you crave, as a happy side effect.

No matter what, you must prepare yourself to realize this:

Nobody can be relied upon to anticipate all of the problems that can beset a non-trivial product.

fractalWe call it “development” for a reason. The product and everything around it, including the requirements and the test strategy, do not arrive fully-formed. We continuously refine what we know about the product, and how to test it, and what the requirements really are, and all of those things feed back into each other. Things are revealed to us as we go, not as a cascade of boxes on a process diagram, but more like a fractal.

The idea that we could know entirely what the requirements are before we’ve discussed and decided we’re done seems like total hubris to me. We humans have a poor track record in understanding and expressing exactly what we want. We’re no better at predicting the future. Deciding today what will make us happy ten months—or even days—from now combines both of those weaknesses and multiplies them.

For that reason, it seems to me that any hard or overly specific “Definition of Done” is antithetical to real agility. Let’s embrace unpredictability, learning, and change, and treat “Definition of Done” as a very unreliable heuristic. Better yet, consider a Definition of Not Done Yet: “we’re probably not done until at least These Things are done”. The “at least” part of DoNDY affords the possibility that we may recognize or discover important requirements along the way. And who knows?—we may at any time decide that we’re okay with dropping something from our DoNDY too. Maybe the only thing we can really depend upon is The Unsettling Rule.

Test cases—almost always prepared in advance of an actual test—are highly vulnerable to a constantly shifting landscape. They get old. And they pile up. There usually isn’t a lot of time to revisit them. But there’s typically little need to revisit many of them either. Many test cases lose relevance as the product changes or as it stabilizes.

Many people seem prone to say “We have to run a bunch of old test cases because we don’t know how changes to the code are affecting our product!” If you have lost your capacity to comprehend the product, why believe that you still comprehend those test cases? Why believe that they’re still relevant?

Therefore: just as you (appropriately) remain skeptical about the product, remain skeptical of your test ideas—especially test cases. Since requirements, products, and test ideas are subject to both gradual and explosive change, don’t overformalize or otherwise constrain your testing to stuff that you’ve already anticipated. You WILL learn as you go.

Instead of overfocusing on test cases and worrying about completing them, focus on risk. Ask “How might some person suffer loss, harm, annoyance, or diminished value?” Then learn about the product, the technologies, and the people around it. Map those things out. Don’t feel obliged to be overly or prematurely specific; recognize that your map won’t perfectly match the territory, and that that’s okay—and it might even be a Good Thing. Seek coverage of risks and interesting conditions. Design your test ideas and prepare to test in ways that allow you to embrace change and adapt to it. Explain what you’ve learned.

Do all that, and you’ll find yourself throwing away the crutches that you never needed anyway. You’ll provide a more valuable service to your client and to your team. You and your testing will remain relevant.

Happy New Year.

Further reading:

Testing By Percentages
Very Short Blog Posts (11): Passing Test Cases
A Test is a Performance
Test Cases Are Not Testing: Toward a Culture of Test Performance” by James Bach & Aaron Hodder (in http://www.testingcircus.com/documents/TestingTrapeze-2014-February.pdf#page=31)

Is There a Simple Coverage Metric?

Tuesday, April 26th, 2016

In response to my recent blog post, 100% Coverage is Possible, reader Hema Khurana asked:

“Also some measure is required otherwise we wouldn’t know about the depth of coverage. Any straight measures available?”

I replied, “I don’t know what you mean by a ‘straight’ measure. Can you explain what you mean by that?”

Hema responded: “I meant a metric some X/Y.”

In all honesty, it’s sometimes hard to remain patient when this question seems to come up at every conference, in every class, week upon week, year upon year. Asking me about this is a little like asking Chris Hadfield—since he’s a well-known astronaut and a pretty smart guy—if he could provide a way of measuring the area of the flat, rectangular earth. But Hema hasn’t asked me before, and we’ve never met, so I don’t want to be immediately dismissive.

My answer, my fast answer, is No. One key problem here is related to what Y could possibly represent. What counts? Maybe we could talk about Y in terms of a number of test cases, and X as how many of those test cases we’ve executed so far. If Y is 600 and X is 540, we could say that testing is 90% done. But that ignores at least two fundamental problems.

The first problem is that, irrespective of the number of test cases we have, we could choose to add more at any time as (via testing) we discover different conditions that we would like to evaluate. Or maybe we could choose to drop test cases when we realize that they’re out of date or irrelevant or erroneous. That is, unless we decide to ignore what we’ve learned, Y will, quite appropriately, change over time.

The second problem is that—at least in my view, and in the view of my colleagues—test cases are a ludicrous way to think about testing.

Another almost-as-quick answer would be to encourage people to re-read that 100% Coverage is Possible post (and the Further Reading links), and to keep re-reading until they get it.

But that’s probably not very encouraging to someone who is asking a naive question, and I’d like to more be helpful than that.

Here’s one thing we could do, if someone were desperate for numbers that summarize coverage: we could make a qualitative evaluation of coverage, and put numbers (or letters, or symbols) on a scale that is nominal and very weakly ordinal.

Our qualitative evaluation would be rooted in analysis of many dimensions of coverage. The Product Elements and Quality Criteria sections of the Heuristic Test Strategy Model provides a framework for generating coverage ideas or for reviewing our coverage retrospectively. We would review and discuss how much testing we’ve done of specific features, or particular functional areas, or perceived risks, and summarize our evaluation using a simple scale that would go something like this:

Level 0 (or X, or an empty circle, or…): We know nothing at all about this area of the product.

Level 1 (or C, or a glassy-eyed emoticon, or…): We have done a very cursory evaluation of this area. Smoke- or sanity-level; we’ve visited this feature and had a brief look at it, but we don’t really know very much about it; we haven’t probed it in any real depth.

Level 2 (or B, or a normal-looking emoticon, or…): We’ve had a reasonable look at this area, although we haven’t gone all the way deep. We’ve examined the common, the core, the critical, the happy paths, the handling of everyday errors or exceptions. We’ve pretty familiar with this area. We’ve done the kind of testing that would expose some significant bugs, if they were there.

Level 3 (or A, or a determined-looking angel emoticon, or…): We’ve really kicked this area harshly and hard. We’ve looked at unusual and complex conditions or states. We’ve probed deeply for subtle or hidden bugs. We’ve exposed the product to the extreme, the exceptional, the rare, the improbable. We’ve looked for bugs that are deep in the corners or hidden in the dark. If there were a serious bug, we’re pretty sure we would have found it by now.

Strictly speaking, these numbers are placed on an ordinal scale, in the sense that Level 3 coverage is deeper than Level 2, which is deeper than Level 1. (If you don’t know about scales of measurement, you should learn about them before providing or asking for metrics. And there are some other things to look at.) The numbers are certainly not an interval scale, or a ratio scale. They may not be commensurate from one feature area to the next; that is, they may represent different notions of coverage, different amounts of effort, different modes of evaluation. By design, these numbers should not be treated as valid measurements, and we should make sure that everyone on the project knows it. They are little labels that summarize evaluations and product elements and effort, factors that must be discussed to be understood. But those discussions can lead to understanding and consensus between ourselves, our colleagues, and our clients.

100% Coverage is Possible

Saturday, April 16th, 2016

In testing, what does “100% coverage” mean? 100% of what, specifically?

Some people might say that “100% coverage” could refer to lines of code, or branches within the code, or the conditions associated with the branches. That’s fine, but saying “100% of the lines (or branches, or conditions) in the program were executed” doesn’t tell us anything about whether those lines were good or bad, useful or useless.

“100% code coverage” doesn’t tell us anything about what the programmers intended, what the user desired, or what the tester observed. It says nothing about the tester’s engagement with the testing; whether the tester was asleep or awake. It ignores the oracles that the tester applied;how the tester recognized—or failed to recognize—bugs and other problems that were encountered during the testing. It suggests that some machinery processed something; nothing more.

Here’s a potentially helpful way to think about this:

“X coverage is how thoroughly we have examined the product with respect to some model of X”.

So: risk coverage is how thoroughly we have examined the product with respect to some model of risk; requirements coverage is how how thoroughly we have examined the product with respect to some model of requirements; code coverage is how thoroughly we have examined the product with respect to some model of code.

To claim 100% coverage is essentially the same as saying “We’ve looked for bugs everywhere!” For a skilled tester, any “100%” claim about coverage should prompt critical thinking: “How much” compared to what? 100% of what, specifically? Some model of X—which one? Whose model? How well does the “model of X” model reality? What does the model of X leave out of the universe of possible ways of thinking about X? And what non-X things should we also be considering when we’re testing?

Here’s just one example: code coverage is usually described in terms of the code that we’ve written, or that we have available to evaluate. Yet every program we write interacts with some platform that might include third-party libraries, browsers, plug-ins, operating systems, file systems, firmware. Our code might interact with our own libraries that we haven’t instrumented this time. So “code coverage” refers to some code in the system, but not all the code in the system.

Once I did a test (or was it 10,000 tests?) wherein I used an automated check to run through all 10,000 possible settings of a particular variable. That was 100% coverage of that variable being used in a particular moment in the execution of the system, on that day.

But it was not 100% of all the possible sequences of those settings, nor 100% of the possible subsequent paths through the product. It wasn’t 100% of the possible variations in pacing, or system load, or times of day when the system could be used. That test wasn’t representative of all of the possible stakeholders who might be using that variable, nor how they might use it.

What would “100% requirements coverage” mean? Would it mean that every statement in the requirements document was covered by a test? If you think so, it might be worthwhile to consider all the models that are in play.

The requirements document is a model of the product’s requirements. It refers to ideas that have been explicitly expressed by some people, but not by all of the people who might have requirements for the product. The requirements document models what those people thought they wanted at a certain point, but not necessarily what they want now.
The requirements document doesn’t account for all of the ideas or ideas that people had that may have been tacit, or implicit, or latent.

You can subject “statement”, “covered”, and “test” to the same kind of treatment. A statement is a model of what someone is thinking at a given point in time; our notion of what “covered” means is governed our models of coverage; our notion of “a test” is conditioned by our models of testing. It’s models all the way down.

Things in testing keep reminding me of a passage from Computer Programming Fundamentals by Herbert Leeds and Jerry Weinberg:

“One of the lessons to be learned … is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.“

Testing is an open investigation. 100% coverage of a particular factor may be possible—but that requires a model so constrained that we leave out practically everything else that might be important.

Test coverage, like quality, is not something that yields very well to quantitative measurements, except when we’re talking of very narrow and specific conditions. But we can discuss coverage, and ask questions about whether it’s what we want, whether we’re happy with it, or whether we want more.

Further reading:

Got You Covered http://developsense.com/articles/2008-09-GotYouCovered.pdf
Cover or Discover http://developsense.com/articles/2008-10-CoverOrDiscover.pdf
A Map by Any Other Name http://developsense.com/articles/2008-11-AMapByAnyOtherName.pdf
What Counts http://www.developsense.com/articles/2007-11-WhatCounts.pdf

Very Short Blog Posts (26): You Don’t Need Acceptance Criteria to Test

Tuesday, February 24th, 2015

You do not need acceptance criteria to test.

Reporters do not need acceptance criteria to investigate and report stories; scientists do not need acceptance criteria to study and learn about things; and you do not need acceptance criteria to explore something, to experiment with it, to learn about it, or to provide a description of it.

You could use explicit acceptance criteria as a focusing heuristic, to help direct your attention toward specific things that matter to your clients; that’s fine. You might choose to use explicit acceptance criteria as claims, oracles that help you to recognize a problem that happens as you test; that’s fine too. But there are many other ways to identify problems; quality criteria may be tacit, not explicit; and you may discover many problems that explicit acceptance criteria don’t cover.

You don’t need acceptance criteria to decide whether something is acceptable or unacceptable. As a tester you don’t have decision-making authority over acceptability anyway. You might use acceptance criteria to inform your testing, and to identify threats to the value of the product. But you don’t need acceptance criteria to test.

How Models Change

Saturday, July 19th, 2014

Like software products, models change as we test them, gain experience with them, find bugs in them, realize that features are missing. We see opportunities for improving them, and revise them.

A product coverage outline, in Rapid Testing parlance, is an artifact (a map, or list, or table…) that identifies the dimensions or elements of a product. It’s a kind of inventory of aspects of the product that could be tested. Many years ago, my colleague and co-author James Bach wrote an article on product elements, identifying Structure, Function, Data, Platform, and Operations (SFDPO; think “San Francisco DePOt”, he suggested) as a set of heuristic guidewords for creating or structuring or reviewing the highest levels of a coverage outline.

A few years later, I was working as a tester. While I was on that assignment, I missed a few test ideas and almost missed a few bugs that I might have noticed earlier had I thought of “Time” as another guideword for modeling the product. After some discussion, I persuaded James that Time was a worthy addition to the Product Elements list. I wrote my own article on that, Time for New Test Ideas).

Over the years, it seemed that people were excited by the idea of using SFDPOT as the starting point for a general coverage outline. Many people reported getting a lot of value out of it, so in my classes, I’ve placed more and more emphasis on using and practicing the application of that part of the Heuristic Test Strategy Model. One of the exercises involves creating a mind map for a real software product. I typically offer that one way to get started on creating a coverage outline is to walk through the user interface and enumerate each element of the UI in the mind map.

(Sometimes people ask, “Why bother? Don’t the specifications or the documentation or the Help file provide maps of the UI? What’s the point of making another one?” One answer is that the journey, rather than the map, is the point. We learn one set of things by reading about a product; we learn different things—and we typically learn more deeply—by touring the product, interacting with it, gaining experience with it, and organizing descriptions of what we’ve found. Moreover, at each moment, we may notice, infer, or wonder about things that the documentation doesn’t address. When we recognize something new, we can add it to our coverage model, our risk list, or our test ideas—plus we might recognize and note some bugs or issues along the way. Another answer is that we should treat anything that any documentation says about a product as a rumour until we’ve engaged with the product.)

One issue kept coming up in class: on the product coverage outline, where should the map of the user interface go? Under Functions (what the product does)? Or Operations (how people use the product)? Or Structure (the bits and pieces of the product)? My answer was that it doesn’t matter much where you put things on your coverage outline, as long as it fits for you and the people with whom you might be sharing the map. The idea is to identify things that could be tested, and not to miss important stuff.

After one class, I was on the phone with James, and I happened to mention that day’s discussion. “I prefer to put the UI under Structure,” I noted.

What? That’s crazy talk! The UI goes under Functions!”

“What?” I replied. “That’s crazy talk. The UI isn’t Functions. Sure, it triggers functions. But it doesn’t perform those functions.”

“So what?” asked James. “If it’s how the user gets at functions, it fits under Functions just fine. What makes you think the UI goes under Structure?”

“Well, the UI has a structure. It’s… structural.”

Everything has a structure,” said James. “The UI goes under Functions.”

And so we argued on. Then one of us—and I honestly don’t remember who—suggested that maybe the UI was important enough to be its own top-level product element. I do remember James pointing out that if when we think of interfaces, plural, there might be several of them—not just the graphical user interface, but maybe a command-line interface. An application programming interface.

“Hmmm…,” I said. This reminded me of the four-user model mentioned in How to Break Software (human user, API user, operating system user, file system user). “Interfaces,” I said. “Operating system interface, file system interface, network interface, printer interface, debugging interface, other devices…”

“Right,” said James. “Plus there are those other interface-y things—importing and exporting stuff, for instance.”

“Aren’t those covered under ‘Functions’?”

“Sure. Or they might be, depending on how you think about it. But the point of this kind of model isn’t to be a template, or a form you fill out. It’s to help us reduce the chances that we might miss something important. Our models are leaky abstractions; overlaps are okay,” said James. Which, of course, was exactly the same argument I had used on him several years earlier when we had added Time to the model. Then he paused. “Ah! But we don’t want to break the mnemonic, do we? San Francisco DePOT.”

“We can deal with that. Just misspell ‘depot’ San Francisco DIPOT. SFDIPOT.”

And so we updated the model.

I wonder what it will look like five years from now.

Counting the Wagons

Monday, December 30th, 2013

A member of Linked In asks if “a test case can have multiple scenarios”. The question and the comments (now unreachable via the original link) reinforce, for me, just how unhelpful the notion of the “test case” is.

Since I was a tiny kid, I’ve watched trains go by—waiting at level crossings, dashing to the window of my Grade Three classroom, or being dragged by my mother’s grandchildren to the balcony of her apartment, perched above a major train line that goes right through the centre of Toronto. I’ve always counted the cars (or wagons, to save us some confusion later on). As a kid, it was fun to see how long the train was (were more than a hundred wagons?!). As a parent, it was a way to get the kids to practice counting while waiting for the train to pass and the crossing gates to lift.

train

Often the wagons are flatbeds, loaded with shipping containers or the trailers from trucks. Others are enclosed, but when I look through the screening, they seem to be carrying other vehicles—automobiles or pickup trucks. Some of the wagons are traditional boxcars. Other wagons are designed to carry liquids or gases, or grain, or gravel. Sometimes I imagine that I could learn something about the economy or the transportation business if I knew what the trains were actually carrying. But in reality, after I’ve counted them, I don’t know anything significant about the contents or their value. I know a number, but I don’t know the story. That’s important when a single car could have explosive implications, as in another memory from my youth.

A test case is like a railway wagon. It’s a container for other things, some of which have important implications and some of which don’t, some of which may be valuable, and some of which may be other containers. Like railway wagons, the contents—the cargo, and not the containers—are the really interesting and important parts. And like railway wagons, you can’t tell much about the contents without more information. Indeed, most of the time, you can’t tell from the outside whether you’re looking at something full, empty, or in between; something valuable or nothing at all; something ordinary and mundane, or something complex, expensive, or explosive. You can surely count the wagons—a kid can do that—but what do you know about the train and what it’s carrying?

To me, a test case is “a question that someone would like to ask (and presumably answer) about a program”. There’s nothing wrong with using “test case” as shorthand for the expression in quotes. We risk trouble, though, when we start to forget some important things.

  • Apparently simple questions may contain or infer multiple, complex, context-dependent questions.
  • Questions may have more outcomes than binary, yes-or-no, pass-or-fail, green-or-red answers. Simple questions can lead to complex answers with complex implications—not just a bit, but a story.
  • Both questions and answers can have multiple interpretations.
  • Different people will value different questions and answers in different ways.
  • For any given question, there may be many different ways to obtain an answer.
  • Answers can have multiple nuances and explanations.
  • Given a set of possible answers, many people will choose to provide a pleasant answer over an unpleasant one, especially when someone is under pressure.
  • The number of questions (or answers) we have tells us nothing about their relevance or value.
  • Most importantly: excellent testing of a product means asking questions that prompt discovery, rather than answering questions that confirm what we believe or hope.

Testing is an investigation in which we learn about the product we’ve got, so that our clients can make decisions about whether it’s the product they want. Other investigative disciplines don’t model things in terms of “cases”. Newspaper reporters don’t frame their questions in terms of “story cases”. Historians don’t write “history cases”. Even the most reductionist scientists talk about experiments, not “experiment cases”.

Why the fascination with modeling testing in terms of test cases? I suspect it’s because people have a hard time describing testing work qualitatively, as the complex cognitive activity that it is. These are often people whose minds are blown when we try to establish a distinction between testing and checking. Treating testing in terms of test cases, piecework, units of production, simplifies things for those who are disinclined to confront the complexity, and who prefer to think of testing as checking at the end of an assembly line, rather than as an ongoing, adaptive investigation. Test cases are easy to count, which in turn makes it easy to express testing work in a quantitative way. But as with trains, fixating on the containers doesn’t tell you anything about what’s in them, or about anything else that might be going on.


As an alternative to thinking in terms of test cases, try thinking in terms of coverage. Here are links to some further reading:

  • Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.
  • Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.
  • A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.
  • What Counts“, an article that I wrote for Better Software magazine, on problems with counting things.
  • Braiding the Stories” and “Delivering the News“, two blog posts on describing testing qualitatively.
  • My colleague James Bach has a presentation on the case against test cases.
  • Apropos of the reference to “scenarios” in the original thread, Cem Kaner has at least two valuable discussions of scenario testing, as tutorial notes and as an article.

Time, Coverage, and Maps

Monday, October 15th, 2012

Over the last few years, people have become increasingly enthusiastic about the idea of mind mapping to help them describe or illustrate or otherwise consider test coverage. For me, Darren McMillan was the one who really got the ball rolling here, here, and here. More recently there have been other examples to present coverage ideas. Colleague Adam Goucher has weighed in here. But there’s another thing you can do, something that James Bach and I have been talking about in the Rapid Software Testing class for a couple of years now: You can use a mind map to help decide about how might allocate your time when you’re dealing with an uncertain situation. You can do this with a functional or structural diagram, too. Let’s try this with an example.

  • There is a given number of hours in a typical week; let’s say 40.
  • There are some testers on the team; let’s say four.
  • Each tester can accomplish a certain amount of uninterrupted testing time in the course of a day. For this exercise, let’s say that it’s three 90-minute sessions per tester per day. That means that each tester could accomplish 15 sessions per week, so our team of four could pull off 60 sessions per week.

Now, most sessions are not entirely productive in terms of test coverage. That is, sessions are not typically dedicated entirely to on-charter test design and execution (that’s called that testing time or T time in session-based test management). T-time is regularly interrupted by other activities. Apart from test-design and execution, whereby we obtain test coverage, there’s usually some setup time (S-time), and there’s almost always some bug investigation and reporting time (B-time). We can’t predict how well any given session is going to go, but over time we can learn to develop a sort of first-order, back-of-the-envelope, finger-in-the-air, heuristic, probably-wrong-but-right-enough kind of guess. We are talking about predicting the future, here. Let’s say that between them, our general experience with this development group is that B-time and S-time tend to cost us about a third of our time as we’re testing.

So, in order to figure out how we’re going to spend our time this week, we can’t say that we’re going to get 60 sessions worth of test coverage. Our effective testing time is more like 40 idealized sessions. Let’s represent those sessions with sticky notes—one perfectly session per note. For the team we’re imagining here, we’d have 40 sticky notes to work with.

Different sessions usually have different themes—tasks, activities, or approaches. As we engage with a brand-new feature, we might perform an “intake” or “survey” or “reconnaissance” session, with the goal of identifying what’s there to be tested. “Analysis” sessions might help us to decide on where certain risks are, what we want to cover, or how we want to cover it. As we get deeper into the testing of a particular feature (“deep coverage” sessions), we might want a given session to be focused on a particular kind of test coverage—straightforward capability testing, data- or domain-focused testing, or testing on a specific platform. Maybe we want to cover a feature of the product while focusing on a particular parafunctional quality criterion, like performance or usability. Perhaps we want to allocate some sessions to design or coding of test oracles. Maybe we could dedicate a session or two to exploring the product based on problem reports from the help desk. If we’d like to highlight specific dimensions of activities or coverage, we can decorate our sticky notes with icons, one or two key words, or a dot——or we can use different colours for the notes, or some combination of these things. In this example, we’ll use little icons to represent classes of activities.

Now get the team together in front of a whiteboard or flip chart to look at your structural diagram, flowchart, or mind map. Place a sticky note (perhaps with a few words of explanatory text) on each node (functional area) or line (interface) on the map you’d like to cover with an idealized session. Keep putting sticky notes on the diagram until you’ve used them up.

By the time you’re out of sticky notes, you will have begun to develop some ideas about what you might or might not be able to accomplish given a week to do it. Are some areas not covered at all? Pick up a sticky note from somewhere else, and move it around. Should certain risky or complex areas receive more attention than others? If so, they might be worthy of more nodes and more than one sticky note. Not enough sticky notes—that is, not enough time, given the people you have available—to cover the whole diagram as well as you’d like? In that case, something has to change, but if all the assumptions above still hold, the catch is that you only have 40 sticky notes to work with and to redeploy.

There’s another catch, too. Diagrams are models. Models are simplifications of reality, and so they leave stuff out. In this kind of exercise, things that don’t appear on your diagrams can be easy to forget. Some essential aspects of test coverage might not fit very well on the diagram, or indeed on any diagram. As you notice missing items or missing ideas, put them on the diagram or on a list in one corner of the space. Keep asking what might be missing from the diagram or the list. Each element on the list is a potential candidate for its own sticky note—or maybe you can cover two or three list items within a single session.

Once the diagram has been covered with general ideas, we can choose to write a more specific and refined charter for each sticky note.

Maybe our sessions won’t be as productive as we thought. If, in the course of testing, we determine that our assumptions aren’t meshing well with reality, we can revisit the diagram and the sticky notes to re-evaluate as soon as we have any information that might threaten the schedule or the anticipated test coverage. We’d typically look at a new diagram, or look at an old diagram in a very different way, every week or two in any case.

This approach could be adapted to mesh very well with the ideas that Paul Holland outlines in this article.

Making things visible provides a point of departure for conversations about strategy, logistics, and timing. It’s important for us to have the skill of telling a story about what there is to test, how we could test it, what we could cover, and what our constraints might be. Some simple visual aids can help us to illustrate that story.