Blog Posts from August, 2010

Hire Ben Simo!

Tuesday, August 31st, 2010

I have four or five blog posts in the hopper, each almost ready to go. I’m working on a whole book and a chapter of another one, and I’m on a deadline that I’m about to blow. The kids are still out of school, and I really should be cooking dinner right now. And yet…

As I write, one of the best testers that I know is looking for work. His name is Ben Simo. He lives in Colorado Springs, Colorado (my understanding is that he’s willing to relocate). He’s well-versed in LoadRunner and Performance Center, like many other testers. Unlike (alas!) so many other testers with those bullet points on the résumé, he’s not inclined merely to go through the motions and use tools for checking. He is an astute, passionate critical thinker, entirely focused on investigating and defending the value of the products for which he’s responsible by identifying problems that threaten that value. And yet’s he’s not of the Quality Assurance school; he entirely understands that assuring quality is the responsibility of those who produce and manage the work—programmers, writers, designers, artists, and managers. His job, as he sees it, is to make quality assurance possible. He collaborates with the project community, investigates the product ,and provides the most important, most timely information he can to the people who are producing and managing the work. With that, they can make the decisions they must make, informed by the very best technical information that he can provide.

He’s past President of the Association for Software Testing; he was Conference Chair for the 2009 Conference for the Association for Software Testing; he maintains one blog called Questioning Software and another called Is There A Problem Here?. He was recently given a three-part interview by UTest; you can read that here, and here, and here.

And, as of this writing, he’s available. For someone looking for a tester, he’s like the dream date that you spy across the dance floor whom a friend tells you is single, smart, modest, and well-off. The only issue is that maybe he’s a little too modest. He won’t be on the dance floor for long, because some organization will come along and sweep him off his feet. And that organization will be exceptionally lucky.

And that organization could be yours. His email address is ben at qualityfrog (period) com.

Postscript: Ben’s been hired by a company near Phoenix, AZ. You had your chance!

Statistician or Journalist?

Friday, August 27th, 2010

Eric Jacobson has a problem, which he thoughtfully relates on his thoughtful blog in a post called “How Can I Tell Users What Testers Did?”. In this post, I’ll try to answer his question, so you might want to read his original post for context.

I see something interesting here: Eric tells a clear story to relate to his readers some problem that he’s having with explaining his work to others who, by his account, don’t seem to understand it well. In that story, he mentions some numbers in passing. Yet the numbers that he presents are incidental to the story, not central to it. On the contrary, in fact: when he uses numbers, he’s using them as examples of how poorly numbers tell the kind of story he wants to tell. Yet he tells a fine story, don’t you think?

In the Rapid Software Testing course, we present this idea (Note to Eric: we’ve added this since you took the class): To test is to compose, edit, narrate, and justify two parallel stories. You must tell a story about the product: how it works, how it fails, and how it might not work in ways that matter to your client (and in the context of a retrospective, you might like to talk about how the product was failing and is now working). But in order to give that story its warrant, you must tell another story: you must tell a story about your testing. In a case like Eric’s, that story would take the form of a summary report focused on two things: what you want to convey to your clients, and what they want to know from you (and, ideally, those two things should be in sync with each other).

To do that, you might like to consider various structures to frame your story. Let’s start with the elements of what we (somewhat whimsically) call The Universal Test Procedure (you can find it in the course notes for the class). From a retrospective view, that would include

  • your model of the test space (that is, what was inside and outside the scope of your testing, and in particular the risks that you were trying to address)
  • the oracles that you used
  • the coverage that you obtained
  • the test techniques you applied
  • the ways in which you configured the product
  • the ways in which you operated the product
  • the ways in which you observed the product
  • the ways in which you evaluated the product; and
  • the heuristics by which you decided to stop testing
  • what you discovered and reported, and how you reported

You might also consider the structures of exploratory testing. Even if your testing isn’t highly exploratory, a lot of the structures have parallels in scripted testing.

Jon Bach says (and I agree) that testing is journalism, so look at the way journalists structure a story: they often start with the classic pyramid lead. They might also start with a compelling anecdote as recounted in What’s Your Story, by Craig Wortmann, or Made to Stick, by Chip and Dan Heath. If you’re in the room with your clients, you can use a whiteboard talk with diagrams, as in Dan Roam’s The Back of the Napkin. At the centre of your story, you could talk about risks that you addressed with your testing; problems that you found and that got addressed; problems that you found and that didn’t get addressed; things that slowed you down as you were testing; effort that you spent in each area; coverage that you obtained. You could provide testimonials from the programmers about the most important problems you found; the assistance that you provided to them to help prevent problems; your contributions to design meetings or bug triage sessions; obstacles that you surmounted; a set of charters that you performed, and the feature areas that they covered. Again, focus on what you want to convey to your clients, and what they want to know from you.

Incidentally, the more often and the more coherently you tell your story, the less explaining you’ll have to do about the general stuff. That means keeping as close to your clients as you can, so that they can observe the story unfolding as it happens. But when you ask “What metric or easily understood information can my test team provide users, to show our contribution to the software we release?”, ask yourself this: “Am I a statistician or a journalist?”

Other resources for telling testing stories:

Thread-Based Test Management: Introducing Thread-Based Test Management, by James Bach; and A New Thread, by Jon Bach (as of this writing, this is brand new stuff)

a video.

Constructing the Quality Story (from Better Software, November 2009): Knowledge doesn’t just exist; we build it. Sometimes we disagree on what we’ve got, and sometimes we disagree on how to get it. Hard as it may be to imagine, the experimental approach itself was once controversial. What can we learn from the disputes of the past? How do we manage skepticism and trust and tell the testing story?

On Metrics:

Three Kinds of Measurement (And Two Ways to Use Them) (from Better Software, July 2009): How do we know what’s going on? We measure. Are software development and testing sciences, subject to the same kind of quantitative measurement that we use in physics? If not, what kinds of measurements should we use? How could we think more usefully about measurement to get maximum value with a minimum of fuss? One thing is for sure: we waste time and effort when we try to obtain six-decimal-place answers to whole-number questions. Unquantifiable doesn’t mean unmeasurable. We measure constantly WITHOUT resorting to numbers. Goldilocks did it.

Issues About Metrics About Bugs (Better Software, May 2009): Managers often use metrics to help make decisions about the state of the product or the quality of the work done by the test group. Yet measurements derived from bug counts can be highly misleading because a “bug” isn’t a tangible, countable thing; it’s a label for some aspect of some relationship between some person and some product, and it’s influenced by when and how we count… and by who is doing the counting.

On Coverage:

Got You Covered (from Better Software, October 2008): Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

Cover or Discover (from Better Software, November 2008): Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

A Map By Any Other Name (from Better Software, December 2008): A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

All Testing is (not) Confirmatory

Tuesday, August 24th, 2010

In a recent blog post, Rahul Verma suggests that all testing is confirmatory.

First, I applaud his writing of an exploratory essay. I also welcome and appreciate critique of the testing vs. checking idea. I don’t agree with his conclusions, but maybe in the long run we can work something out.

In mythology, there was a fellow called Procrustes, an ironmonger. He had a iron bed which he claimed fit anyone perfectly. He accomplished a perfect fit by violently lengthening or shortening the guest. I think that, to some degree, Rahul is putting the idea of confirmation into Procrustes’ bed.

He cites the cites the Oxford Online Dictionary definition of confirm: (verb) establish the truth or correctness of (something previously believed or suspected to be the case). (Rahul doesn’t cite the usage notes, which show some different senses of the word.)

When I describe a certain approach to testing as “confirmatory” in my discussion of testing vs. checking, I’m not trying to introduce another term. Instead, I’m using an ordinary English adjective to identify an approach or a mindset to testing. My emphasis is twofold: 1) not on the role of confirmation in test results, but rather on the role of confirmation in test design; and 2) on a key word in the definition Rahul cites, “previously“.

A confirmatory mindset would steer the tester towards designing a test based on a particular and  specific hypothesis. A tester working in a confirmatory way would be oriented towards saying, “Someone or something has told me that the product should do be able to do X. My test will demonstrate that it can do X.” Upon the execution of the (passing) test, the tester would say “See? The product can do X.” Such tests are aimed in the direction of showing that the product can work.

Someone working from an exploratory or investigative mindset would have a different, broader, more open-ended mission. “Someone or something has told me that the product does X. What are the extents and limitations of what we think of as X? What are the implications of doing X? What essential component of X might we have missed in our thinking about previous tests? What else happens when I ask the product to do X? Can I make the product do P, by asking it to do X in a slightly different way? What haven’t I noticed? What could I learn from the test that I’ve just executed?

Upon performing the test, the tester would report on whatever interesting information she might have discovered, which might include a pass or fail component, but might not. Exploratory tests are aimed at learning something about the product, how it can work, how it might work, and how it might not work; or if you like, on “if it will work”, rather than “that it can work”.

To those who would reasonably object: yes, yes, no test ever shows that a product will work in all circumstances. But the focus here is on learning something novel, often focusing on robustness and adaptability. In this mindset, we’re typically seeking to find out how the program deals with whatever we throw at it, rather than on demonstrating that it can hit a pitch in the centre of the strike zone.

I believe that, in his post, Rahul is focused on the evaluation of the test, rather than on test design. That’s different from what I’m on about. He puts confirmation squarely into result interpretation, defining the confirmation step as “a decision (on) whether the test passed or failed or needs further investigation, based on observations made on the system as a result of the interaction. The observations are compared against the assumption(s).”

I don’t think of that as confirmation (“establishing the truth or correctness of something previously believed or suspected to be the case”). I think of that as application of an oracle; as a comparison of the observed behaviour with a principle or mechanism that would allow us to recognize a problem. In the absence of any countervailing reason for it to be otherwise, we expect a product to be consistent with its history; with an image that someone wants to project; with comparable products; with specific claims; with reasonable user expectations; with the explicit or implicit purpose of the product; with itself in any set of observable aspects; and with relevant standards, statutes, regulations, or laws.

(These heuristics, with an example of how they can be applied in an exploratory way, are listed as the HICCUPP heuristics here. It’s now “HICCUPPS”; we recognized the “Standards and Statutes” oracle after the article was written.)(Update, 2015-11-17: For a while now, it’s been FEW HICCUPPS.)

At best, your starting hypothesis determines whether applying an oracle suggests confirmation. If your hypothesis is that the product works—that is, that the product behaves in a manner consistent with the oracle heuristics—then your approach might be described as confirmatory.

Yet the confirmatory mindset has been identified in both general psychological literature and testing literature as highly problematic. Klayman and Ha point out in their 1987 paper Confirmation, Disconfirmation, and Information in Hypothesis Testing that “In rule discovery, the positive test strategy leads to the predominant use of positive hypothesis tests, in other words, a tendency to test cases you think will have the target property.” For software testing, this tendency (a form of confirmation bias) is dangerous because of the influence it has on your selection of tests.

If you want to find problems, it’s important to take a disconfirmatory strategy—one that includes tests of conditions outside the space of the hypothesis that program works. “For example, when dealing with a major communicable disease (or software bugs —MB), it is more serious to allow a true case to go undiagnosed and untreated than it is to mistakenly treat someone.” Here, Klayman and Ha point out, if we want to prevent disease, the emphasis should be on tests that are outside of those that would exemplify a desired attribute (like good health). In the medical case, they say that would involve “examining people who test negative for the disease, to find any missed cases, because they reveal potential false negatives.”

In software testing, the object would be to run tests that challenge the idea that the test should pass. This is consistent with Myers’ analysis in The Art of Software Testing (which, interestingly, as it was written in 1979, predates Klayman and Ha’s paper).

As I see it, if we’re testing the product (rather than, say, demonstrating it), we’re not looking for confirmation of the idea that it works; we’re seeking to disconfirm the idea that it works. Or, as James Bach might put it, we’re in the illusion demolition business.

One other point: Rahul suggests “Testing should be considered complete for a given interaction only when the result of confirmation in terms of pass or fail is available.” To me, that’s checking. A test should reveal information, but it does not have to pass or fail. For example, I might test a competitive product to discover the features that it offers; such tests don’t have a pass or fail component to them.

A tester might be asked to compare a current product with a past version to look for differences between the two. A tester might be asked to use a product and describe her experience with it, such that there’s an evaluation with explicit, atomic pass or fail criteria. “Pass and fail” are highly limiting in terms of our view of the product: I’m sure that the arrival of yet another damned security message on Windows Vista was deemed as a pass in the suite of automated checks that got run on the system every night. But in terms of my happiness with the product, it’s a grinding and repeated failure.

I think Rahul’s notion that a test must pass or fail is confused with the idea that a test should involve the application of a stopping heuristic.  For a check, “pass or fail” is essential, since a check relies on the non-sapient application of a decision rule.  For a test, pass-vs.-fail might an example of the “mission accomplished” stopping heuristic, but there are plenty of other conditions that we might use to trigger the end of a test.

Since Rahul appears to be a performance tester, perhaps he’ll relate to this example (the framing of which I owe to the work of Cem Kaner). Imagine a system that has an explicit requirement to handle 100,000 transactions per minute. We have two performance testing questions that we’d probably like to address.

One is the load testing question: “Can this system in fact handle 100,000 transactions per minute?” To me, that kind of question often gets addressed with a confirmatory mindset. The tester forms a hypothesis that the system does handle 100,000 transactions per minute; he sets up some automation to pump 100,000 transactions per minute through the system; and if the system stays up and exhibits no other problems, he asserts that the test passes.

The other performance question is a stress testing question: “In what circumstances will the system be unable to handle a given load, and fail?” For that we design a different kind of experiment. We have a hypothesis that the system will fail eventually as we ramp up the number of transactions. But we don’t know how many transactions will trigger the failure, nor do we know the part of the system in which the failure will occur, nor do we know way in which the failure will manifest itself.  We want to know those things, so have a different information objective here than for the load test, and we have a mission that can’t be handled by a check.

In the latter test, there is a confirmatory dimension if you’re willing to look hard enough for it. We “confirm” our hypothesis that, given heavy enough stress, the system will exhibit some problem. When we apply an oracle that exposes a failure like a crash, maybe one could say that we “confirm” that the the crash is a problem, or that behaviour we consider to be bad is bad. Even in the former test, we could flip the hypothesis, and suggest that we’re seeking to confirm the hypothesis that the program doesn’t support a load of 100,000 transactions per minute . If Rahul wants to do that, he’s welcome to do so. To me, though, labelling all that stuff as “confirmatory” testing reminds me of Procrustes.

Questions from Listeners (2a): Handling Regression Testing

Saturday, August 7th, 2010

This is a followup to an earlier post, Questions from Listeners (2): Is Unit Testing Automated? The original question was

Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

Now I’ll deal with the second part of the question.

Part One: What Do We Really Mean By “Automation”?

Some people believe that “automation” means “getting the computer to do the testing”. Yet computers don’t do testing any more than compilers do programming, than cruise control does driving, than blenders do cooking. In Rapid Software Testing, James Bach and I teach that test automation is any use of tools to support testing.

When we’re perform tests on a running program, there’s always a computer involved, so automation is always around to some degree. We can use tools to help us configure the program, to help us observe some aspect of the program as it’s running, to generate data, to supply input to the program, to monitor outputs, to parse log files, to provide an oracle against which outputs can be compared, to aggregate and visualize results, to reconfigure the program or the system,… In that sense, all tests can be automated.

Some people believe that tests can be automated. I disagree. Checks can be automated. Checks are a part of an overall program of testing, and can aid it, but testing itself can’t be automated. Testing requires human judgement to determine what will be observed and how it will be observed; testing requires requires human judgement to ascribe meaning to the test result. Human judgement is needed to ascribe significance to the meaning(s) that we ascribe, and human judgement is required to formulate a response to the information we’ve revealed with the test. Is there a problem with the product under test? The test itself? The logical relationship between the test and the product? Is the test relevant or not? Machines can’t answer those questions. In that sense, no test can be automated.

Automation is a medium. That is, it’s an extension of some human capability, not a replacement for it. If we test well, automation can extend that. If we’re testing badly, then automation can help us to test badly at an accelerated rate.

Part Two: Why Re-run Every Test?

My car is 25 years old. Aside from some soon-to-be addressed rust and threadbare upholstery, it’s in very good shape. Why? One big reason is that my mechanic and I are constantly testing it and fixing important problems.

When I’m about to set out on a long journey in my car, I take the it in to Geoffrey, my mechanic. He performs a bunch of tests on the car. Some of those tests are forms of review: he checks his memory and looks over the service log to see which tests he should run. He addresses anything that I’ve identified as being problemmatic or suspicious. Some of Geoffrey’s tests are system-level tests, performed by direct observation: he listens to the engine in the shop and takes the car out for a spin on city streets and on the highway. Some of his tests are functional tests: he applies the brakes to check to see if they lose pressure. Some of his tests are unit tests, assisted by automation: he uses a machine to balance the tires and a gauge to pressure-test the cooling system. Some of his smoke tests are refined by tools: a look at the tires is refined by a pressure gauge; when he sees wear on the tires, he uses a gauge to measure the depth of the tread. Some of his tests are heavily assisted by automation: he has a computer that hooks up to a port on the car, and the computer runs checks that give him gobs of data that would be difficult or impossible for him to obtain otherwise.

When I set out on a medium-length trip, I don’t take the car in, but I still test for certain things. I walk around the car, checking the brake lights and turn signals. I look underneath for evidence of fluid leaks. I fill the car with gas, and while I’m at the gas station, I lift the hood and check the oil and the windshield wiper fluid. For still shorter trips, I do less. I get in, turn on the ignition, and look at the fuel gauge and the rest of the dashboard. I listen to the sound of the engine. I sniff the air for weird smells–gasoline, coolant, burning rubber.

As I’m driving, I’m making observations all the time. Some of those observations happen below the level of my consciousness, only coming to my attention when I’m surprised by something out of the ordinary, like a bad smell or the strange sound. On the road, I’m looking out the window, glancing at the dashboard, listening to the engine, feeling the feedback from the pedals and the steering wheel. If I identify something as a problem, I might ignore it until my next scheduled visit to the mechanic, I might leave it for a little while but still take it in earlier than usual, or I might take the car in right away.

When Geoffrey has done some work, he tells me what he has done, so to some degree I know what he’s tested. I also know that he might have forgotten something in the fix, and that he might not have tested completely, so after the car has been in the shop, I need to be more alert to potential problems, especially those closely related to the fix.

Notice two things: 1) Both Geoffrey and I are testing all the time. 2) Neither Geoffrey nor I repeat all of the tests that we’ve done on every trip, nor on every visit.

When I’m driving, I know that the problems I’m going to encounter as I drive are not restricted to problems with my car. Some problems might have to do with others—pedestrians or animals stepping out in front of me, or other drivers making turns in my path, encroaching on my lane, tailgating. So I must remain attentive, aware of what other people are doing around me. Some problems might have to do with me. I might behave impatiently or incompetently. So it’s important for me to keep track of my mental state, managing my attention and my intention. Some problems have to do with context. I might have to deal with bad weather or driving conditions. On a bright sunny day, I’ll be more concerned about the dangers of reflected glare than about wet roads. If I’ve just filled the tank, I don’t have to think about fuel for another couple hundred miles at least. Because conditions around me change all the time, I might repeat certain patterns of observation and control actions, but I’m not going to repeat every test I’ve ever performed.

Yes, I recognize that software is different. If software were a car, programmers would constantly be adding new parts to the vehicle and refining the parts that are there. On a car, we don’t add new parts very often. More typically, old parts wear out and get replaced. As such, change is happening. After change, we concentrate our observation and testing on things that are most likely to be affected by the change, and on things that are most important. In software, we do exactly the same thing. But in software, we can take an extra step to reduce risk: low-level, automated unit tests that provide change detection and rapid feedback, and which are the first level of defense against accidental breakage. I wrote about that here.

Part Three: Think About Cost, Value, Risk, and Coverage

Testing involves interplay between cost, value, and risk. The risk is generally associated with the unknown—problems that you’re not aware of, and the unknown consequences of those problems. The value is in the information you obtain from performing the test, and in the capacity to make better-informed decisions. There are lots of costs associated with tests. Automation reduces many of those costs (like execution time) and increases others (like development and maintenance time). Every testing activity, irrespective of the level of automation, introduces opportunity costs against potentially more valuable activities. A heavy focus on running tests that we’ve run before—and which have not been finding problems—represents opportunity cost against tests that we’ve never run and that won’t be found by our repeated tests. A focus on the care and feeding of repeated tests diminishes our responsiveness to new risk. A focus on repetition limits our test coverage.

Some people object to the idea of relaxing attention on regression tests, because their regression tests find so many problems. Oddly, these people are often the same people who trot out the old bromide that bugs that are found earlier are less expensive to fix. To those people, I would say this: If your regression tests consistently find problems, you’ll probably want to fix most of them. But there’s another, far more important problem that you’ll want to fix: someone has created an environment that’s favourable to backsliding.

Acceptance Tests: Let’s Change the Title, Too

Wednesday, August 4th, 2010

Gojko Adzic recently wrote a blog post called Let’s Change the Tune on some of our approaches in agile development. In changing the tune, some of the current words won’t fit so well, so he proposes (for example), “Specifying Collaboratively instead of test first or writing acceptance tests”. I have one more: I think we should change the label “acceptance tests”.

Acceptance tests are given a central role in agile development. They are typically used to express a requirement in terms of an atomic example, and they’re typically automated. That is, they’re expressed in the form of program code for a binary computer, code that helps us to determine whether some aspect of the product is functionally correct. When those tests pass, say certain proponents of the lore, we know we’re done.

Yet acceptability of a product is multi-dimensional. In the end, the product is always being used by people to solve some problem. The code may perform certain functions exquisitely as part of product that is an incomplete solution to the problem, that is hard to use, or that we hate.

The expression of requirements and the determination of acceptability in terms of simplistic, binary decisions delegated to a computer seems to me like a bias towards processes and tools, rather than individuals and interactions. That is: it’s inconsistent with the very first point in the Manifesto for Agile Software Development.

Typically, acceptance tests are set very close to the beginning of a cycle of development. Yet during that development cycle, we tend to learn a significant amount about the scope of the problem to be solved, about technology, about risk, about trade-offs.

If the acceptance tests remain static, the learning isn’t reflected in the acceptance tests. That seems to me like a bias towards following a plan, rather than responding to change—another inconsistency with the Agile Manifesto.

Acceptance tests are examples of how the product should behave. Those tests are typically performed in very constrained, staged, artificial environments that are shadows of the envionments in which the product will be used.

Acceptance tests—examples to be checked—are not really tests, in the sense of testing the mettle of the product, subjecting it to the challenges and stresses of real-world use. Yet acceptance tests are often treated more as authoritative, definitive, specifications for the product, instead of representative examples. That sounds to me like a bias towards comprehensive documentation, rather than working software—yet again inconsistent with the Agile Manifesto.

Acceptance tests are often discussed as though they determined the completion of development. While the acceptance tests aren’t passing, we know we’re not done; when the acceptance tests pass, we’re done and, implicitly, the customer is obliged to accept the product as it is. That sounds to me like a bias towards negotiated contracts, rather the customer collaboration. That’s inconsistent with the fourth point in the four-point Agile Manifesto.

The idea that we’re done when the acceptance tests pass is a myth. As a tester, I can assure you that a suite of passing acceptance tests doesn’t mean that the product is acceptable to the customer, nor does it mean that the customer should accept it. It means that the product is ready for serious exploration, discovery, investigation, and learning—that is, for testing—so that we can find problems that we didn’t anticipate with those tests but that would nonetheless destroy value in the product.

When the acceptance tests pass, the product might be acceptable. When the acceptance tests fail, we know for sure that the product isn’t acceptable. Thus I’d argue that instead of talking about acceptance tests, we should be talking about them as rejection tests.

Post-script: Yes, I’d call them rejection checks. But you don’t have to.

When Should the Product Owner Release the Product?

Tuesday, August 3rd, 2010

In response to my previous blog post “Another Silly Quantitative Model”, Greg writes: In my current project, the product owner has assumed the risk of any financial losses stemming from bugs in our software. He wants to release the product to customers, but he is of course nervous. How do you propose he should best go about deciding when to release? How should he reason about the risks, short of using a quantitative model?

The simple answer is “when he’s not so nervous that he doesn’t want to ship”. What might cause him to decide to stop shipment? He should stop shipment when there are known problems in the product that aren’t balanced by countervailing benefits. Such problems are called showstoppers. A colleague once described “showstopper” as “any problem in the product that would make more sense to fix than to ship.”

When I was a product owner and I reasoned with the project team about showstoppers, we deemed as a showstopper

  • Any single problem in the product that would definitely cause loss or harm (or sufficient annoyance or frustration) to users, such that the product’s value in its essential operation would be nil. Beware of quantifying “users” here. In the age of the Internet, you don’t need very many people with terrible problems to make noise disproportionate to the population, nor do you need those problems to be terrible problems when they affect enough people. The recent kerfuffle over the iPhone 4 is a case in point; the Pentium Bug is another. Customer reactions are often emotional more than rational, but to the customer, emotional reactions are every bit as real as rational ones.
  • Any set of problems in the product that, when taken individually, would not threaten its value, but when viewed collectively would. That could include a bunch of minor irritants that confuse, annoy, disturb, or slow down people using the product; embarrassing cosmetic defects; non-devastating functional problems; parafunctional issues like poor performance or compatibility, and the like.

Now, in truth, your product owner might need to resort to a quantitative model here: he has to be able to count to one. One showstopper, by definition, is enough to stop shipment.

How might you evaluate potential showstoppers qualitatively? My colleague Fiona Charles has two nice suggestions: “Could a problem that we know about in this product trigger a front-page story in the Globe and Mail‘s Report on Business, or in the Wall Street Journal?” “Could a problem that we know about in this product lead to a question being raised in Parliament?” Now: the fact is that we don’t, and can’t, know the answer to whether the problem will have that result, but that’s not really the point of the questions. The points are to explore and test the ways that we might feel about the product, the problems, and their consequences.

What else might cause nervousness for your client? Perhaps he’s worried that, other than the known problems, there are unanswered questions about the product. Those include

  • Open questions whose answer would produce one or more instances of a showstopper.
  • Unasked questions that, when asked, would turn into open questions instead of “I don’t care”. Where would you get ideas for such questions? Try the Heuristic Test Strategy Model at for an example of the kinds of questions that you might ask.
  • Unanswered questions about the product are one indicator that you might not be finished testing. There are other indicators; you can read about them here:

    Questions about how much we have (or haven’t) tested are questions about test coverage. I wrote three columns about that a while back. Here are some links and synopses:

    Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

    Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

    A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

    Whether you’ve established a clear feeling or are mired in uncertainty, you might want to test your first-order qualitative evaluation with a first-order quantitative model. For example, many years ago at Quarterdeck, we had a problem that threatened shipment: on bootup, our product would lock system that had a particular kind of hard disk controller. There was a workaround, which would take a trained technical support person about 15 minutes to walk through. No one felt good about releasing the product in that state, but we were under quarterly schedule pressure.

    We didn’t have good data to work with, but we did have a very short list of beta testers and data about their systems. Out of 60 beta testers, three had machines with this particular controller. There was no indication that our beta testers were representative of our overall user population, but 5% of our beta testers had this controller. We then performed the thought experiment of destroying the productivity of 5% of our user base, or tech support having to spend 15 minutes with 5% of our user base (in those days, in the millions).

    How big was our margin of error? What if we were off by a factor of two, and ten per cent of our user base had that controller? What if we were off by a factor of five, and only one per cent of our user base had that controller? Suppose that only one per cent of the user base had their machines crash on startup; suppose that only a fraction of those users called in. Eeew. The story and the feeling, rather than the numbers, told us this: still too many.

    Is it irrational to base a decision based on such vague, unqualified, first-order numbers? If so, who cares? We made an “irrational” but conscious decision: fix the product, rather than ship it. That is, we didn’t decide based on the numbers, but rather on how we felt about the numbers. Was that the right decision? We’ll never know, unless we figure out how to get access to parallel universes. In this one, though, we know that the problem got fixed startlingly quickly when another programmer viewed it with fresh eyes; that the product shipped without the problem; that users with that controller never faced that problem; and that the tech support department continued in its regular overloaded state, instead of a super-overloaded one.

    The decision to release is always a business decision, and not merely a technical one. The decision to release is not based on numbers or quantities; even for those who claim to make decisions “based on the numbers”, the decisions are really based on feelings about the numbers. The decision to release is always driven by balancing cost, value, knowledge, trust, and risk. Some product owners will have a bigger appetite for risk and reward; others will be more cautious. Being a product owner is challenging because, in the end, product owners own the shipping decisions. By definition, a product owner assumes responsibility for the risk of financial losses stemming from bugs in the software. That’s why they get the big bucks.