Blog: Delivering the News (Test Reporting Part 3)

In the last post in this series, I noted some potentially useful structual similarities between bug reports (whether oral or written) and newspaper reports. This time, I’ll delve into that a little more.

To our clients, investigative problem reports are usually the most important part of the product story. The most respected newspapers don’t earn their reputations by reprinting press releases; they earn their reputations through investigative journalism. As testers (or, heaven help us, quality assurance) people, we tend to be chartered to look for problems, and to investigate them in ways that are most helpful to programmers, managers, and our other clients. A failing test on its own tells us little, and a failing check even less; as I pointed out here, a failing test is only an allegation of a problem. Investigation and study of a failing test is likely to inform us of something more useful: whether someone will perceive a problem that threatens the value of the product. I’ll talk more about the nature of problems in a later post, but for now, think of a product problem in terms of a perceived absence of or threat to some dimension of quality. (See the Heuristic Test Strategy Model for one list of quality criteria; see Software Quality Characteristics, by Rikard Edgren, Henrik Emilsson and Martin Jansson for another.) Since the manager’s goal is generally to release a product at her desired level of quality, problems that could threaten that goal are likely to be interesting and important. Or, as they say in the newspaper business, “if it bleeds, it leads”.

Potential showstoppers are usually the most important stories. In the 1990s, I was a technical support person, a tester, a program manager, and a programmer for a mass-market, commercial shrink-wrap software company. Since we had millions of customers, even minor problems could have a big impact on technical support and on the reputation of our products in the market. The market was enormous, hardware and software were even less standardized than they are now, and we worked under a great deal of time pressure. Classifying and prioritizing problems was contentious. One of the important classification questions was “What should we consider a showstopper?” One of the senior programmers came up with an answer that I’ve used ever since:

Showstopper (n.): Something that makes more sense to fix than to ship.

(I talked about showstoppers here.) In a development project, a showstopper—any threat to the timely release of the project—is a page-one, above-the-fold story, a story that you can see and begin to read without opening the newspaper or picking it up.

There’s always one story that leads. The most important threat to a timely, successful release may be a single problem, or it may be a collection of problems—what Ian Mitroff calls a mess. Do we have a problem, or a couple of problems, or a mess? No matter what the answer, there’s only so much space on the front page above the fold. Will you have one headline, or two, or three? What will that headline say? What will the lead paragraph of each story look like? Does the lead paragraph cover the five Ws—who, what, where, when, and why? If not, are those questions answered shortly? Might there be a reasonable reason not to answer them?

There’s only one front page, and there’s almost always more than one story on it. Our clients need to be able to absorb the lead story and the other front-page stories quickly, so we need to be able to provide headlines, lead paragraphs, and details in appropriate proportions. See an example front page here, with details that follow.

Very infrequently, serious newspapers give their entire front page to a story. In those cases, it’s usually an overwhelmingly important story, or one that threatens the newspaper or journalism itself.

The most compelling stories are those that have an impact on people. Although product problems are often technical in nature, the “making sense” part of the showstopper decision is focused on the business. Testers must to be able to connect technical problems with business risk. Problems related to technical correctness are often easy to describe, but they might not be important. The skill of bug advocacy—making sure that the customer is aware of the best possible motivations for fixing the bug—depends on your ability to report the bug in terms of its most significant effect on the business. Ben Simo has a lovely way to sum this up. Early in his career, when Ben was trying to advocate a bug fix, his project manager said, “Revenue is king. Liability is queen. Tell me how this bug impacts them.”

The number of stories usually isn’t as important as the significance of the stories. This is another way in which test reports can be like newspapers. We don’t usually evaluate the quality of a newspaper by the number of stories in it. Instead, we look at the significance, relevance, and credibility of the stories.

It may take time to distinguish between a breaking story and a major story. Sometimes the news cycle doesn’t afford time for investigation, even though the story might be important. Information gets passed around the project at various moments during the test and development cycle. Sometimes a discovery happens just before a meeting. Smart reporters know to balance urgency and restraint when there’s a breaking story. When I worked in commercial mass-market software in the 1990s, we sometimes discovered a terrible-looking problem a couple of hours before release. Such discoveries would trigger arousal (no, not sexual friskiness, but arousal in the psychological sense of being suddenly snapped awake and alert to danger). All of a sudden, we’d be noticing all kinds of things that we hadn’t noticed before, and most of them were non-problems of one kind or another. We were biased by fear. We called it the “snakes on everything” moment. When reporting, testers need to take stock of the emotional factors surrounding them, and report cautiously and accurate. An hour from now, an allegation or a rumour might be an important story—or it might be nothing.

Non-problems aren’t news. There’s a pattern of stories in the first section of the newspaper: they’re mostly stories about problems, and there’s a reason for that: problems compel attention. Our emotional systems evolved to help keep us out of trouble. Problems or threats trigger arousal. Things that are going well are nice to hear about, but they don’t engage emotions in the same way as problems do. In a software development project, non-problems have relatively little significance for project managers. Routine daily successes don’t threaten the project, and therefore need less attention.

Numbers, like pictures, are illustrations, not the whole story. A qualitative report is not quantity-free; after all, identifying the presence or absence of something involves counting to one, and the degree of some attribute of interest can be illustrated by number. But just as a pictorial illustration isn’t the item it depicts, a numerical illustration isn’t the story it might help to describe. A picture looks a part of a scene through a particular lens; a number focuses on one attribute using a particular metric. Each one may emphasizes some observations at the expense of other observations. Each one may crop out detail. Each one may magnify or distort.

Since the product and testing stories are multi-dimensional, be prepared to show the dimensions. Newspapers reports always have a bias, but reporters and editors often attempt to manage the bias by providing alternative sources of information, and alternative interpretations. A story of any length often includes multiple stories, or multiple threads of the main story. When tables of data are appropriate, newspapers print tables (think stock quotes in the business section, or box scores or line scores in sports). Products, coverage, quality, and problems are all multi-dimensional, multi-variate, and qualitative. Where there’s a mass of data, consider using tables such as dashboards or coverage tables. Pin numbers to reliable measurements (see the slip charts, the detailed impact case methods, and the subjective impact methods in Weinberg’s Quality Software Management, Volume 2: First Order Measurement; and pay attention to validity—see Kirk and Miller’s Reliability and Validity in Qualitative Research and Shadish, Cook, and Campbell’s Experimental and Quasi-Experimental Designs for Generalized Causal Inference).

Describe your coverage. Boris Beizer described coverage as “any metric of test completeness with respect to a test selection criterion”. That suggests that it is possible to quantify coverage if you have a quantifiable test selection criterion. For example, if a single-digit field accepts any digit from 0 to 9, one could select 10 tests and claim complete coverage based on that criterion. Mind, that data coverage doesn’t account for flow or sequence coverage; suppose that a bug was triggered only when a 7 replaced a 3 in that field. Since the overall number of possible tests is infinite, test selection criteria are based on models. In practical terms, this means that overall test coverage is some finite number over an infinite number. If you report that accurately, you’re stuck with a number that remains asymptotically close to zero. Instead, focus on the qualitative, and describe your coverage on an ordinal scale. Level 0 means “We know nothing about this area of the product.” Use Level 1 to say “We have done smoke or sanity testing; at this point, we’ve determined whether the product is even stable enough for serious testing.” Level 2 means “we’ve tested the common, the core, the critical, the happy path; our testing has been focused on can it work.” Level 3 means “We’ve tested the harsh, the complex, the challenging, the extreme, the exceptional; if there were a serious problem in this area, we’d probably know about it by now.” In this system, the numbers are barely more than labels for a qualitative evaluation, so don’t be tempted to do serious math with them.

Want to know more? Learn about Rapid Software Testing classes here.

7 responses to “Delivering the News (Test Reporting Part 3)”

  1. I get the point now. Excellent!

    Like a good news article, a test report is not just an objective entity, but should be a nuanced and enlightening read which makes the reader wiser and better able to make good decisions.

    We can learn a lot about reporting from from journalists and sociologists!

    /Anders

  2. Joe Harter says:

    Hi Michael,

    There is one thing I’d like to add to the concept of coverage. In addition to describing your coverage level ordinally (we use color codes) I recommend providing a date or build number for the last time you achieved that coverage. We deploy multiple times a day even when it’s close to a release date, so the uncertainty of high coverage decreases over time.

    So, you can show the last time you achieved level 3 coverage and you can also show that you have not covered that area at all in the current build.

    – Joe

  3. […] Blog: Delivering the News (Test Reporting Part 3) Written by: Michael Bolton […]

  4. Oren Reshef says:

    Great article,
    We report using test levels but the definitions are very vague, so I’m going to use your definition for test levels.
    A couple of questions though-

    In your example,how can you conclude that “These are fixable issue”? Isn’t your job only about reporting?

    Michael replies: There’s another similarity between newspaper reports and test reports: in a test we can quote sources who know more about certain parts of the story than we do.

    How will you report (like I have to) to a reporting environment built like a tree? I test a small(ish) module as part of a product built from individually testable modules, sub modules and sub-sub modules. I report to someone who is responsible for reporting one level up, and so on 3-4 levels up the tree creating a unified report. Experience shows that even show stoppers at my level might be considered shippable when observed from above.

    Is the report structured like a mind map (or could it be)? If so, perhaps you could use font size or colour to highlight salient information. Same with a more traditional layout.

    I’ve seen people use sticky notes to to flag crucial points on printed reports, but it seems to me that’s compensating for an unhelpful reporting approach. If the reporting environment or its output is structured in such at way that it has the capacity to bury a page-one story, I’d report that as an issue, or mention it directly to someone who can make a difference.

    BTW The link to “coverage tables” returns ERROR 404 – NOT FOUND

    Thank you. Fixed.

  5. […] reports should be stories about the product and the testing we did see Michael Bolton article on test reporting and the telling of the story. We should use the numbers to sport or backup our story. I often see it the wrong way around: lots […]

  6. Chris says:

    Great article, just read the series and it’s helped me fill in some blanks in my own reporting.

    One question – What does it mean to “cover the five Ws”? As a facetious example “I tested the product at my desk yesterday because that’s what you pay me for” answers all of those questions but offers next to no value. Is it a thinking tool to help come up with ideas that need to be communicated from the testing story or to communicate ideas I already know need reporting from the testing story?

    Michael replies: Yes. Both. With respect to the your example, I take “non-trivial” as a default stance for descriptions and reports—and I know you do too.

  7. […] Reporting Part 3: Delivering the news by Michael […]

Leave a Reply

Your email address will not be published. Required fields are marked *