In response to my recent blog post, 100% Coverage is Possible, reader Hema Khurana asked:
“Also some measure is required otherwise we wouldn’t know about the depth of coverage. Any straight measures available?”
I replied, “I don’t know what you mean by a ‘straight’ measure. Can you explain what you mean by that?”
Hema responded: “I meant a metric some X/Y.”
In all honesty, it’s sometimes hard to remain patient when this question seems to come up at every conference, in every class, week upon week, year upon year. Asking me about this is a little like asking Chris Hadfield—since he’s a well-known astronaut and a pretty smart guy—if he could provide a way of measuring the area of the flat, rectangular earth. But Hema hasn’t asked me before, and we’ve never met, so I don’t want to be immediately dismissive.
My answer, my fast answer, is No. One key problem here is related to what Y could possibly represent. What counts? Maybe we could talk about Y in terms of a number of test cases, and X as how many of those test cases we’ve executed so far. If Y is 600 and X is 540, we could say that testing is 90% done. But that ignores at least two fundamental problems.
The first problem is that, irrespective of the number of test cases we have, we could choose to add more at any time as (via testing) we discover different conditions that we would like to evaluate. Or maybe we could choose to drop test cases when we realize that they’re out of date or irrelevant or erroneous. That is, unless we decide to ignore what we’ve learned, Y will, quite appropriately, change over time.
The second problem is that—at least in my view, and in the view of my colleagues—test cases are a ludicrous way to think about testing.
Another almost-as-quick answer would be to encourage people to re-read that 100% Coverage is Possible post (and the Further Reading links), and to keep re-reading until they get it.
But that’s probably not very encouraging to someone who is asking a naive question, and I’d like to more be helpful than that.
Here’s one thing we could do, if someone were desperate for numbers that summarize coverage: we could make a qualitative evaluation of coverage, and put numbers (or letters, or symbols) on a scale that is nominal and very weakly ordinal.
Our qualitative evaluation would be rooted in analysis of many dimensions of coverage. The Product Elements and Quality Criteria sections of the Heuristic Test Strategy Model provides a framework for generating coverage ideas or for reviewing our coverage retrospectively. We would review and discuss how much testing we’ve done of specific features, or particular functional areas, or perceived risks, and summarize our evaluation using a simple scale that would go something like this:
Level 0 (or X, or an empty circle, or…): We know nothing at all about this area of the product.
Level 1 (or C, or a glassy-eyed emoticon, or…): We have done a very cursory evaluation of this area. Smoke- or sanity-level; we’ve visited this feature and had a brief look at it, but we don’t really know very much about it; we haven’t probed it in any real depth.
Level 2 (or B, or a normal-looking emoticon, or…): We’ve had a reasonable look at this area, although we haven’t gone all the way deep. We’ve examined the common, the core, the critical, the happy paths, the handling of everyday errors or exceptions. We’ve pretty familiar with this area. We’ve done the kind of testing that would expose some significant bugs, if they were there.
Level 3 (or A, or a determined-looking angel emoticon, or…): We’ve really kicked this area harshly and hard. We’ve looked at unusual and complex conditions or states. We’ve probed deeply for subtle or hidden bugs. We’ve exposed the product to the extreme, the exceptional, the rare, the improbable. We’ve looked for bugs that are deep in the corners or hidden in the dark. If there were a serious bug, we’re pretty sure we would have found it by now.
Strictly speaking, these numbers are placed on an ordinal scale, in the sense that Level 3 coverage is deeper than Level 2, which is deeper than Level 1. (If you don’t know about scales of measurement, you should learn about them before providing or asking for metrics. And there are some other things to look at.) The numbers are certainly not an interval scale, or a ratio scale. They may not be commensurate from one feature area to the next; that is, they may represent different notions of coverage, different amounts of effort, different modes of evaluation. By design, these numbers should not be treated as valid measurements, and we should make sure that everyone on the project knows it. They are little labels that summarize evaluations and product elements and effort, factors that must be discussed to be understood. But those discussions can lead to understanding and consensus between ourselves, our colleagues, and our clients.