DevelopsenseLogo

Is There a Simple Coverage Metric?

In response to my recent blog post, 100% Coverage is Possible, reader Hema Khurana asked:

“Also some measure is required otherwise we wouldn’t know about the depth of coverage. Any straight measures available?”

I replied, “I don’t know what you mean by a ‘straight’ measure. Can you explain what you mean by that?”

Hema responded: “I meant a metric some X/Y.”

In all honesty, it’s sometimes hard to remain patient when this question seems to come up at every conference, in every class, week upon week, year upon year. Asking me about this is a little like asking Chris Hadfield—since he’s a well-known astronaut and a pretty smart guy—if he could provide a way of measuring the area of the flat, rectangular earth. But Hema hasn’t asked me before, and we’ve never met, so I don’t want to be immediately dismissive.

My answer, my fast answer, is No. One key problem here is related to what Y could possibly represent. What counts? Maybe we could talk about Y in terms of a number of test cases, and X as how many of those test cases we’ve executed so far. If Y is 600 and X is 540, we could say that testing is 90% done. But that ignores at least two fundamental problems.

The first problem is that, irrespective of the number of test cases we have, we could choose to add more at any time as (via testing) we discover different conditions that we would like to evaluate. Or maybe we could choose to drop test cases when we realize that they’re out of date or irrelevant or erroneous. That is, unless we decide to ignore what we’ve learned, Y will, quite appropriately, change over time.

The second problem is that—at least in my view, and in the view of my colleagues—test cases are a ludicrous way to think about testing.

Another almost-as-quick answer would be to encourage people to re-read that 100% Coverage is Possible post (and the Further Reading links), and to keep re-reading until they get it.

But that’s probably not very encouraging to someone who is asking a naive question, and I’d like to more be helpful than that.

Here’s one thing we could do, if someone were desperate for numbers that summarize coverage: we could make a qualitative evaluation of coverage, and put numbers (or letters, or symbols) on a scale that is nominal and very weakly ordinal.

Our qualitative evaluation would be rooted in analysis of many dimensions of coverage. The Product Elements and Quality Criteria sections of the Heuristic Test Strategy Model provides a framework for generating coverage ideas or for reviewing our coverage retrospectively. We would review and discuss how much testing we’ve done of specific features, or particular functional areas, or perceived risks, and summarize our evaluation using a simple scale that would go something like this:

Level 0 (or X, or an empty circle, or…): We know nothing at all about this area of the product.

Level 1 (or C, or a glassy-eyed emoticon, or…): We have done a very cursory evaluation of this area. Smoke- or sanity-level; we’ve visited this feature and had a brief look at it, but we don’t really know very much about it; we haven’t probed it in any real depth.

Level 2 (or B, or a normal-looking emoticon, or…): We’ve had a reasonable look at this area, although we haven’t gone all the way deep. We’ve examined the common, the core, the critical, the happy paths, the handling of everyday errors or exceptions. We’ve pretty familiar with this area. We’ve done the kind of testing that would expose some significant bugs, if they were there.

Level 3 (or A, or a determined-looking angel emoticon, or…): We’ve really kicked this area harshly and hard. We’ve looked at unusual and complex conditions or states. We’ve probed deeply for subtle or hidden bugs. We’ve exposed the product to the extreme, the exceptional, the rare, the improbable. We’ve looked for bugs that are deep in the corners or hidden in the dark. If there were a serious bug, we’re pretty sure we would have found it by now.

Strictly speaking, these numbers are placed on an ordinal scale, in the sense that Level 3 coverage is deeper than Level 2, which is deeper than Level 1. (If you don’t know about scales of measurement, you should learn about them before providing or asking for metrics. And there are some other things to look at.) The numbers are certainly not an interval scale, or a ratio scale. They may not be commensurate from one feature area to the next; that is, they may represent different notions of coverage, different amounts of effort, different modes of evaluation. By design, these numbers should not be treated as valid measurements, and we should make sure that everyone on the project knows it. They are little labels that summarize evaluations and product elements and effort, factors that must be discussed to be understood. But those discussions can lead to understanding and consensus between ourselves, our colleagues, and our clients.

5 replies to “Is There a Simple Coverage Metric?”

  1. Very interesting take. Would you use those for goal setting as well? As in we reached level 1 of 3. I am a bit torn between that being helpful and being dangerous. It could be helpful in terms of moving away from coverage discussions, but might dangerous if combined with a temporal aspect as in “you have two days to get to level 2.”

    Michael replies: As usual, figuring out where we are is not too hard, relatively speaking. Figuring out where we’re going, and how long it might take to get there, is much harder. (You can predict pretty much everything except the future, as the saying goes.) Mandating that we be somewhere at a certain time seems unreasonable to me.

    I’ve written about this here. (http://www.developsense.com/blog/2009/08/test-estimation-is-really-negotiation/). And here. (http://www.developsense.com/blog/2007/01/test-project-estimation-rapid-way/)

    Reply
  2. Your “measuring a flat, rectangular earth” metaphor got me thinking about maps as ways of representing 3 dimensional data. Here are some thoughts.

    The kind of world map many of us might envision when first thinking of a world map is one using the Mercator type of projection, first developed in 1569. This could be compared to the common “metrics” that some people try to attach to software testing. Both simplify the data they’re trying to represent, are missing important dimensions, and may lead people to make potentially dangerous assumptions.

    In the case of the Mercator map, it makes North America appear to be a larger continent than Africa, when in fact the opposite is the case. In the case of commonly used metrics (eg. coverage percentages), it may give a false sense of “completion” with regards to certain aspects of the product under test.

    These ideas are off the top of my head, so I expect that there is probably holes in the reasoning and/or my explanation.

    Reply
  3. I really like the idea of a heuristic evaluation.

    Michael replies: I would hope we’d like that, since that’s what testing is.

    Anything that forces the tester to think is a good idea.

    Alas, I’ve found that nothing successfully forces people to think. But maybe encouraging or prompting or reminding people to think could help.

    I was once asked to evaluate the likelihood of defects occurring in production and my response was “wear to sets of underpants”. The boss was so impressed that he asked me to create a scale along the same lines. My evaluation was “rooted in analysis of many dimensions of coverage” (you need to be an Aussie to appreciate the use of that quote) and proved to be valid.

    Alas I’m not an Aussie, so I remain flummoxed as to what it means. Oh well. If it works in your culture…

    At other times where test coverage was a meaningless metric, I simply made sure that every requirement was referenced by at least one test case. The first law of consulting is CYA.. cover your arse, which of course is part of how to survive.

    Anyway thanks for the blog. I will revaluate “those who can do, those who can’t teach” and those who have no idea write books.

    I haven’t written a book. However, I’ll do my best to figure out a way to interpret that as a compliment.

    Reply

Leave a Comment