Calculating a ratio of passing tests to failing tests is a relatively easy task. If it is used as a means of estimating the state of a development project, though, the ratio is invalid, irrelevant, and misleading. At best, if everyone ignores it entirely, it’s simply playing with numbers. Otherwise, producing a pass/fail ratio is irresponsible, unethical, and unprofessional.
A passing test is no guarantee that the product is working correctly or reliably. Instead, a passing test is an observation that the program appeared to work correctly, under some set of conditions that we were conscious of (and many that we weren’t), using a selection of specific inputs (and not using the rest of an essentially infinite set), at some time (to which we will never return), on some machine (that was in a particular state at that time; we observed and understood only a fraction of that state), based on a handful of things that we were looking at (and a boatload of things that we weren’t looking at, not that we’d have any idea where or how to look for everything). At best, a passing test is a rumour of success. Take any of the parameters above, change one bit, and we could have had a failing test instead.
Meanwhile, a failing test is no guarantee of a failure in the product we’re testing. Someone may have misunderstood a requirement, and turned that misunderstanding into an inappropriate test procedure. Someone may have understood the requirement comprehensively, and erred in establishing the test procedure; someone else may have erred in following it. The platform on which we’re testing may be misconfigured, or there may be something wrong with something on the system, such that our failing test points to that problem and is not an indicator of a problem in our product. If the test was being assisted by automation, perhaps there was a bug in the automation. Our test tools may be misconfigured such that they’re not doing what we think they’re doing. When generating data, we may have misclassified invalid data as valid, or vice versa, and not noticed it. We may have inadvertently entered the wrong data. The timing of the test may be off, such that system was not ready for the input we provided. There may be an as-yet-not-understood reason why the product is providing a result which seems incorrect to us, but which is in fact correct. A failing test is an allegation of failure.
When we do the math based on these assumptions, the unit of measurement in which pass/fail rates are expressed is rumours over allegations. Is this a credible unit of measurement?
Neither rumours nor allegations are things. Uncertainties are not units with a valid natural scale against which they can be measured. One entity that we call a “test case”, whether passing or failing, may consist of a single operation, observation and decision rule. Another entity called “test case” may consist of hundreds or thousands or millions of operations, all invisible, with thousands of opportunities for a tester to observe problems based not only on explicit knowledge, but also on tacit knowledge. Measuring while failing to account for clear differences between entities demolishes the construct validity of the measurement. Treating test cases—whether passing or failing—as though they were countable objects is a classic case of the reification fallacy. Aggregating scale-free, reified (non-)entities loses information about each instance, and loses information about any relationships between them. Some number of rumours doesn’t tell us anything about the meaning, significance, or value of any given passing tests, nor does the aggregate tell us anything about coverage that the passing tests provide, nor does the number tell us about missing coverage. Some number of allegations of which we’re aware doesn’t tell us anything about the seriousness of those allegations, nor does it tell us about undiscovered allegations. Dividing one invalid number by another invalid doesn’t mean the invalidity cancels and produces a valid ratio.
When the student has got an answer wrong, and the student is misinformed, there’s a problem. What does the number of questions that the teacher asked have to do with it? When a manager interviews a candidate for a job, and halfway through the interview he suddenly starts shouting obscenities at her, will the number of questions the manager asked have to do anything to do with her hiring decision? If the battery on the Tesla Roadster is ever completely drained, the car turns into a brick with a $40,000 bill attached to it. Does anyone, anywhere, care about the number of passing tests that were done on the car?
If we are asked to produce pass/fail ratios, I would argue that it’s our professional responsibility to politely refuse to do it, and to explain why: we should not be offering our clients the service of self-deception and illusion, nor should our client accept those services. The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception. Reducing the product story to a number means reducing its relationship with people to a number. By extension, that means reducing people to numbers too. So to irresponsible, unethical, and unprofessional, we can add unscientific and inhumane.
So what’s the alternative? We’ll get to that tomorrow.