And now for the immodest part of the EuroSTAR 2009 Test Lab report: I won the Best Bug award, although it’s not clear to me which bug got the nod, since I reported several fairly major problems.
I tested OpenEMR. For me, one candidate for the most serious problem would have been a consistent pattern of inconsistency in input handling and error checking. I observed over a dozen instances of some kind of sloppiness.
This reminded me of a problem that we testers often see in project work, the problem of measuring by counting things—counting bugs, counting bug reports, counting requirements. When the requirement is to defend the application against overflowing text fields and vulnerability to input constraint attacks by hackers, how should we count? How many mentions of that should there be? One, in a statement of general principles at the beginning of a requirements document? Hundreds, in a statement of specific purpose for each input field in a functional specification? How many requirements are there to make sure that fields don’t overflow? How many requirements that they support only the characters, numbers, or date ranges that they’re supposed to? What about traceability? If this is a genuine problem, and the requirements documents don’t mention a particular requirement explicitly, should we refrain from reporting on a problem with that implicit requirement?
When I report an issue—for example, that practically all of the input fields in OpenEMR have some kind of problem with them—should that count as one bug report? Since it applies to hundreds of fields, should it count as hundreds of bug reports? When such a pervasive overall problem exists, should the tester make a report for each and every field in which he observes a problem? And if you want to answer Yes, to that question: is it worth the opportunity cost to do that when there are plenty of other problems in the product?
So again, there were so many instances of unconstrained and unchecked input that I stopped recording specifics and instead reported a general pattern in the bug tracking system. My decision to do this was an instance of the Dead Horse stopping heuristic; reporting yet another instance of the same class of problem would be like flogging a dead horse. I could have wasted a lot of time and energy reporting each instance of each problem I observed, along with specific symptoms and possible ramifications of each one. Yet I’m very skeptical that this would serve the project well. In my experience as a program manager for a product whose code was being developed outside our company, I found that there was steadily diminishing return in value for many reports of the same ilk. When, in testing, we identified a general pattern of failure, we stopped looking for more instances. We sent the product back to the development shop, and required the programmers and their testers to review the product through-and-through for that kind of problem.
If I were to be evaluated on the number of bugs that I found, I’d find it hard to resist the easy pickings of yet another input constraint attack bug report. Yet when I’m testing, every moment of bug investigation and reporting is, by some reckoning, another moment that I can’t spend on obtaining more test coverage (more about that here). By focusing on investigating and reporting on input problems (and thereby increasing my bug count), am I missing opportunities to design and perform tests on scheduling conflict-resolution algorithms, workflows, database integrity,…?
There were two other fairly serious problems that I observed. One was that the Chinese version of the product showed a remarkable number of English words, presumably untranslated, interspersed among the ideograms; I expected to see no English at all. I treated that problem in the same way as the input constraint problem: with a single report of a general problem.
The second serious problem was that searches of various kinds would place a link in the address bar. The link represented a command to a CGI script of some kind, which evidently constructed and forwarded a query to an underlying SQL database. Backspacing over the last digit in the address bar and replacing it with a slash caused a lovely SQL error message to appear on the screen, unhandled by any of OpenEMR’s code. The message could have been used, said our local product owner, to expose the structure of the database to snoops or hackers. I found that problem by a defocusing heuristic—looking at the browser, rather than the browser window.
I don’t know which of these problems took Best Bug honours. I’m not sure that the presenters specified which bug they were crediting with Best Bug. That makes a certain kind of sense, since I can’t tell which of these problems is the most serious either. After all, a problem isn’t its own thing; it’s a relationship between a person and a product or a situation. There are plenty of ways to address a problem. You could fix the product or the situation. You could change the perspective or the perception of the person observing the problem, say by keeping the problem as it is but providing a workaround. You could choose to ignore the problem yourself, which underscores the fact that a problem for some person might not be a problem for you. That’s why it’s not helpful to count problems.
Managers: do you see how evaluating testers based on test cases or bug counts, rather than the value of reporting, will lead to distortion at best, and more likely to dysfunction? Do you see how providing overstructured test scripts or test cases could reduce the diversity—and therefore the quality—of testing? Do you see how the notion of “one test per requirement” or “one positive and one negative test per requirement” is misleading?
Testers: do you see how being evaluated on bug counts could lead to inattentional blindness with respect to the more serious problems than the low-hanging fruit affords? Do you see how focusing on bugs, rather than focusing on test coverage, could reduce the value of your testing?
Instead of counting things, let’s consider evaluating testing work in a different way. Let’s consider the overall testing story and its many dimensions. Let’s think about the story around each bug, and each bug report—not just the number of reports, but the meaning and significance of each one. Let’s look at the value of the information to stakeholders, primarily to programmers and to product owners. Let’s think about the extent to which the tester makes things easier for others on the team, including other testers. Let’s look at the diversity of problems discovered, the diversity of approaches used, and the diversity of tools and techniques applied. And rather than using this information to reward or punish testers, let’s use it to guide coaching, mentoring, and training such that the focus is on developing skill for everyone.
The dimensions above are qualitative, rather than quantitative. Yet if our mission is to provide information to inform decisions about quality, we of all people should recognize that expressing value in terms of numbers often removes important information rather than adding it.
Measuring and Managing Performance in Organizations (Robert D. Austin)
Software Engineering Metrics: What Do They Measure and How Do We Know? (Kaner and Bond)
Quality Software Management, Vol. 2: First Order Measurement (Weinberg)
Perfect Software (and Other Illusions About Testing) (Weinberg)