The distinction between testing vs. checking got a big boost recently from James Bach at the Øredev conference in Malmö, Sweden. But a recent tweet by Brian Marick, and a recent conversation with a colleague have highlighted an issue that I should probably address.
My colleague suggested that somehow I may have underplayed the significance or importance or the worth of checking. Brian’s tweet said,
“I think the trendy distinction between “testing” and “checking” is a power play: which would you preface with “mere”? http://bit.ly/2Cuyj
As a consequence, I was worried that I had ever said “mere checking” or “merely checking” in one of my blog postings or on Twitter, so I researched it. Apparently I had not; that was a relief. However, the fact that I was suspicious even of myself suggests that some maybe I need to clarify something.
The distinction between testing and checking is a power play, but it’s not a power play between (say) testers and programmers. It’s a power play between the glorification of mechanizable assertions over human intelligence. It’s a power play between sapient and non-sapient actions.
Recall that the action of a check has three parts to it. Part one is an observation of a product. Part two is a decision rule, by which we can compare that empirical observation of the product with an idea that someone had about it. Part three is the setting of a bit (pass or fail, yes or no, true or false) that represents the non-sapient application of both the observation and the decision rule. Note, too, that this means that a check can be performed by one of two agencies: 1) a machine. 2) A sufficiently disengaged human; that is, a human who has been scripted to behave like a machine, and who has for whatever reason accepted that assignment.
So checks can be hugely important. Checks are a means by which a programmer, engaged in test-driven development, checks his idea. Creating the check and analyzing its result are both testing activities. Checks are a valuable product (a by-product, some would say) of test-driven development. Checks are change detectors, tools that allow programmers to refactor with confidence. Checks built into continuous integration are mechanisms to make sure that our builds can work well enough to be tested—or, if we’re confident enough in the prior quality of our testing, can work well enough to be deployed. Checks tend to shorten the loop between the implementation of an idea and the disovery of a problem that the checks can detect, since the checks are typically designed and run (a lot, iteratively) by the person doing the implementation. Checks tend to speed up certain aspects of the post-programmer testing of the product, since good checks will find the kind dopey, embarrassing errors that even the best programmers can make from time to time. The need for checks sometimes (alas, not always) prompts us to create interfaces that can be used by programmers or testers to aid in later exploration.
Checking represents the rediscovery of techniques that were around at least in 1957. “The first attack on the checkout problem may be made before coding has begun.” D. D. McCracken, Digital Computer Programming, 1957 (Thanks to Ben Simo for inspiring me to purchase a copy of this book.) In 2007, I had dinner with Jerry Weinberg and Josh Kerievsky. Josh asked Jerry if he did a lot of unit testing back in the day. Jerry practically did a spit-take, saying “Yes, of course. Computer time was hugely expensive, but we programmers were cheap. Getting the program right was really important, so we had to test a lot.” Then he added something that hadn’t occurred to me. “There was another reason, too. Apart from everything else, we tested because the machinery was so unreliable. We’d run a test program, then run the program we wrote, then run the test program again to make sure that we got the same result the second time. We had to make sure that no tubes had blown out.”
So, in those senses, checking rocks. Checking has always rocked. It seems that in some places, people forgot how much it rocks, and that the Agilists have rediscovered them.
Yet it’s important to note that checks on their own don’t deliver value unless there’s sapient engagement with them. What do I mean by that?
As James Bach says here, “A sapient process is any process that relies on skilled humans.” Sapience is the capacity to act with human intelligence, human judgment, and some degree of human wisdom.
It takes sapience to recognize the need for a check—a risk, or a potential vulnerability. It takes sapience—testing skill—to express that need in terms of a test idea. It takes sapience—more test design skill—to express that test idea in terms of a question that we could ask about the program. Sapience—in terms of testing skill, and probably some programming skill—is needed to frame that question as a yes-or-no, true-or-false, pass-or-fail question. Sapience, in the form of programming skill, is required to turn that question into executable code that can implement the check (or, far more expensively and with less value, into a test script for execution by a non-sapient human). We need sapience—testing skill again—to identify an event or condition that would trigger some agency to perform the check. We need sapience—programming skill again—to encode that trigger into executable code so that the process can be automated.
Sapience disappears while the check is being performed. By definition, the observation, the decision rule, and the setting of the bit all happen without the cognitive engagement of a skilled human.
Once the check has been performed, though, skill comes back into the picture for reporting. Checks are rarely done on their own, so they must be aggregated. The aggregation is typically handled by the application of programming skill. To make the outcome of the check observable, the aggregated results must be turned into a human-readable report of some kind, which requires both testing and programming skill. The human observation of the report, intake, is by defintion a sapient process. Then comes interpretation. The human ascribes meaning to the various parts of the report, which requires skills of testing and of critical thinking. The human ascribes significance to the meaning, which again takes testing and critical thinking skill. Sapient activity by someone—a tester, a programmer, or a product owner—is needed to determine the response. Upon deciding on significance, more sapient action is required—fixing the application being checked (by the production programmer); fixing or updating the check (by the person who designed or programmed the check); adding a new check (by whomever might want to do so) or getting rid of the check (by one or more people who matter, and who have decided that the check is no longer relevant).
So: the check in and of itself is relatively trivial. It’s all that stuff around the check—the testing and programming and analysis activity—that’s important, supremely important. And as is usual with important stuff, there are potential traps.
The first trap is that it might be easy to do any of the sapient aspects of checking badly. Since the checks are at their core software, there might be problems in requirements, design, coding, or interpretation, just as there might be with any software.
The second trap is that it can be easy to fall asleep somewhere between the report and interpretations stages of the checking process. The green bar tells us that All Is Well, but we must be careful about that. All is well with respect to the checks that we’ve programmed is a very different statement. Red tends to get our attention, but green is an addictive and narcotic colour. A passing test is another White Swan, confirmation of our existing beliefs, proof by induction. Now, we can’t live without proof by induction, but induction can’t alert us to new problems. Millions of repeated tests, repeated thousands of times, don’t tell us about the bugs that elude them. We only need one Black Swan to bump into a devastating effect.
The third trap is that we might believe that checking a program is all there is to testing it. Checking done well incorporates an enormous amount of testing and programming skill, but some quality attributes of a program are not machine-decidable. Checks are the kinds of tests that aren’t vulnerable to the halting problem.Someone on a mailing list once said, “Once all the (automated) acceptance test pass (that is, all the checks), we know we’re done.” I liked Joe Rainsberger‘s reply, “No, you’re not done; you’re ready to give it to a real tester to kick the snot out of it.” That kicking is usually expressed with greater emphasis on exploration, discovery, and investigation, and rather less on confirmation, verification, and validation.
The fourth trap is a close cousin of the third trap: at certain points, we might pay undue attention to the value of checking with respect to its cost. Cost vs. value is a dominating problem with any kind of testing, of course. One of the reasons that the Agile emphasis on testing remains exciting is that excellent checking lowers the cost of testing, and both help to defend the value of the program. Yet checks may not be Just The Thing for some purposes. Joe has expressed concerns in his series Integrated Tests are a Scam, and Brian Marick did too, a while ago, An Alternative to Business-Facing TDD. I think they’re both making important points here, thinking of checks as a means to an end, rather than as a fetish.
Fifth: upon noting the previous four traps (and others), we might be tempted to diminish the value of checking. That would be a mistake. Pretty much any program is made more testable by someone removing problems before someone else sees them. Every bug or issue that we find could trigger investigation, reporting, fixing, and retesting, and that gives other (and potentially more serious) problems time to hide. Checking helps to prevent those unhappy discoveries. Excellent checking (which incorporates excellent testing) will tend to reduce the number of problems in the product at any given time, and thereby results in a more testable program. James Bach points out that a good manual test could never be automated (he’d say “sapient” now, I believe). But note, in that same post that he says, that “if you can truly automate a manual test, it couldn’t have been a good manual test”, and “if you have a great automated test, it’s not the same as the manual test that you believe you were automating”. The point is that there are such things as great automated tests, and some of them might be checks.
So the power play is over which we’re going to value: the checks (“we have 50,000 automated tests”) or the checking. Mere checks aren’t important; but checking—the activity required to build, maintain, and analyze the checks—is. To paraphrase Eisenhower, with respect to checking, the checks are nothing; the checking is everything. Yet the checking isn’t everything; neither is the testing. They’re both important, and to me, neither can be appropriately preceded with “mere”, or “merely”.
There’s one exception, though: If you’re only doing one or the other, it might be important to say, “You’re merely been testing the program; wouldn’t you be better off checking it, too?” or “That program hasn’t been tested; it’s merely been checked.”