Blog: Testing vs. Checking

This posting is an expansion of a lightning talk that I gave at Agile 2009. Many thanks to Arlo Belshee and James Shore for providing the platform. Many thanks also to the programmers and testers at the session for the volume of support that the talk and the idea received. Special thanks to Joe (J.B.) Rainsberger. Spread the meme!

Postscript: Over the years, some people have misinterpreted this post as a rejection of checking, or of regression testing, or of testing that is assisted by automation. So in addition to reading this post, it is important that you also read this one.

Post-postscript: This is now officially an old document. In the years since it’s been posted, I’ve further refined my take on the subject of testing and checking, mostly in collaboration with my colleague James Bach. Our current thinking on the topic appears on his blog, and I provide some followup here. We’ve also benefitted from comments and questions from other colleagues, so we encourage you to read the comments, and to comment yourself.(April, 2013)

There is confusion in the software development business over a distinction between testing and checking. I will now attempt to make the distinction clearer.

Checking Is Confirmation

Checking is something that we do with the motivation of confirming existing beliefs. Checking is a process of confirmation, verification, and validation. When we already believe something to be true, we verify our belief by checking. We check when we’ve made a change to the code and we want to make sure that everything that worked before still works. When we have an assumption that’s important, we check to make sure the assumption holds. Excellent programmers do a lot of checking as they write and modify their code, creating automated routines that they run frequently to check to make sure that the code hasn’t broken. Checking is focused on making sure that the program doesn’t fail.

Testing Is Exploration and Learning

Testing is something that we do with the motivation of finding new information. Testing is a process of exploration, discovery, investigation, and learning. When we configure, operate, and observe a product with the intention of evaluating it, or with the intention of recognizing a problem that we hadn’t anticipated, we’re testing. We’re testing when we’re trying to find out about the extents and limitations of the product and its design, and when we’re largely driven by questions that haven’t been answered or even asked before. As James Bach and I say in our Rapid Software Testing classes, testing is focused on “learning sufficiently everything that matters about how the program works and about how it might not work.”

Checks Are Machine-Decidable; Tests Require Sapience

A check provides a binary result—true or false, yes or no. Checking is all about asking and answering the question “Does this assertion pass or fail?” Such simple assertions tend to be machine-decidable and are, in and of themselves, value-neutral.

A test has an open-ended result. Testing is about asking and answering the question “Is there a problem here?” That kind of decision requires the application of many human observations combined with many value judgements.

When a check passes, we don’t know whether the program works; we only know that it’s still working within the scope of our expectations. The program might have serious problems, even though the check passes. To paraphrase Dkijstra, “checking can prove the presence of bugs, but not their absence.” Machines can recognize inconsistencies and problems that they have been programmed to recognize, but not new ones. Testing doesn’t tell us whether the program works either—certainty on such questions isn’t available—but testing may provide the basis of a strong inference addressing the question “problem or no problem?”

Testing is, in part, the process of finding out whether our checks have been good enough. When we find a problem through testing, one reasonable response is to write one or more checks to make sure that that particular problem doesn’t crop up again.

Whether we automate the process or not, if we could express our question such that a machine could ask and answer it via an assertion, it’s almost certainly checking. If it requires a human, it’s a sapient process, and is far more likely to be testing. In James Bach‘s seminal blog entry on sapient processes, he says, “My business is software testing. I have heard many people say they are in my business, too. Sometimes, when these people talk about automating tests, I think they probably aren’t in my business, after all. They couldn’t be, because what I think I’m doing is very hard to automate in any meaningful way. So I wonder… what the heck are they automating?” I have an answer: they’re automating checks.

When we talk about “tests” at any level in which we delegate the pass or fail decision to the machine, we’re talking about automated checks. I propose, therefore, that those things that we usually call “unit tests” be called “unit checks“. By the same token, I propose that automated acceptance “tests” (of the kind Ron Jeffries refers to in his blog post on automating story “tests”) become known as automated acceptance checks. These proposals appeared to galvanize a group of skilled programmers and testers in a workshop at Agile 2009, something about which I’ll have more to say in a later blog post.)

Testing Is Not Quality Assurance, But Checking Might Be

You can assure the quality of something over which you have control; that is, you can provide some level of assurance to some degree that it fulfills some requirement, and you can accept responsiblity if it does not fulfill that requirement. If you don’t have authority to change something, you can’t assure its quality, although you can evaluate it and report on what you’ve found. (See pages 6 and 7 of this paper, in which Cem Kaner explains the distinction between testing and quality assurance and cites Johanna Rothman‘s excellent set of questions that help to make the distinction.) Testing is not quality assurance, but acts in service to it; we supply information to programmers and managers who have the authority to make decisions about the project.

Checking, when done by a programmer, is mostly a quality assurance practice. When an programmer writes code, he checks his work. He might do this by running it directly and observing the results, or observing the behaviour of the code under the debugger, but often he writes a set of routines that exercise the code and perform some assertions on it. We call these unit “tests”, but they’re really checks, since the idea is to confirm existing knowledge. In this context, finding new information would be considered a surprise, and typically an unpleasant one. A failing check prompts the programmer to change the code to make it work the way he expects. That’s the quality assurance angle: a programmer helps to assure the quality of his work by checking it.

Testing, the search for new information, is not a quality assurance practice per se. Instead, testing informs quality assurance. Testing, to paraphrase Jerry Weinberg, is gathering information with the intention of informing a decision, or as James Bach says, “questioning a product in order to evaluate it.” Evaluation of a product doesn’t assure its quality, but it can inform decisions that will have an impact on quality. Testing might involve a good deal of checking; I’ll discuss that at more length below.

Checkers Require Specifications; Testers Do Not

A tester, as Jerry Weinberg said, is “someone who knows that things can be different”. As testers, it’s our job to discover information; often that information is in terms of inconsistencies between what people think and what’s true in reality. (Cem Kaner‘s definition of testing covers this nicely: “testing is an empirical, technical investigation of a product, done on behalf of stakeholders, with the intention of revealing quality-related information of the kind that they seek.”)

We often hear old-school “testing” proponents claim that good testing requires specifications that are clear, complete, up-to-date, and unambiguous. (I like to ask these people, “What do you mean by ‘unambiguous’?” They rarely take well to the joke. But I digress.) A tester does not require the certainty of a perfect specification to make useful observations and inferences about the product. Indeed, the tester’s task might be to gather information that exposes weakness or ambiguity in a specification, with the intention of providing information to the people who can clear it up. Part of the tester’s role might be to reveal problems when the plans for the product and the implementation have diverged at some point, even if part of the plan has never been written down. A tester’s task might be to reveal problems that occur when our excellent code calls buggy code in someone else’s library, for which we don’t have a specification. Capable testers can deal easily with such situations.

A person who needs a clear, complete, up-to-date, unambiguous specification to proceed is a checker, not a tester. A person who needs a test script to proceed is a checker, not a tester. A person who does nothing but to compare a program against some reference is a checker, not a tester.

Testing vs. Checking Is A Leaky Abstraction

Joel Spolsky has named a law worthy of the general systems movement, the Law of Leaky Abstractions (“All non-trivial abstractions, to some degree, are leaky.”). In the process of developing a product, we might alternate very rapidly between checking and testing. The distinction between the two lies primarily in our motivations. Let’s look at some examples.

  • A programmer who is writing some new code might be exploring the problem space. In her mind, she has a question about how she should proceed. She writes an assertion—a check. Then she writes some code to make the assertion pass. The assertion doesn’t pass, so she changes the code. The assertion still doesn’t pass. She recognizes that her initial conception of the problem was incomplete, so she changes the assertion, and writes some more code. This time the check passes, indicating that the assertion and the code are in agreement. She has an idea to write another bit of code, and repeats the process of writing a check first, then writing some code to make it pass. She also makes sure that the original check passes. Next, she sees the possibility that the code could fail given a different input. She believes it will succeed, but writes a new check to make sure. It passes. She tries different input. It fails, so she has to investigate the problem. She realizes her mistake, and uses her learning to inform a new check; then she writes functional code to fix the problem and pass the check.So far, her process has been largely exploratory. Even though she’s been using checks to support the process, her focus has been on learning, exploring the problem space, discovering problems in the code, and investigating those problems. In that sense, she’s testing as she’s programming. At the end of this burst of development, she now has some functional code that will go into the product. As a happy side effect, she has another body of code that will help her to check automatically for problems if and when the functional code gets modified.Mark Simpson, a programmer that I spoke to at Agile 2009, said that this cyclic process is like bushwhacking, hacking a new trail through the problem space. There are lots of ways that you could go, and you clear the bush of uncertainty around you in an attempt to get to where you’re going Historically, this process has been called “test-driven development”, which is a little unfortunate in that TDD-style “tests” are actually checks. Yet it would be hard, and even a little unfair, to argue that the overall process is not exploratory to a significant degree. Programmers engaged in TDD have a goal, but the path to the goal is not necessarily clear. If you don’t know exactly where you’re going to end up and exactly how you’re going to get there, you have to do some amount of exploration. The moniker “behavior-driven development” (BDD) helps to clear up the confusion to some degree, but it’s not yet in widespread adoption. BDD uses checks in the form “(The program) should…”, but the development process requires a lot of testing of the ideas as they’re being shaped.
  • Now our programmer looks over her code, and realizes that one of the variables is named in an unclear way, that one line of code would be more readable and maintainable expressed as two, and that a group of three lines could more elegantly and clearly expressed as a for loop. She decides to refactor. She addresses the problems one at a time, running her checks after each change. Her intention in running these checks is not to explore; it’s confirm that nothing’s been messed up. She doesn’t develop new checks; she’s pretty sure the old ones will do. At this point, she’s not really testing the product; she’s checking her work.
  • Much of the traditional “testing” literature suggests that “testing” is a process of validation and verification, as though we already know how the code should work. Although testing does involve some checking, a program that is only checked is likely to be poorly tested. Much of the testing literature focused on correctness—which can be checked—and ignores the sapience that is necessary to inform deeper questions about value, which must be tested. For example, that which is called “boundary testing” is usually boundary checking.The canonical example is that of the program that adds two two-digit integers, where values in the range from -99 to 99 are accepted, and everything else is rejected. The classic advice on how to “test” such a program focuses on boundary conditions, given in a form something like this: “Try -99 and 99 to verify that valid values are accepted, and try -100 and 100 to verify that invalid values are rejected.” I would argue that these “tests” are so weak as to be called checks; they’re frightfully obvious, they’re focused on confirmation, they focus on output rather than outcome, and they could be easily mechanized.If you wanted to test a program like that, you’d configure, operate, observe the product with eyes open to many more risks, including ones that aren’t at the forefront of your consciousness until a problem manifests itself. You’d be prepared to consider anything that might threaten the value of the product—problems related to performance, installability, usability, testability, and many other quality criteria. You’d tend to vary your tests, rather than repeating them. You’d engage curiosity, and perform a smattering of tests unrelated to your current models of risks and threats, with the goal of recognizing unanticipated risks. You might use automation to assist your exploration; perhaps you would use automation to generate data, to track coverage, to parse log files, to probe the registry or the file system for unanticipted effects. Even if you used automation to punch the keys for you, you’d use the automation in an exploratory way; you’d be prepared to change your line of investigation and your tactics when a test reveals surprising information.
  • The exploratory mindset is focused on questions like “What if…?” “I wonder…?” “How does this thing…?” “What happens when I…?” Even though we might be testing a program with a strongly exploratory approach, we will engage a number of confirmatory kinds of ideas. “If I press on that Cancel button, that dialog should go away.” “That field is asking for U.S. ZIP code; the field should accept at least five digits.” “I’ll double-click on ‘foo.doc’, and that file should open in Microsoft Word on this system.” Excellent testers hold these and dozens of other assumptions and assertions as a matter of course. We may not even be conscious of them being checks, but we’re checking sub-consciously as we explore and learn about the program. Should one of these checks fail, we might be prompted to seek new information, or if the behaviour seems reasonable, we might instead change our model of how the program is supposed to work. That’s a heuristic process (a fallible means of solving a problem or making a decision, conducive to learning; we presume that a heuristic usually works but that it might fail).
  • Also at Agile 2009, Chris McMahon gave a presentation called “History of a Large Test Automation Project using Selenium”. He described an approach of using several thousand automated checks (he called them “tests”) to find problems in the application with testing. How to describe the difference? Again, the difference is one of motivation. If you’re running thousands of automated checks with the intention of demonstrating that you’re okay today, just like you were yesterday, you’re checking. You could use those automated checks in a different way, though. If you are trying to answer new questions, “what would happen if we ran our checks on thirty machines at once to really pound the server?” (where we’re focused on stress testing), or “what would happen if we ran our automated checks on this new platform?” (where we’re focused on compatibility testing), or “what would happen if we were to run our automated checks 300 times in a row?” (where we’re focused on flow testing), the category of checks would be leaking into testing (which would be a fine thing in these cases).

There will be much, much more to say about testing vs. confirmation in the days ahead. I can guarantee that people won’t adopt this distinction across the board, nor will they do it overnight. But I encourage you to consider the distinction, and to make it explicit when you can.

Postscript: Over the years, some people have misinterpreted this post as a rejection of checking, or of regression testing, or of testing that is assisted by automation. So in addition to reading this post, it is important that you also read this one.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

116 Responses to “Testing vs. Checking”

  1. [...] Checking Refined by James Bach, which refines his colleague Michael Bolton’s blog post on Testing vs Checking (my previous “best article” before Huib’s one), the Little Black Book on Test [...]

  2. [...] name is “checking”. Michael Bolton wrote a well-known post about this topic (http://www.developsense.com/blog/2009/08/testing-vs-checking/). Checking is confirming that what we believe is actually true. Products are built in accordance [...]

  3. [...] testing, but I think the most appropriate name is “checking”. Michael Bolton wrote a well-known post about this topic. Checking is confirming that what we believe is actually true. Products are built [...]

  4. [...] all “testing vs. checking” debates occurring and reoccurring, one might ask how far my analogy goes. After all, computers can play chess, and [...]

  5. [...] Master Shifu at Moolya, to clarify what is check automation? She referred me this blog on Testing vs. Checking by Michael Bolton, to explore. That perfectly answered my question.  Testing is a sapient [...]

  6. [...] Path” testing, but I think the most appropriate name is “checking”. Michael Bolton wrote a well-known post about this topic. Checking is confirming that what we believe is actually true. Products are built [...]

  7. [...] you’ll quickly see how passionate testers can be with their craft. Here’s a nice quote from Michael Bolton that pretty much sums it [...]

  8. [...] pointed us to Michael Bolton’s blog entry, Testing vs. Checking. Along with James Bach, Michael has nudged the testing world to distinguish between automated [...]

  9. […] the value of testing? The test effort becomes merely a confirmatory process turning a tester into a checker. There is also and ethical dilemma that can occur, if a tester finds something the culture […]

  10. […]  “Testing vs Checking” by Michael Bolton […]

  11. […] Comparable to Michael Bolton´s checking vs. testing […]

  12. […] ????????? (??????? ? ???? ?????????? ???????? ?????????). ???????????? — ??? ???????????? ????, […]

  13. […] I separate Automated Testing from Quality Assurance as I see them as separate functions. Usually they are assigned to the QA team members, but I find it helps to think of them as separate functions. Automated testing includes Unit testing and test scripts where as Quality Assurance is manually looking at the system not just for bugs, but for things like consistent look and feel, performance, User Interface and design issues. (see:Testing vs Checking). […]

  14. […] Bonusartikel: Varför det bör kallas Checking, och inte Testing […]

Leave a Reply