Tests vs. Checks: The Motive for Distinguishing

The word “criticism” has several meanings and connotations. To criticize, these days, often means to speak reproachfully of someone or something, but criticism isn’t always disparaging. Way, way back when, I studied English literature, and read the work of many critics. Literary critics and film critics aren’t people who merely criticize, as we use the word in common parlance. Instead, the role of the critic is to contextualize—to observe and evaluate things, to shine light on some work so as to help people understand it better.

So when I say that Dale Emery is a critic, that’s a compliment. On the subject of testing vs. checking, Dale recently remarked to me, “I think I understand the distinction. I don’t yet understand what problem you’re trying to solve with your specific choice of terminology. Not the implications, but the problem.” That’s an excellent critical statement, in that Dale is not disparaging, but he’s trying to tell me something that I need to recognize and deal with.

My answer is that sometimes having different vocabulary allows us to recognize a problem and its solution more easily. As Jerry Weinberg says, “A rose by any other name should smell as sweet, yet nobody can seriously doubt that we are often fooled by the names of things.” (An Introduction to General Systems Thinking, p. 74). He also says “If we have limited memories, decomposing a system into noninteracting parts may enable us to predict behavior better than we could without the decomposition. This is the method of science, which would not be necessary were it not for our limited brains.” (ibid, p. 134).

The problem I’m trying to address, then, is that the word test lumps a large number of concepts into a single word, and testing lumps a similarly large number of activities together. As James Bach suggests, compiling is part of the activity of programming, yet we don’t mistake compiling for programming, nor do we mistake the compiler for the programmer.

If we have a conceptual item called a check, or an activity called checking, I contend that we suddenly have a new observational state available to us, and new observations to be made. That can help us to resolve differences in perception or opinion. It can help us to understand the process of testing at a finer level of detail, so that we can make better decisions about strategy and tactics.

In the Agile 2009 session, “Brittle and Slow: Replacing End-To-End Testing“, Arlo Belshee and James Shore took this as a point of departure:

End-to-end tests appear everywhere: test-driven development, story-test-driven development, acceptance testing, functional testing, and system testing. They’re also slow, brittle, and expensive.

This was confusing to me. My colleague Fiona Charles specializes in end-to-end system testing for great big projects. The teams that she leads are fast, compared to others that I’ve seen. Their tests are painstaking and detailed, but they’re flexible and adaptable, not brittle.

During the session, one person (presumably a programmer, but maybe not) said, “Manual testing sucks.” There was a loud murmur of agreement from both the testers and the programmers in the room.

I thought that was strange too. I love manual testing. I like operating the product interactively and making observations and evaluations. I like pretending that I’m a user of the program, with some task to accomplish or some problem to solve. I like looking at the program from a more analytical perspective, too—thinking about how all the components of the product interact with one another, and where the communication between them might be vulnerable if distorted or disturbed or interrupted. I like playing with the data, trying to figure out the pathological cases where the program might hiccupp or die on certain inputs. In my interaction with the program, I discover lots of things that appear to be problems. Upon making such a discovery, I’m compelled to investigate it. As I investigate it, sometimes I find that it’s a real problem, and sometimes I find that it isn’t. In this process, I learn about the system, about the ways in which it can work and the ways in which it might fail. I learn about my preconceptions, which are sometimes right and sometimes wrong. As I test, I recognize new risks, whereupon I realize new test ideas. I act on those test ideas, often right away. (By the way, I’m trying to get out of the habit of calling this stuff manual testing; I learning to call it sapient testing, because it’s primarily the eyes and the brain, not the hands, that are doing the work.) Whatever you call it, manual testing doesn’t suck; it rocks.

So are the programmer in question and all the people who applauded ignorant? That seems unlikely. They’re smart people, and they know tons about software development. Are they wrong? Well, that’s a value judgment, but it would seem to me that as smart people who solve problems for a living, it would be very surprising if they weren’t engaged by exploration and discovery and investigation and learning. So there must be another explanation.

Maybe when they’re talking about manual testing, they’re talking about something else. Maybe they’re talking about behaving like an automaton and precisely following a precisely described set of steps, the last of which is to compare some output of the program to a predicted, expected value. For a thinking human, that process is slow, and it’s tedious, and it doesn’t really engage the brain. And in the end, almost all the time, all we get is exactly what we expected to get in the first place.

So if that’s what they’re talking about, I agree with them. Therefore: if we’re going to understand each other more clearly, it would help to make the distinction between some kinds of manual testing and other kinds. The think that we don’t like, that none of us like apparently, is manual checking.

Maybe Arlo and James were talking about end-to-end system checks being brittle and slow. Maybe it’s integration checks, rather than integration tests, that are a scam, as Joe (J.B.) Rainsberger puts it here, here, here, and here.

So having a handle for a particular concept may make it easier for us to make certain observations and to carry on certain conversations. This leads to a long list:

If we can differentiate between manual testing and manual checking, we might be more specific about what, specifically sucks.
If we can comprehend the difference between automated checks and other forms of tool use, we can understand the circumstances in which one might be more valuable than the other.
If we tease out the elements of developing, performing, and evaluating a check (as I attempted to do here) we might better see specific opportunities for increasing value or reducing cost.
If we can recognize when we’re checking specific facts we can better recognize the opportunity to hand the work over to a machine.
If we can recognize that we’re spending inordinate amounts of time and money preparing scripts directing outsourced testers in other countries to check, without giving them the freedom and responsiblity to perform the rest of the test, we can recognize a waste of energy, time, money, and human potential. Testers are capable of so much more than merely checking. (We might also detect the odour of immorality in asking people in developing countries to behave like machines, and instead consider giving them the opportunity and the mandate to learn things about the product—things in which we might be very interested.)
If we can recognize that checking alone doesn’t yield new information, we can better recognize the need to de-emphasize checking and emphasize testing when that’s appropriate.
If we can recognize when testing is pointing us to areas of the product that appear to be vulnerable to breakage, we might choose to emphasize inserting more and/or better checks, so as to draw our attention to breakage should it occur (“change detectors”, as Cem Kaner calls them).
If we can distinguish between testing and checking, we can penetrate “the illusion that software systems are simple enough to define all the checks before any code is written”, as my colleague Ben Simo recently pointed out—never mind all the tests.
When someone asks, “Why didn’t testing find that bug when we spent all that money on all those automation tools?”, maybe we can point to the fact that the tools foster checking far more than they foster testing.
Maybe we can recognize that checking tends to be helpful in alerting us to bugs that we can anticipate, but not so helpful at finding problems that we didn’t anticipate. For that we need testing. Or, alas, sometimes, accidental discovery.
Maybe we’d be able to recognize that testing (but not checking) can reveal information on novel ways of using the product, information that can add to the perceived value of the product.
When someone asks, “Can’t we hire pretty much any programmer to write our test automation code?”, we can point out that the quality of checking is conditioned largely by the quality of the testing work that surrounds it, and emphasize that creating excellent checks requires excellent testing skill, in addition to programming skill.
If we’re interested in improving the efficiency and capacity of the test group, we can point out that test automation is far more than just check automation. Test automation is, in James Bach’s way of putting it, any use of tools to support testing. Testing tools help us to generate test data; to probe the internals of an application or an operating system; to produce oracles that use a different algorithm to produce a comparable result; to produce macros that automate a long sequence of actionas in the application so that the tester can be quickly delivered to place to start exploring and testing; to rapidly configure or reconfigure the application; to parse, sort, and search log files; to produce blink oracles for blink testing…
When a programmer says to a tester, “You should only test this stuff; here are the boundary conditions,” the tester can respond “I will check that stuff, but I’m also going to test for boundary conditions that you might not have been aware of, or that you’ve forgotten to tell me about, and for other possible problems too.”
When we see a test group that is entirely focused on confirming that a product conforms to some requirements document, rather than investigating to discover things that might threaten the value of the product to its users, we can point out that they may be checking, but they’re not testing.

Here’s a passage from Jerry Weinberg, one that I find inspiring and absolutely true:

One of the lessons to be learned … is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.
Jerry Weinberg, Computer Programming Fundamentals, 1961

To me, that’s a magnificent paragraph. But just in case, let’s paraphrase it to make it (to my mind, at least) even clearer:

“One of the lessons to be learned … is that the sheer number of checks performed is of little significance in itself. Too often, the series of checks simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous checks. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.

Thank you to Dale for your critical questions, and to the others who have asked questions about the motivation for making the distinction and hanging a new label on it. I hope this helps. If it doesn’t, please let me know, and we’ll try to work it out. In any case, there will be more to come.

See more on testing vs. checking.

9 replies to “Tests vs. Checks: The Motive for Distinguishing”

This seems to cry out for a 2×2 matrix. Why not use "scripted testing" vs "sapient testing"?

This is a difficult distinction, Michael.

Most people don't like being told that what they are doing is trivial, especially if (a) it is trivial and (b) they don't want to change what they are doing.

You'll never get this distinction past the ISTQB or the Agile Testing mailing list.

(For clarification: I have a lot of respect for TDD, and teach this as a required course to our undergraduate software engineers. The unit tests developed this way are often referred to as design / development aids and change detectors rather than as traditional tests. The distinction between testing and checking is very different in that context. The distinctions that I think you're making, and that I'm commenting on, are in terms of the larger-scope system testing effort, which is widely and badly done via checking.)

Does not testing encompass checking ?
Can testing alone be efficient without doing any checking ?

A tester does confirmatory tests as part of testing ,so does that mean a tester should not do that ?

Should testers shun checking ?

why not call checking as "confirmative testing" ?

checking might be brainless but i believe it is required and testing builds upon checking and enhances it.

example –
A bug comes to me which is claimed to have been fixed by a dev.

I primarily do 2 tasks –

1. I confirm that the exact problem is fixed – by exaclty executing the steps mentioned in the bug report –
by this I confirm that the bug and only that bug is fixed or not

Brainless ?…yes …. a machine could have done the same …yes
BUT
is this required …YES lovewe might not be deriving new quality value from it but we are CONFIRMING existing quality info from it

2.Task 2 i do is …i look out for side effects …regression…new test ideas …execute more tests etc i.e. all the pillars of ET…

but more questions arise then …

1.How is task 2 useful without 1 ?

2.Should i tell my lead/mamager…buddy I am just a tester find a checker to do this ! or get a machine to do this ?
if a machine needs to do this who i going to code/script a machine to do this …wont that be a tester himself ?

3. If 1 dont do task 1 …there is a possibiliy that I might not arrive at the ideas mentioned in task2 !
Might not get lighting bolts in my head ?

Thanks for your challenging thoughts.
I was trying to find a relevant and simplified analogy in making a distinction between checking and testing in order to better understand your claims.
I've realised that checking and testing are as similar/different as asking closed (yes/no) and open (how/what/why…) questions.
Closed question can help you get information that you seek whereby you need to be careful not to fail by influence of the confirmation bias.
However, open questions can help you get information that you don’t expect and may need.

@Asmir…

Those are wonderful observations. Thank you.

—Michael B.

Thank you all for your comments, and for continuing the discussion.

@jason…

Why not use "scripted testing" vs "sapient testing"?

That's a good question. To me, it's like the exploratory/scripted continuum. A scripted test might consist of nothing but checks, and it may be performed by a machine or by a human. When a machine follows a script, it's guaranteed to do nothing but checking. When a human follows a script, the human has some choices. The script may also guide the human, rather than controlling him; it may afford the opportunity for the human to use some sapience. The tester may choose to be guided entirely by the script (in which case the tester would be doing nothing but checking), or the tester may choose to observe and report on things that are not explicitly specified, in which case the human is mostly checking, but doing some testing as well. What's it like to you?

@Cem…

I'm not telling people that what they're doing is trivial, but I am asking them to consider the cost and value of tests and checks in their context, and I am strongly advocating a balance of focus in their own work. Moreover, I'm asking the people who manage testers (and the people who train them, and the people who write about testing) to consider what they're asking for, and in what measure. The Agile community has strongly emphasized the value of checking at the programmer level, the role of checking in support of refactor, the significance of automated checks at the integration level (and sometimes at the system level). For quite a while, the emphasis on checking appeared to dominate the discussion of testing—the most egregious example of which was Chapters 19 in Testing Extreme Programming. "No manual tests" is, to my mind, in general, extremely bad advice. Yet (again to me, and again in general) "no manual checks" is a reasonable proposal. Why waste human time, effort, and intellect on something that can be done rapidly and accurately by a machine?

"You'll never get this distinction past the ISTQB or the Agile Testing mailing list."

I'm more optimistic than that, at least in the case of the Agile Testing list. (Mind, I'd have to rejoin it if I were to be involved in the discussion. It may be time for that now.) As for the ISTQB… again, I'll cite McLuhan: I don't want them to agree with me, I just want them to think. I believe the distinction could be helpful in allowing us to be more specific about the activities and ideas surrounding our craft. The ISTQB says that it's in favour of that. If people aren't interested in it, they're probably not paying attention to me or anyone else on the issue, so… no loss.

I'll have more to say about TDD in a future post, but for a quick summary: I agree with you. While the activity of TDD is going on, I think it's fair to call those xUnit thingies tests. They become checks when they're no longer at the forefront of our attention; when they become part of the ground, rather than the figure; when they're performed without the sapient activity that surrounds them.

@Sunjeet…

I'll have a reply for you in a new blog post. It's here.

"Literary critics and film critics aren't people who merely criticize"

Actually, literary/film critics don't criticise, they critique.

Careful with the "don't". They critique, or they criticize.

http://www.askoxford.com/concise_oed/criticize?view=uk

—Michael B.

[…] believe that, while there is an element of passive testing (and what i mean here is checking), a tester is more beneficial to a project IF they are being aggressive and proactive and looking […]

9 replies to “Tests vs. Checks: The Motive for Distinguishing”

Leave a Comment Cancel reply