The word “criticism” has several meanings and connotations. To criticize, these days, often means to speak reproachfully of someone or something, but criticism isn’t always disparaging. Way, way back when, I studied English literature, and read the work of many critics. Literary critics and film critics aren’t people who merely criticize, as we use the word in common parlance. Instead, the role of the critic is to contextualize—to observe and evaluate things, to shine light on some work so as to help people understand it better.
So when I say that Dale Emery is a critic, that’s a compliment. On the subject of testing vs. checking, Dale recently remarked to me, “I think I understand the distinction. I don’t yet understand what problem you’re trying to solve with your specific choice of terminology. Not the implications, but the problem.” That’s an excellent critical statement, in that Dale is not disparaging, but he’s trying to tell me something that I need to recognize and deal with.
My answer is that sometimes having different vocabulary allows us to recognize a problem and its solution more easily. As Jerry Weinberg says, “A rose by any other name should smell as sweet, yet nobody can seriously doubt that we are often fooled by the names of things.” (An Introduction to General Systems Thinking, p. 74). He also says “If we have limited memories, decomposing a system into noninteracting parts may enable us to predict behavior better than we could without the decomposition. This is the method of science, which would not be necessary were it not for our limited brains.” (ibid, p. 134).
The problem I’m trying to address, then, is that the word test lumps a large number of concepts into a single word, and testing lumps a similarly large number of activities together. As James Bach suggests, compiling is part of the activity of programming, yet we don’t mistake compiling for programming, nor do we mistake the compiler for the programmer.
If we have a conceptual item called a check, or an activity called checking, I contend that we suddenly have a new observational state available to us, and new observations to be made. That can help us to resolve differences in perception or opinion. It can help us to understand the process of testing at a finer level of detail, so that we can make better decisions about strategy and tactics.
In the Agile 2009 session, “Brittle and Slow: Replacing End-To-End Testing“, Arlo Belshee and James Shore took this as a point of departure:
End-to-end tests appear everywhere: test-driven development, story-test-driven development, acceptance testing, functional testing, and system testing. They’re also slow, brittle, and expensive.
This was confusing to me. My colleague Fiona Charles specializes in end-to-end system testing for great big projects. The teams that she leads are fast, compared to others that I’ve seen. Their tests are painstaking and detailed, but they’re flexible and adaptable, not brittle.
During the session, one person (presumably a programmer, but maybe not) said, “Manual testing sucks.” There was a loud murmur of agreement from both the testers and the programmers in the room.
I thought that was strange too. I love manual testing. I like operating the product interactively and making observations and evaluations. I like pretending that I’m a user of the program, with some task to accomplish or some problem to solve. I like looking at the program from a more analytical perspective, too—thinking about how all the components of the product interact with one another, and where the communication between them might be vulnerable if distorted or disturbed or interrupted. I like playing with the data, trying to figure out the pathological cases where the program might hiccupp or die on certain inputs. In my interaction with the program, I discover lots of things that appear to be problems. Upon making such a discovery, I’m compelled to investigate it. As I investigate it, sometimes I find that it’s a real problem, and sometimes I find that it isn’t. In this process, I learn about the system, about the ways in which it can work and the ways in which it might fail. I learn about my preconceptions, which are sometimes right and sometimes wrong. As I test, I recognize new risks, whereupon I realize new test ideas. I act on those test ideas, often right away. (By the way, I’m trying to get out of the habit of calling this stuff manual testing; I learning to call it sapient testing, because it’s primarily the eyes and the brain, not the hands, that are doing the work.) Whatever you call it, manual testing doesn’t suck; it rocks.
So are the programmer in question and all the people who applauded ignorant? That seems unlikely. They’re smart people, and they know tons about software development. Are they wrong? Well, that’s a value judgment, but it would seem to me that as smart people who solve problems for a living, it would be very surprising if they weren’t engaged by exploration and discovery and investigation and learning. So there must be another explanation.
Maybe when they’re talking about manual testing, they’re talking about something else. Maybe they’re talking about behaving like an automaton and precisely following a precisely described set of steps, the last of which is to compare some output of the program to a predicted, expected value. For a thinking human, that process is slow, and it’s tedious, and it doesn’t really engage the brain. And in the end, almost all the time, all we get is exactly what we expected to get in the first place.
So if that’s what they’re talking about, I agree with them. Therefore: if we’re going to understand each other more clearly, it would help to make the distinction between some kinds of manual testing and other kinds. The think that we don’t like, that none of us like apparently, is manual checking.
Maybe Arlo and James were talking about end-to-end system checks being brittle and slow. Maybe it’s integration checks, rather than integration tests, that are a scam, as Joe (J.B.) Rainsberger puts it here, here, here, and here.
So having a handle for a particular concept may make it easier for us to make certain observations and to carry on certain conversations.
- If we can differentiate between manual testing and manual checking, we might be more specific about what, specifically sucks.
- If we can comprehend the difference between automated tests and automated checks, we can understand the circumstances in which one might be more valuable than the other.
- If we tease out the elements of developing, performing, and evaluating a check (as I attempted to do here) we might better see specific opportunities for increasing value or reducing cost.
- If we can recognize when we’re checking, rather than testing, we can better recognize the opportunity to hand the work over to a machine.
- If we can recognize that we’re spending inordinate amounts of time and money preparing scripts directing outsourced testers in other countries to check, rather than test, we can recognize a waste of energy, time, money, and human potential, because testers are capable of so much more than merely checking. (We might also detect the odour of immorality in asking people in developing countries to behave like machines, and instead consider giving a modicum of freedom and responsibility to them so that they can learn things about the product—things in which we might be very interested.)
- If we can recognize that checking alone doesn’t yield new information, we can better recognize the need to de-emphasize checking and emphasize testing when that’s appropriate.
- If we can recognize when testing is pointing us to areas of the product that appear to be vulnerable to breakage, we might choose to emphasize inserting more and/or better checks, so as to draw our attention to breakage should it occur (“change detectors”, as Cem Kaner calls them).
- If we can distinguish between testing and checking, we can penetrate “the illusion that software systems are simple enough to define all the checks before any code is written”, as my colleague Ben Simo recently pointed out—never mind all the tests.
- When someone asks, “Why didn’t testing find that bug when we spent all that money on all those automation tools?”, maybe we can point to the fact that the tools foster checking far more than they foster testing.
- Maybe we can recognize that checking tends to be helpful in preventing bugs that we can anticipate, but not so helpful at finding problems that we didn’t anticipate. For that we need testing. Or, alas, sometimes, accidental discovery.
- Maybe we’d be able to recognize that testing (but not checking) can reveal information on novel ways of using the product, information that can add to the perceived value of the product.
- When someone asks, “Can’t we hire pretty much any programmer to write our test automation code?”, we can point out that the quality of checking is conditioned largely by the quality of the testing work that surrounds it, and emphasize that creating excellent checks requires excellent testing skill, in addition to programming skill.
- If we’re interested in improving the efficiency and capacity of the test group, we can point out that test automation is far more than just check automation. Test automation is, in James Bach’s way of putting it, any use of tools to support testing. Testing tools help us to generate test data; to probe the internals of an application or an operating system; to produce oracles that use a different algorithm to produce a comparable result; to produce macros that automate a long sequence of actionas in the application so that the tester can be quickly delivered to place to start exploring and testing; to rapidly configure or reconfigure the application; to parse, sort, and search log files; to produce blink oracles for blink testing…
- When a programmer says to a tester, “You should only test this stuff; here are the boundary conditions,” the tester can respond “I will check that stuff, but I’m also going to test for boundary conditions that you might not have been aware of, or that you’ve forgotten to tell me about, and for other possible problems too.”
- When we see a test group that is entirely focused on confirming that a product conforms to some requirements document, rather than investigating to discover things that might threaten the value of the product to its users, we can point out that they may be checking, but they’re not testing.
Here’s a passage from Jerry Weinberg, one that I find inspiring and absolutely true
“One of the lessons to be learned … is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.” Jerry Weinberg, Computer Programming Fundamentals, 1961.
To me, that’s a magnificent paragraph. But just in case, let’s paraphrase it to make it (to my mind, at least) even clearer:
“One of the lessons to be learned … is that the sheer number of checks performed is of little significance in itself. Too often, the series of checks simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous checks. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.
Thank you to Dale for your critical questions, and to the others who have asked questions about the motivation for making the distinction and hanging a new label on it. I hope this helps. If it doesn’t, please let me know, and we’ll try to work it out. In any case, there will be more to come.
Want to know more? Learn about Rapid Software Testing classes here.