Archive for the ‘Testing vs. Checking’ Category

Testing, Checking, and Convincing the Boss to Explore

Tuesday, November 10th, 2009

How is it useful to make the distinction between testing and checking? One colleague (let’s call him Andrew) recently found it very useful indeed. I’ve been asked not to reveal his real name or his company, but he has very generously permitted me to tell this story.

He works for a large, globally distributed company, which produces goods and services in a sector not always known for its nimbleness. He’s been a test manager with the company for about 10 years. He’s had a number of senior managers who have allowed him and his team to take an exploratory approach, almost a skunkworks inside the larger organization. Rather than depending on process manuals and paperwork, he manages by direct interaction and conversation. He hires bright people, trains them, and grants them a fairly high degree of autonomy, balanced by frequent check-ins.

Recently, on a Thursday the relatively new CEO came to town and held an all-hands meeting for Andrew’s division. Andrew was impressed; the CEO seemed genuinely interested in cutting bureaucracy and making the organization more flexible, adaptable, and responsive to change. After the CEO’s remarks, there was a question-and-answer period. Andrew asked if the company would be doing anything to make testing more effective and more efficient. The CEO seemed curious about that, and jotted down a note on a piece of paper. Andrew was given the mandate of following up with the VP responsible for that area.

Late that afternoon, Andrew called me. We chatted for a while on the phone. He hadn’t read my series on testing vs. checking, but he seemed intrigued. I suggested that he read it, and that we get together and talk about it.

As luck would have it, there was occasion to bring a few more people into the picture. That weekend, we had a timely conversation with Fiona Charles who reminded us to focus the issue of risk. Rob Sabourin, happened to be visiting on Saturday evening, so he, Andrew, and I sat down to compose a letter to the VP. Aside from changing the names that would identify the parties involved, this is an unedited version what we came up with:

[Our Letter]

Dear [Madam VP]…

[Mr. CEO] asked me to send you this email as a follow up to a question that I posed during his recent trip to the [OurTown] office on [SomeDate] on the current state of the testing at [OurCompany] and how our testing effectiveness should be improved.

The [OurTown]-based [OurDivision] test team has been very successful in finding serious issues with our products with a fairly small test team using exploratory test approaches. As an example, a couple of weeks ago one of my testers found a critical error in an emergency fix within his two days of exploratory testing in a load that had passed four person-weeks of regression testing (scripted checking) by another team.

Last week a Project Lead called me and asked if my team could perform a regression sweep on a third party delivery. I replied that we could provide the requested coverage with two person-days of effort without disrupting our other commitments. He seemed surprised and delighted. He had come to us because [OurCompany]‘s typical approach yielded a four-to-six person-week effort which would have caused a delay in the project’s subsequent release.

Our experience using exploratory testing in [OurDivision] has demonstrated improved flexibility and adaptability to respond to rapid changes in priorities.

Testing is not checking. Checking is a process of confirmation, validation, and verification in which we are comparing an output to a predicted and expected result. Testing is something more than that. Testing is a process of exploration, discovery, investigation, learning, and analysis with the goal of gathering information about risks, vulnerabilities, and threats to the value of the product. The current effectiveness of many groups’ automated scripts is quite excellent, yet without supplementing these checks with “brain-engaged” human testing we run the risk of serious problems in the field impacting our customers, and the consequential bad press that follow these critical events.

At [OurCompany] much of our “testing” is focused on checking. This has served us fairly well but there are many important reasons for broadening the focus of our current approach. While checking is very important, it is vulnerable to the “pesticide paradox”. As bacteria develop a resistance to antibiotics, software bugs are capable of avoiding detection by existing tests (checks), whether executed once or repeated over and over. In order to reduce our vulnerability to field issues and critical customer incidents, we must supplement our existing emphasis on scripted tests (both manual and automated) with an active search for new problems and new risks.

There are several strong reasons for integrating exploratory approaches into our current development and testing practices:

  • Scripted tests are perceived to be important for compliance with [regulatory] requirements. They are focused on being repeatable and defensible. Mere compliance is insufficient—we need our products to work.
  • Scripted checks take time and effort to design and prepare, whether they are run by machines or by humans. We should focus on reducing preparation cost wherever possible and reallocating that effort to more valuable pursuits.
  • Scripted checks take far more time and effort to execute when performed by a human than when performed by a machine. For scripted checks, machine execution is recommended over human execution, allowing more time for both human interaction with the product, and consequent observation and evaluation.
  • Exploratory tests take advantage of the human capacity for recognizing new risks and problems.
  • Exploratory testing is highly credible and accountable when done well by trained testers. The findings of exploratory tests are rich, risk-focused, and value-centered, revealing far more knowledge about the system than simple pass/fail results.

The quality of exploratory testing is based upon the skill set and the mindset of the individual tester. Therefore, I recommend that testers and managers across the organization be trained in the structures and disciplines of excellent exploratory testing. As teams become trained, we should systematically introduce exploratory sessions into the existing testing processes, observing and evaluating the results obtained from each approach.

I have been actively involved in improving testing, in general, outside of [OurCompany]. I am on the board of a testing association and I have been attending, organizing and facilitating meetings of testers for many years.

During this time, I have been exposed to much of the latest developments in software testing and I have led the implementation of Session Based Exploratory Testing within my department. In addition, over the past four years, I have been providing instruction in software testing both to the testers within my business unit and to companies outside of [OurCompany].

I look forward to the opportunity to talk with you about this further.

[/Our Letter]

Now, I thought that was pretty strong. But the response was far more gratifying than I expected. Andrew sent the message on Sunday afternoon. The VP responded by 8:45am on Monday morning. Her reply was in my mailbox before 10:00am. The reply read:

[The VP's Reply]

Dear Andrew

Thanks very much for the email. I find this very intriguing! I believe the distinction you make between testing and checking is quite insightful and I would like to connect with you to see how we can build these concepts and techniques into our quality management services as well as my central team verification tests. I will get a call together with [Mr. Bigwig] and [Mr. OtherBigwig] so that we can figure out the best way to incorporate your ideas. Again, many thanks!!

[/The VP's Reply]

A couple of key points:

  • The letter was much stronger thanks to collaboration. Any one of the four of us could have written a good letter; the result was better than any of us could have done on our own.
  • The letter is sticky, in the sense that Chip and Dan Heath talk about in their book Made to Stick: Why Some Ideas Survive and Others Die. It’s not a profound book, but it contains some useful points to ponder. The letter starts with two stories that are simple, unexpected, concrete, credible, and emotional (remember, the product manager was surprised and delighted). Those initials can be rearranged to SUCCES, which is the mnemonic that the Heaths use for successful communication.
  • The testing vs. checking distinction is simpler and memorable than “exploratory approaches” vs. “confirmatory scripted approaches”. The explanation is available (and in most cases necessary), but “testing” and “checking” roll off the tongue quickly after the explanation has been absorbed.
  • We managed to hit some of the most important aspects of good testing: cost vs. value, risk focus, diversification of approaches, flexibility and adaptability, and rapid service to the larger organization.

After reviewing this post, Andrew said, “I like the post a lot. Let’s hope we end up helping a lot of people with it.” Amen. You are, of course, welcome to use this letter as a point of departure for your own letter to the bigwigs. If you’d like help, please feel free to drop me a line.

See more on testing vs. checking.

Context-free Questions For Testing and Checking

Wednesday, September 30th, 2009

After a presentation on exploratory approaches and on testing vs. checking yesterday, a correspondent and old friend writes:

Although the presentation made good arguments for exploratory testing, I am not sure a small QA department can spare the resources unless a majority of regression checking can be moved to automation. Particularly in situations with short QA cycles.

(Notice that he and I are using “testing” and “checking” in this specific way.)

Any time someone makes an observation about what is or isn’t possible, irrespective of the kind of testing (or checking) that they’re doing, it suggests some questions for the testing, programming, and management teams. I’d ask my old friend

1) How much checking do you need to do?

2) What, specifically, suggests that checking needs to be done? What happens when you do it? What doesn’t happen when you do it? What happens when you don’t do it? What doesn’t happen when you don’t do it?

3) What specifically, might suggest that the testers are the best people to do the checking? What, specifically, might suggest that they aren’t the best people to do it?

4) Where do your testers spend their time? When you speak with the people who are actually testing, do they feel the time that they’re spending on checking is worthwhile? Do they have things to say about what slows down testing (or checking)?

5) What are the risks that checking addresses well? What risks are not addressed well by checking?

These are open questions that all teams can ask, regardless of the approach they’re using now. Feel free to replace the word “checking” with “testing”, and vice versa, wherever you like.

I encourage and, when asked, help people to ask and answer these questions, and others like them. I have no specific answers from the outset; I don’t know you, and I don’t know your context. But you do. Maybe the questions can be helpful to you. I hope so.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

Should We Call Test-Driven Development Something Else?

Monday, September 28th, 2009

In the first post in this series, I proposed “that those things that we usually call ‘unit tests‘ be called ‘unit checks‘.” I stand by the proposal, but I should clarify something important about it. See, it’s all a matter of timing. And, of course, sapience.

After James Bach‘s blog post titled “Sapience and Blowing Peoples’ Minds“, Joe Rainsberger commented:

Sadly, the distinction between testing and checking makes describing test-driven development (TDD) somewhat awkward, because it’s a test when I write it and a check after I run it for the first time. Am I test-driving or check-driving?

Joe has put his finger on something that’s important: that in the mangle of practice, things are constantly changing, and so are our perspectives on them.

In The Elements of Testing and Checking, I broke down the process of developing, performing, and analyzing a check. The most important thing to note is that the check (an observation linked to a decision rule) can be performed non-sapiently, but that everything surrounding it—the development and analysis of the checkis sapient, and is testing. Test-driven development is first and foremost development, and development is a sapient process. The interactive process of developing a check and analyzing its outcome is a sapient process; the development cycle includes having an idea, testing it and responding to the information revealed by the test (the whole process), even when the result is supplied by a check (an atomic part of the test process). TDD is an exploratory, heuristic process. You don’t know in advance what your solution is going to look like; you explore the problem space and build your solution iteratively, and you stop when you decide to stop.

Several years ago, James and Jon Bach produced a set of exploratory skills, tactics, and dynamics:

  • Modeling
  • Resourcing
  • Chartering
  • Observing
  • Manipulating
  • Pairing (now called Collaborating)
  • Generation and Elaboration
  • Overproduction and Abandonment
  • Abandonment and Recovery
  • Refocusing (Focusing and Defocusing)
  • Alternating
  • Branching and Backtracking
  • Conjecturing
  • Recording
  • Reporting

(I believe that several other people have made contributions to the original list, including Jonathan Kohl and Mike Kelly. I’d also include tooling—building tools, rather than merely obtaining or resourcing them, and orienteering—figuring out where you are in relation to where you want to be. I think James disagrees. That’s okay; good colleagues do that. The cool thing about such lists is that they can evolve as we think and learn more, and disagreeing helps us to figure out what’s important eventually. Maybe I’ll drop them, or maybe James will adopt them.)

The point is that these exploratory skills, tactics and dynamics apply not only to testing, but to practically any open-ended heuristic process. Note how TDD, done well, incorporates practically all of the stuff from James and Jon’s original list, which was focused on testing.

So the answer to the question in the title of this blog is this: No; there’s no need to rename TDD. It really is test-driven development.

As James replied to Joe,

Strictly speaking you are “doing testing” by “writing checks“, but not actually “writing tests.” If you run the checks unattended and accept the green bar as is, then that is not testing. It requires absolutely no testing skill to do that, just as you wouldn’t say someone is doing programming just because they invoke a compiler and verify that the compile was successful. If the bar is NOT green, the process of investigating is testing, as well as debugging, programming, etc.

If you watch the tests (James means checks here, I think –MB) run or ponder the deeper meaning of the green bar, you are doing testing along with the checking.

Think of “test” as a verb rather than a noun, and it becomes clear that test-driven design is truly test-driven design, although the testing is rather simplistic, based primarily on those little checks. Once the design is done the automated checks become useful as change detectors against the risk of regression. They certainly aid the testing process, despite not being tests.

Checks definitely do NOT drive development. Development is never a rote and non-sapient process. It’s far better to say test-driven, because the design of the checks is a thoughtful process.

So what of the earlier business about calling unit tests “unit checks“?

For me, the distinction lies in the artifact—that xUnit thingy, or that rSpec assertion—and the way that you approach it. A minor gloss on Joe’s comment: the thingy might not be a check after you run it the first time, especially if it doesn’t pass. At that point, it is still very much part of your conscious interaction with the business of creating working code; it’s figure, rather than ground.

After you’ve solved the problem that your unit of code is intended to solve, the thingy’s prominence fades from figure into ground. You’re no longer really paying much attention to it. There’s no design activity going on with respect to it, it gets performed automatically and non-sapiently, and its result gets ignored, especially when the result is positive and aggregated with dozens, hundreds, or thousands of other positive results. At that point, it’s no longer shedding any particular cognitive light on what you’re doing, and its testing power has faded into a single pixel in a pale green glow. It’s now a check, no longer a test but a change detector. In fact, you might think of “check” as an abbreviation for “change detector”. The change from a test to a check is a kind of reverse metamorphosis, as though an intriuging, fluttering butterfly has turned into a not-very-interesting, ponderous little green caterpillar. That’s not to say that it’s not an important part of the Great Chain of Being; just that we tend not to pay much attention to it. However, we might pay more attention to the caterpillar when it’s red.

As I’ve said repeatedly, what you call them is less important than how you think of them. As James says,

I wouldn’t insist that people change their ordinary language. I see no problem calling whales “fish” or spiders “insects” in everyday life. But sometimes it matters to be precise. We should be ABLE to make strict distinctions when trying to help ourselves, or others, master our craft.

At Agile 2009, Joe pointed out that if we can produce more code with fewer errors in it, we can get our products to real testing, and then to market more quickly. And that means that we can get paid sooner. So I agree with Joe here, too:

I have to admit I like the pun of Check-Driven Development, even if it only works in American English.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

A Tester Asks About Checking

Monday, September 21st, 2009

In a previous comment, Sunjeet asks

Does not testing encompass checking? Can testing alone be efficient without doing any checking?

As I hope I made it clear in Elements of Testing and Checking, the development and analysis of checks is surrounded by plenty of testing activity, and testing may include a good deal of checking. Testing, I think, can be vastly more efficient if we consider the ways in which checking can be helpful. Cem Kaner, in his 2004 paper The Ongoing Revolution in Software Testing, said this:

I think the recent advances in unit testing (Astels, 2003; Rainsberger, 2004) have been the most exciting progress that I’ve seen in testing in the last 10 years.

With programmer-created and programmer-maintained change detectors (that is, checks -MB):
• There is a near-zero feedback delay between the discovery of a problem and awareness of the programmer who created the problem
• There is a near-zero communication cost between the discover of the problem and the programmer who will either fix the bug or fix the [check]
• The [check] is tightly tied to the underlying coding problem, making troubleshooting much cheaper and easier than system-level testing.

And I agree.

Should testers shun checking? Why not call checking as “confirmative testing“?

There is a role for testers to program and perform checks where cost is low and value is high, but I think that if practices associated with XP really begin to take hold, in the long run it behooves testers to get out of the checking business. That’s because the vast bulk of the checking work will be done by programmers; because checks at the system level tend to be time-consuming, error-prone, and expensive when performed by humans (and expensive to automate when not performed by humans); because they drive humans to inattentional blindness.

But there’s another reason: “confirmative testing” isn’t really testing; it’s confirming. It’s looking at a white swan and saying, “All swans are white;” at another and saying, “See? All swans are white;” and at yet another and saying “Just like I told you; all swans are white.” There’s an analogy in software, “It works on my machine.” “See? It works on my machine.” “Just like I told you; it works on my machine.” To find problems in a product, which is one of the key goals of testing, we need to get out of the confirmatory mindset.

I confirm that the exact problem is fixed – by exactly executing the steps mentioned in the bug report – by this I confirm that the bug and only that bug is fixed or not. Brainless? Yes… a machine could have done the same… yes BUT is this required …YES we might not be deriving new quality value from it but we are CONFIRMING existing quality info from it…

Let me suggest an alternative way of looking at this.

In an environment where the bug report is vague or your programmers are known or suspected of being unreliable with their bug fixes or you know that the programmer has not created an automated check, then what you’re describing might indeed be a very good idea. (When testing the fix, I might start by trying to reproduce the problem as exactly as I could, but I might also start with a slight variation on the problem to see if the general case has been fixed. Either way, I’ll likely end up performing a check to see if the special case of the problem has been fixed.)

Task 2 i do is …i look out for side effects …regression…new test ideas …execute more tests etc i.e. all the pillars of ET… How is task 2 useful without 1?

If you’re following exactly the steps mentioned in the bug report and the programmer has fixed the problem and the programmer has already set up an automated check for the problem, then what you’re doing is reproducing the programmer’s effort (and the machine’s effort) in performing a check, when it might be much more valuable to use your sapient skills and test. Note that I cannot decide either way. I’m not in your context or your immediate situation. But you can. To me, there’s at least one clear circumstance in which it would be greatly more valuable for you to focus on Task 2: When someone has already done Task 1.

2. Should i tell my lead/manager…buddy I am just a tester find a checker to do this! or get a machine to do this? if a machine needs to do this who is going to code/script a machine to do this …wont that be a tester himself?

In answer to the first question, let me ask this: Are you a tester or a checker? Again, I don’t know. The quality of the questions you’re asking suggest to me that you’re a tester; that is, you’re not accepting what I’m saying blindly, nor are you rejecting it out of hand. You’re thinking critically about the idea, even if I may have blown your mind at first.

Either way, I wouldn’t advise telling your lead or your manager that you’re refusing to do work. But thinking in terms of testing vs. checking, if you so choose, might trigger a productive conversation between you about the relative cost and value of the activities that you (and the programmers) are doing. It might indeed make more sense to get a machine to do the checking work—and for the programmers to insert some change detectors and take greater responsibility for the quality of their code, an idea that was one of the triggers for Extreme Programming and the Agile movement.

As for who does the programming, note the passage from the very posting upon which you commented:

“When someone asks, ‘Can’t we hire pretty much any programmer to write our test automation code?’, we can point out that the quality of checking is conditioned largely by the quality of the testing work that surrounds it, and emphasize that creating excellent checks requires excellent testing skill, in addition to programming skill.”

Thank you for your comments and questions.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

Tests vs. Checks: The Motive for Distinguishing

Friday, September 18th, 2009

The word “criticism” has several meanings and connotations. To criticize, these days, often means to speak reproachfully of someone or something, but criticism isn’t always disparaging. Way, way back when, I studied English literature, and read the work of many critics. Literary critics and film critics aren’t people who merely criticize, as we use the word in common parlance. Instead, the role of the critic is to contextualize—to observe and evaluate things, to shine light on some work so as to help people understand it better.

So when I say that Dale Emery is a critic, that’s a compliment. On the subject of testing vs. checking, Dale recently remarked to me, “I think I understand the distinction. I don’t yet understand what problem you’re trying to solve with your specific choice of terminology. Not the implications, but the problem.” That’s an excellent critical statement, in that Dale is not disparaging, but he’s trying to tell me something that I need to recognize and deal with.

My answer is that sometimes having different vocabulary allows us to recognize a problem and its solution more easily. As Jerry Weinberg says, “A rose by any other name should smell as sweet, yet nobody can seriously doubt that we are often fooled by the names of things.” (An Introduction to General Systems Thinking, p. 74). He also says “If we have limited memories, decomposing a system into noninteracting parts may enable us to predict behavior better than we could without the decomposition. This is the method of science, which would not be necessary were it not for our limited brains.” (ibid, p. 134).

The problem I’m trying to address, then, is that the word test lumps a large number of concepts into a single word, and testing lumps a similarly large number of activities together. As James Bach suggests, compiling is part of the activity of programming, yet we don’t mistake compiling for programming, nor do we mistake the compiler for the programmer.

If we have a conceptual item called a check, or an activity called checking, I contend that we suddenly have a new observational state available to us, and new observations to be made. That can help us to resolve differences in perception or opinion. It can help us to understand the process of testing at a finer level of detail, so that we can make better decisions about strategy and tactics.

In the Agile 2009 session, “Brittle and Slow: Replacing End-To-End Testing“, Arlo Belshee and James Shore took this as a point of departure:

End-to-end tests appear everywhere: test-driven development, story-test-driven development, acceptance testing, functional testing, and system testing. They’re also slow, brittle, and expensive.

This was confusing to me. My colleague Fiona Charles specializes in end-to-end system testing for great big projects. The teams that she leads are fast, compared to others that I’ve seen. Their tests are painstaking and detailed, but they’re flexible and adaptable, not brittle.

During the session, one person (presumably a programmer, but maybe not) said, “Manual testing sucks.” There was a loud murmur of agreement from both the testers and the programmers in the room.

I thought that was strange too. I love manual testing. I like operating the product interactively and making observations and evaluations. I like pretending that I’m a user of the program, with some task to accomplish or some problem to solve. I like looking at the program from a more analytical perspective, too—thinking about how all the components of the product interact with one another, and where the communication between them might be vulnerable if distorted or disturbed or interrupted. I like playing with the data, trying to figure out the pathological cases where the program might hiccupp or die on certain inputs. In my interaction with the program, I discover lots of things that appear to be problems. Upon making such a discovery, I’m compelled to investigate it. As I investigate it, sometimes I find that it’s a real problem, and sometimes I find that it isn’t. In this process, I learn about the system, about the ways in which it can work and the ways in which it might fail. I learn about my preconceptions, which are sometimes right and sometimes wrong. As I test, I recognize new risks, whereupon I realize new test ideas. I act on those test ideas, often right away. (By the way, I’m trying to get out of the habit of calling this stuff manual testing; I learning to call it sapient testing, because it’s primarily the eyes and the brain, not the hands, that are doing the work.) Whatever you call it, manual testing doesn’t suck; it rocks.

So are the programmer in question and all the people who applauded ignorant? That seems unlikely. They’re smart people, and they know tons about software development. Are they wrong? Well, that’s a value judgment, but it would seem to me that as smart people who solve problems for a living, it would be very surprising if they weren’t engaged by exploration and discovery and investigation and learning. So there must be another explanation.

Maybe when they’re talking about manual testing, they’re talking about something else. Maybe they’re talking about behaving like an automaton and precisely following a precisely described set of steps, the last of which is to compare some output of the program to a predicted, expected value. For a thinking human, that process is slow, and it’s tedious, and it doesn’t really engage the brain. And in the end, almost all the time, all we get is exactly what we expected to get in the first place.

So if that’s what they’re talking about, I agree with them. Therefore: if we’re going to understand each other more clearly, it would help to make the distinction between some kinds of manual testing and other kinds. The think that we don’t like, that none of us like apparently, is manual checking.

Maybe Arlo and James were talking about end-to-end system checks being brittle and slow. Maybe it’s integration checks, rather than integration tests, that are a scam, as Joe (J.B.) Rainsberger puts it here, here, here, and here.

So having a handle for a particular concept may make it easier for us to make certain observations and to carry on certain conversations.

  • If we can differentiate between manual testing and manual checking, we might be more specific about what, specifically sucks.
  • If we can comprehend the difference between automated tests and automated checks, we can understand the circumstances in which one might be more valuable than the other.
  • If we tease out the elements of developing, performing, and evaluating a check (as I attempted to do here) we might better see specific opportunities for increasing value or reducing cost.
  • If we can recognize when we’re checking, rather than testing, we can better recognize the opportunity to hand the work over to a machine.
  • If we can recognize that we’re spending inordinate amounts of time and money preparing scripts directing outsourced testers in other countries to check, rather than test, we can recognize a waste of energy, time, money, and human potential, because testers are capable of so much more than merely checking. (We might also detect the odour of immorality in asking people in developing countries to behave like machines, and instead consider giving a modicum of freedom and responsibility to them so that they can learn things about the product—things in which we might be very interested.)
  • If we can recognize that checking alone doesn’t yield new information, we can better recognize the need to de-emphasize checking and emphasize testing when that’s appropriate.
  • If we can recognize when testing is pointing us to areas of the product that appear to be vulnerable to breakage, we might choose to emphasize inserting more and/or better checks, so as to draw our attention to breakage should it occur (“change detectors”, as Cem Kaner calls them).
  • If we can distinguish between testing and checking, we can penetrate “the illusion that software systems are simple enough to define all the checks before any code is written”, as my colleague Ben Simo recently pointed out—never mind all the tests.
  • When someone asks, “Why didn’t testing find that bug when we spent all that money on all those automation tools?”, maybe we can point to the fact that the tools foster checking far more than they foster testing.
  • Maybe we can recognize that checking tends to be helpful in preventing bugs that we can anticipate, but not so helpful at finding problems that we didn’t anticipate. For that we need testing. Or, alas, sometimes, accidental discovery.
  • Maybe we’d be able to recognize that testing (but not checking) can reveal information on novel ways of using the product, information that can add to the perceived value of the product.
  • When someone asks, “Can’t we hire pretty much any programmer to write our test automation code?”, we can point out that the quality of checking is conditioned largely by the quality of the testing work that surrounds it, and emphasize that creating excellent checks requires excellent testing skill, in addition to programming skill.
  • If we’re interested in improving the efficiency and capacity of the test group, we can point out that test automation is far more than just check automation. Test automation is, in James Bach’s way of putting it, any use of tools to support testing. Testing tools help us to generate test data; to probe the internals of an application or an operating system; to produce oracles that use a different algorithm to produce a comparable result; to produce macros that automate a long sequence of actionas in the application so that the tester can be quickly delivered to place to start exploring and testing; to rapidly configure or reconfigure the application; to parse, sort, and search log files; to produce blink oracles for blink testing
  • When a programmer says to a tester, “You should only test this stuff; here are the boundary conditions,” the tester can respond “I will check that stuff, but I’m also going to test for boundary conditions that you might not have been aware of, or that you’ve forgotten to tell me about, and for other possible problems too.”
  • When we see a test group that is entirely focused on confirming that a product conforms to some requirements document, rather than investigating to discover things that might threaten the value of the product to its users, we can point out that they may be checking, but they’re not testing.

Here’s a passage from Jerry Weinberg, one that I find inspiring and absolutely true

“One of the lessons to be learned … is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.” Jerry Weinberg, Computer Programming Fundamentals, 1961.

To me, that’s a magnificent paragraph. But just in case, let’s paraphrase it to make it (to my mind, at least) even clearer:

“One of the lessons to be learned … is that the sheer number of checks performed is of little significance in itself. Too often, the series of checks simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous checks. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.

Thank you to Dale for your critical questions, and to the others who have asked questions about the motivation for making the distinction and hanging a new label on it. I hope this helps. If it doesn’t, please let me know, and we’ll try to work it out. In any case, there will be more to come.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

Testing, Checking, and Changing the Language

Wednesday, September 16th, 2009

In the course of trying to describe distinctions between testing and checking, a number of questions have come up:

  • Do you want to change the language?
  • Won’t saying “check” be confusing?
  • Won’t this undermine our goal of industry-standard terminology?
  • Won’t calling certain kinds of tests “checks” fly in the face of years of documentation and books?
  • Isn’t this yet another case of you wanting testing to be done the same way everywhere?

In addition to the gratifying remarks, there has been a spate of similar questions or comments, in replies to the blog post, on Twitter, and various places around the blogosphere.

Then, this evening, I got this:

I think it’s fine to emphasize the value of exploration vs asserting, However I find your attempt at creating a new dictionary to solve the worlds problems incredibly naive and manipulative.

Words mean what people on average choose them to mean; they don’t even obtain their meaning from mainstream dictionaries let alone ones concocted on a blog.

What was the bloody problem with just using the phrase ‘exploratory testing’? I think what you are trying to do is create a ‘newspeak’ in which naughty thoughts are harder to express.

Apparently I blew Anonymous’ mind.

I’ll get to Anonymous way below, but for everyone else, here are some answers to the questions above [her|his] post. To all, I hope we can continue the conversation and work things out in a way that helps to advance our several crafts.

Do you want to change the language?

That might be cool, but it’s not my goal. I made a proposal. Here’s what the Compact Oxford Dictionary says about “propose”:

• verb 1 put forward (an idea or plan) for consideration by others.

Right: consideration. The idea is on the table, and through conversation with various people, maybe we can sharpen the idea and make it more useful. I’d love it if people started expressing themselves more precisely sometimes—calling “unit tests” “unit checks” when they get that way, for example—but as I said in the first post in the series, that’s not my goal. I said then, “I can guarantee that people won’t adopt this distinction across the board, nor will they do it overnight. But I encourage you to consider the distinction, and to make it explicit when you can.” Got that? I’m encouraging, not commanding. I’d be completely happy if people—testers and programmers—were simply conscious of the distinction, and were able to make more informed choices based on that.

(And more on that in tomorrow’s post.)

For all those who keep asking, I said it again in the comments to the first post: “there’s no authority who can enforce the distinction. The question is whether the distinction works for you, if it’s helpful, if it triggers a different way of thinking. In addition, there’s no particular harm in using “test” and “check” interchangably in common parlance.” I said it again later in the comments: “@Gerard: I’m not really interested in “purifying” the usage, although it was fun to hear so many people picking up on the idea at Agile 2009…Nor is there a right or wrong way to use the words. If people adopted testing vs. checking universally, that would be cool, I guess, but it’s a pretty unrealistic to believe that they would. I’m glad you agree the distinction is useful.”

And I said it again in the comments to the first post: “@Declan: First, as both the original text and the comments outline, I’m not really interested in changing people’s language. Got something that is being run entirely by automation and that receives no human evaluation? Want to call it a test? Go ahead; I wouldn’t want to stop you.”

And I said it again in the comments to the second post: One more thing, though. I don’t want to enforce orthodoxy on speech; that way lies the ISTQB and the SWEBOK and all that. In casual conversation, it makes little difference what word you use. It’s when you want to think critically about what you’re up to that the distinction (and not the label) matters. “I want to check to make sure that these are good tests,” is a fine thing to say, even if what you’re doing is really very testerly, and far more than checkish.

The third post introduced both “test” and “check” with “for the purposes of this discussion”. Plus there’s all the stuff that’s new to this post, so I will now say this: please use whatever words you like. If either one of us is unclear to the other, we’ll talk it over—probably very quickly—and we’ll work it out.

Won’t saying “checks” be confusing?

I don’t think so. I think not having a word for checks is confusing.

Won’t this undermine our goal of industry-standard terminology?

Nah. First of all, that may be your goal, but it’s not ours; leave me out of it. The notion of industry-standard terminology is, from my perspective, silly and misguided; I don’t advocate it, and I often speak out against it. To begin with, is there even a testing industry? There isn’t, any more than there’s a “writing” industry. Testing, like writing, is done in all kinds of different business contexts, organizational models, development paradigms, spoken and written languages, social structures, and for all kinds of purposes within those situations. There’s too much diversity in the world for there to be any “standard” terminology. And that’s a good thing. Diversity is complex and messy, but it also addresses people’s different values, supports different context, and engenders innovation.

One more thing—and here I address the English-speaking people: if there is to be industry-standard terminology, what language should it be in? Why not Mandarin? Why not Spanish? Why not Hindi? If you say that the English words should be interpreted into other languages, how can you be sure that the nuances of your terms show up in the interpretation?

Won’t calling certain kinds of tests “checks” fly in the face of years of documentation and books?

If people were actually to adopt the term “checking”, maybe it would. I’m not holding my breath. But you know, I’m optimistic about the bright people in this business. I think people who are genuine students of their crafts would be able to cope by making an instant mental translation, and those who aren’t are unlikely to read books anyway. The world didn’t come to an end when people started calling functions “messages” or “methods”. Nobody panics when a development project deals with several meanings of “integer” simultaneously. Boris Beizer introduced “The Pesticide Paradox”; how did people react? The way they always do as they make sense of the world: “Oh, that’s what you mean.” “Oh, that’s what you call that around here.” “Really? At KnitWare, we called it this.” “That’s cool; I never thought of it that way.”

Within a project, we sort this kind of stuff out all the time, quickly and efficiently. There’s even a name for what we’re developing: ubiquitous language, by which Eric Evans means a shared language specific to a particular team in the context of a particular project. (Why hasn’t that term been adopted as an industry standard? Answer: even though everyone does it, the standards-bearers won’t brook it, because they refuse to deal with the fact that non-standard behaviour is standard.) There will always be new people arriving on the scene with new words. The same old people will coin neologisms. Other people will get confused by certain unfamiliar terms. New technologies will show up with new labels. We’ll occasionally run into archaic words and technologies. (Only a few years ago, I had to write a little routine to translate between ASCII and EBCDIC. Don’t know what I mean? Look it up.) It’s important to notice these patterns, but sorting them out is not a big deal.

Isn’t this yet another case of you wanting testing to be done your way, all the time, everywhere?

Of course not. I don’t believe that testing should be done the same way everywhere. I do have biases in favour of testing being done as inexpensively, as rapidly, and as thoughtfully as possible; in favour of testing that is oriented towards value for people, rather than technological fetish; in favour of documentation being pared down to the minimum required to completely satisfy the mission and the client; in favour of the elimination of waste at every turn; in favour of a wide diversity of oracles and coverage models; in favour of the empowerment and skill and responsibility of the individual tester; towards principles like those expressed in the Agile Manifesto (although I don’t claim AgilityTM).

Not everyone agrees with those principles and those biases. That’s okay; it takes all kinds of communities to make a world, and (as Cem Kaner says) there’s lots of room for reasonable people to disagree reasonably, since there are so many different forms of experience and context to inform our points of view.

Some people say that we context-driven testing advocates believe that all testing should be context-driven. That’s ridiculous, and we’ve said so (see the Commentary). Reasonable people should be able to recognize that the claim of context-driven imperialism is oxymoronic; to be a context-driven thinker absolutely requires us to identify and acknowledge circumstances where context-driven approaches aren’t a good idea (see Four Attitudes Towards Context, linked here).

As I’ve said numerous times, in various forums (you could look them up), the testing that we perform for a financial institution would be ludicrous overkill for an online dating service; the testing that we do for medical devices would be far too slow and detailed for computer games. And as I’ve said numerous times, the quality of testing is like the quality of anything else: value to some person who matters. Do I disagree with you, or you with me? If you want, you can dispatch the disagreement right away by saying that I don’t matter. On the other hand, if I do matter to you, let’s talk about it and work it out.

If after that, we’re still not in agreement, that’s okay too. At one point in the SHAPE Forum (may it rest in peace), a maintenance programmer bemoaned the irrationality he saw in others’ source code, and ask how he could deal with it. Jerry Weinberg responded that “your first step is to stop thinking of it as ‘irrational’, and to start thinking about it as ‘rational from the perspective of a different set of values’”. Or as philosopher/risk manager/stepbrother Ian Heppell once elegantly put it, “Most arguments seem to be about conclusions, when they’re really about premises.”

Now:

Dear Anonymous,

I’m sorry that I blew your mind.

I think it’s fine to emphasize the value of exploration vs asserting, However I find your attempt at creating a new dictionary to solve the worlds problems incredibly naive and manipulative.

I wonder if you’ve heard about the Association for Software Testing’s dictionary project. You can read about that here and here. My vision for the dictionary is that it be based on the Oxford English Dictionary model, as described in Simon Winchester’s fabulous book, The Meaning of Everything. From the outset, the OED was designed to be descriptive, not prescriptive. The idea was to produce the story of each word, and to track where and when each one had appeared, and to follow its different paths through history, language, and culture.

We don’t intend to solve the world’s problems, nor testing’s. A dictionary won’t do that, but it’s possible that recognition of alternative points of view might be an interesting first step. As Cem Kaner puts it in the linked post, “We are not imposing definitions on the community, we are honoring the fact that different approaches to testing yield different language usage. We are not advocating a best definition, we are advocating a good practice, which is to adopt your client’s vocabulary (at least, to adopt it when talking with that client).”

Oh, and it took 71 years to complete the first version of the OED. The third is scheduled to be completed in 2037. With the idea on the table at the AST for the last three years, and with no editor yet chosen, we’re following the OED development model very well indeed. But I digress.

I recognize that you may have an emotional investment in certain words. If that’s true, that’s okay. I do too, sometimes. But I realize that if I want to keep using a certain word, or a certain pronunciation, or a certain turn of phrase, no one can stop me. And no one can stop you either, Anon. You are an adult, you can choose your own path, and you can only be manipulated if you’re willing to be manipulated. Peace be upon you. Namaste. Shalom. Really.

As I said in the comments to the first post, “there is no the defintion of testing. It is always someone’s defintion of testing—just as there is no property of quality that exists in a product without reference to some person and his or her notion of value.”

Then you said,

Words mean what people on average choose them to mean; they don’t even obtain their meaning from mainstream dictionaries let alone ones concocted on a blog.

Two questions on that.

First, if you’re right about that (and I think you are), how is it that language evolves? I think there are several reasons. Sometimes a new technology appears, and we need a new name for it and the stuff around it. As I tweeted this very evening (note the neologism),“Wife now: “I’m going to CASECamp. I sent stuff out on Crowdvine, but I should have Twittered it.” Sentences unintelligible one year ago.” Sometimes it’s because people notice things that they want names for, and they name them. Sometimes it’s because people want to break something complex into simpler components. Sometimes it’s because people want to lump a bunch of elements into a single concept or model. I’m trying to de-lump a previously lumped part of testing.

Second, if you’re right (and I think you are) that words don’t obtain their meanings from the ones concocted on a blog, then we’re all safe. So what’s the problem?

But there is one thing I’d gently dispute. Words don’t acquire meanings based on averages; they acquire meanings as soon as two or more people are willing to share that meaning. One ex of mine had a word, “dido”; it meant “idiosyncracy or habit of the type exhibited by a cat, or a person acting like one”. She and I also developed “insurance pee”, meaning “an activity in which your kids (or your spouse, or your self) should indulge just before leaving on a long highway drive.” Another ex used “fleffing”, meaning “to dither”.

So far, a handful of people like the word “check” as I’ve described it. Some don’t. Others are willing to consider it. Most people, so it seems, just don’t care. What’s the big deal?

What was the bloody problem with just using the phrase ‘exploratory testing’? I think what you are trying to do is create a ‘newspeak’ in which naughty thoughts are harder to express.

Perhaps you don’t remember the somewhat epic controversy over the term “exploratory testing”. “You mean ‘ad hoc’ testing, right?” “You mean testing without knowing what you’re doing, right?” “You mean testing without documentation, right?” This went on for years. Well, thanks to Cem’s coining of the term (we believe that he was the first, in the first edition of Testing Computer Software), James Bach’s vigourous articulation of the idea, and a bunch of other material written by Jonathan Bach, Elisabeth Hendrickson, Mike Kelly, Jonathan Kohl, James Lyndsay, Brian Marick, Bret Pettichord, Harry Robinson, Rob Sabourin, James Whittaker, and many others, including me, we now have a really substantial body of thought on the subject so that people can discuss it, learn it, teach it, compare it different contexts, figure out how to do it better. “Exploratory testing” even has its own erratic Wikipedia entry—the hallmark of every powerful idea.

Similar controversies have been expressed over “agile” and “extreme programming”. I’ve observed (only half-jokingly) that no one who had ever played rugby would name a development model “Scrum”. But people appropriate words all the time. Nothing new there either. In 1661, Boyle was talking about the “spring of the air”, and Hobbes was launching vitriolic attacks not only on Boyle’s conclusions, but on the whole premise of “experimental philosophy”, another neologism of the age. Controversy is nothing new. The world is a story that we’re all editing as we go.

So thanks for your participation; it drives me to clarify the idea, to make it more useful. Feel free to post a more complete expression of your concerns on your blog, point me to it, and we’ll work it out. Or, as above, you can simply declare that I don’t matter.

Newspeak, in Orwell’s 1984, was not a language designed to suppress naughty thoughts; its ultimate purpose was to make it impossible to express any thoughts at all. Suppression of naughtiness was simply a beneficial (to Big Brother) side effect. My purpose is the opposite of that. My purpose is to foster more thinking, deeper thinking about our fascinating and complex craft. If some consider such thoughts naughty, great. Testing could use some titillation.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

Elements of Testing and Checking

Tuesday, September 15th, 2009

In the last couple of weeks, I’ve been very gratified by the response to the testing-vs.-checking distinction. Thanks to all who have grabbed on to the idea and to those who have questioned it.

There’s a wonderful passage in Chapter 4 of Jerry Weinberg‘s Perfect Software and Other Illusions About Testing in which he breaks down the activities of a programmer engaged in testing activities—testing for discovery, discovering an unexpected problem, pinpointing the problem in the behaviour of the product, locating the problem in the source code, determining the significance of the problem, repairing the problem, troubleshooting, and testing to learn (or hacking, or reverse engineering). He points out that confusion among the differences in these different aspects of testing can lead to conflict, resentment, and failed projects.

I brought up the test-vs.-check idea because, like Jerry, I think that the word “test” lumps a large number of concepts into a single word, and (as any programmer will tell you) not knowing or noticing what’s going on inside an encapsulation can lead to trouble. I wanted to raise the issue that (as Dale Emery has helped me to articulate) excellent testing requires us to generate new knowledge, in addition to whatever confirmations we generate. Moreover, tests that generate new knowledge and tests (or checks) that confirm existing knowledge have different motivations, and therefore have different standards of excellence.

A test is a question (or set of questions) that we want to ask of the program. It might consist of a single idea, or many ideas. Designing a test requires us to model the test space (or consider the scope of the question we want to ask), and to determine the oracles we’ll use, the coverage we hope to obtain, and the test procedures that we intend to follow. These are the elements of test design. Performing the test requires us to configure, operate, observe, and evaluate the system, and then to report on what we’ve done, what we’ve observed, and our evaluation. These are the elements of test execution.

A check is a component of a confirmatory approach to testing. As James Bach and I reckoned, a check itself has three elements:

1) It involves an observation.
2) The observation is linked to a decision rule.
3) Both the observation and the decision rule can be performed without sapience (that is, without a human brain).

Although you can execute a check without sapience, you can’t design, implement, or interpret a check without sapience. What needs to be done to make a check happen and to respond to it?

  • We start the process when we recognize the need for the check. That’s the bit in which we consider some problem to solve or identify some risk, and come up with an observation that we’d like to make. That requires sapience; it’s an act of testing.
  • Once we’ve seen a need for the check, we must translate the check into a question for the agency that’s going to perform it, whether that agency is a human or a machine. That requires us to develop the decision rule, turning the test idea into a question with a binary outcome. That requires sapience too, and thus is a testing activity.
  • When we have a question that expreses our test idea, the next step is to program the check, interpreting the binary question into program code (or into a script for non-sapient human execution), and put it into some transferrable form, such as a source code file or a Word document. Part of this requires sapience (the design and expression of the idea), and part of it doesn’t (the typing). Maybe a machine could do the typing part (say, via voice recognition), but programming isn’t just typing; it’s typing and thinking.
  • When we have a check programmed, the next step it to initiate it, to start it up or kick it off. This too has a sapient and a non-sapient aspect. A machine could start a check automatically, either on a schedule or in response to an event, but someone has to tell the machine about the schedule and the event. So the decision to run a check and when to run it is sapient, but the actual kickoff isn’t; it can be done mechanically.
  • Once the check has been initiated, the agency (machine or human) will execute or run the check, going through a prescribed set of steps from start to end. By definition, that’s definitely machine-doable and non-sapient. Pre-scribed literally means written down beforehand. For a check, the script specifies exactly what the agency must do, exactly what the agency must observe, exactly how the agency must decide the result, and exactly how the agency must report, and the agency does no more and no less than that.
  • Upon completing the prescribed steps, the agency must decide the result of the check. Pass or fail? True or false? Yes or no? By definition, the check must be non-sapient; machine-decidable, whether a human or a machine makes the decision.
  • The agency will typically record the result, based on a program for doing it. Checks performed by a machine might record results in an alert or result pane in an IDE, or in a log file. Checks performed by a human might show up in a pass or fail checkbox in a test management tool, or in a column of a spreadsheet.
  • The agency may report the result, alerting some human that something has happened. The report might be passive—the agency may be programmed to leave a log file in a folder at the end of a check run, say; or it might be more active, taking the form of a green or red bar, or a lava lamp. Depending upon the degree to which his actions have been scripted, a tester may or may not actively or immediately report the result.
  • Someone may interpret the result of check, assigning meaning to it. Okay, so the output says “pass”, or “fail”. What does it mean? What is our oracle—that is, what is the heuristic principle or mechanism by which we might recognize a problem? Is the result what we expected? Problem or no problem? If there’s a problem, is the problem in the check or in the item that we’re testing? Ascribing meaning requires sapience.Note that this step is optional. It’s possible for someone to consider a check “complete” without a human observation of the result. This should trigger a Black Swan alert: failing checks tend to get noticed, and passing checks don’t.
  • Someone may evaluate the check, ascribing significance to the outcome and to the meaning that we’ve reckoned. After the check has passed or failed, and we’ve figured out what it means, we have to decide “big deal” or “not a big deal”; whether we need to do something about it; whether the check and its outcome supply us with sufficient information.This step is optional too. Whether it happens or not, evaluation is definitely a human thing. Machines don’t make value judgments. Another Black Swan alert: if we don’t go through the previous step, interpreting the result of the check, we won’t get to this step either. There’s a risk here: the narcotic comfort of the green bar.
  • Whether we’ve ascribed meaning and significance or not, there is a response. One response is to ignore the result of the check altogether. Another is to pay just enough attention to say that the check has passed, and otherwise ignore interpretation and evaluation. Ignoring the check—oblivion—doesn’t require sapience. However, a person could also choose to ignore the result of the check consciously, which is a sapient act.Alternatively, if the check has passed, are we okay with that? Shall we proceed to something else, or should we program and execute another check? If the check has failed, a typical response is to decide to perform some action. We could perform some further analysis by developing new checks or other forms of testing. We could fix the program, fix the check, or change them to be consistent with one another. We could delete the check, or kill the program. All of these decisions and the subsequent activities require sapience, a human.

In future posts, I’ll be talking about how we can put this miniature task analysis to work for us, which I hope in turn will help us to consider important issues in the quality of our testing.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

Pass vs. Fail vs. Is There a Problem Here?

Friday, September 4th, 2009

A test, for the purposes of this discussion, is at its core a process of exploration. Initially, our community described exploratory testing as “simultaneous test design, test execution, and learning.” Later descriptions included “simultaneous test design, test execution, and learning, with an emphasis on learning“, “a parallel process of test design, test execution, test result interpretation, and learning, with an emphasis on learning”. At the Workshop on Heuristic and Exploratory Techniques in Palm Bay, Florida, 2006, we had a bunch of conversations (one might call them transpection) that ended up with Cem Kaner synthesizing this lengthy but very thorough description.

“Exploratory testing is a style of testing that emphasizes the personal freedom and responsibility of the individual tester to optimize the quality of his or her work by treating test design, test execution, test result interpretation, and learning as mutually supporting activities that continue in parallel throughout the project.”

A little unwieldy for conversation over drinks around the pool, but when I want to be really explicit about exploratory testing, that’s the description I turn to. When I want to be quick, I’ll say “parallel test design, test execution, result interpretation, and learning”.

Why mention this? Because I would argue that, at its core, all testing is exploratory in nature (apparently Boris Beizer would agree, according to an earlier comment), but checking (again, at least for the purposes of this discussion) isn’t. As James pointed out in our conversation, checking does require some element of testing; to create a check requires an act of test design, and to act upon the result of a check requires test result interpretation and learning.

This helps, I think, to distinguish testing and checking. Some person (a programmer, a tester, or anyone else for that matter) is required to create a check. Creating a check is a sapient process; it is an act of test design, and it requires a human to do it.

After someone designs and instantiates a check, someone or something else may perform the check, making an observation and applying a decision rule. If it’s a check, the rule has a binary answer that is closed and machine-decidable. The decision may or may not be made by a machine, but the important thing is that it could be; by definition, a check is a non-sapient process, and a non-sapient process, also by definition, is one that doesn’t require a human.

So the outcome of a check is simply a pass or fail result; the outcome doesn’t require human interpretation. I’ve seen organizations in which some programmer, business analyst, or test designer creates a list of checks for people to execute. The list often takes the form of a suite of assertions in a test tool or a spreadsheet, and there’s a checkbox in a form or a column in the spreadsheet headed “pass or fail”. And I’ve seen this many times, most commonly in certain kinds of outsourced testing. Taken to the extreme (and I have seen it taken to the extreme), it’s an immense waste of human capabilities. Humans are capable of providing vastly more information than a handful of check results can provide. In particular, they can see things around the check that a non-sapient agent would miss.

What happens after a check? If we want to interpret the result—that is, to determine its meaning and significance—a human is required again. That too is an act of testing, but note that it’s possible to perform a check without interpreting the result. This happens in at least two cases: 1) When we give a script to a machine to execute, or 2) when we give a script to a human to execute and when our requirements for the performance of the script emphasize pass-or-fail decisions, rather than the question, “Is there a problem here?”

If there is a problem, it takes sapience to recognize it and sapient acts to address it.

Some organizations emphasize checking vastly more than they emphasize testing. IMVU, at least at the time James Bach and I poked around it, seemed to be that kind of organization (or at least Timothy Fitz’s description of what he found cool about IMVU emphasized that aspect of it). Note that lots of checking is usually a good thing; it’s handy for finding regression problems, for example. Even exclusively checking is not intrinsically a bad thing to do. Right or wrong, good or bad, sufficient or insufficient are value judgements, and they’re up to the person making the decision. From my perspective, though, the problem with a check-focused approach to testing is that it becomes very easy to be satisfied by checks that pass. In the past, I’ve referred to this as the narcotic comfort of the green bar. In other forms, it’s sometimes also called the pesticde paradox, which Beizer named.

For many kinds of checks, a machine might be a much, much better choice than a human. A complex arithmetic calculation is a case in point; give me an algorithm and a machine and a conditional statement for that. Many test ideas can be designed and expressed as a set of machine-decidable checks, and that may be just the ticket for exercising those ideas. High-volume, randomized testing is an exploratory practice enabled by squillions of actions that may include many simple checks; we’re seeking problems that aren’t in our current risk models. Stress testing happens when we throw a squillion actions (which may include many checks) against a system, supposing ahead of time that it will break but not being sure where. Some kinds of performance testing (which may include many checks) may be checking (when we’re seeking to confirm that a system behaves within expected parameters), or it may be testing (when our objective is to vary things and exercise new ideas).

But in other contexts, things get dicier. For example, a set of checks that compare strings of text with a table of known words is faster than having a human read through a long document. Yet if a peace of text or a hole duck you meant whirr interpreted by dicked Asian soft wear, you’d probably want to test it, in addition to checking it, and if you wanted to evaluate the quality of the arguments in the document, checks wouldn’t really do the trick. More formally, checks may be adequate for syntactic evaluation, but semantic evaluation still needs humans to test. Notice that they’re called spelling checkers and grammar checkers, rather than spelling or grammar testers.

A development strategy that emphasizes checking at the expense of testing is one in which we’ll emphasize confirmation of existing knowledge over discovery of new knowledge. That might be okay. A few checks might constitute sufficient testing for some purpose; no testing and no checking at all might even be sufficient for some purposes. But the more we emphasize checking over testing, and the less testing we do generally, the more we leave ourselves vulnerable to the Black Swan.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

Transpection and the Three Elements of Checking

Friday, September 4th, 2009

James Bach and I have a thing that we do called “transpection“. It’s not at all new (you do it, Socrates and his interlocutors did it in Plato’s dialogs, and people did it long before that and have done it ever since) but I think James’ word for it is new. Transpection is an exploratory conversation aimed (or chartered) towards discussing and refining a particular idea. The word is related to inspection (in which one examines and observes something) introspection (in which one examines and observes one’s mental or emotional processes) and retrospection (in which one examines the past). Transpection involves examining or observing an idea with another person.

James and I often disagree on certain ideas, but through transpection we come to sort them out and sharpen them. At first, it seemed that we disagreed on testing and checking, but through a transpection session yesterday, we came to agreement and James provided me with a set of attributes whereby we can test whether something is a test or a check.

A check, then, has three attributes:

1) It requires an observation.

2) The observation is linked to a decision rule.

3) The observation and the rule can be applied without sapience.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds

Testing vs. Checking

Saturday, August 29th, 2009

This posting is an expansion of a lightning talk that I gave at Agile 2009. Many thanks to Arlo Belshee and James Shore for providing the platform. Many thanks also to the programmers and testers at the session for the volume of support that the talk and the idea received. Special thanks to Joe (J.B.) Rainsberger. Spread the meme!

There is confusion in the software development business over a distinction between testing and checking. I will now attempt to make the distinction clearer.

Checking Is Confirmation

Checking is something that we do with the motivation of confirming existing beliefs. Checking is a process of confirmation, verification, and validation. When we already believe something to be true, we verify our belief by checking. We check when we’ve made a change to the code and we want to make sure that everything that worked before still works. When we have an assumption that’s important, we check to make sure the assumption holds. Excellent programmers do a lot of checking as they write and modify their code, creating automated routines that they run frequently to check to make sure that the code hasn’t broken. Checking is focused on making sure that the program doesn’t fail.

Testing Is Exploration and Learning

Testing is something that we do with the motivation of finding new information. Testing is a process of exploration, discovery, investigation, and learning. When we configure, operate, and observe a product with the intention of evaluating it, or with the intention of recognizing a problem that we hadn’t anticipated, we’re testing. We’re testing when we’re trying to find out about the extents and limitations of the product and its design, and when we’re largely driven by questions that haven’t been answered or even asked before. As James Bach and I say in our Rapid Software Testing classes, testing is focused on “learning sufficiently everything that matters about how the program works and about how it might not work.”

Checks Are Machine-Decidable; Tests Require Sapience

A check provides a binary result—true or false, yes or no. Checking is all about asking and answering the question “Does this assertion pass or fail?” Such simple assertions tend to be machine-decidable and are, in and of themselves, value-neutral.

A test has an open-ended result. Testing is about asking and answering the question “Is there a problem here?” That kind of decision requires the application of many human observations combined with many value judgements.

When a check passes, we don’t know whether the program works; we only know that it’s still working within the scope of our expectations. The program might have serious problems, even though the check passes. To paraphrase Dkijstra, “checking can prove the presence of bugs, but not their absence.” Machines can recognize inconsistencies and problems that they have been programmed to recognize, but not new ones. Testing doesn’t tell us whether the program works either—certainty on such questions isn’t available—but testing may provide the basis of a strong inference addressing the question “problem or no problem?”

Testing is, in part, the process of finding out whether our checks have been good enough. When we find a problem through testing, one reasonable response is to write one or more checks to make sure that that particular problem doesn’t crop up again.

Whether we automate the process or not, if we could express our question such that a machine could ask and answer it via an assertion, it’s almost certainly checking. If it requires a human, it’s a sapient process, and is far more likely to be testing. In James Bach‘s seminal blog entry on sapient processes, he says, “My business is software testing. I have heard many people say they are in my business, too. Sometimes, when these people talk about automating tests, I think they probably aren’t in my business, after all. They couldn’t be, because what I think I’m doing is very hard to automate in any meaningful way. So I wonder… what the heck are they automating?” I have an answer: they’re automating checks.

When we talk about “tests” at any level in which we delegate the pass or fail decision to the machine, we’re talking about automated checks. I propose, therefore, that those things that we usually call “unit tests” be called “unit checks“. By the same token, I propose that automated acceptance “tests” (of the kind Ron Jeffries refers to in his blog post on automating story “tests”) become known as automated acceptance checks. These proposals appeared to galvanize a group of skilled programmers and testers in a workshop at Agile 2009, something about which I’ll have more to say in a later blog post.)

Testing Is Not Quality Assurance, But Checking Might Be

You can assure the quality of something over which you have control; that is, you can provide some level of assurance to some degree that it fulfills some requirement, and you can accept responsiblity if it does not fulfill that requirement. If you don’t have authority to change something, you can’t assure its quality, although you can evaluate it and report on what you’ve found. (See pages 6 and 7 of this paper, in which Cem Kaner explains the distinction between testing and quality assurance and cites Johanna Rothman‘s excellent set of questions that help to make the distinction.) Testing is not quality assurance, but acts in service to it; we supply information to programmers and managers who have the authority to make decisions about the project.

Checking, when done by a programmer, is mostly a quality assurance practice. When an programmer writes code, he checks his work. He might do this by running it directly and observing the results, or observing the behaviour of the code under the debugger, but often he writes a set of routines that exercise the code and perform some assertions on it. We call these unit “tests”, but they’re really checks, since the idea is to confirm existing knowledge. In this context, finding new information would be considered a surprise, and typically an unpleasant one. A failing check prompts the programmer to change the code to make it work the way he expects. That’s the quality assurance angle: a programmer helps to assure the quality of his work by checking it.

Testing, the search for new information, is not a quality assurance practice per se. Instead, testing informs quality assurance. Testing, to paraphrase Jerry Weinberg, is gathering information with the intention of informing a decision, or as James Bach says, “questioning a product in order to evaluate it.” Evaluation of a product doesn’t assure its quality, but it can inform decisions that will have an impact on quality. Testing might involve a good deal of checking; I’ll discuss that at more length below.

Checkers Require Specifications; Testers Do Not

A tester, as Jerry Weinberg said, is “someone who knows that things can be different”. As testers, it’s our job to discover information; often that information is in terms of inconsistencies between what people think and what’s true in reality. (Cem Kaner‘s definition of testing covers this nicely: “testing is an empirical, technical investigation of a product, done on behalf of stakeholders, with the intention of revealing quality-related information of the kind that they seek.”)

We often hear old-school “testing” proponents claim that good testing requires specifications that are clear, complete, up-to-date, and unambiguous. (I like to ask these people, “What do you mean by ‘unambiguous’?” They rarely take well to the joke. But I digress.) A tester does not require the certainty of a perfect specification to make useful observations and inferences about the product. Indeed, the tester’s task might be to gather information that exposes weakness or ambiguity in a specification, with the intention of providing information to the people who can clear it up. Part of the tester’s role might be to reveal problems when the plans for the product and the implementation have diverged at some point, even if part of the plan has never been written down. A tester’s task might be to reveal problems that occur when our excellent code calls buggy code in someone else’s library, for which we don’t have a specification. Capable testers can deal easily with such situations.

A person who needs a clear, complete, up-to-date, unambiguous specification to proceed is a checker, not a tester. A person who needs a test script to proceed is a checker, not a tester. A person who does nothing but to compare a program against some reference is a checker, not a tester.

Testing vs. Checking Is A Leaky Abstraction

Joel Spolsky has named a law worthy of the general systems movement, the Law of Leaky Abstractions (“All non-trivial abstractions, to some degree, are leaky.”). In the process of developing a product, we might alternate very rapidly between checking and testing. The distinction between the two lies primarily in our motivations. Let’s look at some examples.

  • A programmer who is writing some new code might be exploring the problem space. In her mind, she has a question about how she should proceed. She writes an assertion—a check. Then she writes some code to make the assertion pass. The assertion doesn’t pass, so she changes the code. The assertion still doesn’t pass. She recognizes that her initial conception of the problem was incomplete, so she changes the assertion, and writes some more code. This time the check passes, indicating that the assertion and the code are in agreement. She has an idea to write another bit of code, and repeats the process of writing a check first, then writing some code to make it pass. She also makes sure that the original check passes. Next, she sees the possibility that the code could fail given a different input. She believes it will succeed, but writes a new check to make sure. It passes. She tries different input. It fails, so she has to investigate the problem. She realizes her mistake, and uses her learning to inform a new check; then she writes functional code to fix the problem and pass the check.So far, her process has been largely exploratory. Even though she’s been using checks to support the process, her focus has been on learning, exploring the problem space, discovering problems in the code, and investigating those problems. In that sense, she’s testing as she’s programming. At the end of this burst of development, she now has some functional code that will go into the product. As a happy side effect, she has another body of code that will help her to check automatically for problems if and when the functional code gets modified.Mark Simpson, a programmer that I spoke to at Agile 2009, said that this cyclic process is like bushwhacking, hacking a new trail through the problem space. There are lots of ways that you could go, and you clear the bush of uncertainty around you in an attempt to get to where you’re going Historically, this process has been called “test-driven development”, which is a little unfortunate in that TDD-style “tests” are actually checks. Yet it would be hard, and even a little unfair, to argue that the overall process is not exploratory to a significant degree. Programmers engaged in TDD have a goal, but the path to the goal is not necessarily clear. If you don’t know exactly where you’re going to end up and exactly how you’re going to get there, you have to do some amount of exploration. The moniker “behavior-driven development” (BDD) helps to clear up the confusion to some degree, but it’s not yet in widespread adoption. BDD uses checks in the form “(The program) should…”, but the development process requires a lot of testing of the ideas as they’re being shaped.
  • Now our programmer looks over her code, and realizes that one of the variables is named in an unclear way, that one line of code would be more readable and maintainable expressed as two, and that a group of three lines could more elegantly and clearly expressed as a for loop. She decides to refactor. She addresses the problems one at a time, running her checks after each change. Her intention in running these checks is not to explore; it’s confirm that nothing’s been messed up. She doesn’t develop new checks; she’s pretty sure the old ones will do. At this point, she’s not really testing the product; she’s checking her work.
  • Much of the traditional “testing” literature suggests that “testing” is a process of validation and verification, as though we already know how the code should work. Although testing does involve some checking, a program that is only checked is likely to be poorly tested. Much of the testing literature focused on correctness—which can be checked—and ignores the sapience that is necessary to inform deeper questions about value, which must be tested. For example, that which is called “boundary testing” is usually boundary checking.The canonical example is that of the program that adds two two-digit integers, where values in the range from -99 to 99 are accepted, and everything else is rejected. The classic advice on how to “test” such a program focuses on boundary conditions, given in a form something like this: “Try -99 and 99 to verify that valid values are accepted, and try -100 and 100 to verify that invalid values are rejected.” I would argue that these “tests” are so weak as to be called checks; they’re frightfully obvious, they’re focused on confirmation, they focus on output rather than outcome, and they could be easily mechanized.If you wanted to test a program like that, you’d configure, operate, observe the product with eyes open to many more risks, including ones that aren’t at the forefront of your consciousness until a problem manifests itself. You’d be prepared to consider anything that might threaten the value of the product—problems related to performance, installability, usability, testability, and many other quality criteria. You’d tend to vary your tests, rather than repeating them. You’d engage curiosity, and perform a smattering of tests unrelated to your current models of risks and threats, with the goal of recognizing unanticipated risks. You might use automation to assist your exploration; perhaps you would use automation to generate data, to track coverage, to parse log files, to probe the registry or the file system for unanticipted effects. Even if you used automation to punch the keys for you, you’d use the automation in an exploratory way; you’d be prepared to change your line of investigation and your tactics when a test reveals surprising information.
  • The exploratory mindset is focused on questions like “What if…?” “I wonder…?” “How does this thing…?” “What happens when I…?” Even though we might be testing a program with a strongly exploratory approach, we will engage a number of confirmatory kinds of ideas. “If I press on that Cancel button, that dialog should go away.” “That field is asking for U.S. ZIP code; the field should accept at least five digits.” “I’ll double-click on ‘foo.doc’, and that file should open in Microsoft Word on this system.” Excellent testers hold these and dozens of other assumptions and assertions as a matter of course. We may not even be conscious of them being checks, but we’re checking sub-consciously as we explore and learn about the program. Should one of these checks fail, we might be prompted to seek new information, or if the behaviour seems reasonable, we might instead change our model of how the program is supposed to work. That’s a heuristic process (a fallible means of solving a problem or making a decision, conducive to learning; we presume that a heuristic usually works but that it might fail).
  • Also at Agile 2009, Chris McMahon gave a presentation called “History of a Large Test Automation Project using Selenium”. He described an approach of using several thousand automated checks (he called them “tests”) to find problems in the application with testing. How to describe the difference? Again, the difference is one of motivation. If you’re running thousands of automated checks with the intention of demonstrating that you’re okay today, just like you were yesterday, you’re checking. You could use those automated checks in a different way, though. If you are trying to answer new questions, “what would happen if we ran our checks on thirty machines at once to really pound the server?” (where we’re focused on stress testing), or “what would happen if we ran our automated checks on this new platform?” (where we’re focused on compatibility testing), or “what would happen if we were to run our automated checks 300 times in a row?” (where we’re focused on flow testing), the category of checks would be leaking into testing (which would be a fine thing in these cases).

There will be much, much more to say about testing vs. confirmation in the days ahead. I can guarantee that people won’t adopt this distinction across the board, nor will they do it overnight. But I encourage you to consider the distinction, and to make it explicit when you can.

See more on testing vs. checking.

Related: James Bach on Sapience and Blowing People’s Minds