Blog Posts for the ‘Testing and Checking’ Category

The Rapid Software Testing Namespace

Monday, February 2nd, 2015

Just as no one has the right to tell you what language to speak at home, nobody outside of your project has the authority to tell you how to speak inside your project. Every project develops its own namespace, so to speak, and its own formal or informal criteria for naming things inside it.

Rapid Software Testing is, among other things, a project in that sense. For years, James Bach and I have been developing labels for ideas and activities that we talk about in our work and in our classes. While we’re happy to adopt useful ideas and terms from other places, we have the sole authority (for now) to set the vocabulary formally within Rapid Software Testing (RST).

We don’t have the right to impose our vocabulary on anyone else. So what do we do when other people use a word to mean something different from what we mean by the same word?

We invoke “the RST namespace” when we talk about testing and checking, for example, so that we can speak clearly and efficiently about ideas that we bring up in our classes and in the practice of Rapid Software Testing. From time to time, we also try to make it clear why we use words in a specific way.

For example, we make a big deal about testing and checking. We define checking as “the process of making evaluations by applying algorithmic decision rules to specific observations of a product” (and a check is an instance of checking). We define testing as “the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.” (and a test is an instance of testing).

This is in contrast with the ISTQB, which in its Glossary defines “test” as “a set of test cases”—along with “test case” as “a set of input values, execution preconditions, expected results and execution postconditions, developed for a particular objective or test condition, such as to exercise a particular program path or to verify compliance with a specific requirement.”

Interesting, isn’t it: the ISTQB’s definition of test looks a lot like our definition of check. In Rapid Software Testing, we prefer to put learning and experimentation (rather than satisfying requirements and demonstrating fitness for purpose) at the centre of testing. We prefer to think of a test as something that people do as an act of investigation; as a performance, not as an artifact.

Because words convey meaning, we converse (and occasionally argue, and sometimes passionately) the value we see in the words we choose and the ways we think of them. Our goal is to describe things that people haven’t noticed, or to make certain distinctions clear, with the goal of reducing the risk that someone will misunderstand—or miss—something important.

Nonetheless, we freely acknowledge that we have no authority outside of Rapid Software Testing. There’s nothing to stop people from using the words we use in a different way; there are no language police in software development. So we’re also willing to agree to use other people’s labels for things when we’ve had the conversation about what those labels mean, and have come to agreement.

People who tout a “common language” often mean “my common language”, or “my namespace”. They also have the option to certify you as being able to pass a vocabulary test, if anyone thinks that’s important. We don’t.

We think that it’s important for people to notice when words are being used in different ways. We think it’s important for people to become polyglots—and that often means working out which namespace we might be using from one moment to the next.

In our future writing, conversation, classes, and other work, you might wonder what we’re talking about when we refer to “the RST namespace”. This post provides your answer.

Very Short Blog Posts (18): Ask for Testability

Saturday, May 3rd, 2014

Whether you’re working in an Agile environment or not, one of the tester’s most important tasks is to ask and advocate for things that make a product more testable. Where to start? Think about visibility—in its simplest form, log files—and controllability in the form of scriptable application programming interfaces (APIs).

Logs aren’t just for troubleshooting. Comprehensive log files can help to identify the data that was processed and the functions that were covered during testing. Logs can be parsed to gather statistics or processed with visualization tools to reveal interesting patterns of behaviour. Ask for consistent structure, precise time stamps, and configurable levels of logging.

A scriptable API affords the opportunity for testers to drive the program at high speed or high volume, in well-ordered, variable, or randomized sequences. A scripting interface can allow testers to observe the program’s data structures, query its internal states, or adjust its configuration quickly and easily. Use APIs and tools for more than functional checking; use them for sophisticated, automation-assisted exploration. As a bonus, an API can add to the value of your product by making its functions more accessible to your customers.

You can’t depend on getting log files and APIs without asking for them. So, starting with your current sprint, ask early and ask often.

Related posts:

Very Short Blog Posts (17): Regression Obsession

Thursday, April 24th, 2014

Regression testing is focused on the risk that something that used to work in some way no longer works that way. A lot of organizations (Agile ones in particular) seem fascinated by regression testing (or checking) above all other testing activities. It’s a good idea to check for the risk of regression, but it’s also a good idea to test for it. Moreover, it’s a good idea to make sure that, in your testing strategy, a focus on regression problems doesn’t overwhelm a search for problems generally—problems rooted in the innumerable risks that may beset products and projects—that may remain undetected by the current suite of regression checks.

One thing for sure: if your regression checks are detecting a large number of regression problems, there’s likely a significant risk of other problems that those checks aren’t detecting. In that case, a tester’s first responsibility may not be to report any particular problem, but to report a much bigger one: regression-friendly environments ratchet up not only product risk, but also project risk, by giving bugs more time and more opportunity to hide. Lots of regression problems suggest a project is not currently maintaining a sustainable pace.

And after all, if a bug clobbers your customer’s data, is the customer’s first question “Is that a regression bug, or is that a new bug?” And if the answer is “That wasn’t a regression; that was a new bug,” do you expect the customer to feel any better?

Related material:

Regression Testing (a presentation from STAR East 2013)
Questions from Listeners (2a): Handling Regression Testing
Testing Problems Are Test Results
You’ve Got Issues

Harry Collins and The Motive for Distinctions

Monday, March 3rd, 2014

“Computers and their software are two things. As collections of interacting cogs they must be ‘checked’ to make sure there are no missing teeth and the wheels spin together nicely. Machines are also ‘social prostheses’, fitting into social life where a human once fitted. It is a characteristic of medical prostheses, like replacement hearts, that they do not do exactly the same job as the thing they replace; the surrounding body compensates.

“Contemporary computers cannot do just the same thing as humans because they do not fit into society as humans do, so the surrounding society must compensate for the way the computer fails to reproduce what it replaces. This means that a complex judgment is needed to test whether software fits well enough for the surrounding humans to happily ‘repair’ the differences between humans and machines. This is much more than a matter of deciding whether the cogs spin right.”

—Harry Collins

Harry Collins—sociologist of science, author, professor at Cardiff University, a researcher in the fields of the public understanding of science, the nature of expertise, and artificial intelligence—was slated to give a keynote speech at EuroSTAR 2013. Due to illness, he was unable to do so. The quote above is the abstract from the talk that Harry never gave. (The EuroSTAR community was very lucky and grateful to have his colleague, Rob Evans, step in at the last minute with his own terrific presentation.)

Since I was directed to Harry’s work in 2010 (thank you, Simon Schaffer), James Bach and I have been galvanized by it. As we’ve been trying to remind people for years, software testing is a complex, cognitive, social task that requires skill, tacit knowledge, and many kinds of expertise if we want people to do it well. Yet explaining testing is tricky, precisely because so much of what skilled testers do is tacit, and not explicit; learned by practice and by immersion in a culture, not from documents or other artifacts; not only mechanical and algorithmic, but heuristic and social.

Harry helps us by taking a scalpel to concepts and ideas that many people consider obvious or unimportant, and dissecting those ideas to reveal the subtle and crucial details under the surface.

As an example, in Tacit and Explicit Knowledge, he takes the idea of tacit knowledge—formerly, any kind of knowledge that was not told—and divides it into three kinds: relational, the kind of knowledge that resides in an individual human mind, and that in general could be told; somatic, resident in the system of a human body and a human mind; and collective, residing in society and in the ever-changing relationships between people in a culture.

How does that matter? Consider the Google car. On the surface, operating a car looks like a straightforward activity, easily made explicit in terms of the laws of physics and the rules of the road. Look deeper, and you’ll realize that driving is a social activity, and that interaction between drivers, cyclists, and other pedestrians is negotiated in real time, in different ways, all over the world.

So we’ve got Google cars on the road experimentally in California and Washington; how will they do in Beijing, in Bangalore, or in Rome? How will they interact with human drivers in each society? How will they know, as human drivers do, the extent to which it is socially acceptable to bend the rules—and socially unacceptable not to bend them?

In many respects, machinery can do far better than humans in the mechanical aspects of driving. Yet testing the Google car will require far more than unit checks or a Cucumber suite—it will require complex evaluation and judgement by human testers to see whether the machinery—with no awareness or understanding of social interactions, for the foreseeable future—can be accommodated by the surrounding culture.

That will require a shift from the way testing is done at Google according to some popular stories. If you want to find problems that matter to people before inflicting your product on them, you must test—not only the product in isolation, but in its relationships with other people.

In Rapid Software Testing, our goal all the way along has been to probe into the nature of testing and the way we talk about it, with the intention of empowering people to do it well. Part of this task involves taking relational tacit knowledge and making it explicit. Another part involves realizing that certain skills cannot be transferred by books or diagrams or video tutorials, but must be learned through experience and immersion in the task. Rather than hand-waving about “intuition” and “error guessing”, we’d prefer to talk about and study specific, observable, trainable, and manageable skills.

We could talk about “test automation” as though it were a single subject, but it’s more helpful to distinguish the many ways that we could use tools to support and amplify our testing—for checking specific facts or states, for generating data, for visualization, for modeling, for coverage analysis… Instead of talking about “automated testing” as though machines and people were capable of the same things, we’d rather distinguish between checking (something that machines can do, an activity embedded in testing) and testing (which requires humans), so as to make both our checking and our testing more powerful.

The abstract for Prof. Collins’ talk, quoted above, is an astute, concise description of why skilled testing matters. It’s also why the distinction between testing and checking matters, too. For that, we are grateful.

There will be much more to come in these pages relating Harry’s work to our craft of testing; stay tuned. Meanwhile, I give his books my highest recommendation.

Tacit and Explicit Knowledge
Rethinking Expertise (co-authored with Rob Evans)
The Shape of Actions: What Humans and Machines Can Do (co-authored with Martin Kusch)
The Golem: What You Should Know About Science (co-authored with Trevor Pinch)
The Golem at Large: What You Should Know About Technology (co-authored with Trevor Pinch)
Changing Order: Replication and Induction in Scientific Practice
Artificial Experts: Social Knowledge and Intelligent Machines

Very Short Blog Posts (11): Passing Test Cases

Wednesday, January 29th, 2014

Testing is not about making sure that test cases pass. It’s about using any means to find problems that harm or annoy people.

Testing involves far more than checking to see that the program returns a functionally correct result from a calculation.

Testing means putting something to the test, investigating and learning about it through experimentation, interaction, and challenge. Yes, tools may help in important ways, but the point is to discover how the product serves human purposes, and how it might miss the mark.

So a skilled tester does not ask simply “Does this check pass or fail?” Instead, the skilled tester probes the product and asks a much more rich and fundamental question: Is there a problem here?

Counting the Wagons

Monday, December 30th, 2013

A member of Linked In asks if “a test case can have multiple scenarios”. The question and the comments (now unreachable via the original link) reinforce, for me, just how unhelpful the notion of the “test case” is.

Since I was a tiny kid, I’ve watched trains go by—waiting at level crossings, dashing to the window of my Grade Three classroom, or being dragged by my mother’s grandchildren to the balcony of her apartment, perched above a major train line that goes right through the centre of Toronto. I’ve always counted the cars (or wagons, to save us some confusion later on). As a kid, it was fun to see how long the train was (were more than a hundred wagons?!). As a parent, it was a way to get the kids to practice counting while waiting for the train to pass and the crossing gates to lift.

train

Often the wagons are flatbeds, loaded with shipping containers or the trailers from trucks. Others are enclosed, but when I look through the screening, they seem to be carrying other vehicles—automobiles or pickup trucks. Some of the wagons are traditional boxcars. Other wagons are designed to carry liquids or gases, or grain, or gravel. Sometimes I imagine that I could learn something about the economy or the transportation business if I knew what the trains were actually carrying. But in reality, after I’ve counted them, I don’t know anything significant about the contents or their value. I know a number, but I don’t know the story. That’s important when a single car could have explosive implications, as in another memory from my youth.

A test case is like a railway wagon. It’s a container for other things, some of which have important implications and some of which don’t, some of which may be valuable, and some of which may be other containers. Like railway wagons, the contents—the cargo, and not the containers—are the really interesting and important parts. And like railway wagons, you can’t tell much about the contents without more information. Indeed, most of the time, you can’t tell from the outside whether you’re looking at something full, empty, or in between; something valuable or nothing at all; something ordinary and mundane, or something complex, expensive, or explosive. You can surely count the wagons—a kid can do that—but what do you know about the train and what it’s carrying?

To me, a test case is “a question that someone would like to ask (and presumably answer) about a program”. There’s nothing wrong with using “test case” as shorthand for the expression in quotes. We risk trouble, though, when we start to forget some important things.

  • Apparently simple questions may contain or infer multiple, complex, context-dependent questions.
  • Questions may have more outcomes than binary, yes-or-no, pass-or-fail, green-or-red answers. Simple questions can lead to complex answers with complex implications—not just a bit, but a story.
  • Both questions and answers can have multiple interpretations.
  • Different people will value different questions and answers in different ways.
  • For any given question, there may be many different ways to obtain an answer.
  • Answers can have multiple nuances and explanations.
  • Given a set of possible answers, many people will choose to provide a pleasant answer over an unpleasant one, especially when someone is under pressure.
  • The number of questions (or answers) we have tells us nothing about their relevance or value.
  • Most importantly: excellent testing of a product means asking questions that prompt discovery, rather than answering questions that confirm what we believe or hope.

Testing is an investigation in which we learn about the product we’ve got, so that our clients can make decisions about whether it’s the product they want. Other investigative disciplines don’t model things in terms of “cases”. Newspaper reporters don’t frame their questions in terms of “story cases”. Historians don’t write “history cases”. Even the most reductionist scientists talk about experiments, not “experiment cases”.

Why the fascination with modeling testing in terms of test cases? I suspect it’s because people have a hard time describing testing work qualitatively, as the complex cognitive activity that it is. These are often people whose minds are blown when we try to establish a distinction between testing and checking. Treating testing in terms of test cases, piecework, units of production, simplifies things for those who are disinclined to confront the complexity, and who prefer to think of testing as checking at the end of an assembly line, rather than as an ongoing, adaptive investigation. Test cases are easy to count, which in turn makes it easy to express testing work in a quantitative way. But as with trains, fixating on the containers doesn’t tell you anything about what’s in them, or about anything else that might be going on.


As an alternative to thinking in terms of test cases, try thinking in terms of coverage. Here are links to some further reading:

  • Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.
  • Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.
  • A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.
  • What Counts“, an article that I wrote for Better Software magazine, on problems with counting things.
  • Braiding the Stories” and “Delivering the News“, two blog posts on describing testing qualitatively.
  • My colleague James Bach has a presentation on the case against test cases.
  • Apropos of the reference to “scenarios” in the original thread, Cem Kaner has at least two valuable discussions of scenario testing, as tutorial notes and as an article.

Interview and Interrogation

Friday, September 27th, 2013

In response to my post from a couple of days ago, Gus kindly provides a comment, and I think a discussion of it is worth a blog post on its own.

Michael, I appreciate what you are trying to say but the simile doesn’t really work 100% for me, let me try to explain.

The simile has prompted you to think and to question, so in that sense, it works 100% for me. Triggering thought is, after all, is why people use similes. (See also Surfaces and Essences: Analogy as the Fuel and Fire of Thinking.)

I would apply lean principles and cut some waste from your interview process. I will fail the candidate as soon as she gives me the first wrong answer.

I have 5 questions and all have to be answered correctly to hire the person for a junior position (release 1).

Interview candidate A:
Ask question 1 OK
Ask question 2 FAIL
Send candidate A home

Second interview to candidate A:
Ask question 1 OK
Ask question 2 OK
Ask question 3 FAIL
Send candidate A home

Third interview to candidate A:
Ask question 1 OK
Ask question 2 OK
Ask question 3 OK
Ask question 4 OK
Ask question 5 OK

Hire candidate A

All right. You seem to have left out something important in your process here, which I would apply after each step—indeed, after each question and answer: make a conscious decision about your next step. To me, that requires continuous review of your list of questions for relevance, significance, sufficiency, and information value. Interviewing is an exploratory process. A skilled interviewer will respond, in the moment, to what the candidate has said. A skilled interviewer will think less in terms of “pass or fail”, and more in terms of “What am I learning about this candidate? What does the last answer suggest I should ask next? What other information, exclusive of the answer, might I apply to my decision-making process? What else should I be looking for?” When the candidate gets the answer wrong, the skilled interviewer will ask “Was it really wrong? Maybe there are multiple right answers to the same question. Maybe she didn’t understand the question because I asked in in an ambiguous way, and she gave a right answer to an alternative interpretation. Maybe her answer was a question for me, intended to clarify my question.”

I can’t emphasize this enough: like interviewing, testing is about far more than pass or fail. Testing is about exploration, discovery, investigation, and learning, with the goal of imparting what we’ve learned to people who matter. Testing is about trying to understand the product that we’ve got, with the goal of revealing information that helps our clients decide if it’s the product they want. Testing is usually (but not always) focused on finding evident problems, apparent problems and potential problems, not only in our products, but in our ideas about our products. Testing is also about finding problems in our testing, and every one of the “fail” moments above is a point at which I would want to consider a problem with the test. (The “pass” moments are like that too, if I really want to do a great job.)

At this point when candidate A will want to be promoted to a senior position (translate with next release of the software) I will prepare other 5 different questions probing against the new skills and responsibilities and as I have automated the first 5 questions I can send her a link to a web site where she will have to prove that she hasn’t forgotten about the first 5 before she can be even considered for the new position.

I’d do things slightly differently.

First I would ask “What would prompt me to ask the same questions again? Are those still the most important questions I could ask as she’s heading for her new role? What reason do I have to believe that she might have lost some capability she previously had? Are there other questions related to her old role—not necessarily to her new one—that I should ask that might be more revealing or more significant?” Note that there might be entirely legitimate reasons to believe that she might have backslid somehow—but at that point, I’d also want to ask “What are the conditions that would have allowed her to backslide without me noticing it—and what could I do to minimize those kinds of conditions?”

Then there would be another question I’d ask: “What if she has learned to answer a specific question properly, but is not adaptable to the general case? Should I be asking the same question in a different way, to see if she gives the same answer? Should I be asking a similar question that has a different answer, and see if she notices and handles the difference?”

Now: it might be costly to vary my questions, so I might simply shrug and decide just to go with the ones I’d asked before. But the point of evaluating my process is to ask, “How might I be fooling myself in the belief that I still know this person well?”

Assumes she answers correctly the 5 automated questions, at this point I will do the interview for the senior role.

Interview candidate A for senior role:
Ask question 6 OK
Ask question 7 FAIL
Send candidate A home

and so on.

I don’t see a problem with this process as long as I am allowed using everything I learn from the feedback with the candidate up to question “N” to adapt and change all the questions greater than “N”

Up until this point, you haven’t mentioned that, and your description of your process doesn’t make that at all clear. You’ve only mentioned the “pass” and “fail” parts of “everything I learn from the feedback”. Now, you might be taking that into account in your head, but notice how your description, your process model, doesn’t reflect that—so it becomes easy to misinterpret what you actually do. In addition, you’ve focused on adapting and changing all the questions greater than N—but I’d be interested in the possibility of adapting and changing all the questions less than or equal to N, too.

More importantly: qualifying someone for an important job is not about making sure that they can get the right answers on a canned test, just as testing a product is not about making sure that the functions produce expected results for some number of test cases. The specific answers might have some significance, but if I’m serious about hiring the right people for the job, I don’t want to make my decisions solely by putting them in front of a terminal, having them fill out an online form, and checking their answers. I want to evaluate them from a number of criteria: do they respond quickly, in a polite and friendly way? Do they work well with others? Are they appropriately discrete? Are they adaptable? Can they deal with heavy workloads? Do they learn quickly? In order to learn those things, I need to do more than ask pass-or-fail questions. I need to have unscripted, spontaneous, and free-flowing conversation with them; interview and interaction, and not just interrogation. You see?

Interview Questions

Wednesday, September 25th, 2013

Imagine that you are working for me, and that I want your help in qualifying and hiring new staff. I start by giving you my idea of how to interview a candidate for a job.

“Prepare a set of questions with specific, predetermined answers. Asking questions is expensive, so make sure to come up with as few questions as you can. Ask the candidate those questions, and only those questions. (It’s okay if someone else does the asking; anybody should be able to do that.) Check the candidate’s answers against what you expected. If he gives the answers that you expected, you can tell me that he’s good enough to hire. If he doesn’t, send him away. When he comes back, ask him the original questions. Keep asking those questions over and over, and when he starts giving the right answers consistently, then we’ll hire him.”

Now, a few questions for you.

1) Would you think me a capable manager? Why or why not?
2) What might you advise me about the assumptions and risks in my approach towards interviewing and qualifying a candidate?
3) What happens in your mind when you replace “interviewing a candidate” with “testing a product or service”, “questions” with “test cases”, “asking” with “testing”, “answers” with “results”, “hire” with “release”? Having done that, what problems do you see in the scenario above?
4) How do you do testing in your organization?

Very Short Blog Posts (4): Leaves and Trees

Tuesday, September 24th, 2013

Having trouble understanding why James Bach and I think it’s important to distinguish between checking and testing? Consider this: a pile of leaves is not a tree. Leaves are important parts of trees, but there’s a lot more to a tree than just its leaves. The leaves owe their existence to being part of a larger system of the tree. Nature makes sure that leaves drop off and are replaced periodically, especially in environments that undergo significant changes from time to time. And if you asked someone to describe his tree, you’d probably—and properly—think him strange if he pointed to a pile of leaves and said, “this is my tree”.

Versus != Opposite

Sunday, March 31st, 2013

Dale Emery, a colleague for whom we have great respect, submitted a comment on my last blog post, which in turn referred to Testing and Checking Refined on James Bach‘s blog. Dale says:

I don’t see the link between your goals and your solution. Your solution seems to be (a) distinguishing what you call checking from what you call testing, (b) using the terms “checking” and “testing” to express the distinction, and (c) promoting both the distinction and the terminology. So, three elements: distinction, terminology, promotion.

How do these:

  • deepen understanding of the craft? (Also: Which craft?)
  • emphasize that tools and skilled use are essential?
  • illustrate the risks of asking humans to behave like machines?

I can see how your definitions contribute to the other goal you stated: to show that checking is deeply embedded in testing. And your recent refinements contribute better than your earlier definitions did.

But then there’s “versus,” which I think that bumps smack into this goal. And not only the explicit use of “versus”; also the “versus” implied by repeatedly insisting that “That’s not testing, that’s checking!”

Also, I think your choice of terminology bumps up against this “deeply embedded” goal. Notice, that you often express distinctions by adding modifiers. In James’s post: Checking, human checking, machine checking, human/machine checking. The terms with modifiers are clearly related (and likely a subset) of the unmodified term.

Your use of a distinct word (“checking”) rather than a modified term (e.g. “mechanizable testing” or “scripted testing” or similar), have a natural effect of hinting at a relationship other than “this is a kind of that.” I read your choice of terminology (and what I interpret as insistence on the terminology) as expressing a more distant relationship than “deeply embedded in.”

James and I composed this reply together:

Our goal here is to improve the crafts of software testing, software engineering, and software project management. We use several tactics in our attempt to achieve that goal.

One tactic is to install linguistic guardrails to help prevent people from casually driving off a certain semantic cliff. At the bottom of that cliff is a gaggle of confused testers, programmers, and managers who are systematically—due to their confusion and not to any evil intent—releasing software that has been negligently tested.

This approach is less likely than they would wish to reveal important things that they would want to know about the software. You might believe that “negligently tested” is a strong way of putting it. We agree. To the extent that this unawareness brings harm to themselves or others, the software has been negligently tested. For virtual world chat programs on the Web, that negligence might be no big deal (or at least, no big deal until they store your credit card information). However, we have experience working with people in financial domains, retail, medical devices, and educational software who are similarly confused on this issue specifically: there’s more to testing a product than checking it.

Our tactic, we believe, deepens the understanding of the craft of test quite literally: where there were no distinctions and people talked at cross-purposes, we install distinctions so that we can more easily detect when we are not talking about the same things. This adds an explicit dimension there there had been just a tacit and half-glimpsed one. That is exactly what it means to deepen understanding. In Chapter 4 of Perfect Software and Other Illusions about Testing, Jerry Weinberg performed a similar task, de-lumping (that’s his term) “testing”. There, he calls out components of testing and some related activities that are not, strictly speaking, testing at all: “testing for discovery”, “pinpointing”, “locating”, “determining significance”, “repairing”, “troubleshooting”, “testing to learn”, “task-switching”. We’re working along similar lines here.

Our tactic, we believe, helps to emphasize that tools and skilled use of tools are essential by creating explicit categories for processes amenable to tooling and processes not so amenable. These categories then become roosting places for our thoughts and our conversations about how tools relate to our work. At best, understanding is elusive and communication is difficult. Without words to mark them, understanding and communication are even more difficult. That is not necessarily a problem in everyday life. As testers we work in a turbulent world of business, technology, and ideas. Problems in products (bugs) and in projects (issues) emerge from misunderstanding. The essence of a testing work is to clear misunderstandings over differences between what people want and what they say they want; what people produced and what they say they produced; what they did and what they say they did. People often tell us that they’ve tested a product. It often turns out that people mean that they’ve checked the functions in a product. We want to know what else they’ve done to test it. We need words, we claim, to mark those distinctions.

We’re aware that other people have come up with labels for what we might call “checks”; for example, Mike Hill speaks of “microtests” in a similar way, and others have picked up on that, presenting arguments on similar lines to ours. That’s cool. In the post on James’ blog, we make it explicit that we use this terminology in the domain we control—the Rapid Software Testing class and our writings—and we suggest that it might be useful for others. Some people borrow bits of Rapid Software Testing for their own work; some plagiarize. We encourage the former, and ask the latter to give attribution to their sources. But in the end, as we’ve said all along, it’s the ideas that matter, and it’s up to people to use the language they want. To us, it’s not a terrible thing to say “simplistic testing” any more than it would be a terrible thing to call a compiler an automatic programmer, but we think “compiler” works better.

We visit many projects and companies, including a lot of Agile projects, and we routinely find that talk of checking has drowned out talk of testing—except that people call it testing so nobody even notices how skewed their focus has become. Testers become increasingly selected for their enthusiasm as quasi-programmers and check-jockeys. Who studies testing then? What do testers on Agile projects normally talk about at their conferences or on the Web? Tools. Tools. And tools—most of which focus on checking, and the design of checkable requirements. This is not in itself a bad thing. We’re concerned by the absence of serious discussion of testing, critical investigation of the product. Sometimes there is an off-handed reference to exploratory testing, based on naïve or misbegotten ideas about it. Here’s a paradigmatic example, from only yesterday as we write: http://www.scrumalliance.org/articles/511-agile-methodology-is-not-all-about-exploratory-testing.

The fellow who wrote that article speaks of “validation criteria”, “building confidence” (Lord help us, at one point he says “guarantees confidence”), “defined expected results”. That is, he’s talking about checking.

Checking is deeply embedded in testing. It is also distinct from testing. That is not a contradiction. Distinction is not necessarily disjunction; “or” in common parlance is not necessarily “xor”. Our use of “versus” is exactly how we English speakers make sharp distinctions even among things that are strongly related, even when one is embedded in the other (the forest vs. the trees, playing hockey vs. skating). Consider people who believe they can eat nothing but bread and meat, as long as they gobble a daily handful of vitamin pills. We think it would be perfectly legitimate to say “That’s not nutrition, that’s vitamin supplements.” Yes, vitamins are part of nutrition. But they are not nutrition. It’s reasonable, we would argue, to talk about “nutrition versus vitamins” in that conversation.

For instance we could say “mind vs. body.” Mind is obviously embedded in body. Deeply embedded. But don’t you agree that mind is quite a different sort of thing than body? Do you feel that some sort of violence is being done with that distinction? Perhaps some people do think so, but the distinction is undeniably a popular and helpful one, and it has been to a great many thinkers over hundreds of years. Some people focus on their minds and neglect their bodies. Others focus on their bodies and neglect their minds. At least we have these categories so that we can have a conversation about them.

When Pierre Janet first distinguished between conscious and sub-conscious thought, that also was not an easy distinction. Today it is a commonplace. Everyone, even those who never took a class in psychology, is aware of the concept of the sub-conscious, and that not everything we do is driven by purely conscious forces. We believe our distinction between testing and checking could have a similar impact and similar effect—in time.

Meanwhile, Dale, we know you and we respect you. Please help us resolve our confusion: what’s YOUR goal? In your world, are there testers? Do the ambitious testers in your world diligently study testing? Or would you say that they study programming and how to use tools? How do you cope with that? Do you feel that your goal is served best by treating testing and whatever people do with tools to verify certain facts about a product as just one kind of activity? Would you suggest breaking it down a different way than we do? If so, how?