Blog Posts from May, 2010

Heuristics and Leadership

Friday, May 28th, 2010

In a recent blog post, James Bach discusses the essence of heuristics. A heuristic is a fallible method for solving a problem or making a decision. When used as an adjective, “heuristic” means fallible and conducive to learning. James ends the post by introducing a number of questions in order to test whether someone is teaching you a heuristic effectively. Meeta Prakash, in the comments, remarks “Your questions sound so much like my idea and interpretation of a ‘leader’.”

Yes, Meeta, the asking of those questions sounds like leadership to me too. What is leadership? Leadership is “creating an environment in which everyone is empowered” (that’s Jerry Weinberg’s definition). We believe in that. We believe that excellent testing starts with the skill set and the mindset of the individual tester. Other things might help, but excellence in testing must centre on the tester.

Discussion of the Method is Billy Vaughan Koen’s book on engineering. He describes the engineering method as “the use of heuristics to cause the best change in a poorly understood situation within the available resources”. He notes that the individual engineer, the engineering organization in which he works, and the engineering discipline overall each has a state of the art, which he abbreviates as “sota”. These sotas overlap one another in some places and cover new ground elsewhere. Each leads and lags the others in certain areas. The overall discipline has a sota that is more advanced than that of the organization and the individual in some places. The organization’s sota is aware of things that neither the individual nor the discipline has yet recognized. Each individual’s sota contains some knowledge that is unknown to both the organization and the discipline.

Each sota may advance before that of the other two, and the advances are ongoing. Each sota evolves in a different context. For each stage of evolution, we can’t be certain about what factors might be relevant to success or failure. Thus no practice nor method nor approach can deemed to be best. We don’t know, can’t know, whether some method is best; we don’t know, can’t know, whether someone might invent a better method. We don’t know, can’t know, whether someone has already invented a better method. What we can do, however, is to create environments in which everyone is empowered to discover and apply new heuristics along with those that we already know. That’s important because the discipline and the organization never advance on their own. They advance when someone has the inspiration, the initiative, the courage, and the opportunity to try something new, uncertain as to whether the new approach will work.

Organizations and individuals can foster initiative by empowering people (or at the very least, by leaving them alone). The overall discipline tends not to encourage initiative much, so it seems. With prescriptive, restrictive standards, disciplines often discourage innovation. Why would disciplines do this? One reason is that disciplines are typically led by experts who, as McLuhan said, are heavily invested in their own expertise, and therefore resist change that would threaten that investment. Propaganda and the threat of unemployment are among the crude tools that experts wield.  Want evidence for that in testing? You need look no further than the “experts”, the certifiers, and the standards enthusiasts in our craft; their narrow and flawed models of evidence and measurement; their intolerance of uncertainty; and their resistance to acknowledging the exploratory mindset. Yet without exploration, progress in any domain ceases while the rest of the world rushes past.

“All is heuristic,” declares Koen. That’s an absolute statement which appears to declare that there are no absolutes. But rather than denying the paradox, Koen embraces it. All is heuristic, he says, including the heuristic that all is heuristic. The notion that all is heuristic is pretty robust. Koen points out that even algorithms have contexts in which they work and contexts in which they fail, and that algorithms must be chosen by people applying judgement and skill in uncertain conditions. Yet he leaves open the possibility that someone, somewhere, some day, might discover an infallible method for solving a problem.

James, our colleagues, and I deal with a similar paradox when we contend that all testing is heuristic. There’s no test for infallibility! It might be that there are some occasions when there are infallible methods for testing. It’s just that we’ve never seen one, and that we can’t currently imagine a case in which a process or a standard could—without the application of judgement or skill—be guaranteed to solve some testing problem. That’s because skilled testers (let’s call them “we”, without claiming expertise, but asserting that we are students of the craft) have specific advantages over process models, methods, and tools (“they”):

  • We have situational awareness (they don’t).
  • We work from the assumption that we’re fallible (they don’t).
  • We have the capacity to make judgements on questions of cost vs. value (they don’t).
  • We have the ability to apply a stopping heuristic at any time (they don’t).
  • We have the intelligence to choose which heuristics are applicable and which are not (they don’t).
  • We have the opportunity to consult, in real time, with our clients (they don’t).
  • We have the inventiveness to work around a problem (they don’t).
  • We have the sensory appartus to determine when a heuristic is failing (they don’t).
  • We have the humanity to notice something unexpected that might be a problem for people (they don’t).
  • We have the capacity to learn (they don’t).

And that’s only a partial list. A couple of notes: every one of these capabilities is itself heuristic; and it should be clear that I’m not talking only about skilled testers, but about any skilled discipline.

It’s not that bodies of knowledge or process models or standards never have anything interesting to say (although, in my experience, most of them do tend to be written in a style that induces immediate and profound slumber). It’s that none of these tools have any intrinsic relevance or value unless and until they are applied by people. If the tools are to be applied effectively, we need to recognize that they are all heuristic. We must test these heurstics and their validity with questions like the ones James poses.  We must also recognize the people applying them must have the skills to know how, when, and when not to apply them. To develop those heuristics and those skills, we need to create an environment in which everyone is empowered. That is what we believe.

Transpection Transpected

Tuesday, May 25th, 2010

Part of the joy of producing this blog is in seeing what happens when other people pick up the ideas and run with them.  That happened when I posted a scenario on management mistakes a few weeks ago, and Markus Gärtner responded with far more energy and thought than I would have expected. Thanks, Markus.

Last week I posted a transcript of a transpection session between me and James Bach.  The responses and the comments were very gratifying, but Oliver Vilson’s comment has sparked a discussion of its own. Oliver says,

I would have to say it is not only possible to test the clock-in-the-box but actually necessary.

I see it as an exercise when you have to test part of a system which you have no control over.

For example I’ve had problems with integration to the third party systems that gave absolute nonsense errors about things nobody could think of at that time and it messed up the correct behavior of the primary system pretty badly. We could do nothing but to observe what happened. Almost no possible way to change input data by end user. It either happened or not. But it ended up as very useful experience about testing.

I discussed your exercise with my colleague Rasmus and we found at least few ways to test it without giving it direct input itself

1) Expectations – for example: What format does it show time? Is it understandable?
2) End-values – turnover of seconds/minutes/hours where, for example, 59 -> 00
3) Load testing – how much does it starts to lie in 10 seconds, 1 minute, 1 hour, 1 day, 1 month, 1 year etc compared to let’s say quantum clock or NIST-F1.
4) What time zone time is it showing? Can be tricky because look at India’s time zone for example.
5) How long does the battery last before it shuts down? or before it starts to “lie”? How rapidly does it start to lie when batteries are running lower?
6) How are the digits shown? Are they visible via any other angle? Are they too small or too big?

And few ways to have direct input without moving or touching the box itself
1) Put powerful-enough magnet next to the box to see what happens.
2) Set EMP-bomb off near the box to see what happens.

With best regards
Oliver V.

I’ve had the pleasure of meeting Oliver Vilson a couple of times.  I find his thinking to be incisive and insightful, and he has provided me with a couple of excellent stories.  The first thing that Oliver has done here is to help with transfer:  the idea that our odd little thought experiment about the clock can be transferred to real-world contexts.  Oliver is right:  no matter what we test, much of the time we interact with things that are black boxes, closed to us.  Sometimes we have to take the operation of the black boxes on trust.  Other time we have to test them, and as we’re testing them, we’re nastily constrained by our inability to control or influence the factors in the experiment.  Identifying those factors, getting around those constraints (to the degree that we can), and figuring out what and how to observe are all central to testing skill.

As I was reading, it also occurred to me that Oliver’s list of test ideas could provide a very nice example of the way to use the HICCUPPS(F) mnemonic for oracle heuristics and the CRUSSPICSTMPL mnemonic (!) for quality criteria in both a retrospective and in a generative way.

Let’s recall:  HICCUPPS(F) is a mnenomic by which we remember consistency heuristics for oracles, the principles or mechanisms by which we might recognize a problem.  We perceive no problem when all of the following heuristics hold, and we suspect a problem when any one of the following heuristics is violated:

History: The present version of the system is consistent with past versions of itself.
Image: The system is consistent with an image that the organization wants to project.
Comparable Products: The system is consistent with comparable systems.
Claims: The system is consistent with what important people say it’s supposed to be.
Users’ Expectations: The system is consistent with what users want.
Product: Each element of the system is consistent with comparable elements in the same system.
Purpose: The system is consistent with its purposes, both explicit and implicit.
Statutes: The system is consistent with applicable laws.
That’s the HICCUPPS part.  What’s with the (F)?  “F” stands for “Familiar problems”:
Familiarity: The system is not consistent with the pattern of any familiar problem.

That is, we suspect a problem in the item to be tested if we see some consistency with a problem that we’ve seen before.  We perceive “no problem” in the item to be tested when it doesn’t present a familiar problem to us while we’re testing.  I’ve written about an earlier version of this list of oracle heuristics here.

Update, August 22, 2014: HICCUPPS/F is now “FEW HICCUPPS“. Time and models march on!

The quality criteria for a product are those aspects of it that would tend to please favoured users—customers, or people who benefit from the efficient and accurate work of that customer.  Quality critieria can also be seen as things that would stymie disfavoured users—users that we don’t like, such as intruders, black hat hackers, snoops, denial-of-service enthusiasts, thieves, and so forth.

In the Rapid Software Testing course, we talk about quality criteria in terms of a set of guideword heuristics—labels for groups of ideas that trigger deeper analysis.  Our quality criteria include:

  • Capability
  • Reliability
  • Usability
  • Security
  • Scalability
  • Performance
  • Installability
  • Compatibility
  • Supportability
  • Testability
  • Maintainability
  • Portability
  • Localizability

These criteria are part of the Heuristic Test Strategy Model, first developed by James Bach.

So let’s look at Oliver’s example in terms of the oracles that are being used and the quality criteria that are being questioned here. I’ll start by tagging each test idea with one or more oracle heuristics and one or more quality criteria.

1) Expectations – for example: What format does it show time? Is it understandable?

Oracles:  user expectations, (implicit) purpose.  Quality critieria:  Usability, localizability

2) End-values – turnover of seconds/minutes/hours where, for example, 59 -> 00

Oracles:  user expectations, relevant standards.  Quality critieria:  capability, reliability.

3) Load testing – how much does it starts to lie in 10 seconds, 1 minute, 1 hour, 1 day, 1 month, 1 year etc compared to let’s say quantum clock or NIST-F1.

Oracles:  History, comparable products; familiar problem (clocks gaining or losing time).  Quality criteria:  reliability, performance.

4) What time zone time is it showing? Can be tricky because look at India’s time zone for example.

Oracles:  User expectations; implicit purpose.  Quality criteria:  usability, localizability.

5) How long does the battery last before it shuts down? or before it starts to “lie”? How rapidly does it start to lie when batteries are running lower?

Oracles:  History, user expectations.  Quality criteria:  Reliability, performance

6) How are the digits shown? Are they visible via any other angle? Are they too small or too big?

Oracles:  User expectations, implicit purpose.  Quality criteria:  Usability, testability.

Now, I’d like you to notice a few things.  First, the classifications that I’ve set here are my own.  They’re arbitrary.  You can agree with them or disagree.  That doesn’t matter so much.

What matters more, I think, is the excerise in which we think about the relationships between the test ideas, the quality criteria, the oracles, and the risks.  For a product of any kind, there’s risk associated with the idea that a relevant quality criterion of some kind will not be fulfilled.    By using the oracle and quality criteria guidewords, we can become conscious of the chaing of logic or “framing” of the test, which in turn helps us to compose, edit, narrate, and justify the product story and the testing story.

After we’ve applied oracle and quality-criteria tags to each of Oliver’s test ideas, we might start to notice some things. First, he has used a number of diverse heuristics by which he might recognize a problem.  In doing that, he has also identified tests that would address a number of quality criteria.  He did that quite spontaneously, without specifications or other documentation.  That is, as we’ve emphasized so often, it’s perfectly possible to test with incomplete or insufficient or inconsistent or ambiguous or out-of-date information, because

When information is missing, testing is a great way to generate it.

In providing a set of test ideas as he’s done, Oliver also brings to the surface a number of ideas and assumptions about the clock.  Whether those assumptions turn out to be right or wrong isn’t so important.  What’s far more important is getting started in observing similarities and differences between the assumptions and the reality.  The process of doing this is central to generating knowledge about the product.  This is very similar to Karl Weick’s observation, responding to a story in which a platoon of soldiers had a map that didn’t match the territory, but found their way home anyway:

“This raises the intriguing possibility that when you’re lost, any old map will do … maybe when you are confused any old strategic plan will do. Strategic plans are a lot like maps. They animate and orient people. Once people begin to act, they generate tangible outcomes in some context, and this helps them discover what is occurring, what needs to be explained, and what should be done next. Managers keep forgetting that it is what they do, not what they plan, that explains their success. They keep giving credit to the wrong thing—namely, the plan—and having made this error, they then spend more time planning and less time acting. They are astonished when more planning improves nothing.”  (Karl Weick, Sensemaking in Organizations, p. 54-55)

Oliver’s list (implicitly) includes test ideas that take advantage of the user expectations, comparable product, purpose, standards, and familiar problem heuristics.  We can see and justify what’s there by comparing it with the HICCUPPS(F) list, and noting that inconsistency with those items would point us to a problem.  “User expectations” seems to dominate the list of oracle heuristics.  One question we could ask is “how might we refine or expand the set of user expectations that we have?”  Another question is “are our ideas about oracles overloaded in the direction of user expectations?”

We can use the HICCUPPS(F) list to see what’s there, but with the list we can also see what might be missing:  questions about history (is there another clock like this?  is this the first one that we’ve ever seen?); about image (who is our client here?  what are possible perceptions that the client might want to project?); claims (what do people say about this clock, anyway? how is it supposed to work?  is there any useful information, whether documented or not, on this?); product (can we learn anything about the product by observing parts of it that should be consistent with one another? does the product include any internal sanity checks?).

Similarly, we can use the quality criteria list to help us generate ideas based on the things that might threaten the value of the product.  We can see some test ideas based on capability, reliability, usability, performance, and localizability.  What other factors might we choose to consider?  Which ones might be more important in our testing mission?  Less important?  Are there any that are crucial, or irrelevant?

Are there security concerns related to the clock?  Why is it in this box?  What would happen if someone were to get inside?  Could the functioning of the clock be affected by heat, cold, light, acceleration, bombardment?  What are the boundaries between the clock, its containers, and other systems?   Scalability:  is this a prototype clock, or are there going to be many like it?  Could it be used for very short-term or long-term measurements of time?  What if large numbers of people need access to the information it provides?  Installability:  How did it get there?  Can it be updated?  How would we get rid of it?  Compatibility:  does the clock interface with anything else?  How?  Supportability:  What do we do if someone has a problem with the clock? Can we get at it then?  How?  And if we can get at it then, why not now?  Testability:  You say that there is no way to provide input to the clock.  Really?  Is there some other way that you might be interpreting “input”?  What interfaces might be available?  What reference material?  What oracles?  Does the clock produce any information other than its display?  Are there any markings on it?  Guides to its internals?  Maintainability:  Supposedly I’m testing this because you want to be able to identify problems with it.  Do you want to be able to fix those problems?  Who would be responsible for doing that?  Is there source code or are there architectural drawings for the program that runs the clock?  Portability:  does that program work on other clocks?  What information can we learn about this clock that might be transferrable to other clocks?

As tools to help us see what’s there and see what’s missing, we can use the HICCUPPS(F) list to evaluate our oracles.  We can use the quality criteria list to evaluate our requirements coverage and make decisions about it.  At some point, we’ll also talk about product elements that point to coverage ideas.  We’ll also talk about the project environment that influences our context and our choices, both of which evolve over time.  But for now, that’s for later.  Thank you to Oliver for providing an excellent example on which, in this space, we could do a little something like transpection.

A Transpection Session: Inputs and Expected Results

Thursday, May 20th, 2010

A transpection is a dialog for learning. James Bach describes it here. Transpection is a technique we use a lot to refine ideas for presentations, for articles, for our course, or for our own understanding. Sometimes it’s all of them put together. Transpective sessions with James have led me sharpen ideas and to do work of which I’m very proud—on test coverage, for example (articles here, here, and here). Sometimes the conversation happens in speech (as in this dialog on what scripts tell us, or this one between James and Mike Kelly titled “Is There A Problem Here?” Sometimes the conversation happens in an instant message system like Skype. The former is more dynamic; the latter leaves a written transcript, which can be handy.

A few months back, James Bach and I took some time for a transpection session. Initially, the goal was to provide a demonstration of transpection for a particular student of James’, but then we realized that not only the session but its content might be of more general interest. James started the conversation.

James: Consider the definition: “A test consists of at least an input and an expectation.. Is that true? What does it mean?

Michael: Let’s apply your three-word heuristic for critical thinking: “Huh? Really? So?” 🙂

James: Can we go deeper with it?

Michael: Sure. Would sitting and observing a system that is already doing something (or not) be considered an “input”?

James: What do you think?

Michael: My first approach is to look at statements like the one you provide above and ask how broadly we could interpret them. I also try to falsify them. So… One could falsify it by asking, “Can I think of a case in which I don’t provide input, and it’s still a test. Where I don’t have an expectation, and it’s still a test?”

James: Okay.

Michael: One approach to answering to that question would be to ask “Well, what’s an input? What’s an expectation?”

James: So, let’s imagine an acrylic box. Inside the box runs a clock. You can see the clock face. You can’t interact with the clock (except to look at it in normal light). Can the clock be tested?

Michael: I can certainly “ask questions in order to evaluate it” (that’s the essence of your definition of testing), or “gather information with the intention of informing a decision” (which is Jerry Weinberg’s). I can conjecture. I can observe.

James: But can you provide an input? And can you test it?

Michael: In one sense, yes. I provide input by observing it. I don’t know if other people would be willing to take such a broad interpretation of input, but I could do that for the purpose of the exercise.

James: That seems like an odd definition of input. I don’t get it. It doesn’t process you looking at it. It doesn’t react to you looking at it.

Michael: It does seem like an odd definition. And yet…I put myself into the system of the clock. So what do YOU mean by “input”?

James: I will be happy to tell you. But first can you define what input means to you? What would be a reasonable definition?

Michael: In computer and testing talk, it usually means to provide data to a function for processing.

James: I agree. What does “provide data” mean?

Michael: It could mean to enter a sequence of keystrokes at the keyboard; to move a mouse and click on an element of the screen; to connect the computer to some stream of network traffic. To feed the machine with something to process. It more generally means, I think, to alter the current functioning of the system in some way.

James: To exert control over the system?

Michael: Very broadly, yes. Again, I’m not sure that some would accept such a general definition, but I would.

James: I think we can divide input into symbolic and non-symbolic input and explicit and implicit input. Symbolic input is data processed by the computer; data meaning “bits”, and “processed” meaning processed using the microprocessor. Non-symbolic input would be anything else, such as heat or shock. Then there’s explicit input—the input you knowingly provide—and implicit input: the input that influences the system without your knowledge or intent. Once you set an option, that option becomes implicit input for the function that refers to it. But in the case of the clock, there’s no explicit input. The implicit input is the previous state of the clock. Non-symbolic input includes the temperature of the room, voltage level, air pressure perhaps. But the tester does not control those in my example. So, can you test it?

Michael: Implicit input would also include the programming or design of the clock, yes? The engineering of the clock; its manufacture, its intended purpose?

James: Well, the programming is the structure of the product. It’s “input” for the microprocessor, but we aren’t testing that, are we? I don’t think those are inputs. The concept of input separates it from function and structure.

Michael: How do we know that we’re NOT testing it?

James: What do you mean?

Michael: Think of the Notepad exercise that we do in our course. Most people observe a bug in Notepad. They don’t realize that they’re testing Windows, which is where the bug actually is; Notepad calls that Windows function, and apparently few other products do. The people think they’re testing Notepad, and they are, of course. But they’re testing not only Notepad, but also the system in which Notepad lives.

James: Relate that to the clock, please. The clock is sealed. Can it be tested? You can see it run. Can you test it?

Michael: Well, you mentioned implicit input as the input that influences the system without your knowledge or intent. You can make inferences about it. You can observe it. You can evaluate the behaviour of the clock with respect to its implicit inputs. That’s testing, in my view.

James: You aren’t providing any input, and yet you are testing.

Michael: Input is there.

James: Can we imagine a system without input?

Michael: Yes; we’d call it a closed system.

James: Yes, I tried to construct one, with the clock.

Michael: And in real life, there aren’t any.

James: Not so. The clock is a closed system.

Michael: As soon as you put me in there, it’s not closed any more.

James: We’re not testing you. We’re testing the clock.

Michael: I’m involved.

James: No, you’re not.

Michael: Oh. I thought I was testing it.

James: You’re the tester, being asked to test the clock. You aren’t the product. The product is the clock. Can we test the clock? Furthermore, the clock has no implicit input.

Michael: Nope. But we’re not really testing only the product. We’re testing how the product relates to something (or more typically someone, or their values).

James: Of course. But don’t get that confused with input.

Michael: The clock absolutely has implicit input, unless we infer that God popped it into existence. In which case, the implicit input is God’s will.

James: You are not providing input to the clock.

Michael: I am not providing explicit input.

James: There is no implicit input either.

Michael: How did it get there, then?

James: “Getting there” is not input. That’s construction.

Michael: Bah. 🙂

James: You can’t just make things up, my friend. It helps to have definite ideas about what words mean.

Michael: Somebody made up the clock. Someone designed it. Someone wound it.

James: Yes, I’m talking about the word “input”.

Michael: Was winding the clock not input?

James: It’s an electronic clock with no internal data or state.

Michael: If it has no internal data or state, it’s gonna have a tough job keeping time.

James: It’s a set of instructions that run. It places data in memory but does not ever accept data or look at data. It’s a very long set of instructions, not decisions; a straight line program that places one value after another in memory. Pre-programmed.

Michael: There’s no loop?

James: No loop. It’s a fixed set of instructions that run in a straight line for, say, one year before the instructions run out.

Michael: It seems to me that one test would be to watch it for a year. Plus, I don’t know in advance what non-symbolic and implicit input it might take.

James: It takes non-symbolic input, in the form of temperature, etc. Let’s say those are fixed and out of your control and it takes no symbolic input at all, explicit or implicit.

Of course the program is input to the microprocessor, but we aren’t testing the microprocessor, as such. My point is: even if it is true that no symbolic input (the classic sort of computer input) is taken, and even if no explicit or implicit input happens (and certainly no explicit input) we can still test it. We agree that we can test it, even if you admit all my strange conditions. We can test it by watching it and evaluating it against our expectations, or against another clock. Now: do we have to plan the test in advance for it to be a test? Or can we concoct it as we go?

Michael: I’d say that at the moment we have a question, or an observation for which we can back-generate a question, we have a test.

James: What about expectation? Do we need that?

Michael: “I wonder if it will do that?” and “I wonder why it did that?” are testing questions, in my view.

James: Can either of those questions result in a bug found? I’m doubtful on that point.

Michael: Yes, absolutely. “Hey… the hands fell off the clock. I wonder why it did that?” I often find that I generate an expectation after the event.

James: So you do have an expectation. Or is it that you have an “expectation generator?”

Michael: I’m not sure if people would call “an expectation of which I became conscious later” an “expectation”.

James: By definition, it is an expecation. You say that it is. Why wouldn’t someone call that an expectation?

Michael: I observe that people use “expectation” in a couple of different ways, at least. One is in advance of an observation. The other is after the fact, often expressed in terms of a surprise. “I didn’t expect that!” “What DID you expect?” “Uh… I don’t know, but I didn’t expect that! I expected no penguin to suddenly appear on top of the television set.”

The problem that I see with that relates to the business of a test having to have an expected, predicted result. If we lock on to that, as long as the calculator says something like “4” after “2 + 2 =”, we can justify (or rationalize) missing too many bugs, in my view. Performance problems. Excessive precision, or imprecision. Usability problems. Reliability problems—what answer would the calculator give if we tried that test again? Might it give the answer “4” no matter what input we provide?

James: So “expected” can be conflated with “predicted” and that’s bad. That’s limiting?

Michael: I think so.

James: Me too. But if we have a real-time expectation generator, perhaps we could call that…what’s the word?… oracle. At least, “oracle” covers it.

Michael: I’ll suggest that the oracle can be developed and applied retrospectively. “Oh, that IS a problem. That WAS a problem.”

James: That’s different. Let me suggest a hierarchy. 1) A prediction. 2) An evaluation on the fly. 3) A retrospective evaluation at any later point in time. The first one is the classic “expectation”. The first two are the classic oracle that defines a test. The third can turn anything that wasn’t a test into a test retrospectively. A week later, a tour that I made can become a test that I ran, if I become aware of an expectation that applies to that memory of what happened.

Michael:Yes. When I use a new product, I’m testing it. I start off optimistic, and after three weeks of frustration, I can say that the test failed, even though I didn’t set out to test. I might have to rely on memory, or go dumpster-diving for data, or try to recreate the test that I now realize was a test. But it was a test.

James: Okay. So, applying this to the original question. “A test consists of at least an input and an expectation.” Want to try to rephrase that?

Michael: Okay, let me try. “A test consists of at least an input (which may be explicit or implicit, symbolic or non-symbolic) and an evaluation linked to an observation (where the evaluation may have been predicted, generated at the same time as the observation, or applied retrospectively) by an oracle.”

James: That doesn’t fit the clock example, which involves no input provided by the tester.

Michael: Okay, so..”which may be explicit or implicit, symbolic or non-symbolic, and which may or may not be provided by the tester…”

James: Do you really think that a test consists of an input?

Michael: Hmm…

James: We agree that some sort of input, in some sort of extremely broad sense is there, but just because it’s there, does that mean the test consists of it?

Michael: Yeah, you’re right. “Consists” seems like the wrong word in that light. Maybe a test is an observation—or a set of observations—over time, where the input is optional.

James: When we took away the explicit and implicit symbolic input, it still seemed obvious that we could test. Could we observe a static system and be testing it, or does the system have to be operating?

Michael: Yes, I agree we could observe a static system and be testing it.

James: So, you could stare at a CD and be testing the video game that’s stored there. I think that’s more commonly called inspection or review.

Michael: Yes. And as you and I once argued, it doesn’t matter if we call it inspection or review, it’s still testing. I lost that argument. You won it when you made that point: review and inspection are testing too. Questioning something in order to evaluate it.

James: Testing as commonly understood, I think, involves putting something through its paces. Exercising it.

Michael: Yes. I’m more careful these days to call that test execution.

James: But inspection might be part of the testing process. To avoid confusion, I no longer say “static testing” and “dynamic testing”. I say “inspection” and “testing”. To me, test execution means performing a test. Although some of my “testing” involves inspection, if my intent is to evaluate without having the product perform its function, I call that inspection outright.

Michael: We can test an idea too. When I’m reviewing or inspecting or performing a test, what I’m putting through the paces is my model of the object under test. That is, sometimes we’re not testing the object in and of itself. We’re testing the relationship between the object and our model of it. (By object, I mean the target of our investigation, which may be tangible or executable or neither.)

James: But we test an idea by putting it through its paces.

Michael: We might call that “transpection”, rather than inspection or testing!

James: I don’t want to get hung up on this too much. I’m just saying that my preference is to apply the word “testing” mainly to what I once would have called “dynamic testing”, even though the thought process applies to static things as well.

Michael: Yes. For the same reason, I don’t want to get hung up on the words that people use for “testing” and “checking”, as long as they’re conscious of the distinction. I’d like people to be alert to the possibilities available in ideas that are lumped into single words—and vice versa. Words and ideas are many-to-many, many-to-one, and one-to-many, but one-to-one correspondences are pretty rare. As soon as we come up with one, someone creative will come along and use it as a metaphor!

James: Given that, I want to say what my take on a “test” is, now. A test consists of two things: coverage and oracle; not “input and expectation”, but coverage and oracle. Coverage means an observation of some aspect of the product in action. Oracle means a principle or mechanism by which we recognize a problem. Applying to the clock example:  Yes, we can obviously test the clock. We can observe it working, and we can have an oracle, such as a second clock.

Michael: I’d also like to point out the importance of emotional oracles, like surprise or confusion or frustration or amusement. Those are the kinds of oracles that don’t suggest a right answer on their own, but which definitely point the to possibility of a problem for some person. Those emotional reactions don’t tell us what the problem is, necesssarily, but they are signals that tell us to look into something more deeply. If an oracle is a heuristic principle or mechanism by which we recognize a problem. then emotional reactions definitely qualify as oracles.

James: Yes. I think the IEEE concept of “input and expectation” is a special case created by a not-very-imaginative test manager. “Input” leads to coverage; “expectation” is one expression of an oracle. “Input” implies control by the tester, but control is not strictly necessary to test.

Michael: Yes—to test:  questioning a product in order to evaluate it, or gathering information to inform a decision.

James: Obviously, we don’t have control over most of what a product does, and yet we test it. Much of what the product does is invisible to us, and yet we test it.

Michael: That’s really important, isn’t it? When you think about the complexity of even a simply program and the system that it interacts with, there’s really very little that’s within our control, or within our capacity to observe it.

James: True, but we can get leverage on that. There’s also a third thing that we talk about in our class. procedures. You need to know how to do the test; you need to observe something about the product as it runs (perhaps controlling it as it runs, but perhaps not); and you need to be able to recognize a problem if it occurs. Arguably the latter two things imply the first.

Michael: So, with the clock experiment or with any other test, we’re following what we whimsically call the Universal Test Procedure 2.0.

1. Model the test space.
2. Determine oracles.
3. Determine coverage.
4. Identify Test Procedures.

Those things are test design, “knowing how to do the test”, as you say.  Then we

5. Configure the product.
6. Operate the product.

That’s your “perhaps controlling it” part. In the case of the clock, configuration was already done, and operation is ongoing.

7. Observe the product.
8. Evaluate the product.

When we link the observation in (7) with the oracles in (2), that’s your “recognizing a problem if it occurs”, and that the basis for evaluation. And in a real testing situation, we’d also

9. Apply a stopping heuristic
10. Report on what we’ve found.

James: Yeah, that’s an expansion of the ideas.

Michael: That’s a lot richer than “input and expected result”, isn’t it? Plus we’ve now got some ideas on implicit and explicit inputs, and symbolic and non-symbolic inputs. A very productive transpection session!

Questions from Listeners (1): Handling Inexperienced Testers

Wednesday, May 19th, 2010

On April 19, 2010, I was interviewed by Gil Broza.  In preparation for that interview, we solicited questions from the listeners, and I promised to answer them either in the interview or in my blog.  Here’s the first one.

How to deal with un-experienced testers? is there a test approach that suits better for them?

Here’s what I’d do: I’d train them. I’d pair them up with more experienced testers (typically test leads or coaches). I’d start them with things that are relatively easy to test, gradually increasing the difficulty of the assignments and the responsibilities of the tester. I’d watch closely to make sure that the test leads and the novices were talking to each other a lot during testing sessions.  I’d debrief them both together and individually.  I’d have the novices read books, articles, and blog posts, and ask them for summary reviews, and then we’d talk about them. I’d give them experiential exercises, in in which they and other members of the team would try to solve a testing puzzle, and then we’d talk about it afterwards. I’d reward them for demonstrating increasing skill: participating in Weekend Testing sessions; writing blog posts or articles for internal or external consumption; building tools and contributing to our development group or to the wider community.

Some might object that this approach is time-consuming. It certainly consumes some time, but it’s the fastest way I know to develop skill, and it’s similar in a general way to the training approaches for doctors, pilots, and skilled trades: involve them in the work, supervise them closely, and make them increasingly responsible for increasingly challenging work.

Here’s what I wouldn’t do: I wouldn’t give them a script to follow unsupervised and then presume that they’re going to do or learn the work. Everything that we know about learning and about testing suggests that both will be highly limited with this hands-off approach.  My friend Joey McAllister recently tweeted “The Barista at this Target Starbucks didn’t know if a mocha came with a shot of espresso in it. And she was working alone.” Most of the products that we’re testing aren’t simple cups of mocha. People are likely to suffer loss or harm if we take the approach that someone took with this barista.

Black Box Software Testing Course in Toronto, June 23-25 2010

Thursday, May 6th, 2010

In 1996, I was working as a program manager for Quarterdeck, which at the time produced some of the best-selling utility software on the market. I took a three-day in-house training class that quite literally changed the course of my life. That class was the Black Box Software Testing course, by Cem Kaner.

Unlike anyone else that I was aware of at the time, Cem was writing and talking about a different kind of testing from what we were used to. Most of the books and testing models that I was aware of talked about things like timely, complete, and unambiguous requirements; they talked about process models; they talked about how, if you didn’t get what the books said you needed, you should refuse to test. They talked about testers as the gatekeepers of quality. Other books talked about test techniques in an abstract and largely mathematical way. All focused on some notion of functional correctness. Very few, if any, focused on value to the customer, and the idea that software testing was a very human part of software development, itself a very human thing.

Cem’s book, Testing Computer Software (written with Hung Nguyen and Jack Falk), was different. It was a book for testers who were working in environments where no one else followed “the rules”, the so-called best practices that were neither best nor practiced in real life. The BBST course took the same tack. Cem didn’t preach that we were quality gatekeepers; in fact, he demolished that myth. Instead, he offered an approach that was much more skills-oriented than proces-focussed, pragmatic rather than Platonic, and investigation-focused rather than confirmation-focused. In 2002, Cem released a new book (with James Bach and Brett Pettichord) called Lessons Learned in Software Testing. That book was strongly interconnected with the BBST course material (which, by then, credited James Bach with co-authorship). In that era, Cem began to release videos of the course lectures online, along with presentation slides, course notes, self-quizzes, extra material, reading lists, and references. Portions of the online BBST course are now being offered in an instructor-led form by the Association for Software Testing for its members, with more and more classes being added each year.

Now, after 15 years of continuous development on the Black Box Software Testing course, Cem is coming to Toronto to deliver a very rare live, public, version of the class, June 23 through June 25, 2010. He says, “The Black Box Software Testing course takes an explorer’s view of the core issues in software testing. We look at the primary test techniques (tests based on scenarios, risks, specifications, or attributes of the data under test), at the challenges of identifying and credibly reporting failures, and at the management challenges of adapting your practices to the project’s context (for example, regulatory or market requirements).

“Supplementing the course is a rich collection of multimedia instructional materials, available free, online. This gives us the freedom to tailor the course to the preferences of the students, leaving some topics to the videos, buying time for more activities, discussions, and exercises in class.”

The class is being sponsored by TASSQ, the Toronto Association of Systems and Software Quality. I’ll be there in a support role.

You can sign up for the class via the form at http://www.tassq.org/pdf/registration_form_black_box.pdf. If you mention the promotional code BLG, you’ll be able to register for the Early Bird Rate of $1400 through May 21.

Many thanks to the eagle-eyed testers who pointed out that the title of Lessons Learned in Software Testing was not, in fact Testing Computer Software as this post once erroneously claimed.

When Testers Are Asked For A Ship/No-Ship Opinion

Wednesday, May 5th, 2010

In response to my post, Testers:  Get Out of the Quality Assurance Business, my colleague Adam White writes,

I want ask for your experience when you’ve your first 3 points for managers:

  • Provide them with the information they need to make informed decisions, and then let them make the decisions.
  • Remain fully aware that they’re making business decisions, not just technical ones.
  • Know that the product doesn’t necessarily have to hew to your standard of quality.

In my experience I took this to an extreme. I got to the point where I wouldn’t give my opinion on whether or not we should ship the product because I clearly didn’t have all the information to make a business decision like this.

I don’t think that’s taking things to extreme.  I think that’s professional and responsible behaviour for a tester.

Back in the 90s, I was a program manager for several important mass-market software products. As I said in the original post, I believe that I was pretty good at it. Several times, I tried to quit, and the company kept asking me to take on the role again.  As a program manager, I would not have asked you for your opinion on a shipping decision.  If you had merely told me, “You have to ship this product!” or “You can’t ship this product!” I would have thanked you for your opinion, and probed into why you thought that way.  I would have also counselled you to frame your concerns in terms of threats to the value of the product, rather than telling me what I should or should not do.  I likely would have explained the business reasons for my decisions; I liked to keep people on the team informed.  But had you continued in telling me what decision I “should” be making, and had you done it stridently enough, I would have stopped thanking you, and I would have insisted to you (and, if you persisted, your manager) to stop telling me how to do my job.

On the other hand, I would have asked you constantly for solid technical information about the product.  I would have expected it as your primary deliverable, I would have paid very close attention to it, and would have weighed it very seriously in my decisions.

This issue is related to what Cem Kaner calls “bug advocacy“. “Bug advocacy” is an easy label to misinterpret. The idea is not that you’re advocating in favour of bugs, of course, but I don’t believe that you’re advocating management to do something specific with respect to a particular bug, either.  Instead, you’re “selling” bugs. You’re an advocate in the sense that you’re trying to show each bug as the most important bug it can be, presenting it in terms of its clearest manifestation, its richest story, its greatest possible impact, its maximum threat to the value of the product. Like a good salesman, you’re trying to sell the customer on all of the dimensions and features of the bug and trying to overcome all of the possible objections to the sale. A sale, in this case, results in a decision by management to address the bug. But (and this is a key point) like a good salesman, you’re not there to tell the customer that he must buy what you’re selling; you’re there to provide your your customers with information about the product that allows them to make the best choice for them. And (corresponding key point) a responsible customer doesn’t let a salesman make his decisions for him.

I started to get pushback from my boss and others in development that it’s my job to give this opinion.

“Please, Tester! I’m scared to commit! Will you please do my job for me? At least part of it?”

What do you think? Should testers give their opinion on whether or not to ship the product or should they take the approach of presenting “just the facts ma’am”?

I remember being in something like your position when I was at Quarterdeck in the late 1990s. The decision to ship a certain product lay with the product manager, which at the time was a marketing function. She was young, and ambitious, but (justifiably) nervous about whether to ship, so she asked the rest of us what she should do. I insisted that it was her decision; that, since she was the owner of the product, it was up to her to make the decision. The testers, also being asked to provide their opinion, also declined. Then she wanted to put it to a vote of the development team. We declined to vote; businesses aren’t democracies. Anticipating the way James Bach would answer the question (a couple of years before I met him), I answered more or less with “I’m in the technical domain. Making this kind of business decision is not a service that I offer”.

But it’s not that we were obstacles to the product, and it’s not that we were unhelpful. Here are the services that we did offer:

  • We told her about what we considered the most serious problems in the product.
  • We made back-of-the-envelope calculations of technical support burdens for those problems (and were specific about the uncertainty of those calculations).
  • We contextualized those problems in terms of the corresponding benefits of the product.
  • We answered any questions for which we had reliable information.
  • We provided information about known features and limitations of competitive products.
  • We offered rapid help in answering questions about the known unknowns.

Yet we insisted, respectfully, that the decision was hers to make.  And in the end, she made it.

There are other potentially appropriate answers to the question, “Do you think we should ship this product?”

  • “I don’t know,” is a very good one, since it’s inarguable; you don’t know, and they can’t tell you that you do.
  • “Would anything I said change your decision?” is another, since if they’re sure of their decision, their answer would be the appropriate one: No. If the answer is Yes, then return to “I don’t know.”
  • “What if I said Yes?” immediately followed by “What if I said No?” might kick-start some thinking about business business based alternatives.
  • “Would I get a product owner’s position and salary if I gave you an answer?”, said with a smile and a wink might work in some rare, low-pressure cases although, typically, the product owner will have lost his sense of humour by the time you’re being asked for your ship/no-ship opinion.

As I said in the previous post, managers (and in particular, the product owners) are the brains of the project. Testers are sense organs, eyes for the project team. In dieting, car purchases, or romance, we know what happens when we let our eyes, instead of our brains, make decisions for us.

Testers: Get Out of the Quality Assurance Business

Monday, May 3rd, 2010

The other day, Cory Foy tweeted a challenge: “Having a QA department is a sign of incompetency in your Development department. Discuss.”

Here’s what I think: I’m a tester, and it’s time for our craft to grow up. Whatever the organizational structure of our development shops, it’s time for us testers to get out of the Quality Assurance business.

In the fall of 2008, I was at the Amplifying Your Effectiveness conference (AYE), assisting Fiona Charles and Jerry Weinberg with a session called “Testing Lies”. Jerry was sitting at the front of the room, and as people were milling in and getting settled, I heard him in the middle of a chat with a couple of people sitting close to him. “You’re in quality assurance?” he asked. Yes, came the answer. “So, are you allowed to change the source code for the programs you test?” No, definitely not. “That’s interesting. Then how can you assure quality?”

A good question, and one that immediately reminded me of a conversation more than ten years earlier. In the fall of 1996, Cem Kaner presented his Black Box Software Testing course at Quarterdeck. I was a program manager at the time, but the head of Quality Assurance (that is, testing) had invited me to attend the class. As part of it, Cem led a discussion as to whether the testing group should really be called “Quality Assurance”. His stance was that individuals—programmers and testers alike—could certainly assure the quality of their own work, but that testers could not assure the quality of the work of others, and shouldn’t try it. The quality assurance role in the company, Cem said, lay with the management and the CEO (the principal quality officer in the company), since it was they—and certainly not the testers—who had the authority to make decisions about quality. Over the years he has continued to develop this idea, principally in versions of his presentations and papers on The Ongoing Revolution in Software Testing, but the concept suffuses all of his work. The role for us is not quality assurance; we don’t have control over the schedule, the budget, programmer staffing, product scope, the development model, customer relationships, and so forth. But when we’re doing our best work, we’re providing valuable, timely information about the actual state of the product and the project. We don’t own quality; we’re helping the people who are responsible for quality and the things that influence it. “Quality assistance; that’s what we do.”

More recently, in an interview with Roy Osherove, James Bach also notes that we testers are not gatekeepers of quality. We don’t have responsibility for quality any more than anyone else; everyone on the development team has that responsibility. When he first became a test manager at Apple Computer way back when, James was energized by the idea that he was the quality gatekeeper. “I came to realize later that this was terribly insulting to everyone else on the team, because the subtle message to everyone else on the team is ‘You guys don’t really care, do you? You don’t care like I do. I’m a tester, that means I care.’ Except the developers are the people who create quality; they make the quality happen. Without the developers, nothing would be there; you’d have zero quality. So it’s quite insulting, and when you insult them like that, they don’t want to work with you.”

Last week, I attended the STAR East conference in Orlando. Many times, I was approached by testers and test managers who asked for my help. They wanted me to advise them on how they could get programmers to adopt “best practices”. They wanted to know how they could influence the managers to do a better job. They wanted to know how to impose one kind of process model or another on the development team. In one session, testers bemoaned the quality of the requirements that they were receiving from the business people (moreover, repeating a common mistake, the testers said “requirements” when they meant requirement documents). In response, one fellow declared that when he got back to work, the testers were going to take over the job of writing the requirements.

In answer to the request for advice, the most helpful reply I can give, by far, is this: these are not the businesses that skilled testers are in.

We are not the brains of the project. That is to say, we don’t control it. Our role is to provide ESP—not extra-sensory perception, but extra sensory perception. We’re extra eyes, ears, fingertips, noses, and taste buds for the programmers and the managers. We’re extensions of their senses. At our best, we’re like extremely sensitive and well-calibrated instruments—microscopes, telescopes, super-sensitive microphones, vernier calipers, mass spectrometers. Bomb-sniffing detectors. (The idea that we are the test instruments comes to me from Cem Kaner.) We help the programmers and the managers to see and hear and otherwise sense things that, in the limited time available to them, and in the mindset that they need to do their work, they might not be able to sense on their own.

Listen: if you really want to improve the quality of the code and think that you can, become a programmer. I’ve done that. I can assure you that if you do it too, you’ll quickly find out how challenging and humbling a job it is to be a truly excellent programmer—because like all tools, the computer extends your incompetence as quickly and powerfully as it extends your competence. If you want to manage the project, become a project manager. I’ve done that too, and I was pretty good at it. But try it, and you’ll soon discover that quality is value to some person or persons who matter, and that your own standards of quality are far less important than those who actually use the product and pay the bills. Become a project manager, and you’ll almost immediately realize that the decision to release a product is informed by technical issues but is ultimately a business decision, balancing the value of the product and bugs—threats to the value of the product—against the costs of not releasing it.

In neither case would I have found it helpful for someone who was neither a programmer nor a project manager to whine to me that I didn’t have respect for quality; worse still would have been instruction on how to do my job from people who had never done that kind of work. In both of those roles, what I wanted from testers was information. As a programmer, I wanted to know about problems the testers had found in my code, how they had found them, the ways in which the product might not work, and the steps and clues I needed to point me towards finding and fixing the problem myself. As a program manager, I wanted to know what information the testers had gathered about the product, and how they had configured, operated, observed, and evaluated the product to get that information. I wanted to know how our product worked differently from the ways in which it had always worked. I wanted to know about internal inconsistencies. I wanted to know how our product worked in comparison to other products on the market; how it was worse, how it was better. I wanted to know how the product stacked up against claims that we were making for it.

So:  you want to have an influence on quality, and on the team.  Want to know how to influence the programmers in a positive way?

  • Tell the programmers, as James suggests in the interview, that your principal goal is to help them look good—and then start believing it. Your job is not to shame, or to blame, or to be evil. I don’t think we should even joke about it, because it isn’t funny.
  • You’re often the bearer of bad news. Recognize that, and deliver it with compassion and humility.
  • You might be wrong too.  Be skeptical about your own conclusions.
  • Focus on exploring, discovering, investigating, and learning about the product, rather than on confirming things that we already know about it.
  • Report what you’ve learned about the product in a way that identifies its value and threats to that value.
  • Try to understand how the product works on all of the levels you can comprehend, from the highest to the lowest.  Appreciate that it’s complex, so that when you have some harebrained idea about how simple it is to fix a problem, or to find all the bugs in the code, you can take the opportunity to pause and reflect.
  • Express sincere interest in what programmers do, and learn to code if that suits you. At least, learn a little something about how code works and what it does.
  • Don’t ever tell programmers how they should be doing their work. If you actually believe that that’s your role, try a reality check: How do you like it when they do that to you?

Want to know how to influence the managers?

  • Provide them with the information they need to make informed decisions, and then let them make the decisions.
  • Remain fully aware that they’re making business decisions, not just technical ones.
  • Know that the product doesn’t necessarily have to hew to your standard of quality.
  • It’s not the development manager’s job, nor anyone else’s, to make you happy. It might be part of their job to help you to be more productive.  Help them understand how to do that.  Particularly draw attention to the fact that…
  • Issues that slow down testing are terribly important, because they allow bugs the opportunity to hide for longer and deeper. So report not only bugs in the product, but issues that slow down testing.
  • If you want to provide information to improve the development process, report on how you’re actually spending your time.
  • Note, as is so commonly the case, why testing is taking so long—how little time you’re spending on actually testing the product, and how much time you’re spending on bug investigation and reporting, setup, meetings, administrivia, and other interruptions to obtaining test coverage.
  • Focus on making these reports accurate (say to the nearest five or ten per cent) rather than precise, because most problems that we have in software development can be seen and solved with first-order measures rather than six-decimal analyses derived from models that are invalid anyway.
  • Show the managers that the majority of problems that you find aren’t exposed by mindless repetition of test cases, but by actions and observations that the test cases don’t cover—that is, by your sapient investigation of the product.
  • Help managers and programmers alike to recognize that test automation is more, far more, than programming a machine to pound keys.
  • Help everyone to understand that automation extends some kinds of testing and greatly limits others.
  • Help to keep people aware of the difference between testing and checking.  Help them to recognize the value of each, and that excellent checking requires a great deal of testing skill.
  • Work to demonstrate that your business is skilled exploration of the product, and help managers to realize that that’s how we really find problems.
  • Help the team to avoid the trap of thinking of software development as a linear process rather than an organic one.
  • Help managers and programmers to avoid confusing the “testing phase” with what it really is: the fixing phase.
  • When asked to “sign off” on the product, politely offer to report on the testing you’ve done, but leave approval to those whose approval really matters: the product owners.

Want to earn the respect of your team?

  • Be a service to the project, not an obstacle. You’re a provider of information, not a process enforcer.
  • Stop trying to “own” quality, and take as your default assumption that everyone else on the project is at least as concerned about quality as you are.
  • Recognize that your role is to think critically—to help to prevent programmers and managers from being fooled—and that that starts with not being fooled yourself.
  • Diversify your skills, your team, and your tactics. As Karl Weick says, “If you want to understand something complicated, you have to complicate yourself.”
  • As James Bach says, invent testing for yourself. That is, don’t stop at accepting what other people say (including me); field-test it against your own experience, knowledge, and skills. Check in with your community, and see what they’re doing.
  • If there’s something that you can learn that will help to sharpen your knowledge or your senses or your capacity to test, learn it.
  • Study the skills of testing, particularly your critical thinking skills, but also work on systems thinking, scientific thinking, the social life of information, human-computer interaction, data representation, programming, math, measurement
  • Practice the skills that you need and that you’ve read about. Sharpen your skills by joining the Weekend Testing movement.
  • Get a skilled mentor to help you. If you can’t find one locally, the Internet will provide. Ask for help!
  • Don’t bend to pressure to become a commodity. Certification means that you’re spending money to become indistinguishable from a hundred thousand other people. Make your own mark.
  • Be aware that process models are just that—models—and that all process models—particularly those involving human activities like software development—leave out volumes of detail on what’s really going on.
  • Focus on developing your individual skill set and mindset, so that you can be adaptable to any process model that comes along.
  • Share your copy of Lessons Learned in Software Testing and Perfect Software and Other Illusions About Testing.

Ultimately, if you’re a tester, get out of the quality assurance business. Be the best tester you can possibly be: a skilled investigator, rather than a programmer or project manager wannabe.

Note: every link above points to something that I’ve found to be valuable in developing the thinking and practice of testing. Some I wrote; most I didn’t.  Feast your mind!

Why We Do Scenario Testing

Saturday, May 1st, 2010

Last night I booked a hotel room using a Web-based discount travel service. The service’s particular shtick is that, in exchange for a heavy discount, you don’t get to know the name of the airline, hotel, or car company until you pay for the reservation. (Apparently the vendors are loath to admit that they’re offering these huge discounts—until they’ve received the cash; then they’re okay with the secret getting out.) When you’re booking a hotel, the service reveals the general location and the amenities. I made a choice that looked reasonable to me, and charged it to my credit card.

I had screwed up. When I got the confirmation, I noticed that I had booked for one night, when I should have booked for two. I wanted to extend my stay, but when I went back to the Web page, I couldn’t be sure that I was booking the same hotel. The names of the hotels are hidden, and I knew that the rates might change from night to night. One can obtain clues by looking at the amenities and the general location of the hotels, but I wanted to be sure. So instead of booking online, I called the travel service’s 1-800 number.

Jim answered the phone sympathetically. It turns out that not even the employees of the service can see the hotel name before a booking is made. However, this was a familiar problem to him, so it seemed, and he told me that he’d match the hotel by location and amenities, back out the first credit-card transaction for one night, and charge me for a new transaction of two nights. He managed to book the same hotel. So far so good.

I went to the hotel and checked in. The woman behind the counter asked for identification and a credit card for extras, and then she asked me, “How many keys will you be needing tonight, sir?” “Just one”, I said. She put a single key card into the electronic key programming machine, and handed the card to me. I took the elevator to room 761, which had a comfortable bed and desk with a window behind it, including a nice view. I went up to my room, unpacked some of my things, and decided to go for a dip in the hot tub. When I came back upstairs, I changed into dry clothes, took out my laptop, plugged it in, and sat down at the desk.

The floor was shaking. I mean, it was really vibrating. Some big motor—an air-conditioning compressor? a water pump?—had turned the office chair into a massager. I stood up, and it seemed that half of the room, including the bed was shaking. I tried to do a little work, but the vibration was enormously distracting. I called down to the front desk.

Peter answered the phone sympathetically. “I’ll send someone right up to check it out,” he said. Fair enough, but this problem was unlikely to go away any time soon, and until it did, I wanted another room. “No worries,” said Peter. “I’ll start the process now, and send someone up to check out the problem. Then you can come downstairs to exchange your key.” (“Why not send the new key up with the person coming upstairs?” I thought, but I didn’t say anything.) “I’ll need a few minutes to tidy up,” I said. “Very well, sir,” said Peter. I repacked my bags. A few minutes later, the phone rang, and Peter asked if I was ready for the staff member to arrive.  Yes.

After a short time, someone knocked on the door. He had a pair of new keys (two, not one), which he passed to me. He appeared skeptical at first, but I sat him down in the desk chair. “Oh, now I feel it,” he said. “Stand over here, next to the bed” I said. He got up, moved over, and felt the shaking. “Wow,” he said. We chatted for a few more moments, speculating on where the shaking was coming from. He left to investigate, and I decamped to my new room, 1021, on another floor on the other side of the building. So far so good.

This morning on my way to the shower, I noticed that a piece of paper had been slipped under the door. It was the checkout statement for my stay, noting my arrival and departure date and the various charges had been made to my credit card, including state sales tax, county tax, and a service fee for Internet use. I noticed that the checkout date and time was this morning, but I’m not supposed to be leaving until tomorrow morning. I called the front desk.

Zhong-li answered the phone sympathetically. I explained the situation, noting that I had booked through a travel service twice, once for one night and then later for two, and that the first booking should have been backed out (but maybe the service hadn’t done that), plus I had changed rooms the night before, so maybe it was an issue with the service but maybe it was an issue with the hotel’s own system too. Or maybe it was only the hotel. “No problem,” he said. “We can extend your stay for another night. But you’ll have to come downstairs at some point today so that we can re-author your room keys.”

So here’s the thing: how many variables can you see here? How many interconnected systems? How many different hardware platforms are involved? What protocols do they use to communicate?  To create, read, update, and delete? What are the overall transactions here?  What are the atomic elements of each one?  How does each transaction influence others?  How is each influenced by others?  What are the chances that everything is going to work right, and that I will neither under nor overpay?  What are the chances that the travel service will overpay (or underpay) the hotel for my stay, even if my credit card shows the appropriate entries and reversals?

It’s not even a terribly complicated story, but look at how many subtleties there are to the scenario. Have you ever seen a user story that has the richness and complexity of even this relatively simple little story? And yet, if we pay attention, aren’t there lots of stories like mine every day? Does my story, long as it is, include everything that we’d need to program or test the scenario? Does the card below include everything?

Index Card

Next question: if you want to create automated acceptance tests, do you want a scenario like this to be static, using record and playback to lock in on checking specific values in specific fields? Are we really going to get value from the story if we use the same data and the same outputs over and over again? This approach will be hard enough to program, but it will tend to be very brittle, resistant to change and variation. It will tend to miss details in the scenario that we would only learn about through repeated human interaction with the product.

Or would you prefer to have a flexible framework that allows you to explore and vary the scenario, designing and acting upon new test ideas, and observing the flow of each piece of data through each interconnected system? Might you be able to do this by exploiting testing tools that you’ve developed for the lower levels of the system and assembling them into progressively more powerful suites? This second apprach will likely be even harder to program, although you might be able to take advantage of lower-level test APIs, probes, and data generators that you and the programmers have developed as you’ve gone along. This approach, though, will tend to be far more powerful and robust to change, to learning, and to incorporating new and varied test ideas. Think well, and choose wisely.

In either case, unless you have people exploring and interacting with the product and the story directly, I guarantee you will miss important points in the story and you’ll miss important problems in the product.  Your tools, as helpful as they are, won’t ever pause and say, “What if…?” or “I wonder…” or “That’s funny…” You’ll need people to exercise skill, judgment, imagination, and interaction with the system, not in a linear set of prescribed steps but in a thoughtful, inventitive, risk-focused, and variable set of interactions.

In either case, you’ll also have a choice as to how to account for what you’re doing.  It’s one scenario, but is it only one test?  Is it dozens of tests?  Thousands?  If you use the second framework and induce variation, what does that do for your test count?  Or would it be better to report your work in an entirely different way, reporting on risks and test ideas and test activities, rather than try to quantify a complex intellectual interaction by using meaningless, quantitatively invalid units like “test cases” or “test steps”?

It’s been a while since I’ve posted this, but it’s time to do it again. This passage comes from a book on programming and on testing, written by Herbert Leeds and Gerald M. Weinberg (Jerry wrote this passage, he says). It’s understandable that people haven’t got the point yet, since the book is relatively new: it came out only 49 years ago (in 1961).  The emphases are mine.

“One of the lessons to be learned … is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.

Amen.