“Why Didn’t We Catch This in QA?”

August 13th, 2020

My good friend Keith Klain recently posted this on LinkedIn:

“Why didn’t we catch this in QA” might possibly be the most psychologically terrorizing and dysfunctional software testing culture an organization can have. I’ve seen it literally destroy good people and careers. It flies in the face of systems thinking, complexity of failure, risk management, and just about everything we know about the psychology involved in testing, but the bully and blame culture in IT refuses to let it die…”

There’s a lot to unpack here. Let’s start with this: what is “QA”?

If “QA” is quality assurance, then it’s important to figure out who, or what, assures quality—value to some person(s) who matter(s).

Confusion abounds when “QA” is used as a misnomer for testing. Testing is not quality assurance, though it can inform quality assurance. Testing does not assure quality, no more than diagnosis assures good health.

In terms of health, there’s no question that we want good diagnoses so that we can become aware of particular pathologies or diseases. If we’re in poor health, and we’re not aware of it, and diagnosis doesn’t catch it, it’s reasonable to ask why not, so that we can improve the quality of diagnosis. The unreasonableness starts when someone foolishly believes that diagnosis is infallible, or that it assures good health, or that it prevents disease—like believing that lab technicians and epidemiologists are responsible for COVID-19, or for its spread.

Once again, it is high time that we dropped the idea that testing is quality assurance. Who perpetuates this? Everyone, so it seems, and it’s not a new problem. At very least, it would be a great idea if testers stopped using the label to describe themselves. As long as testers persist in calling themselves “QA”, the pandemic of ignorance and blame will continue.

What, or who, does assure quality, then?

In one sense, everyone who performs work has agency or authority over it, which includes an implicit responsibility to assure its quality, just as everyone is responsible to maintain the health of his or her mind and body. Assuring the quality of our work a matter of craft; self-awareness; diligence; discipline; professionalism; and duty of care towards ourselves, our clients and our social groups. If we’re adults, no one else is responsible for washing our hands.

In everyday life, we make choices about lifestyle, diet, and hygiene that influence our health and safety. As adults, those choices, whether wise or reckless, are our responsibility. At work, our agency affords freedom and responsibility to push back or ask for help when we’re pressed to do work in a way that might compromise our own sense of quality. And our agency enables us to leave any situation in which we are required to behave in ways that we consider unprofessional or unethical.

Part of maintaining personal health is maintaining awareness of it. That means asking ourselves how we feel, and soliciting the help of others who can sometimes help us become aware of things that we don’t see, like personal trainers, doctors, or counsellors. Similarly, assuring quality in our work involves evaluating it—often with the help of other people—to become aware of its state, and in particular, its limitation and problems.

Other people might help us, but as authors of our own work, we are responsible for making those evaluations, and we are responsible for what we do based on those evaluations. Choices that bear on our health, or on the quality of our work, are ours to make.

So, in this sense, “why didn’t we catch this in QA?” would mean “why did we not assure the quality of our own work?” And at the centre of that “we” is “I”.

In another sense, responsibility for the quality of work and workplace resides in the management role. While we’re responsible for washing our hands, management is responsible for providing an environment where handwashing is possible—and for ensuring that people aren’t pushed into conditions where they’re endangering themselves, each other, or the business.

Insofar as management engages people to do work and make products, management is responsible for determining what constitutes quality work, and deciding whether the product has met its goals. Management decides whether the product it’s got is the product it wants—and the product it wants to ship. Management can ask testers to learn about the product on management’s behalf, but management is ultimately responsible for assuming the risk of unknown problems in the product.

Management is responsible for setting the course; for co-ordinating people; for marshaling resources; for setting policy; for providing help when it’s needed; for listening and responding and acting appropriately when people are pushing back. While testers help management to become aware of the status of the product, management is responsible for evaluating the quality of the work and the workplace, and for deciding (based on information from everyone, not only testers) whether the work is ready for the outside world.

Management assures quality by creating the conditions that make it possible for people to assure the quality of their own work. And management fails to assure quality when it sets up conditions that make quality assurance impossible, or that undermine it. In that case, “why didn’t we catch this in QA?” would mean “why didn’t management assure the quality of the work for which it is responsible?”

When people get sick, it’s reasonable to ask how people got sick. It’s reasonable to ask what they might need and what they might do to take better care of themselves. It’s also reasonable to ask if government is providing sufficient support for individual health, public health, and public health workers. It’s even reasonable to ask how better epidemiology and diagnosis could help to sound the alarm when people and populations aren’t healthy. It’s not reasonable to put responsibility for personal or public health on the epidemiologists and diagnosticians and lab techs.

So “Why didn’t we catch this in QA?” is a fine question to ask when it means “Why did we not assure the quality of our own work?” or “Why didn’t management assure the quality of the work for which it is responsible?” But don’t mistake testing for quality assurance, and don’t mistake the question for “Why didn’t testers assure the quality of the product?” And if you’re a tester, and being asked the latter question, reframe it to refer to the previous two.

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

It’s Not About The Typing

August 6th, 2020

Garbage truckloads of marketing bumph are being dumped into the testing space about “codeless” testing tools. For the companies producing these tools, to “test” seems to mean “performing a sequence of keystrokes or mouse clicks or button presses on an app”. (You can see the same pattern in many tutorials on “test automation”; write a script that executes a sequence of actions, and that’s a “test”.) But the marketing material is mute on how the tool aids the tester in recognizing problems in the product. The marketing focus is on how quickly the product can repeat some sequence of keystrokes and clicks and pushes.

Automated data entry can be a useful part of a test, but automated data entry is not a test. Alas, the marketing approach works really well on managers (and, sadly, some testers) who can perceive only the visible activities of testing; not the cognitive aspects of it, and not the essential mission of testing: revealing the status of the product, and finding problems before it’s too late to do anything about them.

In Rapid Software Testing, testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, critical thinking, risk analysis, etc. Above all, testing requires us to focus on the risk that there are problems in the product, to anticipate problems, and to recognize problems that are present. That requires oracles; an oracle is a means by which we recognize a problem when we encounter one in testing.

The “no-code automation” tools supply weak oracles at best; typically checking for the presence of a particular element on the screen, or a particular value in some output field. If that element is there, or if that value matches some specified and presumably desirable result, the “test” “passes”. But that doesn’t mean that there is no problem; a product can have plenty of problems even when it arrives a correct calculation, or drops the user on the requested page. You know this from your own experience. You’ve used iTunes, right? You’ve been on LinkedIn. You’ve tried to fix an indentation issue in Microsoft Word. Maybe not those things specifically, but I bet you’ve felt annoyed, frustrated, impatient, or baffled when try to use software to get something done. Quite possibly today.

And there’s the rub: rather than a means to gain experience with the product, most of these tools represent a means to check the product for specific conditions that can be specified easily. There’s a seductive story to be told about that: you can run those checks over and over, really quickly, and find a few shallow bugs when something changes in a bad way. Yet the tools are fussy; a change in the product can throw the script off even when it’s a desirable change. Addressing that requiring investigation, repair, and continuous maintenance, which takes time.

Then something even worse happens: testers, deliberately or not, don’t report the time it takes to deal with problems around the tool. Why wouldn’t they do that? One reason could be that management has spent a wad of money on the tool, and the vendor says it’s supposed to be simple. As a tester, to suggest that there are difficulties with the tool is to risk your reputation. Come on. It’s codeless. It’s supposed to be simple.

While all that is going on, the tool misses problems that would be easily apparent if testers were gaining real, human experience with the product. People find problems that tools miss because humans have a wonderful capacity to recognize problems that they have not been told about in advance. Humans bring rich sets of oracles to testing. Testers use their feelings, their social awareness, their memories, their tacit knowledge, their experience of the world, their familiarity with comparable or competitive products or features—all of these things, and more—to generate and apply oracles on the fly. But there’s less time available for gaining experience with the product and identifying unanticipated problems whenever the tester is repairing and maintaining the scripts.

Whether your testing tool is “codeless”, whether your input is delivered by a script, or input directly via keyboard and mouse, the means of entering data is usually one of the least significant aspects of a test.

What matters is not typing quickly, but the capacity for the tester to recognize problem that matter. If there’s no oracle, there’s no test. If there are weak oracles, there’s weak testing.

Further reading:

A Context-Driven Approach to Automation in Testing
Oracles from the Inside Out

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

A Testopsy: Learning from Performance

July 27th, 2020

What’s the difference between Rapid Software Testing (RST) and other forms of testing? In RST, the process model is not the centre of testing; neither is formal documentation; nor are tools. All of those things play a role in testing, of course, but they’re not at the centre.

In RST, the centre of testing is the skill set and the mindset of the individual tester, and heuristics that testers apply.

A heuristic is a fallible means of solving a problem. That is, a heuristic might work, or it might fail. A heuristic will fail when it is applied to the wrong kind of problem; or when it is applied with insufficient judgement, wisdom, skill, or care; or when some context factor or another derails it. All of the models that we apply to the product and to the test space are heuristic. All test techniques are heuristic. All of the ways in which we could apply tools are heuristic. All the ways we have of deciding that there’s a problem (that is, all of our oracles) are heuristic. And this doesn’t apply only to testing; everything in software development, and in the broader field of engineering itself, is heuristic.

So, in order to get good at testing, we must learn about heuristics that we can apply powerfully in our work. We must also consider how our heuristics can fail, too. One of the better ways to do that is to review and evaluate our work periodically in a very detailed way. In Rapid Software Testing, we call that a testopsy.

Earlier this year, James Bach and I did a testopsy on a session of testing that we had performed together about six months earlier, in preparation for the Rapid Software Testing Applied class that we teach. By examining our performance, we able to notice and name heuristics and patterns that help us to think about testing, to describe it, and to understand how testing can go right—and sometimes not so right.

Here are just a few things we learned—or learned more deeply—from that session and the testopsy we performed:

  • When we’re doing pair testing, a lot of tacit knowledge emerges into the explicit. Each person’s performance is visible to the other, raising observations and questions about things that have not been shared up to that point. Through that, knowledge gets shared, discussed, and refined.
  • Products often give us lists of their own features in odd places, in interesting ways, that afford some efficiency for identifying coverage ideas.
  • There’s a phenomenon that happens in testing that we’re calling a “bug cascade“—periods where we are stressed or even overwhelmed by overlapping and competing investigations of complex and confusing behaviour.
  • During a bug cascade, we often recognize that we don’t know enough about the product to perform good analysis and troubleshooting.
  • Bugs get noticed and then lost, or missed altogether, during a bug cascade…
  • …but having a video and reviewing it can help us to recover what we’ve lost.
  • Analyzing the product (which had been our original misison for the session) can be severely disrupted by a cascade of bugs.
  • We coined a term, “mutually disruptive processes”, to describe one of the consequence of the bug cascade—which, when you’re working alone, is self-disruptive.
  • We coined another term, “the money booth effect”, to account for the collapse of productivity that is the consequence of mutually- or self-disruptive processes.
  • It is a good idea to be forgiving of ourselves for these problems. Although we can try to manage them to some degree, they are intrinsic to the process of learning and testing a product.

There’s lots more to the testopsy, which you can see here.

Why is this all important? Because in order to do something well, we must understand it, and testing is often terribly misunderstood—by managers, by developers, and, sadly, by testers themselves. By doing deep study of our work from time to time, we can begin the process of framing it, describing it, dicussing it, and developing expertise in it.

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

Breaking the Test Case Addiction (Part 12)

July 25th, 2020

In previous posts in this series, I made a claim about the audience for a test report:

They almost certainly don’t want to know about when the testing is going to be done (although they might think they do).

It’s true that managers frequently ask testers when the testing will be done. That’s a hard question to answer, but maybe not for reasons that you—or they—might have considered.

By definition, testers who are working for clients do not work independently. We are providing services to our clients. We gain experience with the product, explore it and experiment with it so that that our clients can determine the status of the product. Knowledge of the status of the product allows our clients to decide whether product is ready to ship, or whether there is more development work to do.

Whatever testing we may have performed, we could always perform more; but once the client decides more development work won’t be worthwhile, development stops, and testing stops along with it. (At least, pre-release testing stops. Live-site monitoring and other forms of information gathering begin when the product is released, presenting an opportunity for learning about the quality of the product and about the quality of the testing that’s been done on it. Sometimes that learning comes with a big price tag.) The real question on the table, then, is not when testing work will be done, but when the development work will be done.

So, brace yourself: the fact is that no one really cares when testing will be done, because testing is never done; it only stops. Testing stops when the client determines that there is no more development work worth doing. The client—not the tester—decides when development is done. And how does the client decide that?

The client decides based on economics, reasoning, politics, and emotion. This is a complex decision, and here comes a long sentence that illustrates just how complex the decision is.

The client will decide to ship the product when she believes that

  • she knows enough about the product, the actual known problems about it, and the potential for unknown problems about it, such that…
  • the product provides sufficient benefits—that is, the product will help its users to accomplish a task, or some set of tasks; and
  • the product has a sufficiently small number of known bad problems about it; and
  • the product is sufficiently unlikely to have unknown bad problems; and
  • more development work—adding new features and fixing problems—will not be worthwhile, because
  • the benefits from the product outweigh the known problems to a sufficient degree that customers will obtain the value they want; and
  • the business can deal with known problems about the product, sufficiently inexpensively for the business to sustain the product and the business; and
  • the business can deal with whatever unknown problems may still exist; and
  • the client will not be in political trouble with her social group (including the team, management, and society at large) if she turns out to be wrong about any of all this; and
  • she feels okay about all of these things.

So when will testing be done? The client can declare testing to be done at any moment when the client is satisfied that all of these conditions have been fulfilled. So when the client asks “When will testing be done?”, that question amounts to “When will I be satisfied that development work is done?” And how can you, the tester, predict when someone else will be satisfied by work being done by other people?

You can’t. So I would recommend that you don’t, and that you don’t try. Instead, I’d suggest that you negotiate your role and your commitments. At first, this may look like a long conversation.

Try something like this:

“I understand that you want to know when testing will be done, because you want to know when development will be done; that is, when you will be satisfied that the product is ready to ship. I don’t know how to make a reliable prediction about when you will be satisfied, but here’s something that I can propose in return.

“I will start testing right now; that is, I will start obtaining experience with the product, exploring it, performing experiments on it, analyzing it. I’ll learn rapidly about the technology, the clients for the product, and the contexts in which the product will be used. As a tester, my special focus will be on evaluating it like a good critic; finding problems that threaten the value of the product to people who matter—especially you.

“Things will tend to go better if I’m able to help find problems early on—in the design of the product, or in our understanding of how its users might get value from it, or in the context that surrounds it. I don’t presume to be the manager or designer of the product, but I may have some suggestions for it—especially in terms of how to make the product more practically testable.

“As the product is being built, I’ll work closely with you and with the developers to help everyone make sure that the product we’re building is reasonably close to the product we think we’re building. The testing we need for that tends to be relatively shallow, focusing on quick feedback that doesn’t slow down or interrupt the pace of development. I’d recommend that you give the developers time and support to do their work in a disciplined way, as good craftspeople do. That discipline includes review, testing, and checking their work as they go, so that easy-to-find problems don’t get buried and cause trouble for everyone later. I can offer help with that, to the degree that the developers welcome it.

“The more that the developers can cover that quick, shallower testing, the more I’ll be able to focus on deep testing to find rare, hidden, subtle, intermittent, platform-dependent, emergent, elusive problems that matter. Deep testing requires a different mindset from the builder’s mindset, and changing mental gears to do deep testing can really disrupt the developers’ flow. So I’ll try to do deep testing as much as I can in parallel with the shallower testing that the developers are doing all the way along.

“At every step, I’ll let you know about any problems that I see in the product. I’ll be giving you bug reports, of course. I’ll also let you know about how the testing is going—what has been covered and what hasn’t. I’ll use coverage outlines in some form to help illustrate that, and I’m happy to offer you a variety of formats for them so you can choose one that works for you.

“If I notice a lot of bugs that seem like they should have been easy to find, I’ll let you know right away. For one thing, when there are lots of shallow bugs, deep testing becomes harder and slower, because I’m obliged to pause to investigate and report those bugs. More significantly, though, lots of shallow bugs might indicate that the developers are working too fast, or are under too much pressure. When people are pressed, they tend to have a hard time maintaining discipline and mental control over their work. In software, that’s a Severity 0 project risk; it leads to bugs, and some of those bugs may be deep enough that they’ll get past us—especially if we’re investigating and reporting the shallower bugs.

“I’m prepared to test or review anything you give me at any time; I’ll let you know how that influences the pace of other work that you’ve asked me to do.

“If there is testing that must be done formally—that is, in a specific way, or to check specific facts—I can certainly do that. I’ll provide you (and the auditors, if necessary) with evidence to support claims about all of the testing that has been done, both formal and informal. I’ll also let you know about extra costs associated with formal work—the time and effort it takes—and how it might affect our ability to find problems that matter.

“Apropos of that, I’ll keep track of anything that might threaten the on-time, successful completion of whatever work we’re doing. If you like, I’ll help to maintain product and project risk lists. (I’d recommend that the project manager be responsible for those, though.)

“I’ll keep track of where my own time is going, so that I’ll be able to produce a credible account of anything that is slowing down my work or making it harder. I’ll let you know what I need or recommend to make testing go as quickly and as easily as possible, and I invite you to ask for anything that helps make the product status or the testing work more legible—visible, readable, or understandable—to you.

“My goal is to help you to be immediately aware of everything you need to know to anticipate and inform a shipping decision.

“I know that this doesn’t directly answer the question of when testing will be done; but testing ends when we know the development work is done. So perhaps the best thing is for us to go together to the designers and developers. You can ask them when they anticipate that the development work will be done, and when the problems we encounter along the way will be fixed. I will help them to identify problems and risks, and to remember to include time and resources for testability as they give their estimate. As we’re working together to build and test the product, we can develop and refine our understanding about it, and we can be continually aware of its status. When that’s the case, you’ll be able to decide quickly whether there’s more development work to do, or whether you believe the product is ready for release.”

That’s a fairly thorough description of testing work. It’s a pretty long statement, isn’t it? Reading it aloud takes me just over five minutes. In real life, it would probably be interrupted by questions from time to time, too. So let’s imagine that the whole conversation might take 15 minutes, or even half an hour. But let me leave this post—and this series of posts—with these questions:

In a project that can take weeks or months, wouldn’t one relatively short conversation describing the testing role and affirming the tester’s commitments be worthwhile?

In that thorough description of testing work, did you notice that the expression “test cases” didn’t come up?

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? See the full schedule, with links to register here.

Breaking the Test Case Addiction (Part 11)

July 24th, 2020

In the previous post in this series, I made these claims about the audience for test reports:

  • They almost certainly don’t want to know about test case counts (although they might think they do).
  • They almost certainly don’t want to know about pass-fail ratios (although they might think they do).
  • They almost certainly don’t want to know about when the testing is going to be done (although they might think they do).

It’s far more likely that they want an answer to these questions:

What is the actual status of the product? Are there problems that threaten the value of the product? How do you—the tester—know? Do these problems threaten the on-time, successful completion of our work?

In this post, I’ll address the first two claims; I’ll leave the latter claim for next time.

They almost certainly don’t want to know about test case counts (although they might think they do).

Imagine asking a tester to test a cheap pocket calculator for you. We will call him “Eccles” (in honour of The Goon Show). You tell him your intentions for it: you would like use it mostly to help you to divide the bill for a group of friends at a restaurant, and other everyday tasks. Eccles disappears, and returns a few minutes later. You ask him if he has found any problems. He says No. You ask to see his results, and he shows you his two test cases:

Input: 1 + 1 Result: 2 (Pass)
Input: 2 + 2 Result: 4 (Pass)

You quite reasonably believe that Eccles’ testing is inadequate. You tell him that you want more test cases. He listens, appears to understand the problem, and nods. He disappears again, and considerably later he returns, telling you that he has run 100 test cases—50 times more than the first time! And he has carefully documented the results:

Input: 1 + 1 Result: 2 (Pass)
Input: 2 + 2 Result: 4 (Pass)
Input: 3 + 3 Result: 6 (Pass)
Input: 4 + 4 Result: 8 (Pass)
Input: 5 + 5 Result: 10 (Pass)
Input: 6 + 6 Result: 12 (Pass)
Input: 7 + 7 Result: 14 (Pass)
Input: 8 + 8 Result: 16 (Pass)
Input: 9 + 9 Result: 18 (Pass)
Input: 10 + 10 Result: 20 (Pass)
Input: 11 + 11 Result: 22 (Pass)

Input: 99 + 99 Result: 198 (Pass)
Input: 100 + 100 Result: 200 (Pass)

To the degree that more is better here, it’s not very much better.

The trouble, of course, is that the count doesn’t mean anything without context. What aspects of the product are being tested? Has the testing been limited to only mathematical functions within the product? If so, has the tester at least given some coverage to all of them—and if not, which ones has the tester not covered—and why not? Has the tester considered other things that could diminish, damage, or destroy the value of the product? Has the tester considered performance and reliability? Has the tester considered the different people who might use the product, and the ways in which they might use the product in the real world?

Testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, sensemaking, risk analysis, critical thinking—and many other things too. A test is a an instance of testing. Not all tests are equal in terms of effort, time, skill, scope, risk focus,…

Test cases tend represent things that are easy to describe about a test: directly observable behaviour that can be described or encoded explicitly; and observable and describable outputs. Test cases both assume and ignore tacit knowledge.

But neither tests nor test cases are commensurate—that is, they cannot be counted as though they were equivalent units—so “test case” is not a valid unit of measurement.

  • From one case to another, test cases vary widely in scope, in coverage, in cost, in risk focus, and in value.
  • The design of a test case is subjective, based at least to some degree on the mental models and mindset of individual testers.
  • Test cases involve different test techniques.
  • Test cases are not independent; the outcome of one might influence the outcome of another.
  • Test cases are not interchangeable. They’re different, depending on the feature, function, data, and product in front of us.
  • Test cases do not—and cannot—capture all the testing work that occurs, such as learning, conjecture, discoveries, bug investigation, and so forth.
  • Test cases don’t even capture the work of designing the test cases, nor of analyzing the results!
  • And finally… testers often don’t follow the test cases anyway—and certainly not in the same way every time! A test is a performance, and a test case is like a script and stage directions for that performance. As with actors working from a script, the performance will vary from tester to tester, and from time to time.

Note that none of these things is necessarily a problem. Indeed, in testing, there’s considerable value in variation and variability. Bugs aren’t all the same, and they’re not always in the same place. There is a big problem in trying to treat test cases as equivalent for the purposes of counting them. (I’ve talked about that many times before, including here, and here.)

Now, there is at least one argument in favour of test cases:

Perhaps someone wants to verify that a specific procedure can be followed, with specific preconditions and specific inputs, in order to show that the procedure and inputs will produce a specific result. And, in fact, perhaps that procedure, or some part of it at least, can be automated.

That’s okay, although there are at least two problems to consider. First, all that specification tends to take time and effort which can be costly, and which can swamp the value of what we might learn from following the procedure. Second, demonstrating that something can work based on specific procedures and inputs doesn’t mean that it will work. A variation in the procedure, or the conditions, or the inputs will result different output. Even holding the conditions and the procedure steady, and obtaining the correct output might result in an outcome that is terribly wrong in some sense.

Perhaps someone wants certain conditions to be identified and covered. If that’s true, identify those conditions and cover them. There are plenty of ways to do that without over-formalizing or over-proceduralizing the testing work.

Consider

  • noting those conditions in guidance for human interaction with the product;
  • reviewing existing logs or records to see if those conditions have been covered, and if not, cover them; or
  • creating automated low- or middle-level checks for those conditions.

Over 50 years ago, Jerry Weinberg wrote this passage:

One of the lessons to be learned from such experiences is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test really does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.”

Leeds and Weinberg, Computer Programming Fundamentals: Based on the IBM System/360, 1970

So, consider thinking in terms of testing, rather than test cases. And if you are applying test cases, please don’t count them. And if you count them, please don’t believe that the count means anything.

They almost certainly don’t want to know about pass-fail ratios (although they might think they do).

If a test case count is not a valid measure of test coverage, then a ratio derived from that count is invalid too, whether used to evaulate the quality of the product or the quality of the testing. I’ve heard tell of organizations that have a policy that says “when 97% of the test cases pass, the product is ready for shipping”. It shouldn’t take long to see the foolishness of this policy; it’s like a doctor say that when 97% of the data points in your medical checkup indicate no problem, you’re healthy.

Just as “the sheer number of tests is of little interest in itself”, the ratio of passing tests to failing ones is both insignificant and easy to game. Insignificant, because a product can be passing all of the tests that we’ve performed so far and still have terrible problems. Also insignificant, because a product can fail to pass hundreds of tests—but if those tests are outdated, inconsequential, overly precise, or otherwise irrelevant, there’s no problem. Easy to game, because if you want to make the product look better than it is, it’s a simple matter to perform more passing tests.

The point of testing is not to provide a pat on the head for the product; the point is to evaluate its true status, and to identify problems that threaten the value of the product to people who matter—to the users or customers of the software, or to anyone affected by it; to the support organization; to the operations people; and, ultimately, to the business.

Several years ago, a participant in one of my Rapid Software Testing classes approached me after I had mentioned this 97% pass rate business (which I’ll call 97PR henceforth). He said, “It’s funny you should mention it. I’ve worked at two companies where they used that measure to decide when to ship.”

“Really?” I replied. “Do you mind me asking—which ones?”

“Well,” he said. “One was Nortel.” I winced; Nortel was a huge Canadian success story until all of a sudden it wasn’t. “The other,” he said, “was RIM—Research in Motion. The Blackberry people.” I winced again.

Was 97PR responsible for the demise of these two companies? Probably not—certainly not directly. But to me, the 97PR suggests a company where engineering has been reduced to scorekeeping. If you want to fool people about something, providing numbers without context is a great way to do it. And if you want other people to fool you, ask for numbers without context.

For the calculator example above, what would a better test report look like? Here’s what I might offer:

“I’ve tested the calculator for basic math operations that seem likely to be important in calculating restaurant cheques: addition, multiplication, subtraction, and division. I imagined that you would be wanting to do this for groups of up to a dozen people. I did a handful of variations of each math operation, up to the limits of what the display of calculator supports, including stuff like dviding by zero. Beware, because if you do that by accident, you’ll lose what you’ve entered so far. (Aside: Windows Calculator loses the operations before a divide-by-zero too.) I took notes, if you want to see them.”

The client, of course, could stop me at any time. What if she didn’t? What would a deeper test report look like? Given some time, I might offer this:

“I tested the memory-store and memory-recall functions, too, and didn’t observe any problems. Even though they’re present as buttons on the calculator, I didn’t bother to test the higher-order math functions like squares, square roots, and trigonometric functions, since I reckoned you wouldn’t need those for restaurant bills and I didn’t want to waste your time by testing them. But if you want me to, I can.

“The buttons provide haptic feedback, so it’s easy to tell when they’ve been pressed, and there’s no key-repeat function, so it’s easier to avoid accidental double keypresses on this calculator than it is on others. I looked at it in low-light conditions; its LCD screen may be a little hard to see in a dark restaurant. It’s solar-powered, and there’s a feature that turns itself off after five minutes. In that case, it forgets whatever data you’ve entered.

“I dumped some water on the keypad, and it continued to perform without any problems. After I immersed it in a glass of water, though, I had to let it dry for a couple of days before it started working again, but it now seems to be working just fine.”

Yes; all that takes quite a bit longer to say—or to write—than “We’ve run 5163 tests, and of those, 118 are failing, for a pass rate of 97.7 per cent.” It’s also more informative—by a country mile—about the quality of the product and the quality of the testing.

So what do you do when a manager asks for test case counts or pass-fail ratios? Here’s a reply from James Bach: “I’m sorry, but misleading you is not a service that I offer.” Consider offering a three-part testing story instead.

We’ll get to that last claim about a test report’s audience (they almost certainly don’t want to know about when the testing is going to be done (although they might think they do)) in the next and final post in this all-too-long series.

Breaking the Test Case Addiction (Part 10)

June 8th, 2020

This post serves two purposes. It is yet another installation in The Series That Ate My Blog; and it’s a kind of personal exploration of work in progress on the Rapid Software Testing Guide to Test Reporting. Your feedback and questions on this post will help to inform the second project, so I welcome your comments.

As a tester, your mission is to evaluate the product and report on its status, typically with a special emphasis on finding problems that matter. We’ve discussed bug reporting in the Rapid Testing Guide to Making Good Bug Reports. In this installment of Breaking the Test Case Addiction, I’m describing test reporting as something that responsible testers do.

Sounds straightforward, right? But right away, I want to address the risk of misunderstanding, so let me clear up what I mean by certain terms here.

Responsible Testers
Responsible testers are people who assume the role of tester on a project, and who commit themselves to doing that job well over time. Supporting testers (which we used to call “helpers”) help the test effort temporarily or intermittently, but are not committed to the testing role. Supporting testers are generally not required to report on their testing work to the same degree as responsible testers are.

Test Project
In this post, when I say test project, I’m referring to any set of activities focused on testing of any product or service, or any part of it: a low-level unit, a function, a component, a feature, a story, a service, an entire system… A test project can contain lots of little test projects. Accordingly, depending on the level of granularity we’re referring to, a test project might happen over moments or minutes, days, weeks, or months. A report on a test project might cover similar spans of time—instants, episodes, sprints, releases…

“Test project” here could refer to something that happens outside of development. More typically, it refers to testing activity that happens inside a development project, in parallel with the other aspects of development, like design, programming, or other testing.

Product
When I say product here, I mean anything that anyone has produced that might be subject to testing. While that includes running code, “product” could include code that is not running yet; prototypes and mockups; specifications and other requirement documents; flowcharts, diagrams, or state models; user documentation; sales and marketing material; or ideas about any of those things. When we refer to testing activity pointed at things that are static, like most of the items in the preceding list, we usually call it “review”; we might also call it “performing a thought experiment”. Review is a kind of testing activity that may be closely or distantly associated with performing a test—which brings us to what we mean by “testing”.

Testing, Test Activities, and Review
When I say testing here, I am using the Rapid Software Testing definition. To us, testing is the process of evaluating a product by learning about it through experiencing, exploring, and experimenting.

Testing includes many activities: questioning, studying, modeling, operating the product, manipulating it, making inferences, analyzing risk, thinking critically, recording the process, reporting on it, etc. Testing activities also include investigating and analyzing bugs and suspicious behaviour. Testing typically includes applying tools to help with any testing activities.

A test is an instance of testing, and to perform a test means to explore, experiment with, and gain experience of a product. In general, to perform a test implies that we will operate and observe a product or its output by some means.

In review, operation of the product as such typically isn’t available. In review, though, we engage in other testing activities as mentioned above. We can’t perform experiments on the running product but, as I mentioned above, we might perform thought experiments on it, imagining interactions between the product and the people using it. Of course, a thought experiment isn’t the same as a real-world experiment; that’s a key difference between review and performing a test.

Why go on about all this? Because reporting is central to our role as testers. We test; we learn; and we report on what we’ve learned.

Are you doing testing work of any kind, or even thinking about doing testing? Then you’ve got a test project on the go, and you can report on its status, even if your report starts with “I haven’t started testing the product yet, but here are some ideas about how we might go about it.”

Report
Next, let’s unpack the idea of a report. A report is a description, explanation, or justification of something. A report is a communication, but a report is not necessarily a document.

Communicating a report might happen as conversation in a hallway, or beside a coffee machine or a water cooler; as a couple of sentences uttered at a stand-up meeting; as a quick mention of a bug in passing to a developer; as a lengthy description of the status of the product and the status of testing at a go-live meeting. A report might be conveyed in writing as a paragraph, a page, or several pages of text; as (heaven help us) a PowerPoint presentation; or as hundreds of pages in bound books, formally presented to a government or regulatory body.

We might include or refer to artifacts collected or produced during the activity that led to the report—the reporter’s raw notes, data sets, program code, design notes for the activity itself. A report might be supplemented with illustrations, charts, graphs, or diagrams, sketched on a whiteboard or formally rendered on glossy paper. Or a report might be accompanied by photographs, audio, video, mind maps, tables, and references to other artifacts.

Test Report
A test report is any description, explanation, or justification of the status of a test project.

A comprehensive test report is all of those things together.

A professional test report is one that is competently, thoughtfully, and ethically designed to serve your clients in their context. A professional test report need not be a comprehensive test report, nor vice versa.

Some might say that a test report is “just the facts”, but it isn’t; it cannot be. A test report is based on facts, but it’s a story about facts—a story framed for the person or people receiving it. Stories always emphasize some things and leave other things out. We never have all the facts, and facts are sometimes in dispute. Stories are always, to some degree, biased by the storyteller and focused by what the storyteller wants the audience to hear, to learn, and to know. Those biases can seen be as problems in the report, features of it, or both.

The audience for your test report might include insiders who are directly involved in the testing and development work; other insiders (who might be overseeing that work, or affected by it without being directly involved); or outsiders.

For now, I’m going to assume your audience is in the first two categories. On that basis, it helps to consider what the audiences for a test report probably wants to know above all else.

They almost certainly don’t want to know about test case counts (although they might think they do).
They almost certainly don’t want to know about pass-fail ratios (although they might think they do).
They almost certainly don’t want to know about when the testing is going to be done (although they might think they do).

(I realize that these claims may sound strange to you. I will address these (non-)desires in a future post.)

Having been a program manager, a developer, and having worked with lots of them, I can tell you what those people almost certainly do want to know:

What is the actual status of the product? Are there problems that threaten the value of the product? Do these problems threaten the on-time, successful completion of our work?

A test report addresses those questions.

Three Aspects of Test Reporting
A good test report braids three strands of story together:

  • a story about the product and its status; what the product is, what it does, how it works, how it doesn’t work, and how it might not work in ways that matter to our various clients. This is a story about bugs, problems, and risks about the product.
  • a story about how the testing was done—how the product story in was obtained; how we configured, operated, observed, and evaluated the product. A thread in this second strand of the testing story involves describing the ways in which we recognized problems; our oracles. Another thread in this strand involves where we looked for problems; our coverage. Yet another thread includes what we haven’t covered yet, or won’t cover at all unless something changes.
  • a story about the quality of the testing work—why the testing that was done can be trusted, or to the degree that it is untrustworthy, issues that present obstacles to the fastest, least expensive, most powerful testing we can do. In this strand, we also identify what we might need or recommend to the testing better, and we may also provide a context and and evaluation of the quality of the report itself.

Most of the time, the client of the testing will be most interested in that first strand. Sometimes the client might be more interested in one of the other two. Nonetheless, whatever form the report might take, the reporter should at least be prepared to address all three strands.

(I’ve written more about this pattern here, here, and here.)

Credibility
If you’re not credible, your reports won’t be taken seriously. In your reporting, you may be delivering surprising or uncomfortable information. Your clients, unconsciously or deliberately, may assume that you’re mistaken or that you’re exaggerating risks, and they may try to micro-manage your reporting. Credibility is an antidote to all this.

To build and maintain credibility, it’s important to actually care about the project and the people on it. It’s important to take your work and your skills seriously, and to demonstrate that seriousness in your attitude, commitments, and behaviour. There will be more to say about this later, but for now…

  • Actually know how to do your job.
  • Gain experience with the product.
  • Study the technology in and around your project.
  • Read all of the relevant requirement, specification, and standards documents carefully, especially when you’re in a regulated environment.
  • Take notes diligently on your own work to inform your reporting.
  • Sweat the details in your own work.
  • Find things to appreciate about the work of others.
  • Acknowledge mistakes, correct them and learn from them.
  • Do not tell lies or exaggerate.

Examples
Note that Part 7 of this series included a number of test reports delivered verbally. Here I’m providing examples of test report documents.

As you survey them, you might want to consider the context for which they’re intended; the reporting levels that they focus on (product, testing, or quality-of-testing); the evidence or references included to support the report; and what the report might need or could leave out.

Note that while a couple of reports refer to specific things to be checked, there is rarely even a mention of test cases. The focus, instead, is usually on bugs or potential problems in the product that represent risk to the value of the product, and therefore risk to the business.

Spot Check Test Report

Click to access mpim-report.pdf


Here is an example of a real, comprehensive, professional test report, prepared by James Bach and edited by me. Over five pages, it describes a paired exploratory testing session that found problems in a real medical device. (The names, nouns and verbs have been changed to shield the identity of the company and the product.)

Cheese Grater Incident Report

Click to access cheesegrater.pdf


This is two reports in one: a whimsical yet serious report on repairing a broken Parmesan cheese dispenser; and a much longer, detailed set of notes on how to perform an investigation and report on it. Indeed, the latter section is a really worthwhile complement to this blog post.

OEW Case Tool

Click to access OEWCaseToolReport.pdf


An example of a two-page summary report (from 1994!) about a computer-aided software engineering (CASE) tool at Borland.

Y2K Compliance Report

Click to access Y2KComplianceReport.pdf


An eight-page report prepared for compliance with Y2K requirements, including notes on strategy; the test approaches that were applied (and risks that prompted those approaches); the results; and a list of specific items that needed to be checked.

OWL Quality Plan

Click to access OWLQualityPlan.pdf


This is a report on proposed plans for testing another Borland product, the Object Windows Library. The report includes a table linking product risks to testing work necessary to investigate those risks. It also includes a listing of components and sub-components in the product.

An Exploratory Tester’s Notebook

Click to access etnotebook.pdf


This paper on recording and reporting includes a report on my spontaneous investigation of an in-flight entertainment system, and a couple of session-based test management session sheets.

A Sticky Situation

Click to access 2012-02-AStickySituation.pdf


This is an example of a form of reporting that’s sometimes called an “information radiator”. It visualizes the status of a test project (and some degree of test coverage) using sticky notes.

The Low-Tech Testing Dashboard

Click to access dashboard.pdf


Of this, James Bach says “Back in 1997, I was challenged by top management to create a way to convey testing status at a glance. Thus was born the “low-tech testing dashboard” which has since been rendered in various electronic, distributed forms. The important thing about the dashboard is that there are no “measurements.” We don’t count anything. Instead there are assessments. These are subjective, yes, but always grounded in evidence.

Who Killed My Battery?

Click to access boneh-www2012.pdf


A splendid research paper on what drains mobile phone batteries… and why. Also a presentation on YouTube: https://www.youtube.com/watch?v=_uv057DP2Vs

Once again, these reports don’t focus test cases, but on testing. They’re examples of powerful and reasonable test reports that offer an alternative to management that is fixated on test cases.

Managers are more likely to relax their obsession with test cases when we provide them with reports that tell the product and testing stories.

Breaking the Test Case Addiction (Part 9)

February 15th, 2020

Last time, Frieda and I had been looking at visualizations of time spent on various testing activities, include work that foster test coverage of the product (T time), bug investigation and reporting (B time) and setup work to get ready to test, or tidying up afterwards (S time).

“So…,” Frieda mused, “I could track T-time, and B-time, and S-time. But I’d be a little worried about watching the clock all the time, instead of concentrating on my testing. It’d be like micro-managing myself.”

“That is worth worrying about,” I replied. “The last thing we want to be is obsessive-compulsive clock watchers. So here’s a secret: to some degree, we misrepresent our accounting of session time.”

“Oh, great,” said Frieda. “I thought this whole discussion has been about establishing trust.”

“It is. But it’s also about accounting for what we do in a way that everybody can make sense of what’s happening. And although we care about accuracy, precision isn’t too big a deal. In session-based test management, we’re trying to account for the effort that we’ve put in, but we’re also trying to make things easy enough for our clients to comprehend. So we don’t watch the clock all the time. A reasonable estimate of how much time we spent on T, B, and S is good enough. Precision to the nearest five or ten per cent will do. We’re not lying, but we are simplifying; smoothing out the details so they don’t get overwhelmed or obsessed or fooled by the numbers. Remember, the point of all this isn’t score-keeping. It’s to prompt us to ask questions. Mostly: are we okay with how we’re spending time?”

“Here’s an example,” I continued. “One day, after the morning standup, I start working a charter that covers some area of the product. Things go smoothly for the first 20 minutes or so, and then a developer comes up and asks me to help him with reproducing a problem that someone else reported. That goes on for 15 minutes.

“Then I get back to work on the charter. There are quick little interruptions along the way—a phone call here, and an instant message there—but by and large I can handle them quickly and keep the flow going for an hour and a half. I run into some bugs, and I run into some problems with a test tool that amount to setup time.

“Then it’s lunch. When I come back, I’m still looking at the same area. I work at it for 25 minutes, and the development manager wanders by for a chat. That takes 20 minutes. I get back into testing for 45 minutes, and then it’s Paula’s birthday, so I go to the lunchroom and eat cake and chat for 15 minutes.

“I get back and do testing work for 40 minutes, and then another tester asks me to look over a coverage outline they’ve done. That takes 10 minutes. Then I get back to the charter, and work it for another 25 minutes, and wrap it up.

“Now: if we add all that up, that’s just over five hours of clock time, of which an hour in total was interruptions. 245 minutes were spent on actual testing. If we think of a session as 90 minutes, that’s pretty close to three sessions worth of work.

“So when I’m reporting, I’ll probably submit that as two session sheets, one to describe what I did the morning and the other for the afternoon. I’ll account for the work as three sessions worth of time. I’ll make a reasonable guess as to how much I spent on T-time, B-time, and S-time for each one. Again, precision to the nearest five or ten per cent is good enough. With the TBS numbers, we’re trying to identify approximately how badly our coverage has been interrupted. If we’re not okay with what the approximation suggests, we’ll look into the specifics.”

“But won’t managers get upset if we don’t report the numbers precisely?” Frieda asked.

“Trust me,” I said, laughing. “They’re not watching that closely. They never are. They can’t want watch that closely; it’s not possible. They don’t have time to scrutinize everyone’s work every minute of every day. There’d be no point to it. Plus supervising people’s every move would undermine the social nature of work. People need to be unsupervised to some degree in order to feel trusted—and be—responsible for what they’re doing.

“Plus,” I noted, “if managers were watching closely, they would be horrified at home much time was being wasted on the care and feeding of test cases, and how little time was being spent on actual testing and collaborative work.”

“Heh,” said Frieda. “That’s true.”

“On the other hand,” I continued, “it would quite reasonable and important for them to know if your session time is being swamped by bug investigation and reporting, or by setup or followup work, or if interruptions from outside the session are preventing you from performing at least a couple of sessions worth of coverage a day.”

“Doesn’t that vary a lot?” Frieda asked. “I mean… some groups do a lot of stuff in meetings. You know, like design meetings and grooming meetings and project planning meetings. Should we track those?”

“Sure,” I said, “if you like. The key is this: if everyone is completely happy with a situation, don’t bother trying to measure anything in particular. But if someone is unhappy, or if someone has a feeling that there might be something to be unhappy about, then pay some attention to it. For instance, someone might say that testing is taking too long…”

“I’ve heard that before,” said Frieda.

“Uh huh. Too long compared to what? What part, or parts, specifically are taking too long? Get some data. After you’ve collected the data, ask questions about it. Analyze it. Are testers spending a lot of time in bug investigation? Why is that? Is it because they’re being overly detailed in preparing their reports? Are they investigating bugs for longer than necessary? Is it because the bugs are subtle and hard to reproduce? Or is it because there are so many bugs that it’s it’s overwhelming the testing time, and any opportunity for test coverage is destroyed?

“Each of those things should prompt a different management action, or a different change in behaviour. Maybe the problem is not really that the testing is taking too long, but that the developers are under too much pressure. They’re producing code so quickly that they don’t have a good handle on what they’re building, and they don’t have time to check their work. Or maybe the problem is that the testers are spending tons of time writing up bug reports—and maybe a solution to that we be to have the testers work right next to the developers. Then, instead of doing unnecessary paperwork, the testers could simply demo some bugs to the developers right away.

“The point of activity-based test management is to avoid turning testing work into production of artifacts. To prevent testers from being turned into test case machines.”

“What happens when somebody wants artifacts?” asked Frieda. “That’s a big reason managers say they want test cases… so they can know for sure that the work got done.”

“You know there’s a term for that, in our lingo: test integrity. Test integrity is about making sure the testing we say we did matches up with the testing we actually did. Are test cases the only way that managers could know that work got done?”, I asked.

“Well…,” she replied. “I guess there’s debriefing, as we were talking about. But they want… evidence. You know, something in writing.”

“How about the tester’s notes?”

“Hmmm…” Frieda paused. “Most testers aren’t that great at taking notes.”

“I agree,” I said. “I’ve seen that too, and it can be a real problem. People doing good investigative work—journalists, lab researchers, detectives—need to keep good notes. Testers do too. I like to tell testers that it’s okay not to keep good notes… as long as you want to forget lots of important stuff.”

“Why aren’t testers good at taking notes?” Frieda asked.

“I think there’s a feedback loop at work,” I replied. “People don’t do good investigative work when they’re following formally scripted test cases — and they don’t tend to take good notes either. Why should they? They just do what the script tells them to do, and the mission turns from ‘test the product’ into ‘follow the script’. That makes testing rote, and boring, and it derails the task of looking for problems. Why even bother to take notes in that case? And then, since people don’t practice taking notes, their note-taking skills decline. And then when they’re given a chance to work in a less scripted way, they don’t take good notes. They forget important details of what they were up to, and even if they remember, they might not have evidence.”

“So,” I continued, “one way to get people to learn to keep good notes is to set them free from writing and executing test cases. But the deal is that, in return, they have to produce some kind of evidence of what they were thinking and doing. They can show me that stuff to supplement the debriefing, and we can review it together. Tidy notes, taken every couple of minutes or so, tend to be helpful. I’d like to see what their test ideas were, or what risks they considered as they went. If they’ve used specific test data and examined specific behaviours, they can show me lists or tables or mind maps. If they’ve written some code to help them test, they can show me the code and the output from it. Their notes don’t have to be ponderous or bureaucratic, but I want to see something that helps me to follow their thought process and develop trust.”

“Some managers are really worried about that integrity stuff,” said Frieda.

“That’s reasonable,” I replied. “If I were managing a project for which integrity were an issue, like in a medical hardware or software context, I still wouldn’t make people follow test cases most of the time. If stuff needs to be checked, automate it. For high integrity, I’d require formal session reports as part of the deliverables, and I’d give the testers constant feedback on them. In session-based test management, for instance, there’s this concept of the session sheet that combines test notes, data about the session, and references to artifacts that were generated during the session. Things like test results, snippets of test code, or even screen shots or videos if they’re helpful.

“Before the session, I might identify specific factors to examine, or output values to check. I might charter them to use to tables of existing data. More often I’d get the tester to develop those things independently, and then show them to me along with the session sheet during the debriefing. Then we can discuss the tester’s choices and actions, and figure out how well we’re covering the product and what needs to be done next. And after that we can summarize session sheets into reports for managers, auditors, regulators, or anyone else who’s looking for something formal.”

For more on note-taking and session sheets, see https://www.developsense.com/presentations/etnotebook.pdf

Breaking the Test Case Addiction (Part 8)

December 9th, 2019

Throughout this series, we’ve been looking at an an alternative to artifact-based approaches to performing and accounting for testing: an activity-based approach.

Frieda, my coaching client, and I had been discussing how to manage testing without dependence on formalized, scripted, procedural test cases. Part of any approach to making work accountable is communication between a manager or test lead and the person who had done the work. In session-based test management, one part of this communication is a conversation that we call a debrief, and that’s what we talked about last time.

One of the important elements of a debrief is accounting for the time spent on the work. And that’s why one of the most important questions in the debrief is What did you spend your time doing in this session?

“Ummm… That would be ‘testing’, presumably, wouldn’t it?” Frieda asked.

“Well,” I replied, “there’s testing, and then there’s other work that happens in the session. And there are pretty much inevitably interruptions of some kind.”

“For sure,” Frieda agreed. “I’m getting interrupted every day, all the time: instant messages, phone calls, other testers asking me for help, programmers claiming they can’t reproduce the bug on their machines…”

“Interruptions are a Thing, for sure,” I said. “Let’s talk about those in a bit. First, though, let’s consider what you’d be doing during a testing session in which you weren’t interrupted. Or if we didn’t talk about the interruptions, for a moment. What would you be doing?”

“Testing. Performing tests. Looking for bugs,” said Frieda.

“Right. Can you go deeper? More specific?”

“OK. I’d be learning about the product, exercising test conditions, increasing test coverage. I’d be keeping notes. If I were making a mind map, I’d be adding to it, filling in the empty areas where I hadn’t been before. Each bit of testing I performed would add to coverage.”

“‘Each bit of testing,'”, I repeated. “All right; let’s imagine that you set up a 90-minute session where you could be uninterrupted. Lock the office door…”

“…the one that I don’t have…”, Frieda said.

“Natch. It’s cubicle-land where you work. But let’s say you put up a sign that said “Do not disturb! Testing is in Session!” Set the phone to Send Calls, shut off Slack and Skype and iMessage and what-all… In that session, let’s just say that you could do a bunch of two-minute tests, and with each one of those tests, you could learn something specific about the product.”

“That’s not how testing really works! That sounds like… test cases!” Frieda said.

“I know,” I grinned. “You’re right. I agree. But let’s suspend that objection for a bit while we work through this. Imagine that 90-minute session rendered as a nine-by-five table of 45 little microbursts of test activity. The kind of manager that you’ve been role-playing here thinks this will happen.”

A Manager's Fantasy of an Ideal Test Session

Frieda chuckled. “Manager’s Fantasy Edition. That’s about right.”

“Indeed,” I said. “But why?”

“Well, obviously, when I’m testing, I find bugs. When I do, start investigating. I start figuring out how to reproduce the bug, so I can write it up. And then I write it up.”

“Right,” I said. “But even though it’s part of testing, it’s got a different flavour than the learning-focused stuff, doesn’t it?”

“Definitely,” said Frieda. ” When I find a bug, I’m not covering new territory. It’s like I’m not adding to the map I’m making of the product. It’s more like I’m staying in the same place for a while while I investigate.”

“Is that a good thing to do?”

“Well…, yes,” Frieda replied. “Obviously. Investigating bugs is a big part of my job.”

“Right. And it takes time. How much?”

“Well,” Frieda began, “A lot of the time I repeat the test to make sure I’m really seeing a bug. Then I try to find out how to reproduce it reliably, in some minimum set of steps, or with some particular data. Sometimes I try some variations to see if I can find other problems around that problem. Then I’ve got to turn all that into a bug report, and log it in the tracking system. Even if I don’t write it up formally, I have to talk to the developer about it.”

“So, quite a bit of time,” I said.

“Yep,” she said. “And another thing: some bugs block me and prevent me from getting to part of the product I want to test. Trying to work around the blockers takes time too. So… like I said, while I’m doing all those things, I’m not covering new ground. It’s like being stuck in the mud on a flooded road.”

“If I were your manager, and if I were concerned about your productivity, I’d want to know about stuff like that,” I said. That’s why, in session-based test management, we keep track of several kinds of testing time. Let’s start with two: test design and execution, in which we’re performing tests, learning about the product, gaining a better understanding of it. Of course, our focus is on activity that will either find a bug, or help us to find a bug. We call that T-time, for short, and distinguish it from bug investigation and reporting—B-time—which includes the stuff that you were just talking about. The key thing is that B-time interrupts T-time.”

Frieda’s brow furrowed. “Or, to put it another way, investigating bugs reduces test coverage.”

“Yes. And when it does, it’s important for managers to know about it. As a manager, I don’t want to be fooled about coverage—that is, how much of the product that we’ve examined with respect to some model.

“You start a session with a charter that’s intended to cover something we want to know about. In a 90-minute session, it’s one thing if a tester spends 80 minutes covering some product area with testing and only ten minutes investigating bugs. It’s a completely different thing if the tester spends 80 minutes investigating bugs, and only ten minutes on tests that produced new coverage. If you only spend ten percent of the time addressing the charter, and the rest on investigating a bug that you’ve found, I’d hope you’d report that you hadn’t accomplished your charter.”

“Wait… what if I were nervous about that?” Frieda asked. “Doesn’t it look bad if I haven’t achieved the goal for the session?”

“Not necessarily,” I replied. “We can have the best of intentions and aspirations for a session before it starts But the product is what it is, and whatever happens, happens. Whatever the charter suggests, there’s an overarching mission for every session: investigate the product and report on the problems in it. If you’re having to report lots of bugs because they’re there, and you’re doing it efficiently, that shouldn’t be held against you. Testers don’t put the bugs in. If there are problems to report, that takes time, and that’s going to reduce coverage time. If you’re finding and investigate a lot of bugs, there’s no shame in not covering what we might hope you’d cover. Plus, bug investigation helps the developers to understand what they’re dealing with, so that’s a service to the team.”

Frieda looked concerned. “Not very managers I’ve worked with would understand that. They’d just say, ‘Finish the test cases!’ and be done with it.”

“That can be an issue, for sure. But a key part of testing work these days is to help managers to learn how to become good clients for testing. That sometimes means spelling out certain things explicitly. For instance: if you find a ton of bugs in during in a session, that’s bad enough, in that you’ve got a lot less than a session’s worth of test coverage. But there’s something that might be even worse on top of that: you have found only the shallowest bugs By definition; the bugs you’ve found already were the easiest bugs to find. A swarm of shallow bugs is often associated with an infestation of deeper bugs.”

“So, in that situation, I’m going to need a few more sessions to obtain the coverage we intended to achieve with the first one,” said Frieda.

“Right. And you if you’re concerned about risk, you’ll may want to charter more, deeper testing sessions, because—again, by definition—deeper bugs are harder to find.”

Frieda paused. “You said there were several kinds of testing time. You mentioned T-time and B-time. That’s only two.”

“Yes. At very least, there’s also Setup time, S-time. While you’re setting up for a test, you aren’t obtaining coverage, and you’re not investigating or reporting a bug. Actually, setting up is only one thing covered our notion of “Setup”. S-time is a kind of catch-all for time within the session in which you couldn’t have found a bug. Maybe you’re configuring the product or some tool; maybe you’re resetting the system after a problem; maybe you’re tidying up your notes.”

“Or reading about the product? Or talking with somebody about it?”, Frieda asked.

“Right. Anything that’s necessary to get the work done, but that isn’t T-time or B-time. So instead of that Manager’s Fantasy Version of the session, a real session often looks like this:”

A More Plausible Test Session

“Or even this.”

A Common Test Session

“Wow,” said Frieda. “I mean, that second one is totally realistic to me. And look at how little gets covered, and how much doesn’t get covered.”

“Yeah. When we visualize like this, makes an impression, doesn’t it? Trouble is, not very many testers help managers connect those dots. As you said, if you want to achieve the coverage that the manager hoped for in the Fantasy Edition, this helps to show that you’ll need something like four sessions to get it, not just one. Plus the bugs that you’ve found in that one session are by definition the shallowest bugs, the ones closest to the surface. Hidden, rare, subtle, intermittent, emergent bugs… they’re deeper.”

Frieda still had a few more questions, which we’ll get to next time.

Breaking the Test Case Addiction (Part 7)

June 10th, 2019

Throughout this series, we’ve been looking at an an alternative to artifact-based approaches to testing: an activity-based approach.

In the previous post, we looked at a kind of scenario testing, using a one-page sheet to guide a tester through a session of testing. The one-pager replaces explicit, formal, procedure test cases with a theme and a set of test ideas, a set of guidelines, or a checklist. The charter helps to steer the tester to some degree, but the tester maintains agency over her work. She has substantial freedom make her own choices from one moment to the next.

Frieda, my coaching client, anticipated what her managers would say. In our coaching session, she played the part of her boss. “With test cases,” she said, in character, “I can be sure about what has been tested. Without test cases, how will anyone know what the tester has done?”

A key first step in breaking the test case addiction is acknowledging the client’s concern. I started my reply to “the manager” carefully. “There’s certainly a reasonable basis for that question. It’s important for managers and other clients of testing to know what testing has been done, and how the testers have done it. My first step would be to ask them about those things.”

“How would that work?”, asked Frieda, still in her role. “I can’t be talking to them all the time! With test cases, I know that they’ve followed the test cases, at least. How am I supposed to trust them without test cases?”

“It seems to me that if you don’t trust them, that’s a pretty serious problem on its own—one of the first things to address if you’re a manager. And if you mistrust them, can you really trust them when they tell you that they’ve followed the test cases? And can you trust that they’ve done a good job in terms of the things that the test cases don’t mention?”

“Wait… what things?” asked “the manager” with a confused expression on her face. Frieda played the role well.

“Invisible things. Unwritten things. Most of the written test cases I’ve seen refer only to conditions or factors that can be observed or manipulated; behaviours that can be described or encoded in strings or sentences or numbers or bits. It seems to me that a test case rarely includes the motivation for the test; the intention for it; how to interpret the steps. Test cases don’t usually raise new questions, or encourage testers to look around at the sides of the path.

“Now,” I continued, “some testers deal with that stuff really well. They act on those unspoken, unwritten things as they perform the test. Other testers might follow the test case to the letter — yet not find any bugs. A tester might not even follow the test case at all, and just say that he followed it. Yet that tester might find lots of important bugs.”

“So what am I supposed to do? Watch them every minute of every day?”

“Oh, I don’t think you can do that,” I replied. “Watching everybody all the time isn’t reasonable and it isn’t sustainable. You’ve got plenty of important stuff to do, and besides, if you were watching people all the time, they wouldn’t like it any more than you would. As a manager, you must to be able to give a fair degree of freedom and responsibility to your testers. You must be able to extend some degree of trust to them.”

“Why should I trust them? They miss lots of bugs!” Frieda seemed to have had a lot of experience with difficult managers.

“Do you know why they miss bugs?” I asked. “Maybe it’s not because they’re ignoring the test cases. Maybe it’s because they’re following them too closely. When you give someone very specific, formalized instructions and insist that they follow them, that’s what they’ll do They’ll focus on following the instructions, but not on the overarching testing task, which is learning about the product and finding problems in it.”

“So how should I get them to do that?”, asked “the manager”.

“Don’t turn test cases into the mission. Make their mission learning about the product and finding problems in it.”

“But how can I trust them to do that?”

“Well,” I replied, “let’s look at other people who focus on investigation: journalists; scientific researchers; police detectives. Their jobs are to make discoveries. They don’t follow scripted procedures. No one sees that as a problem. They all work under some degree of supervision—journalists report to editors; researchers in a lab report to senior researchers and to managers; detectives report to their superiors. How do those bosses know what their people are doing?”

“I don’t know. I imagine they check in from time to time. They meet? They talk?”

“Yes. And when they do, they describe the work they’ve done, and provide evidence to back up the description.”

“A lot of the testers I work with aren’t very good at that,” said Frieda, suddenly as herself. “I worry sometimes that I’m not good at that.”

“That’s a good thing to be concerned about. As a tester, I would want to focus on that skill; the skill of telling the story of my testing. And as a manager, I’d want to prepare my testers to tell that story, and train them in how to do it any time they’re asked.”

“What would that be like?”, asked Frieda.

“It varies. It depends a lot on tacit knowledge.”

“Huh?”

“Tacit knowledge is what we know that hasn’t been made explicit—told, or written down, or diagrammed, or mapped out, or explained. It’s stuff that’s inside someone’s head; or it’s physical things that people do that has become second nature, like touch typing; or it’s cultural, social—The Way We Do Things Around Here.

“The profile of a debrief after a testing session varies pretty dramatically depending on a bunch of context factors: where we are in the project, how well the tester knows the product, and how well we know each other.

“Let me take you through one debrief. I’ll set the scene: we’re working on a product—a project management system. Karla is an experienced tester who’s been testing the product for a while. We’ve worked together for a long time too, and I know a lot about how she tests. When I debrief her, there’s a lot that goes unsaid, because I trust her to tell me what I need to know without me having to ask her too much. We both summarize. Here’s how the conversation with Karla might play out.”

Me: (scanning the session sheet) The charter was to look at task updates from the management role. Your notes look fine. How did it go?

Karla: Yeah. It’s not in bad shape. It feels okay, and I’m mostly done with it. There’s at least one concurrency problem, though. When a manager tries to reassign a task to another tester, and that task is open because the assigned tester is updating it, the reassignment doesn’t stick. It’s still assigned to the original tester, not the one the manager assigned. Seems to me that would be pretty rare, but it could happen. I logged that, and I talked about it to Ron.

Me: Anything else?

Karla: Given that bug, we might want to do another session on any kind of update. Maybe part of a session. Ron tells me async stuff in Javascript can be a bear. He’s looking into a way of handling the sequence properly, and he should have a fix by the end of the day. I wouldn’t mind using part of a session to script out some test data for that.

Me: Okay. Want to look at that tomorrow, when you look at the reporting module? And anything else I should know?

Karla: I can get to that stuff in the morning. It’d be cool to make sure the programmers aren’t mucking around in the test environment, though. That was 20 minutes of Setup.

Me: Okay, I’ll tell them to stay out.

“And that’s it,” I said.

“That’s it?”, asked Frieda. “I figured a debrief would be longer than that.”

“Oh, it could be,” I replied. “If the tester is inexperienced or new to me; if the test notes have problems; if the product or feature is new or gnarly; or if the tester found lots of bugs or ran into lots of obstacles, the debrief can take a while longer.

When I want to co-ordinate testing work for a bunch of people, or when I anticipate that someone might want to scrutinize the work, or when I’m in a regulated environment, I might want to be extra-careful and structure the conversation more formally. I might even want to checklist the debriefing.

No matter what, though, I have a kind of internal checklist. In broad terms, I’ve got three big questions: How’s the product? How do we know? Why should I trust what we know, and what do we need to get a better handle on things?”

“That sounds like four questions,” Frieda smiled. “But it also sounds like the three-part testing story.”

“Right you are. So when I’m asking focused questions, I’d start with the charter:

  • Did you fulfill your charter? Did you cover everything that the charter was intended to cover?
  • If you didn’t fulfill the charter, what aspects of the charter didn’t get done?
  • What else did you do, even if it was outside the scope of the mission?

“What I’m doing here is trying to figure out whether the charter was met as written, or if we need to adjust the it to reflect what really happened. After we’ve established that, I’ll ask questions in three areas that overlap to some degree. I won’t necessarily ask them in any particular order, since each answer will affect my choice of the next question.”

“So a debriefing is an exploratory process too!” said Frieda.

“Absolutely!” I grinned. “I’ll tend to start by asking about the product:

  • How’s the product? What is it supposed to do? Does it do that?
  • How do you know it’s supposed to to that?
  • What did you find out or learn? In particular, what problems did you find?

“I’ll ask about the testing:

  • What happened in the course of the session?
  • What did you cover, and how did you cover it it?
  • What product factors did you focus on?
  • What quality criteria were you paying the most attention to?
  • If you saw problems, how did you know that they were problems? What were your oracles?
  • Was there anything important from the charter that you didn’t cover?
  • What testing around this charter do you see as important, but has not yet been done?

“Based on things that come up in response to these questions, I’ll probably have some others:

  • What work products did you develop?
  • What evidence do you have to back the story? What makes it credible?
  • Where can people find that evidence? Why, or why not, should we hang on to it?
  • What testing activity should, or could, happen next or in the more distant future?
  • What might be necessary to enable that activity?

“That last question is about practical testability.”

“Geez, that’s a lot of questions,” said Frieda.

“I don’t necessarily ask them all every time. I usually don’t have to. I will go through a lot of them when a tester is new to this style of working, or new to me. In those cases, as a manager, I have to take more responsibility for making sure about what was tested—what we know and what we don’t. Plus these kinds of questions—and the answers—help me to figure out whether the tester is learning to be more self-guided

“And then I’ve got three more on my list:

  • What factors might have affected the quality of the testing?
  • What got in the way, made things harder, made things slower, made the testing less valuable?
  • What ongoing problems are you having?
  • Frieda frowned. “A lot of the managers I’ve worked with don’t seem to want to know about the problems. They say stuff like, ‘Don’t come to me with problems; come to me with solutions.'”

    I laughed. “Yeah, I’ve dealt with those kinds of managers. I usually don’t want to go them at all. But when I do, I assure them that I’m really stuck and that I need management help to get unstuck. And I’ve often said this: ‘You probably don’t want to hear about problems; no one really does. But I think it would be worse for everyone if you didn’t know about them.’

    “And that leads to one more important question:

    • What did you spend your time doing in this session?”

    “Ummm… That would be ‘testing’, presumably, wouldn’t it?” Frieda asked.

    “Well,” I replied, “there’s testing, and then there’s other work that happens in the session.”

    We’ll talk about that next time.

Breaking the Test Case Addiction (Part 6)

February 5th, 2019

In the last installment, we ended by asking “Once the tester has learned something about the product, how can you focus a tester’s work without over-focusing it?

I provided some examples in Part 4 of this series. Here’s another: scenario testing. The examples I’ll provide here are based on work done by James Bach and Geordie Keitt several years ago. (I’ve helped several other organizations apply this approach much more recently, but they’re less willing to share details.)

The idea is to use scenarios to guide the tester to explore, experiment, and get experience with the product, acting on ideas about real-world use and about how the product might foreseeably be misused. It’s nice to believe that careful designs, unit testing, BDD, and automated checking will prevent bugs in the product — as they certainly help to do — but to paraphrase Gertrude Stein, experience teaches experience teaches. Pardon my words, but if you want to discover problems that people will encounter in using the product, it might be a good idea to try using the damned product.

The scenario approach that James and Geordie developed uses richer, more elaborate documentation than the one- to three-sentence charters of session-based test management. One goal is to prompt the tester to perform certain kinds of actions to obtain specific kinds of coverage, especially operational coverage. Another goal is to make the tester’s mission more explicit and legible for managers and the rest of the team.

Preparing for scenario testing involves learning about the product using artifacts, conversations, and preliminary forms of test activity (I’ve given examples throughout this series, but especially in Part 1). That work leads into developing and refining the scenarios to cover the product with testing.

Scenarios are typically based around user roles, representing people who might use the product in particular ways. Create at least a handful of them. Identify specifics about them, certainly about the jobs they do and the tasks they perform. You might also want to incorporate personal details about their lives, personalities, temperaments, and conditions under which they might be using the product.

(Some people refer to user roles as “personas”, as the examples below do. A word of caution over a potential namespace clash: what you’ll see below is a relatively lightweight notion of “persona”. Alan Cooper has a different one, which he articulated for design purposes, richer and more elaborate than what you’ll see here. You might seriously consider reading his books in any case, especially About Face (with Reimann, Cronin, and Noessel) and the older The Inmates are Running the Asylum.)

Consider not only a variety of roles, but a variety of experience levels within the roles. People may be new to our product; they may be new to the business domain in which our product is situated; or both. New users may be well or poorly trained, subject to constant scrutiny or not being observed at all. Other users might be expert in past versions of our products, and be irritated or confused by changes we’ve made.

Outline realistic work that people do within their roles. Identify specific tasks that they might want to accomplish, and look for things that might cause problems for them or for people affected by the product. Problems might take the form of harm, loss, or diminished value to some person who matters. Problems might also include feelings like confusion, irritation, frustration, or annoyance.

Remember that use cases or user stories typically omit lots of real-life activity. People are often inattentive, careless, distractable, under pressure. People answer instant messages, look things up on the web, cut and paste stuff between applications. They go outside, ride in elevators, get on airplanes and lose access to the internet; things that we all do every day that we don’t notice. And, very occasionally, they’re actively malicious.

Our product may be a participant in a system, or linked to other products via interfaces or add-ins or APIs. At very least, our product depends on platform elements: the hardware upon which it runs; peripherals to which it might be connected, like networks, printers, or other devices; application frameworks and libraries from outside our organization; frameworks and libraries that we developed in-house, but that are not within the scope of our current project.

Apropos of all this, the design of a set of scenarios includes activity patterns or moves that a tester might make during testing:

  • Assuming the role or persona of a particular user, and performing tasks that the user might reasonably perform.
  • Considering people who are new to the product and/or the domain in which the product operates (testing for problems with ease of learning)
  • Considering people who have substantial experience with the product (testing for problems with ease of use).
  • Deliberately making foreseeable mistakes that a user in a given role might make (testing for problems due to plausible errors).
  • Using lots of functions and features of the product in realistic but increasingly elaborate ways, and that trigger complex interactions between functions.
  • Working with records, objects, or other data elements to cover their entire lifespan: creating, revising, refining, retrieving, viewing, updating, merging, splitting, deleting, recovering… and thereby…
  • Developing rich, complex sets of data for experimentation over periods longer than single sessions.
  • Simulating turbulence or friction that a user might encounter: interruptions, distractions, obstacles, branching and backtracking, aborting processes in mid-stream, system updates, closing the laptop lid, going through a train tunnel…
  • Working with multiple instances of the product, tools, and/or multiple testers to introduce competition, contention, and conflict in accessing particular data items or resources.
  • Giving the product to different peripherals, running it on different hardware and software platforms, connecting it to interacting applications, working in multiple languages (yes, we do that here in Canada).
  • Reproducing behaviours or workflows from comparable or competing products.
  • Considering not only the people using the product, but the people that interact with them; their customers, clients, network support people, tech support people, or managers.

To put these ideas to work at ProChain (a company that produces project management software), James and Geordie developed a scenario playbook.
Let’s look at some examples from it.

The first exhibit is a one-page document that outlines the general protocol for setting up scenario sessions.

PCE Scenario Testing Setup
PCE Scenario Testing General Setup Sheet

This document is an overview that applies to every sessions. It is designed primarily to give managers and supporting testers a brief overview of the process and and how it should be carried out. (A supporting tester is someone who is not a full-time tester, but is performing testing under the guidance and supervision of a responsible tester — an experienced tester, test lead, or a test manager. A responsible tester is expected to have learned and internalized the instructions on this sheet.) There are general notes here for setting up and patterns of activities to be performed during the session.

Testers should be familiar with oracles by which we recognize problems, or should learn about oracles quickly. When this document was developed, there was a list of patterns of consistency with the mnemonic acronym HICCUPP; that’s now FEW HICCUPPS. For any given charter, there may be specific consistency patterns, artifacts, documents, tools, or mechanisms to apply that can help the tester to notice and describe problems.

Here’s an example of a charter for a specific testing mission:

PCE Scenario Testing Example Charter 1
PCE Scenario Testing Example Charter 1

The Theme section outlines the general purpose of the session, as a one- to three- line charter would in session-based test management. The Setup section identifies anything that should be done specifically for this session.

Note that the Activities section offers suggestions that are both specific and open. Openness helps to encourage variation that broadens coverage and helps to keep the tester engaged (“For some tasks…”; “…in some way,…”). The specificity helps to focus coverage (“set the task filter to show at least…”; the list of different ways to update tasks).

The Oracles section identifies specific ways for the tester to look for problems, in addition to more general oracle principles and mechanisms. The Variations section prompts the tester to try ideas that will introduce turbulence, increase stress, or cover more test conditions.

A debrief and a review of the tester’s notes after the session helps to make sure that the tester obtained reasonable coverage.

Here’s another example from the same project:

Here the tester is being given a different role, which requires a different set of access rights and a different set of tasks. In the Activities and Variations section, the tester is encouraged to explore and to put the system into states that cause conflicts and contention for resources.

Creating session sheets like these can be a lot more fun and less tedious than typing out instructions in formally procedurally scripted test cases. Because they focus on themes and test ideas, rather than specific test conditions, the sheets are more compact and easier to review and maintain. If there are specific functions, conditions, or data values that must be checked, they can be noted directly on the sheet — or kept separately with a reference to them in the sheet.

The sheets provide plenty of guidance to the tester while giving him or her freedom to vary the details during the session. Since the tester has a general mission to investigate the product, but not a script to follow, he or she is also encouraged and empowered to follow up on anything that looks unusual or improper. All this helps to keep the tester engaged, and prevents him or her from being hypnotized by a script full of someone else’s ideas.

You can find more details on the development of the scenarios in the section “PCE Scenario Testing” in the Rapid Software Testing Appendices.

Back in our coaching session, Frieda once again picked up the role of the test-case-fixated manager. “If we don’t give them test cases, then there’s nothing to look at when they’re done? How will we know for sure what the tester has covered?”

It might seem as though a list of test cases with check marks beside them would solve the accountability problem — but would it? If you don’t trust a tester to perform testing without a script, can you really trust him to perform testing with one?

There are lots of ways to record testing work: the tester’s personal notes or SBTM session sheets, check marks and annotations on requirements and other artifacts, application log files, snapshot tools, video recording… Combine these supporting materials with a quick debriefing to make sure that the tester is working in professional way and getting the job done. If the tester is new, or a supporting tester, increase training, personal supervision and feedback until he or she gains your trust. And if you still can’t bring yourself to trust them, you probably shouldn’t have them testing for you at all.

Frieda, still in character, replied “Hmmm… I’d like to know more about debriefing.Next time!