Breaking the Test Case Addiction (Part 7)

June 10th, 2019

Throughout this series, we’ve been looking at an an alternative to artifact-based approaches to testing: an activity-based approach.

In the previous post, we looked at a kind of scenario testing, using a one-page sheet to guide a tester through a session of testing. The one-pager replaces explicit, formal, procedure test cases with a theme and a set of test ideas, a set of guidelines, or a checklist. The charter helps to steer the tester to some degree, but the tester maintains agency over her work. She has substantial freedom make her own choices from one moment to the next.

Frieda, my coaching client, anticipated what her managers would say. In our coaching session, she played the part of her boss. “With test cases,” she said, in character, “I can be sure about what has been tested. Without test cases, how will anyone know what the tester has done?”

A key first step in breaking the test case addiction is acknowledging the client’s concern. I started my reply to “the manager” carefully. “There’s certainly a reasonable basis for that question. It’s important for managers and other clients of testing to know what testing has been done, and how the testers have done it. My first step would be to ask them about those things.”

“How would that work?”, asked Frieda, still in her role. “I can’t be talking to them all the time! With test cases, I know that they’ve followed the test cases, at least. How am I supposed to trust them without test cases?”

“It seems to me that if you don’t trust them, that’s a pretty serious problem on its own—one of the first things to address if you’re a manager. And if you mistrust them, can you really trust them when they tell you that they’ve followed the test cases? And can you trust that they’ve done a good job in terms of the things that the test cases don’t mention?”

“Wait… what things?” asked “the manager” with a confused expression on her face. Frieda played the role well.

“Invisible things. Unwritten things. Most of the written test cases I’ve seen refer only to conditions or factors that can be observed or manipulated; behaviours that can be described or encoded in strings or sentences or numbers or bits. It seems to me that a test case rarely includes the motivation for the test; the intention for it; how to interpret the steps. Test cases don’t usually raise new questions, or encourage testers to look around at the sides of the path.

“Now,” I continued, “some testers deal with that stuff really well. They act on those unspoken, unwritten things as they perform the test. Other testers might follow the test case to the letter — yet not find any bugs. A tester might not even follow the test case at all, and just say that he followed it. Yet that tester might find lots of important bugs.”

“So what am I supposed to do? Watch them every minute of every day?”

“Oh, I don’t think you can do that,” I replied. “Watching everybody all the time isn’t reasonable and it isn’t sustainable. You’ve got plenty of important stuff to do, and besides, if you were watching people all the time, they wouldn’t like it any more than you would. As a manager, you must to be able to give a fair degree of freedom and responsibility to your testers. You must be able to extend some degree of trust to them.”

“Why should I trust them? They miss lots of bugs!” Frieda seemed to have had a lot of experience with difficult managers.

“Do you know why they miss bugs?” I asked. “Maybe it’s not because they’re ignoring the test cases. Maybe it’s because they’re following them too closely. When you give someone very specific, formalized instructions and insist that they follow them, that’s what they’ll do They’ll focus on following the instructions, but not on the overarching testing task, which is learning about the product and finding problems in it.”

“So how should I get them to do that?”, asked “the manager”.

“Don’t turn test cases into the mission. Make their mission learning about the product and finding problems in it.”

“But how can I trust them to do that?”

“Well,” I replied, “let’s look at other people who focus on investigation: journalists; scientific researchers; police detectives. Their jobs are to make discoveries. They don’t follow scripted procedures. No one sees that as a problem. They all work under some degree of supervision—journalists report to editors; researchers in a lab report to senior researchers and to managers; detectives report to their superiors. How do those bosses know what their people are doing?”

“I don’t know. I imagine they check in from time to time. They meet? They talk?”

“Yes. And when they do, they describe the work they’ve done, and provide evidence to back up the description.”

“A lot of the testers I work with aren’t very good at that,” said Frieda, suddenly as herself. “I worry sometimes that I’m not good at that.”

“That’s a good thing to be concerned about. As a tester, I would want to focus on that skill; the skill of telling the story of my testing. And as a manager, I’d want to prepare my testers to tell that story, and train them in how to do it any time they’re asked.”

“What would that be like?”, asked Frieda.

“It varies. It depends a lot on tacit knowledge.”

“Huh?”

“Tacit knowledge is what we know that hasn’t been made explicit—told, or written down, or diagrammed, or mapped out, or explained. It’s stuff that’s inside someone’s head; or it’s physical things that people do that has become second nature, like touch typing; or it’s cultural, social—The Way We Do Things Around Here.

“The profile of a debrief after a testing session varies pretty dramatically depending on a bunch of context factors: where we are in the project, how well the tester knows the product, and how well we know each other.

“Let me take you through one debrief. I’ll set the scene: we’re working on a product—a project management system. Karla is an experienced tester who’s been testing the product for a while. We’ve worked together for a long time too, and I know a lot about how she tests. When I debrief her, there’s a lot that goes unsaid, because I trust her to tell me what I need to know without me having to ask her too much. We both summarize. Here’s how the conversation with Karla might play out.”

Me: (scanning the session sheet) The charter was to look at task updates from the management role. Your notes look fine. How did it go?

Karla: Yeah. It’s not in bad shape. It feels okay, and I’m mostly done with it. There’s at least one concurrency problem, though. When a manager tries to reassign a task to another tester, and that task is open because the assigned tester is updating it, the reassignment doesn’t stick. It’s still assigned to the original tester, not the one the manager assigned. Seems to me that would be pretty rare, but it could happen. I logged that, and I talked about it to Ron.

Me: Anything else?

Karla: Given that bug, we might want to do another session on any kind of update. Maybe part of a session. Ron tells me async stuff in Javascript can be a bear. He’s looking into a way of handling the sequence properly, and he should have a fix by the end of the day. I wouldn’t mind using part of a session to script out some test data for that.

Me: Okay. Want to look at that tomorrow, when you look at the reporting module? And anything else I should know?

Karla: I can get to that stuff in the morning. It’d be cool to make sure the programmers aren’t mucking around in the test environment, though. That was 20 minutes of Setup.

Me: Okay, I’ll tell them to stay out.

“And that’s it,” I said.

“That’s it?”, asked Frieda. “I figured a debrief would be longer than that.”

“Oh, it could be,” I replied. “If the tester is inexperienced or new to me; if the test notes have problems; if the product or feature is new or gnarly; or if the tester found lots of bugs or ran into lots of obstacles, the debrief can take a while longer.

When I want to co-ordinate testing work for a bunch of people, or when I anticipate that someone might want to scrutinize the work, or when I’m in a regulated environment, I might want to be extra-careful and structure the conversation more formally. I might even want to checklist the debriefing.

No matter what, though, I have a kind of internal checklist. In broad terms, I’ve got three big questions: How’s the product? How do we know? Why should I trust what we know, and what do we need to get a better handle on things?”

“That sounds like four questions,” Frieda smiled. “But it also sounds like the three-part testing story.”

“Right you are. So when I’m asking focused questions, I’dd start with the charter:

  • Did you fulfill your charter? Did you cover everything that the charter was intended to cover?
  • If you didn’t fulfill the charter, what aspects of the charter didn’t get done?
  • What else did you do, even if it was outside the scope of the mission?

“What I’m doing here is trying to figure out whether the charter was met as written, or if we need to adjust the it to reflect what really happened. After we’ve established that, I’ll ask questions in three areas that overlap to some degree. I won’t necessarily ask them in any particular order, since each answer will affect my choice of the next question.”

“So a debriefing is an exploratory process too!” said Frieda.

“Absolutely!” I grinned. “I’ll tend to start by asking about the product:

  • How’s the product? What is it supposed to do? Does it do that?
  • How do you know it’s supposed to to that?
  • What did you find out or learn? In particular, what problems did you find?

“I’ll ask about the testing:

  • What happened in the course of the session?
  • What did you cover, and how did you cover it it?
  • What product factors did you focus on?
  • What quality criteria were you paying the most attention to?
  • If you saw problems, how did you know that they were problems? What were your oracles?
  • Was there anything important from the charter that you didn’t cover?
  • What testing around this charter do you see as important, but has not yet been done?

“Based on things that come up in response to these questions, I’ll probably have some others:

  • What work products did you develop?
  • What evidence do you have to back the story? What makes it credible?
  • Where can people find that evidence? Why, or why not, should we hang on to it?
  • What testing activity should, or could, happen next or in the more distant future?
  • What might be necessary to enable that activity?

“That last question is about practical testability.”

“Geez, that’s a lot of questions,” said Frieda.

“I don’t necessarily ask them all every time. I usually don’t have to. I will go through a lot of them when a tester is new to this style of working, or new to me. In those cases, as a manager, I have to take more responsibility for making sure about what was tested—what we know and what we don’t. Plus these kinds of questions—and the answers—help me to figure out whether the tester is learning to be more self-guided

“And then I’ve got three more on my list:

  • What factors might have affected the quality of the testing?
  • What got in the way, made things harder, made things slower, made the testing less valuable?
  • What ongoing problems are you having?
  • Frieda frowned. “A lot of the managers I’ve worked with don’t seem to want to know about the problems. They say stuff like, ‘Don’t come to me with problems; come to me with solutions.'”

    I laughed. “Yeah, I’ve dealt with those kinds of managers. I usually don’t want to go them at all. But when I do, I assure them that I’m really stuck and that I need management help to get unstuck. And I’ve often said this: ‘You probably don’t want to hear about problems; no one really does. But I think it would be worse for everyone if you didn’t know about them.’

    “And that leads to one more important question:

    • What did you spend your time doing in this session?”

    “Ummm… That would be ‘testing’, presumably, wouldn’t it?” Frieda asked.

    “Well,” I replied, “there’s testing, and then there’s other work that happens in the session.”

    We’ll talk about that next time.

Breaking the Test Case Addiction (Part 6)

February 5th, 2019

In the last installment, we ended by asking “Once the tester has learned something about the product, how can you focus a tester’s work without over-focusing it?

I provided some examples in Part 4 of this series. Here’s another: scenario testing. The examples I’ll provide here are based on work done by James Bach and Geordie Keitt several years ago. (I’ve helped several other organizations apply this approach much more recently, but they’re less willing to share details.)

The idea is to use scenarios to guide the tester to explore, experiment, and get experience with the product, acting on ideas about real-world use and about how the product might foreseeably be misused. It’s nice to believe that careful designs, unit testing, BDD, and automated checking will prevent bugs in the product — as they certainly help to do — but to paraphrase Gertrude Stein, experience teaches experience teaches. Pardon my words, but if you want to discover problems that people will encounter in using the product, it might help to use the damned product.

The scenario approach that James and Geordie developed uses richer, more elaborate documentation than the one- to three-sentence charters of session-based test management. One goal is to prompt the tester to perform certain kinds of actions to obtain specific kinds of coverage, especially operational coverage. Another goal is to make the tester’s mission more explicit and legible for managers and the rest of the team.

Preparing for scenario testing involves learning about the product using artifacts, conversations, and preliminary forms of test activity (I’ve given examples throughout this series, but especially in Part 1). That work leads into developing and refining the scenarios to cover the product with testing.

Scenarios are typically based around user roles, representing people who might use the product in particular ways. Create at least a handful of them. Identify specifics about them, certainly about the jobs they do and the tasks they perform. You might also want to incorporate personal details about their lives, personalities, temperaments, and conditions under which they might be using the product.

(Some people refer to user roles as “personas”, as the examples below do. A word of caution over a potential namespace clash: what you’ll see below is a relatively lightweight notion of “persona”. Alan Cooper has a different one, which he articulated for design purposes, richer and more elaborate than what you’ll see here. You might seriously consider reading his books in any case, especially About Face (with Reimann, Cronin, and Noessel) and the older The Inmates are Running the Asylum.)

Consider not only a variety of roles, but a variety of experience levels within the roles. People may be new to our product; they may be new to the business domain in which our product is situated; or both. New users may be well or poorly trained, subject to constant scrutiny or not being observed at all. Other users might be expert in past versions of our products, and be irritated or confused by changes we’ve made.

Outline realistic work that people do within their roles. Identify specific tasks that they might want to accomplish, and look for things that might cause problems for them or for people affected by the product. Problems might take the form of harm, loss, or diminished value to some person who matters. Problems might also include feelings like confusion, irritation, frustration, or annoyance.

Remember that use cases or user stories typically omit lots of real-life activity. People are often inattentive, careless, distractable, under pressure. People answer instant messages, look things up on the web, cut and paste stuff between applications. They go outside, ride in elevators, get on airplanes and lose access to the internet; things that we all do every day that we don’t notice. And, very occasionally, they’re actively malicious.

Our product may be a participant in a system, or linked to other products via interfaces or add-ins or APIs. At very least, our product depends on platform elements: the hardware upon which it runs; peripherals to which it might be connected, like networks, printers, or other devices; application frameworks and libraries from outside our organization; frameworks and libraries that we developed in-house, but that are not within the scope of our current project.

Apropos of all this, the design of a set of scenarios includes activity patterns or moves that a tester might make during testing:

  • Assuming the role or persona of a particular user, and performing tasks that the user might reasonably perform.
  • Considering people who are new to the product and/or the domain in which the product operates (testing for problems with ease of learning)
  • Considering people who have substantial experience with the product (testing for problems with ease of use).
  • Deliberately making foreseeable mistakes that a user in a given role might make (testing for problems due to plausible errors).
  • Using lots of functions and features of the product in realistic but increasingly elaborate ways, and that trigger complex interactions between functions.
  • Working with records, objects, or other data elements to cover their entire lifespan: creating, revising, refining, retrieving, viewing, updating, merging, splitting, deleting, recovering… and thereby…
  • Developing rich, complex sets of data for experimentation over periods longer than single sessions.
  • Simulating turbulence or friction that a user might encounter: interruptions, distractions, obstacles, branching and backtracking, aborting processes in mid-stream, system updates, closing the laptop lid, going through a train tunnel…
  • Working with multiple instances of the product, tools, and/or multiple testers to introduce competition, contention, and conflict in accessing particular data items or resources.
  • Giving the product to different peripherals, running it on different hardware and software platforms, connecting it to interacting applications, working in multiple languages (yes, we do that here in Canada).
  • Reproducing behaviours or workflows from comparable or competing products.
  • Considering not only the people using the product, but the people that interact with them; their customers, clients, network support people, tech support people, or managers.

To put these ideas to work at ProChain (a company that produces project management software), James and Geordie developed a scenario playbook.
Let’s look at some examples from it.

The first exhibit is a one-page document that outlines the general protocol for setting up scenario sessions.

PCE Scenario Testing Setup
PCE Scenario Testing General Setup Sheet

This document is an overview that applies to every sessions. It is designed primarily to give managers and supporting testers a brief overview of the process and and how it should be carried out. (A supporting tester is someone who is not a full-time tester, but is performing testing under the guidance and supervision of a responsible tester — an experienced tester, test lead, or a test manager. A responsible tester is expected to have learned and internalized the instructions on this sheet.) There are general notes here for setting up and patterns of activities to be performed during the session.

Testers should be familiar with oracles by which we recognize problems, or should learn about oracles quickly. When this document was developed, there was a list of patterns of consistency with the mnemonic acronym HICCUPP; that’s now FEW HICCUPPS. For any given charter, there may be specific consistency patterns, artifacts, documents, tools, or mechanisms to apply that can help the tester to notice and describe problems.

Here’s an example of a charter for a specific testing mission:

PCE Scenario Testing Example Charter 1
PCE Scenario Testing Example Charter 1

The Theme section outlines the general purpose of the session, as a one- to three- line charter would in session-based test management. The Setup section identifies anything that should be done specifically for this session.

Note that the Activities section offers suggestions that are both specific and open. Openness helps to encourage variation that broadens coverage and helps to keep the tester engaged (“For some tasks…”; “…in some way,…”). The specificity helps to focus coverage (“set the task filter to show at least…”; the list of different ways to update tasks).

The Oracles section identifies specific ways for the tester to look for problems, in addition to more general oracle principles and mechanisms. The Variations section prompts the tester to try ideas that will introduce turbulence, increase stress, or cover more test conditions.

A debrief and a review of the tester’s notes after the session helps to make sure that the tester obtained reasonable coverage.

Here’s another example from the same project:

Here the tester is being given a different role, which requires a different set of access rights and a different set of tasks. In the Activities and Variations section, the tester is encouraged to explore and to put the system into states that cause conflicts and contention for resources.

Creating session sheets like these can be a lot more fun and less tedious than typing out instructions in formally procedurally scripted test cases. Because they focus on themes and test ideas, rather than specific test conditions, the sheets are more compact and easier to review and maintain. If there are specific functions, conditions, or data values that must be checked, they can be noted directly on the sheet — or kept separately with a reference to them in the sheet.

The sheets provide plenty of guidance to the tester while giving him or her freedom to vary the details during the session. Since the tester has a general mission to investigate the product, but not a script to follow, he or she is also encouraged and empowered to follow up on anything that looks unusual or improper. All this helps to keep the tester engaged, and prevents him or her from being hypnotized by a script full of someone else’s ideas.

You can find more details on the development of the scenarios in the section “PCE Scenario Testing” in the Rapid Software Testing Appendices.

Back in our coaching session, Frieda once again picked up the role of the test-case-fixated manager. “If we don’t give them test cases, then there’s nothing to look at when they’re done? How will we know for sure what the tester has covered?”

It might seem as though a list of test cases with check marks beside them would solve the accountability problem — but would it? If you don’t trust a tester to perform testing without a script, can you really trust him to perform testing with one?

There are lots of ways to record testing work: the tester’s personal notes or SBTM session sheets, check marks and annotations on requirements and other artifacts, application log files, snapshot tools, video recording… Combine these supporting materials with a quick debriefing to make sure that the tester is working in professional way and getting the job done. If the tester is new, or a supporting tester, increase training, personal supervision and feedback until he or she gains your trust. And if you still can’t bring yourself to trust them, you probably shouldn’t have them testing for you at all.

Frieda, still in character, replied “Hmmm… I’d like to know more about debriefing.” Next time!

Breaking the Test Case Addiction (Part 5)

January 29th, 2019

In our coaching session (which started here), Frieda was still playing the part of a manager who was fixated on test cases—and doing it very well. She played a typical management card: “What about learning about the product? Aren’t test cases a good way to do that?”

In Rapid Software Testing, we say that testing is evaluating a product by learning about it through exploration and experimentation, which includes questioning, modeling, studying, manipulating, making inferences, etc. So learning is an essential part of testing. There are lots of artifacts and people that testers could interact with to start learning about the product, which I’ve discussed already. Let’s look at why making a tester work through test cases might not be such a good approach.

Though test cases are touted as a means of learning about the product, my personal experience is that they’re not very helpful at all for that purpose. Have you ever driven somewhere, being guided by a list of instructions from Google Maps, synthesized speech from a navigation system, or even spoken instructions from another person? My experience is that having someone else direct my actions disconnects me from wayfinding and sensemaking. When I get to my destination, I’m not sure how I got there, and I’m not sure I could find my way back.

If I want to learn something and have it stick, a significant part of my learning must be self-guided. From time to time, I must make sense of where I’ve been, where I am, and where I’m going. I must experience some degree of confusion and little obstacles along the way. I must notice things that are interesting and important to me that I can connect to the journey. I must have the freedom to make and correct little mistakes.

Following detailed instructions might aid in accomplishing certain kinds of tasks efficiently. However, following instructions can get in the way of learning something, and the primary mission of testing is to learn about the product and its status.

You could change the assignment by challenging the tester to walk through a set of test cases to find problems in them, or to try to divine the motivation for them, and that may generate some useful insights.

But if you really want testers to learn about the product, here’s how I’d do it: give them a mission to learn about the product. Today we’ll look at instances of learning missions that you can apply early in the tester’s engagement or your own. Such missions tend to be broad and open, and less targeted towards specific risks and problems than they might be later. I’ll provide a few examples, with comments after each one.

“Interview the product manager about the new feature. Identify three to six user roles, and (in addition to your other notes) create sketches or whiteboard diagrams of some common instances of how they might use the feature. In your conversation, raise and discuss the possibility of obstacles or interruptions that might impede the workflow. Take notes and photos.”

As the principles of context-driven testing note, the product is a solution. If the problem isn’t solved, the product doesn’t work. When the product poses new problems, it might not be working either from the customer’s perspective.

“Attend the planning session for the new feature. Ask for descriptions of what we’re building; who we’re building it for; what kind of problems they might experience; and how we would recognize them as problems. Raise questions periodically about testability. Take minutes of the discussions in the meeting.”

Planning meetings tend to be focused on envisioning success; on intention. Those meetings present opportunities to talk anticipating failure; on how we or the customer might not achieve our goals, or might encounter problems. Planning a product involves planning ways of noticing how it might go wrong, too.

“Perform a walkthrough of this component’s functionality with a developer or a senior tester. Gather instances of functions in the product, or data that it processes, that might represent exceptions or extremes. Collect sets of ideas for test conditions that might trigger extreme or exceptional behaviour, or that might put the product in an unstable state. Create a risk list, with particular focus on threats to capability, reliability, and data integrity that might lead to functional errors or data loss.”

In Rapid Software Testing parlance, a test condition is something that can be examined during a test, or something that might change the outcome of a test. It seems to me that when people use formalized procedural test cases, often their intention is to examine particular test conditions. However, those conditions can be collected and examined using many different kinds of artifacts: tables, lists, annotated diagrams or flowcharts, mind maps…

“Review the specification for the product with the writer of the user manual. In addition to any notes or diagrams that you keep, code the contents of the specification. (Note: “code” is used here in the sense used in qualitative research; not in the sense of writing computer code.) That is, for each numbered paragraph, try to identify at least one and up to three quality criteria that are explicitly or implicitly mentioned. Collate the results and look for quality criteria that are barely mentioned or missing altogether, and be on the lookout for mysterious silences.”

There’s a common misconception about testing: that testers look for inconsistencies between the product and a description of the product, and that’s all. But excellent testers look at the product, at descriptions of the product, and at intentions for the product, and seek inconsistencies between all of those things. Many of our intentions are tacit, not explicit. Note also that the designer’s model of the user’s task may be significantly different from the user’s model.

Notice that each example above includes an information mission. Each one includes a mandate to produce specific, reviewable artifacts, so that the tester’s learning can be evaluated with conversation and documented evidence. Debriefing and relating learning to others is an important part of testing in general, and session-based test management in particular.

Each example also involves collaboration with other people on the team, so that inconsistencies between perspectives can be identified and discussed. And notice: these are examples. They are not templates to be followed. It’s important that you develop your own missions, suited to the context in which you’re working.

At early stages of the tester’s engagement, finding problems is not the focus. Learning is. Nonetheless, as one beneficial side effect, the learning may reveal some errors or inconsistencies before they can turn into bugs in the product. As another benefit, testers and teams can collect ideas for product and project risk lists. Finally, the learning might reveal test conditions that can usefully be checked with tools, or that might be important to verify via explicit procedures.

Back to the coaching session. “Sometimes managers say that it’s important to give testers explicit instructions when we’re dealing with an offshore team whose first language is not English”, said Frieda.

Would test cases really make that problem go away? Presumably the test cases and the product would be written in English too. If the testers don’t understand English well, then they’ll scarcely be able to read the test cases well, or to comprehend the requirements or the standards, or to understand what the product is trying to tell them through its (presumably also English) user interface.

Maybe the product and the surrounding artifacts are translated from English into the testers’ native language. That addresses one kind of problem, but introduces a new one: requirements and specifications and designs and jargon routinely get misinterpreted even when everyone is working in English. When that material is translated, some meaning is inevitably changed or lost in translation. All of these problems will need attention and management.

If a product does something important, presumably there’s a risk of important problems, many of which will be unanticipated by test cases. Wouldn’t it be a good idea to have skilled testers learn the product reasonably rapidly but also deeply to prepare them to seek and recognize problems that matter?

When testers are up and running on a project, there are several approaches towards focusing their work without over-focusing it. I’ve mentioned a few already. We’ll look at another one of those next.

Breaking the Test Case Addiction (Part 4)

January 21st, 2019

Note: this post is long from the perspective of the kitten-like attention spans that modern social media tends to encourage. Fear not. Reading it could help you to recognize how you might save you hours, weeks, months of excess and unnecessary work, especially if you’re working as a tester or manager in a regulated environment.

Testers frequently face problems associated with excessive emphasis on formal, procedurally scripted testing. Politics, bureaucracy, and paperwork combine with fixation on test cases. Project managers and internal auditors mandate test cases structured and written in a certain form “because FDA”. When someone tells you this, it’s a pretty good indication that they haven’t read the FDA’s guidance documentation.

Because here’s what it really says:

For each of the software life cycle activities, there are certain “typical” tasks that support a conclusion that the software is validated. However, the specific tasks to be performed, their order of performance, and the iteration and timing of their performance will be dictated by the specific software life cycle model that is selected and the safety risk associated with the software application. For very low risk applications, certain tasks may not be needed at all. However, the software developer should at least consider each of these tasks and should define and document which tasks are or are not appropriate for their specific application. The following discussion is generic and is not intended to prescribe any particular software life cycle model or any particular order in which tasks are to be performed.

General Principles of Software Validation;
Final Guidance for Industry and FDA Staff, 2002

The General Principles of Software Validation document is to some degree impressive for its time, 2002. It describes some important realities. Software problems are mostly due to design and development, far less to building and reproduction. Even trivial programs are complex. Testing can’t find all the problems in a product. Software doesn’t wear out like physical things do, and so problems often manifest without warning. Little changes can have big, wide-ranging, and unanticipated effects. Using standard and well-tested software components addresses one kind of risk, but integrating those components requires careful attention.

There are lots of problems with General Principles of Software Validation document, too. I’ll address several of these, I hope, in future posts.

Apropos of the present discussion, the document doesn’t describe what a test case is, nor how it should be documented. By my count, the document mentions “test case” or “test cases” 30 times. Here’s one instance:

“Test plans and test cases should be created as early in the software development process as feasible.”

Here are two more:

“A software product should be challenged with test cases based on its internal structure and with test cases based on its external specification.”

If you choose to interpret “test case” as an artifact, and consider that challenge sufficient, this would be pretty terrible advice. It would be analogous to saying that children should be fed with recipes, or that buildings should be constructed with blueprints. A shallow reading could suggest that the artifact and the performance guided by that artifact are the same thing; that you prepare the recipe before you find out what the kids can and can’t eat, and what’s in the fridge; that you evaluate the building by comparing it to the blueprints and then you’re done.

On the other hand, if you substitute “test cases” with “tests” or “testing”, it’s pretty great advice. It’s a really good idea to challenge a software product with tests, with testing, based on internal and external perspectives.

The FDA does not define “test case” in the guidance documentation. A definition does appear in Glossary of Computer System Software Development Terminology (8/95).

test case. (IEEE) Documentation specifying inputs, predicted results, and a set of execution conditions for a test item. Syn: test case specification. See: test procedure

Okay, let’s see “test procedure”:

test procedure (NIST) A formal document developed from a test plan that presents detailed instructions for the setup, operation, and evaluation of the results for each defined test. See: test case.

So it is pretty terrible advice after all.

(Does that “8/95” refer to August 1995? Yes, it does. None of the source documents for the  Glossary of Computer System Software Development Terminology (8/95) is dated after 1994. For some perspective, that’s before Windows 95; before Google; before smartphones and tablets; before the Manifesto for Agile Software Development; before the principles of context-driven testing…)

But happily, in Section 2 of General Principles of Software Validation, before any of the guidance on testing itself, is the Principle of the Least Burdensome Approach:

We believe we should consider the least burdensome approach in all areas of medical device regulation. This guidance reflects our careful review of the relevant scientific and legal requirements and what we believe is the least burdensome way for you to comply with those requirements. However, if you believe that an alternative approach would be less burdensome, please contact us so we can consider your point of view.

The “careful review” happened in the period leading up to 2002, which is the publication date of this guidance document. In testing community of those days, anything other than ponderously scripted procedural test cases were viewed with great suspicion in writing and conference talks. Thanks to work led by Cem Kaner, James Bach, and other prominent voices in the testing community, the world is now a safer place for exploration in testing. And, as noted in the previous post in this series, the FDA itself has acknowledged the significance and importance of exploratory work.

Test documentation may take many forms more efficient and effective than formally scripted procedures, and the Least Burdensome Approach appears to allow a lot of leeway as long as evidence is sufficient and the actual regulations are followed. (For those playing along at home, the regulations include Title 21 Code of Federal Regulations (CFR) Part 11.10 and 820, and 61 Federal Register (FR) 52602.)

Several years ago, James Bach began some consulting work with a company that made medical devices. They had hired him to analyze, report on, and contribute to the testing work being done for a particular Class III device. (I have also done some work for this company.)

The device consisted of a Control Box, operated by a technician. The Control Box was connected to a Zapper Box that delivered Healing Energy to the patient’s body. (We’ve modified some of the specific words and language here to protect confidentiality and to summarize what the devices do.) Insufficient Healing Energy is just Energy. Too much Healing Energy, or the right amount for too long, turns into Hurting Energy or Killing Energy.

When James arrived, he examined the documentation being given to testers. He found more than a hundred pages of stuff like this:

9.8.1 To verify Power Accuracy

9.8.1.1Connect the components according to the General Setup document.
9.8.1.2Power on and connect Power Monitor (instead of electrodes).
9.8.1.3Power on the Zapper Box.
9.8.1.4Power on the Control Box.
9.8.1.5Set default settings of temperature and power for zapping.
9.8.1.6Set test jig load to nominal value.
9.8.1.7Select nominal duration and nominal power setting.
9.8.1.8Press the Start button.
9.8.1.9Verify Zapper reports the power setting value ±10% on display.

Is this good formal testing?

It’s certainly a formal procedure to follow, but where’s the testing part? The closest thing is that little molecule of actual testing in the last line: the tester is instructed to apply an oracle by comparing the power setting on the Control Box with what the Zapper reports on its display. There’s nothing to suggest examining the actual power being delivered by noting the results from the Power Monitor. There’s nothing about inducing variation to obtain and extend coverage, either.

At one point, James and another tester defrosted this procedure. They tried turning on the Control Box first, and then waited for a variety of intervals to turn on the Zapper Box. To their amazement, the Zapper Box could end up in one of four different states, depending on how long they waited to start it—and at least a couple of those states were potentially dangerous to the patient or to the operator.

James replaced 50 pages of this kind of stuff with two paragraphs containing things that had not been covered previously. He started by describing the test protocol:

3.1 General testing protocol

In the test descriptions that follow, the word “verify” is used to highlight specific items that must be checked. In addition to those items a tester shall, at all times, be alert for any unexplained or erroneous behavior of the product. The tester shall bear in mind that, regardless of any specific requirements for any specific test, there is the overarching general requirement that the product shall not pose an unacceptable risk of harm to the patient, including any unacceptable risks due to reasonably foreseeable misuse.

Read that paragraph carefully, sentence by sentence, phrase by phrase. Notice the emphasis on looking for problems and risks—especially on the risk of human error.

Then he described the qualifications necessary for testers to work on this product:

3.2 Test personnel requirements

The tester shall be thoroughly familiar with the Zapper Box and Control Box Functional Requirements Specification, as well as with the working principles of the devices themselves. The tester shall also know the working principles of the Power Monitor Box test tool and associated software, including how to configure and calibrate it, and how to recognize if it is not working correctly. The tester shall have sufficient skill in data analysis and measurement theory to make sense of statistical test results. The tester shall be sufficiently familiar with test design to complement this protocol with exploratory testing, in the event that anomalies appear that require investigation. The tester shall know how to keep test records to credible, professional standard.

In summary: Be a scientist. Know the domain, know the tools, be an analyst, be an investigator, keep good lab notes.

Then James provided some concise test ideas, leaving plenty of room for variation designed to shake out bugs. Here’s an example like something from the real thing:

3.2.2 Fields and Screens

3.2.2.1With the Power Monitor test tool already running, start the Zapper Box and the Control Box. Vary the order and timing in which you start them, retain the Control Box and Power Monitor log files, and note any inconsistent or unexpected behaviour.
3.2.2.2Visually inspect the displays and verify conformance to the requirements and for the presence of any behaviour or attribute that could impair the performance or safety of the product in any material way.
3.2.2.3With the system settings at default values change the contents of every user-editable field through the range of all possible values for that field. (e.g. Use the knob to change the session duration from 1 to 300 seconds.) Visually verify that appropriate values appear and that everything that happens on the screen appears normal and acceptable.
3.2.2.4Repeat 3.2.2.3 with system settings changed to their most extreme possible values.
3.2.2.5Select at least one field and use the on-screen keyboard, knob, and external keyboard respectively to edit that field.
3.2.2.6Scan the Control Box and Power Monitor log files for any recorded error conditions or anomalies.

To examine certain aspects of the product and its behaviour, sometimes very specific test design matters. Here’s a representative snippet based on James’ test documentation:

3.5.2 Single Treatment Session Power Accuracy Measurement

3.5.2.3From the Power Monitor log file, extract the data for the measured electrode. This sample should comprise the entire power session, including cooldown, as well as the stable power period with at least 50 measurements (i.e., taken at least five times per second over 10 seconds of stable period data).
3.5.2.4From the Control Box log file, extract the corresponding data for the stable power period of the measured electrode.
3.5.2.5Calculate the deviation by subtracting the reported power for the measured electrode from the corresponding Power Monitor reading (use interpolation to synchronize the time stamp of the power meter and generation logs).
3.5.2.6Calculate the mean of the power sample X (bar) and its standard deviation (s).
3.5.2.7Find the 99% confidence and 99% two-sided tolerance interval k for the sample. (Use Table 5 of SOP-QAD-10, or use the equation below for large samples.)
3.5.2.8The equation for calculating the tolerance interval k is:
Zapper Formula
where χ2γ,N-1 is the critical value of the chi-square distribution with degrees of freedom N -1 that is exceeded with probability γ; and Z2(1-p)/2 is the critical value of the normal distribution which is exceeded with probability (1-p)/2. (See NIST Engineering Statistics Handbook.)

Now, that’s some real formal testing. And it was accepted just fine by the organization and the FDA auditors. Better yet, and following this protocol revealed some surprising behaviours that prompted more careful evaluation of the requirements for the product.

What are some lessons we could learn from this? One key point, it seems to me, is that when you’re working as a tester in a regulated environment, it’s crucial that you read the regulations and the guidance documentation. If you don’t, you run the risk of being pushed around by people who haven’t read them, and who are working on the basis of mythology and folklore.

Our over-arching mission as testers is to seek and find problems that threaten the value of the product. In contexts where human life, health, or safety are on the line, the primary job at hand is to learn about the product and problems that post risks and hazards to people. Excessive bureaucracy and paperwork can distract us from that mission; even displace it. Therefore, we must find ways to do the best testing possible, while still providing the best and least evidence that still completely satisfies auditors and regulators that we’ve done it.

Back in our coaching session, Frieda, acting the part of the manager, replied, “But… we don’t have the time to train testers to do that kind of stuff. We need them to be up to speed ASAP.”

“What does ‘up to speed’ actually mean?” I asked.

Frieda, still in character, replied “We want them to be banging on keys as quickly as possible.”

Uh huh. Imagine a development manager responsible for a medical device saying, “We don’t have time for the developers to learn what they’re developing. We want them up to speed as quickly as possible. (And, as we all know, programming is really just banging on keys.)”

The error in this line of thinking is that testing is about pushing buttons; producing widgets on a production line; flipping testburgers. If you treat testing as flipping testburgers, then there’s a risk that testers will flip whatever vaguely burger-shaped thing comes their way… burgers, frisbees, cow pies, hockey pucks… You may not get the burger you want.

If you think of testing as an investigation of the product, testers must be investigators, and skillful ones at that. Upon engaging with the product and the project, testers set to learning about the product they’re investigating and the domain in which it operates. Testers keep excellent lab notes and document their work carefully, but not to the degree that documentation displaces the goal of testing the system and finding problems in it. Testers are focused on risk, and trained to be aware of problems that they might encounter as they’re testing (per CFR Title 21 Part 820.25 (b)(2)) .

If they’re not sufficiently skilled when you hire them, you’ll supervise and train them until they are. And if they’re unskilled and can’t be trained… are you really sure you want them testing a device that could deliver Killing Energy?

How else might you guide testing work, whether in projects in regulated contexts or not? That’s a topic for next time.

Breaking the Test Case Addiction (Part 3)

January 17th, 2019

In the previous post, “Frieda”, my coaching client, asked about producing test cases for auditors or regulators. In Rapid Software Testing (RST), we find it helpful to frame that in terms of formal testing.

Testing is formal to the degree that it must be done in a specific way, or to verify specific facts. Formal testing typically has the goal of confirming or demonstrating something in particular about the product. There’s a continuum to testing formality in RST. My version, a tiny bit different from James Bach‘s, looks like this:

Some terminology notes: checking is the process of operating and observing a product; applying decision rules to those observations; and then reporting on the outcome of those rules; all mechanistically, algorithmically. A check can be turned into a formally scripted process that can be performed by a human or by a machine.

Procedurally scripted test cases are instances of human checking, where the tester is being substantially guided by what the script tells her to do. Since people are not machines and don’t stick to the algorithms, people are not checking in the strictest sense of our parlance.

A human transceiver is someone doing things based only on the instructions of some other person, behaving as that person’s eyes, ears, and hands.

Machine checking is the most formal mode of testing, in that machines perform checks in entirely specific ways, according to a program, entirely focused on specific facts. The motivation to check doesn’t come from the machine, but from some person. Notice that programs are formal, but programming is an informal activity. Toolsmiths and people who develop automated checks are not following scripts themselves.

The degree to which you formalize is a choice, based on a number of context factors. Your context guides your choices, and both of those evolve over time.

One of the most important context factors is your mission. You might be in a regulated environment, where regulators and auditors will eventually want you to demonstrate specific things about the product and the project in a highly formal way. If you are in that context, keeping the the auditors and the regulators happy may require certain kinds of formal testing. Nonetheless, even in that context, you must perform informal testing—lots of it—for at least two big reasons.

The first big reason is to learn the about the product and its context to prepare for excellent formal testing that will stand up to the regulators’ scrutiny. This is tied to another context factor: where you are in the life of the project and your understanding of the product.

Formal testing starts with informal work that is more exploratory and tacit, with the goal of learning; less scripted and explicit, with the goal of demonstrating. All the way along, but especially in between those poles, we’re searching for problems. No less than the Food and Drug Administration emphasizes how important this is.

Thorough and complete evaluation of the device during the exploratory stage results in a better understanding of the device and how it is expected to perform. This understanding can help to confirm that the intended use of the device will be aligned with sponsor expectations. It also can help with the selection of an appropriate pivotal study design.

Section 5: The Importance of Exploratory Studies in Pivotal Study Design
Design Considerations for Pivotal Clinical Investigations for Medical Devices
Guidance for Industry, Clinical Investigators, Institutional Review Boards
and Food and Drug Administration Staff

The pivotal stage of device development, says the FDA, focuses on developing what people need to know to evaluate the safety and effectiveness of a product. The pivotal stage usually consists of one or more pivotal studies. In other words, the FDA acknowledges that development happens in loops and cycles; that development is an iterative process.

James Bach emphasized this in his talk The Dirty Secret of Formal Testing and it’s an important point in RST. Development is an iterative process because at the beginning of any cycle of work, we don’t know for sure what all the requirements are; what they mean; what we can get; and how we might decide that we’ve got it. We don’t really know that until we’ve until we’ve tested the product… and we don’t know how to test the product until we’ve tried to test the product!

Just like developing automated checks, developing formally scripted test cases is an informal process. You don’t follow a script when you’re interpreting a specification; when you’re having a conversation with a developer or a designer; when you’re exploring the product and the test space to figure out where checking might be useful or important. You don’t follow a script when you recognize a new way of using tools to learn something about the product, and apply them. And you don’t follow a script when you investigate bugs that you’ve found—either during informal testing or the formal testing that might follow it.

If you try to develop formal procedural test cases without testing the actual product, they stand a good chance of being out of sync with it. The dirty secret of format testing is that all good formal testing begins with informal testing.

It might be a very good idea for programmers to develop some automated checks that helps them with the discipline of building clean code and getting rapid feedback on it. It’s also a good idea for developers, designers, testers, and business people to develop clear ideas about intentions for a product, envisioning success. It might also be a good idea to develop some automated checks above the unit level and apply them to the build process—but not too many and certainly not too early. The beginning of the work is usually a terrible time for excessive formalization.

Which brings us to the second big reason to perform informal testing continuously throughout any project: to address the risk that our formal testing to date will fail to reveal how the product might disappoint customers; lose someone’s money; blow something up; or hurt or kill people. We must be open to discovery, and to performing the testing and investigation that supports it, all the way throughout the project, because neither epiphanies nor bugs follow scripts or schedules.

The overarching mission of testing is focused on a question: “are there problems that threaten the value of the product, or the on-time, successful completion of our work?” That’s not a question that formal testing can ever answer on its own. Fixation on automated checks or test cases runs the risk of displacing time for experimentation, exploration, discovery, and learning.

Next time, we’ll look at an example of breaking test case addiction on a real medical device project. Stay tuned.

Breaking the Test Case Addiction (Part 2)

January 16th, 2019

Last time out, I was responding to a coaching client, a tester who was working in an organization fixated on test cases. Here, I’ll call her Frieda. She had some more questions about how to respond to her managers.

What if they want another tester to do your tests if you are not available?

“‘Your tests’, or ‘your testing’?”, I asked.

From what I’ve heard, your tests. I don’t agree with this but trying to see it from their point of view, said Frieda.

I wonder what would happen if we asked them “What happens when you want another manager to do your managing if you are not available?” Or “What happens when you want another programmer to do programming if the programmer is not available?” It seems to me that the last thing they would suggest would be a set of management cases, or programming cases. So why the fixation on test cases?

Fixation is excessive, obsessive focus on something to the exclusion of all else. Fixation on test cases displaces people’s attention from other important things: understanding of how the testing maps to the mission; whether the testers have sufficient skill to understand and perform the testing; the learning comes from testing and that feeds back into more testing; whether formalization is premature or even necessary…

A big problem, as I suggested last time, is a lack of managers’ awareness of alternatives to test cases. That lack of awareness feeds into a lack of imagination, and then loops back into a lack of awareness. What’s worse is that many testers suffer from the same problem, and therefore can’t help to break the loop. Why do managers keep asking for test cases? Because testers keep providing them. Why do testers keep providing them? Because managers keep asking for them, because testers keep providing them…, and the cycle continues.

That cycle also continues because there’s an attractive, even seductive, aspect to test cases: they can make testing appear legible. Legibility, as Venkatesh Rao puts it beautifully here, “quells the anxieties evoked by apparent chaos”.

Test cases help to make the messy, complex, volatile landscape of development and testing seem legible, readable, comprehensible, quantifiable. A test case either fails (problem!) or passes (no problem!). A test case makes the tester’s behaviours seem predictable and clear, so clear that the tester could even be replaced by a machine. At the beginning of the project, we develop 782 test cases. When we’ve completed 527 of them, the testing is 67.39% done!

Many people see testing as rote, step-by-step, repetitive, mechanical keypressing to demonstrate that the product can work. That gets emphasized by the domain we’re in: one that values the writing of programs. If you think keypressing is all there is to it, it makes a certain kind of sense to write programs for a human to follow so that you can control the testing.

Those programs become “your tests”. We would call those “your checks—where checking is the mechanistic process of applying decision rules to observations of the software.

On the other hand, if you are willing to recognize and accept testing as a complex, cognitive investigation of products, problems, and risks, your testing is a performance. No one else can do just as you do it. No one can do again just what you’ve done before. You yourself will never do it the same way twice. If managers want people to do “your testing” when you’re not available, it might be more practical and powerful to think of it as “performing their investigation on something you’ve been investigating”.

Investigation is structured and can be guided, but good investigation can’t be scripted. That’s because in the course of a real investigation, you can’t be sure of what you’re going to find and how you’re going to respond to it. Checking can be algorithmic; the testing that surrounds and contains checking cannot.

Investigation can be influenced or guided by plenty of things that are alternatives to test cases:

Last time out, I mentioned almost all of these as things that testers could develop while learning about the product or feature. That’s not a coincidence. Testing happens in tangled loops and spirals of learning, analysis, exploration, experimentation, discovery, and investigation, all feeding back into each other. As testing proceeds, these artifacts and—more importantly—the learning they represent can be further developed, expanded, refined, overproduced, put aside, abandoned, recovered, revisited…

Testers can use artifacts of these kinds as evidence of testing that has been done, problems that have been found, and learning that has happened. Testers can include these artifacts in test reports, too.

But what if you’re in an environment where you have to produce test cases for auditors or regulators?

Good question. We’ll talk about that next time.

Breaking the Test Case Addiction (Part 1)

January 15th, 2019

Recently, during a coaching session, a tester was wrestling with something that was a mystery to her. She asked:

Why do some tech leaders (for example, CTOs, development managers, test managers, and test leads) jump straight to test cases when they want to provide traceability, share testing efforts with stakeholders, and share feature knowledge with testers?

I’m not sure. I fear that most of the time, fixation on test cases is simply due to ignorance. Many people literally don’t know any other way to think about testing, and have never bothered to try. Alarmingly, that seems to apply not only to leaders, but to testers, too. Much of the business of testing seems to limp along on mythology, folklore, and inertia.

Testing, as we’ve pointed out (many times), is not test cases; testing is a performance. Testing, as we’ve pointed out, is the process of learning about a product through exploration and experimentation, which includes to some degree questioning, studying, modeling, observation, inference, etc. You don’t need test cases for that.

The obsession with procedurally scripted test cases is painful to see, because a mandate to follow a script removes agency, turning the tester into a robot instead of an investigator. Overly formalized procedures run a serious risk of over-focusing testing and testers alike. As James Bach has said, “testing shouldn’t be too focused… unless you want to miss lots of bugs.”

There may be specific conditions, elements of the product, notions of quality, interactions with other products, that we’d like to examine during a test, or that might change the outcome of a test. Keeping track of these could be very important. Is a procedurally scripted test case the only way to keep track? The only way to guide the testing? The best way? A good way, even?

Let’s look at alternatives for addressing the leaders’ desires (traceability, shared knowledge of testing effort, shared feature knowledge).

Traceability. It seems to me that the usual goal of traceability is be able to narrate and justify your testing by connecting test cases to requirements. From a positive perspective, it’s a good thing to make those connections to make sure that the tester isn’t wasting time on unimportant stuff.

On the other hand, testing isn’t only about confirming that the product is consistent with the requirements documents. Testing is about finding problems that matter to people. Among other things, that requires us to learn about things that the requirements documents get wrong or don’t discuss at all. If the requirements documents are incorrect or silent on a given point, “traceable” test cases won’t reveal problems reliably.

For that reason, we’ve proposed a more powerful alternative to traceability: test framing, which is the process of establishing and describing the logical connections between the outcome of the test at the bottom and the overarching mission of testing at the top.

Requirements documents and test cases may or may not appear in the chain of connections. That’s okay, as long as the tester is able to link the test with the testing mission explicitly. In a reasonable working environment, much of the time, the framing will be tacit. If you don’t believe that, pause for a moment and note how often test cases provide a set of instructions for the tester to follow, but don’t describe the motivation for the test, or the risk that informs it.

Some testers may not have sufficient skill to describe their test framing. If that’s so, giving test cases to those testers papers over that problem in an unhelpful and unsustainable way. A much better way to address the problem would, I believe, would be to train and supervise the testers to be powerful, independent, reliable agents, with freedom to design their work and responsibility to negotiate it and account for it.

Sharing efforts with stakeholders. One key responsibility for a tester is to describe the testing work. Again, using procedurally scripted test cases seems to be a peculiar and limited means for describing what a tester does. The most important things that testers do happen inside their heads: modeling the product, studying it, observing it, making conjectures about it, analyzing risk, designing experiments… A collection of test cases, and an assertion that someone has completed them, don’t represent the thinking part of testing very well.

A test case doesn’t tell people much about your modeling and evaluation of risk. A suite of test cases doesn’t either, and typical test cases certainly don’t do so efficiently. A conversation, a list, an outline, a mind map, or a report would tend to be more fitting ways of talking about your risk models, or the processes by which you developed them.

Perhaps the worst aspect of using test cases to describe effort is that tests—performances of testing activity—become reified, turned into things, widgets, testburgers. Effort becomes recast in terms of counting test cases, which leads to no end of mischief.

If you want people to know what you’ve done, record and report on what you’ve done. Tell the testing story, which is not only about the status of the product, but also about how you performed the work, and what made it more and less valuable; harder or easier; slower or faster.

Sharing feature knowledge with testers. There are lots of ways for testers to learn about the product, and almost all of them would foster learning better than procedurally scripted test cases. Giving a tester a script tends to focus the tester on following the script, rather than learning about the product, how people might value it, and how value might be threatened.

If you want a tester to learn about a product (or feature) quickly, provide the tester with something to examine or interact with, and give the tester a mission. Try putting the tester in front of

  • the product to be tested (if that’s available)
  • an old version of the product (while you’re waiting for a newer one)
  • a prototype of the product (if there is one)
  • a comparable or competitive product or feature (if there is one)
  • a specification to be analyzed (or compared with the product, if it’s available)
  • a requirements document to be studied
  • a standard to review
  • a user story to be expanded upon
  • a tutorial to walk through
  • a user manual to digest
  • a diagram to be interpreted
  • a product manager to be interviewed
  • another tester to pair with
  • a domain expert to outline a business process

Give the tester the mission to learn something based on one or more of these things. Require the tester to take notes, and then to provide some additional evidence of what he or she learned.

(What if none of the listed items is available? If none of that is available, is any development work going on at all? If so, what is guiding the developers? Hint: it won’t be development cases!)

Perhaps some people are concerned not that there’s too little information, but too much. A corresponding worry might be that the available information is inconsistent. When important information about the product is missing, or unclear, or inconsistent, that’s a test result with important information about the project. Bugs breed in those omissions or inconsistencies.

What could be used as evidence that the tester learned something? Supplemented by the tester’s notes, the tester could

  • have a conversation with a test lead or test manager
  • provide a report on the activities the tester performed, and what the tester learned (that is, a test report)
  • produce a description of the product or feature, bugs and all (see The Honest Manual Writer Heuristic)
  • offer proposed revisions, expansions, or refinements of any of the artifacts listed above
  • identify a list of problems about the product that the tester encountered
  • develop a list of ways in which testers might identify inconsistencies between the product and something desirable (that is, a list of useful oracles)
  • report on a list of problems that the tester had in fulfilling the information mission
  • in a mind map, outline a set of ideas about how the tester might learn more about the product (that is, a test strategy)
  • list out a set of ideas about potential problems in the product (that is, a risk list)
  • develop a set of ideas about where to look for problems in product (that is, a product coverage outline)

Then review the tester’s work. Provide feedback, coaching and mentoring. Offer praise where the tester has learned something well; course correction where the tester hasn’t. Testers will get a lot more from this interactive process than from following step-by-step instructions in a test case.

My coaching client had some more questions about test cases. We’ll get to those next time.

A Moment of Jerry Weinberg Zen

January 10th, 2019

The year was 2006. James Bach and I were running a workshop at the Amplifying Your Effectiveness conference (AYE). We were in one of those large-ish, high-ceiling conference rooms with about 15 programmers and software consultants.

We were showing them one of James Lyndsay’s wonderful testing machines. (You can find it here, but you’ll need Flash active to run it.) It looked like this:

James Lyndsday's Machine 1

At first, it’s all very confusing. When you press the buttons on the left, the red and blue balls on the right move in some way. The slider in the middle influences the range of motion somehow. In general, the mission of the exercise is to describe the behaviour of the machine.

Test cases are not testing. To illustrate this important fact, we give class participants the machines to investigate for a few moments, and then ask a question that James (Bach) asked in our AYE session.

“How many test cases,” he asked, “would you need to be able to understand and describe this product completely?”

Brows immediately furrowed. Clicking sounds from the buttons and murmured conversation between pairs of participants filled the room. “Two states to the power of five buttons with how many stops on that slider…?” “Wait, that button is just momentary…” “Seven hundred and sixt… no, that’s wrong.”

Whereupon, in a moment of perfect timing, a door opened, and Jerry Weinberg walked into the room. His walking stick and his bearing reminded me of Yoda and Gandalf and other sages and wizards.

“Hey, here’s Jerry Weinberg!” said James. “The world’s greatest living software tester! Jerry, how many test cases would you need to understand and describe this product completely?”

The room fell silent. Everyone wanted to know the answer. Jerry observed the laptop that James was holding. He didn’t touch the laptop, press a key, or move the mouse. He just looked for a few moments.

Then he said, “Three.” There was a pause.

Having worked with Jerry over a decade or so, James understood. “Three, Jerry?”, he asked dramatically, in mock astonishment.

“Hm.” (pause) “Yeah. Three,” replied Jerry. Another pause.

Then he peered at James. “Why? Were you expecting some other number?”

Rapid Software Testing is coming soon to Sydney, Brisbane, Utrecht, and Copenhagen. Learn more about it, and find places to register.

Pressing the Green Button

December 19th, 2018

For years at conferences and meetups and in social media, I have been hearing regularly from testers who tell me that they must “sign off” on the product or deployment before it is released to production, or to review by the client. The testers claim that, after they have performed some degree of testing work, they must “approve” or “reject” the product. Here’s a fairly typical verbatim report from one tester:

In my current context, despite my reasoned explanations to the contrary, I am viewed as the work product gatekeeper and am newly positioned as such within a software controlled workflow that literally presents a green “approve” or red “reject” button for me to select after I “do QC” on the work product which as a bonus might be provided with a list of ambiguous client requests and/or a sketchy mock-up with many scattered revision notes (often superseded by verbal requests not logged).

It’s important to note that in project work, a mess of competing information is normal — not completely desirable, necessarily, but normal. When information about the product is unclear, it’s typically part of the tester’s job to identify where and how it’s unclear. Confusion and uncertainty ratchet up product and project risk.

After all, if you’re not sure about what the product is supposed to do, how can the developers be sure about it? In the unlikely event that the developers know how the product should work and you don’t, how will you recognize all of the important bugs? Whether you, the developers, or both aren’t straight on what the client wants, bugs will have time and opportunity to breed and to survive.

The tester continues:

Delivery of the product to the client for their review is generally held up until I press the green “approve” button. The expectation when I “approve” is that the product (which I did not build) is “error free”, meets contradictory loosely defined “standards” and satisfies the client (whom I have not met). The structure is such that I am not to directly communicate to the client, so all clarifications and questions I have are filtered through a project manager.

I am now beginning to frustrate the developers with whom I have previously built a great rapport by repeatedly rejecting their work products for even a single minor infraction. I also frustrate project managers delaying product delivery and going over budget. I predict there will soon be pressure from all sides to “just approve” and then later repercussions of “how/why did this get approved”. I combat this by providing long lists of observations, potential issues, questions, obstacles and coverage notes with each “rejection” or “approval” and I communicate to project managers that they do not need my approval to proceed and may override at anytime.

Some testers seem happy with the authority implicit in “approving” or “rejecting” the product. Most express some level of discomfort with the decision. To me, the discomfort is appropriate.

In the Rapid Software Testing view of the world, it is not the job of the tester to approve or disapprove of things. It is the job of the tester to identify reasons to believe that some person who matters might approve or disapprove of something, and to explain the bases for that belief. Decisions about what to do with a product, including approving or rejecting it, lie with a role called management.

If you’re in a situation like the tester above, and someone offers you the responsibility to approve and reject products, and you desire to be a manager, this is your big chance! Seize the opportunity—but don’t do it without the manager’s title and authority—and salary, while you’re at it. If you’re offered the approval or rejection decision without becoming a manager, though, I’d recommend that you politely decline the “offer”, and make a counteroffer—perhaps one like this:

“Thank you for honouring me with the offer to approve or reject the product. However, as a tester, I don’t believe that it is appropriate for me to make such decisions without management authority. Here’s what I would do in this tester’s situation, though; I’d say this:

“I will gladly test the product, learning about it through exploration and experimentation. I will evaluate the product for consistency with these (contradictory, loosely-defined) standards. If the product appears to be inconsistent with them, I will certainly let you know about that. If I see inconsistencies or contradictions in those standards, I will let you know about those too, so that you can decide how the standards apply to our product. But I won’t limit my testing to that.

“I will tell you about any important problems that I find. I will tell you about problems that appear inconsistent with things desirable to important people. Here’s an example of how I might categorize desirable things, and here’s an example of how I might recognize problems in the product.

“I will report on anything that appears consistent with some notion of an ‘error’. However, I will not assert that the product is error-free. I don’t know how I could do that. I don’t know how anyone can do that.

“I would prefer to interact freely and directly with stakeholders, for the purposes of obtaining clarifications and answers to questions I have without bothering the project manager. (I will, of course, keep responsible records of my interactions; and I will not presume to make decisions about product or project scope, since that’s a management function.)

“If you would prefer to restrict or mediate my access to stakeholders, that’s OK; I can work that way too. Doing so will likely come with a cost of extra time on the part of the project manager, and the risk of broken-telephone-style miscommunication between the stakeholders and me. However, if you’re prepared to take responsibility for that risk, I’m fine with it too.

“Since I am manager of neither the product, nor the project, nor the developers, I do not have the authority to direct them. However, I am happy to report on everything I know about the product—and the apparent problems and risks in it—to those who do have the required authority and responsibility, and they can make the appropriate decisions based on everything they know about the product and business and its needs.

“I am not a gatekeeper, or owner, or ‘approver’ of the quality of the product. I am not a manager or decision maker. I am a reporter on the status of the product, and of the testing, and of the quality of the testing, and I’ll report accordingly. My “approval” is immaterial; what matters is what managers and the business want. It is they, not I, who decide whether a problem is a showstopper or something we’re prepared to live with. It is they, not I, who decide whether problems are significant enough to extend the schedule or increase the budget for the project.

“It’s my job to contribute information to any decision to approve or reject, but it’s not my job to make that decision. I would like someone else to be responsible for the ‘approve’ or ‘reject’ checkbox as such. However, if the tool that we’re using restricts me to ‘approve’ and ‘reject’, let me tell you what those mean, because what they say is inconsistent with their normal English meanings, and we should all be aware of that.

“Pressing ‘Approve’ means this, and only this: ‘I am not aware of any problem in this area that threatens the value of the product, project, or business to any person that matters.’

Pressing ‘Reject’ means ‘I am aware of a specific problem or I have some reason to believe that there could be a problem in this area that I have not had the opportunity to identify yet.’ In other words, ‘reject’ means that I see risk; there’s something about the product or about the testing that I believe managers or the programmers should be aware of. ‘Reject’ means no more than that.

“In either case, we should frequently discuss my observations, potential issues, questions, obstacles and coverage notes, to avoid the possibility that I’m overlooking something important, or that I’m over-emphasizing the significance of particular problems.”

How you’re viewed depends on the commitments (another example here) that you make and declare about what you do, what you’re willing to do, and what you’re not willing to do. If your role, your profile and your commitments don’t match, getting them lined up is your most urgent and important job.

Related reading:
Signing Off
When Testers Are Asked For a Ship/No-Ship Opinion
Testers: Get Out of the Quality Assurance Business

I Represent the User! And We All Do

December 15th, 2018

As a tester, I try to represent the interests of users. Saying the user, in the singular, feels like a trap to me. There are usually lots of users, and they tend to have diverse and sometimes competing interests. I’d like to represent and highlight the interests of users that might have been forgotten or overlooked.

There’s another trap, though. As Cem Kaner has pointed out, it’s worth remembering that practically everyone else on the team represents the interests of end users in some sense. “End users want this product in a timely way at a reasonable price, so let’s get things happening on schedule and on budget,” say the project managers. “End users like lots of features,” say the marketers. “End users want this specific feature right away,” say the sales people. “End users want this feature optimized like I’m making it now,” say the programmers. I’d be careful about claiming that I represent the end user—and especially insinuating that I’m the only one who does—when lots of other people can credibly make that claim.

Meanwhile, I aspire to test and find problems that threaten the value of the product for anyone who matters. That includes anyone who might have an interest in the success of the product, like managers and developers, of course. It also includes anyone whose interests might have been forgotten or neglected. Technical support people, customer service representatives, and documentors spring to mind as examples. There are others. Can you think of them? People who live in other countries or speak other languages, whether they’re end users or business partners or colleagues in remote offices, are often overlooked or ignored.

All of the people in our organization play a role in assuring quality. I can assure the quality of my own work, but not of the product overall. For that reason, it seems inappropriate to dub myself and my testing colleagues as “quality assurance”. The “quality assurance” moniker causes no end of confusion and angst. Alas, not much has changed over the last 35 years or so: no one, including the most of the testing community, seems willing to call testers what they are: testers.

That’s a title I believe we should wear proudly and humbly. Proudly, because we cheerfully and diligently investigate the product, learning deeply about it where most others merely prod it a little. Humbly, because we don’t create the product, design it, code it, or fix it if it has problems. Let’s honour those who do that, and not make the over-reaching claim that we assure the quality of their work.