Blog Posts for the ‘Documentation’ Category

Breaking the Test Case Addiction (Part 6)

Tuesday, February 5th, 2019

In the last installment, we ended by asking “Once the tester has learned something about the product, how can you focus a tester’s work without over-focusing it?

I provided some examples in Part 4 of this series. Here’s another: scenario testing. The examples I’ll provide here are based on work done by James Bach and Geordie Keitt several years ago. (I’ve helped several other organizations apply this approach much more recently, but they’re less willing to share details.)

The idea is to use scenarios to guide the tester to explore, experiment, and get experience with the product, acting on ideas about real-world use and about how the product might foreseeably be misused. It’s nice to believe that careful designs, unit testing, BDD, and automated checking will prevent bugs in the product — as they certainly help to do — but to paraphrase Gertrude Stein, experience teaches experience teaches. Pardon my words, but if you want to discover problems that people will encounter in using the product, it might help to use the damned product.

The scenario approach that James and Geordie developed uses richer, more elaborate documentation than the one- to three-sentence charters of session-based test management. One goal is to prompt the tester to perform certain kinds of actions to obtain specific kinds of coverage, especially operational coverage. Another goal is to make the tester’s mission more explicit and legible for managers and the rest of the team.

Preparing for scenario testing involves learning about the product using artifacts, conversations, and preliminary forms of test activity (I’ve given examples throughout this series, but especially in Part 1). That work leads into developing and refining the scenarios to cover the product with testing.

Scenarios are typically based around user roles, representing people who might use the product in particular ways. Create at least a handful of them. Identify specifics about them, certainly about the jobs they do and the tasks they perform. You might also want to incorporate personal details about their lives, personalities, temperaments, and conditions under which they might be using the product.

(Some people refer to user roles as “personas”, as the examples below do. A word of caution over a potential namespace clash: what you’ll see below is a relatively lightweight notion of “persona”. Alan Cooper has a different one, which he articulated for design purposes, richer and more elaborate than what you’ll see here. You might seriously consider reading his books in any case, especially About Face (with Reimann, Cronin, and Noessel) and the older The Inmates are Running the Asylum.)

Consider not only a variety of roles, but a variety of experience levels within the roles. People may be new to our product; they may be new to the business domain in which our product is situated; or both. New users may be well or poorly trained, subject to constant scrutiny or not being observed at all. Other users might be expert in past versions of our products, and be irritated or confused by changes we’ve made.

Outline realistic work that people do within their roles. Identify specific tasks that they might want to accomplish, and look for things that might cause problems for them or for people affected by the product. Problems might take the form of harm, loss, or diminished value to some person who matters. Problems might also include feelings like confusion, irritation, frustration, or annoyance.

Remember that use cases or user stories typically omit lots of real-life activity. People are often inattentive, careless, distractable, under pressure. People answer instant messages, look things up on the web, cut and paste stuff between applications. They go outside, ride in elevators, get on airplanes and lose access to the internet; things that we all do every day that we don’t notice. And, very occasionally, they’re actively malicious.

Our product may be a participant in a system, or linked to other products via interfaces or add-ins or APIs. At very least, our product depends on platform elements: the hardware upon which it runs; peripherals to which it might be connected, like networks, printers, or other devices; application frameworks and libraries from outside our organization; frameworks and libraries that we developed in-house, but that are not within the scope of our current project.

Apropos of all this, the design of a set of scenarios includes activity patterns or moves that a tester might make during testing:

  • Assuming the role or persona of a particular user, and performing tasks that the user might reasonably perform.
  • Considering people who are new to the product and/or the domain in which the product operates (testing for problems with ease of learning)
  • Considering people who have substantial experience with the product (testing for problems with ease of use).
  • Deliberately making foreseeable mistakes that a user in a given role might make (testing for problems due to plausible errors).
  • Using lots of functions and features of the product in realistic but increasingly elaborate ways, and that trigger complex interactions between functions.
  • Working with records, objects, or other data elements to cover their entire lifespan: creating, revising, refining, retrieving, viewing, updating, merging, splitting, deleting, recovering… and thereby…
  • Developing rich, complex sets of data for experimentation over periods longer than single sessions.
  • Simulating turbulence or friction that a user might encounter: interruptions, distractions, obstacles, branching and backtracking, aborting processes in mid-stream, system updates, closing the laptop lid, going through a train tunnel…
  • Working with multiple instances of the product, tools, and/or multiple testers to introduce competition, contention, and conflict in accessing particular data items or resources.
  • Giving the product to different peripherals, running it on different hardware and software platforms, connecting it to interacting applications, working in multiple languages (yes, we do that here in Canada).
  • Reproducing behaviours or workflows from comparable or competing products.
  • Considering not only the people using the product, but the people that interact with them; their customers, clients, network support people, tech support people, or managers.

To put these ideas to work at ProChain (a company that produces project management software), James and Geordie developed a scenario playbook.
Let’s look at some examples from it.

The first exhibit is a one-page document that outlines the general protocol for setting up scenario sessions.

PCE Scenario Testing Setup
PCE Scenario Testing General Setup Sheet

This document is an overview that applies to every sessions. It is designed primarily to give managers and supporting testers a brief overview of the process and and how it should be carried out. (A supporting tester is someone who is not a full-time tester, but is performing testing under the guidance and supervision of a responsible tester — an experienced tester, test lead, or a test manager. A responsible tester is expected to have learned and internalized the instructions on this sheet.) There are general notes here for setting up and patterns of activities to be performed during the session.

Testers should be familiar with oracles by which we recognize problems, or should learn about oracles quickly. When this document was developed, there was a list of patterns of consistency with the mnemonic acronym HICCUPP; that’s now FEW HICCUPPS. For any given charter, there may be specific consistency patterns, artifacts, documents, tools, or mechanisms to apply that can help the tester to notice and describe problems.

Here’s an example of a charter for a specific testing mission:

PCE Scenario Testing Example Charter 1
PCE Scenario Testing Example Charter 1

The Theme section outlines the general purpose of the session, as a one- to three- line charter would in session-based test management. The Setup section identifies anything that should be done specifically for this session.

Note that the Activities section offers suggestions that are both specific and open. Openness helps to encourage variation that broadens coverage and helps to keep the tester engaged (“For some tasks…”; “…in some way,…”). The specificity helps to focus coverage (“set the task filter to show at least…”; the list of different ways to update tasks).

The Oracles section identifies specific ways for the tester to look for problems, in addition to more general oracle principles and mechanisms. The Variations section prompts the tester to try ideas that will introduce turbulence, increase stress, or cover more test conditions.

A debrief and a review of the tester’s notes after the session helps to make sure that the tester obtained reasonable coverage.

Here’s another example from the same project:

Here the tester is being given a different role, which requires a different set of access rights and a different set of tasks. In the Activities and Variations section, the tester is encouraged to explore and to put the system into states that cause conflicts and contention for resources.

Creating session sheets like these can be a lot more fun and less tedious than typing out instructions in formally procedurally scripted test cases. Because they focus on themes and test ideas, rather than specific test conditions, the sheets are more compact and easier to review and maintain. If there are specific functions, conditions, or data values that must be checked, they can be noted directly on the sheet — or kept separately with a reference to them in the sheet.

The sheets provide plenty of guidance to the tester while giving him or her freedom to vary the details during the session. Since the tester has a general mission to investigate the product, but not a script to follow, he or she is also encouraged and empowered to follow up on anything that looks unusual or improper. All this helps to keep the tester engaged, and prevents him or her from being hypnotized by a script full of someone else’s ideas.

You can find more details on the development of the scenarios in the section “PCE Scenario Testing” in the Rapid Software Testing Appendices.

Back in our coaching session, Frieda once again picked up the role of the test-case-fixated manager. “If we don’t give them test cases, then there’s nothing to look at when they’re done? How will we know for sure what the tester has covered?”

It might seem as though a list of test cases with check marks beside them would solve the accountability problem — but would it? If you don’t trust a tester to perform testing without a script, can you really trust him to perform testing with one?

There are lots of ways to record testing work: the tester’s personal notes or SBTM session sheets, check marks and annotations on requirements and other artifacts, application log files, snapshot tools, video recording… Combine these supporting materials with a quick debriefing to make sure that the tester is working in professional way and getting the job done. If the tester is new, or a supporting tester, increase training, personal supervision and feedback until he or she gains your trust. And if you still can’t bring yourself to trust them, you probably shouldn’t have them testing for you at all.

Frieda, still in character, replied “Hmmm… I’d like to know more about debriefing.” Next time!

Breaking the Test Case Addiction (Part 5)

Tuesday, January 29th, 2019

In our coaching session (which started here), Frieda was still playing the part of a manager who was fixated on test cases—and doing it very well. She played a typical management card: “What about learning about the product? Aren’t test cases a good way to do that?”

In Rapid Software Testing, we say that testing is evaluating a product by learning about it through exploration and experimentation, which includes questioning, modeling, studying, manipulating, making inferences, etc. So learning is an essential part of testing. There are lots of artifacts and people that testers could interact with to start learning about the product, which I’ve discussed already. Let’s look at why making a tester work through test cases might not be such a good approach.

Though test cases are touted as a means of learning about the product, my personal experience is that they’re not very helpful at all for that purpose. Have you ever driven somewhere, being guided by a list of instructions from Google Maps, synthesized speech from a navigation system, or even spoken instructions from another person? My experience is that having someone else direct my actions disconnects me from wayfinding and sensemaking. When I get to my destination, I’m not sure how I got there, and I’m not sure I could find my way back.

If I want to learn something and have it stick, a significant part of my learning must be self-guided. From time to time, I must make sense of where I’ve been, where I am, and where I’m going. I must experience some degree of confusion and little obstacles along the way. I must notice things that are interesting and important to me that I can connect to the journey. I must have the freedom to make and correct little mistakes.

Following detailed instructions might aid in accomplishing certain kinds of tasks efficiently. However, following instructions can get in the way of learning something, and the primary mission of testing is to learn about the product and its status.

You could change the assignment by challenging the tester to walk through a set of test cases to find problems in them, or to try to divine the motivation for them, and that may generate some useful insights.

But if you really want testers to learn about the product, here’s how I’d do it: give them a mission to learn about the product. Today we’ll look at instances of learning missions that you can apply early in the tester’s engagement or your own. Such missions tend to be broad and open, and less targeted towards specific risks and problems than they might be later. I’ll provide a few examples, with comments after each one.

“Interview the product manager about the new feature. Identify three to six user roles, and (in addition to your other notes) create sketches or whiteboard diagrams of some common instances of how they might use the feature. In your conversation, raise and discuss the possibility of obstacles or interruptions that might impede the workflow. Take notes and photos.”

As the principles of context-driven testing note, the product is a solution. If the problem isn’t solved, the product doesn’t work. When the product poses new problems, it might not be working either from the customer’s perspective.

“Attend the planning session for the new feature. Ask for descriptions of what we’re building; who we’re building it for; what kind of problems they might experience; and how we would recognize them as problems. Raise questions periodically about testability. Take minutes of the discussions in the meeting.”

Planning meetings tend to be focused on envisioning success; on intention. Those meetings present opportunities to talk anticipating failure; on how we or the customer might not achieve our goals, or might encounter problems. Planning a product involves planning ways of noticing how it might go wrong, too.

“Perform a walkthrough of this component’s functionality with a developer or a senior tester. Gather instances of functions in the product, or data that it processes, that might represent exceptions or extremes. Collect sets of ideas for test conditions that might trigger extreme or exceptional behaviour, or that might put the product in an unstable state. Create a risk list, with particular focus on threats to capability, reliability, and data integrity that might lead to functional errors or data loss.”

In Rapid Software Testing parlance, a test condition is something that can be examined during a test, or something that might change the outcome of a test. It seems to me that when people use formalized procedural test cases, often their intention is to examine particular test conditions. However, those conditions can be collected and examined using many different kinds of artifacts: tables, lists, annotated diagrams or flowcharts, mind maps…

“Review the specification for the product with the writer of the user manual. In addition to any notes or diagrams that you keep, code the contents of the specification. (Note: “code” is used here in the sense used in qualitative research; not in the sense of writing computer code.) That is, for each numbered paragraph, try to identify at least one and up to three quality criteria that are explicitly or implicitly mentioned. Collate the results and look for quality criteria that are barely mentioned or missing altogether, and be on the lookout for mysterious silences.”

There’s a common misconception about testing: that testers look for inconsistencies between the product and a description of the product, and that’s all. But excellent testers look at the product, at descriptions of the product, and at intentions for the product, and seek inconsistencies between all of those things. Many of our intentions are tacit, not explicit. Note also that the designer’s model of the user’s task may be significantly different from the user’s model.

Notice that each example above includes an information mission. Each one includes a mandate to produce specific, reviewable artifacts, so that the tester’s learning can be evaluated with conversation and documented evidence. Debriefing and relating learning to others is an important part of testing in general, and session-based test management in particular.

Each example also involves collaboration with other people on the team, so that inconsistencies between perspectives can be identified and discussed. And notice: these are examples. They are not templates to be followed. It’s important that you develop your own missions, suited to the context in which you’re working.

At early stages of the tester’s engagement, finding problems is not the focus. Learning is. Nonetheless, as one beneficial side effect, the learning may reveal some errors or inconsistencies before they can turn into bugs in the product. As another benefit, testers and teams can collect ideas for product and project risk lists. Finally, the learning might reveal test conditions that can usefully be checked with tools, or that might be important to verify via explicit procedures.

Back to the coaching session. “Sometimes managers say that it’s important to give testers explicit instructions when we’re dealing with an offshore team whose first language is not English”, said Frieda.

Would test cases really make that problem go away? Presumably the test cases and the product would be written in English too. If the testers don’t understand English well, then they’ll scarcely be able to read the test cases well, or to comprehend the requirements or the standards, or to understand what the product is trying to tell them through its (presumably also English) user interface.

Maybe the product and the surrounding artifacts are translated from English into the testers’ native language. That addresses one kind of problem, but introduces a new one: requirements and specifications and designs and jargon routinely get misinterpreted even when everyone is working in English. When that material is translated, some meaning is inevitably changed or lost in translation. All of these problems will need attention and management.

If a product does something important, presumably there’s a risk of important problems, many of which will be unanticipated by test cases. Wouldn’t it be a good idea to have skilled testers learn the product reasonably rapidly but also deeply to prepare them to seek and recognize problems that matter?

When testers are up and running on a project, there are several approaches towards focusing their work without over-focusing it. I’ve mentioned a few already. We’ll look at another one of those next.

Breaking the Test Case Addiction (Part 4)

Monday, January 21st, 2019

Note: this post is long from the perspective of the kitten-like attention spans that modern social media tends to encourage. Fear not. Reading it could help you to recognize how you might save you hours, weeks, months of excess and unnecessary work, especially if you’re working as a tester or manager in a regulated environment.

Testers frequently face problems associated with excessive emphasis on formal, procedurally scripted testing. Politics, bureaucracy, and paperwork combine with fixation on test cases. Project managers and internal auditors mandate test cases structured and written in a certain form “because FDA”. When someone tells you this, it’s a pretty good indication that they haven’t read the FDA’s guidance documentation.

Because here’s what it really says:

For each of the software life cycle activities, there are certain “typical” tasks that support a conclusion that the software is validated. However, the specific tasks to be performed, their order of performance, and the iteration and timing of their performance will be dictated by the specific software life cycle model that is selected and the safety risk associated with the software application. For very low risk applications, certain tasks may not be needed at all. However, the software developer should at least consider each of these tasks and should define and document which tasks are or are not appropriate for their specific application. The following discussion is generic and is not intended to prescribe any particular software life cycle model or any particular order in which tasks are to be performed.

General Principles of Software Validation;
Final Guidance for Industry and FDA Staff, 2002

The General Principles of Software Validation document is to some degree impressive for its time, 2002. It describes some important realities. Software problems are mostly due to design and development, far less to building and reproduction. Even trivial programs are complex. Testing can’t find all the problems in a product. Software doesn’t wear out like physical things do, and so problems often manifest without warning. Little changes can have big, wide-ranging, and unanticipated effects. Using standard and well-tested software components addresses one kind of risk, but integrating those components requires careful attention.

There are lots of problems with General Principles of Software Validation document, too. I’ll address several of these, I hope, in future posts.

Apropos of the present discussion, the document doesn’t describe what a test case is, nor how it should be documented. By my count, the document mentions “test case” or “test cases” 30 times. Here’s one instance:

“Test plans and test cases should be created as early in the software development process as feasible.”

Here are two more:

“A software product should be challenged with test cases based on its internal structure and with test cases based on its external specification.”

If you choose to interpret “test case” as an artifact, and consider that challenge sufficient, this would be pretty terrible advice. It would be analogous to saying that children should be fed with recipes, or that buildings should be constructed with blueprints. A shallow reading could suggest that the artifact and the performance guided by that artifact are the same thing; that you prepare the recipe before you find out what the kids can and can’t eat, and what’s in the fridge; that you evaluate the building by comparing it to the blueprints and then you’re done.

On the other hand, if you substitute “test cases” with “tests” or “testing”, it’s pretty great advice. It’s a really good idea to challenge a software product with tests, with testing, based on internal and external perspectives.

The FDA does not define “test case” in the guidance documentation. A definition does appear in Glossary of Computer System Software Development Terminology (8/95).

test case. (IEEE) Documentation specifying inputs, predicted results, and a set of execution conditions for a test item. Syn: test case specification. See: test procedure

Okay, let’s see “test procedure”:

test procedure (NIST) A formal document developed from a test plan that presents detailed instructions for the setup, operation, and evaluation of the results for each defined test. See: test case.

So it is pretty terrible advice after all.

(Does that “8/95” refer to August 1995? Yes, it does. None of the source documents for the  Glossary of Computer System Software Development Terminology (8/95) is dated after 1994. For some perspective, that’s before Windows 95; before Google; before smartphones and tablets; before the Manifesto for Agile Software Development; before the principles of context-driven testing…)

But happily, in Section 2 of General Principles of Software Validation, before any of the guidance on testing itself, is the Principle of the Least Burdensome Approach:

We believe we should consider the least burdensome approach in all areas of medical device regulation. This guidance reflects our careful review of the relevant scientific and legal requirements and what we believe is the least burdensome way for you to comply with those requirements. However, if you believe that an alternative approach would be less burdensome, please contact us so we can consider your point of view.

The “careful review” happened in the period leading up to 2002, which is the publication date of this guidance document. In testing community of those days, anything other than ponderously scripted procedural test cases were viewed with great suspicion in writing and conference talks. Thanks to work led by Cem Kaner, James Bach, and other prominent voices in the testing community, the world is now a safer place for exploration in testing. And, as noted in the previous post in this series, the FDA itself has acknowledged the significance and importance of exploratory work.

Test documentation may take many forms more efficient and effective than formally scripted procedures, and the Least Burdensome Approach appears to allow a lot of leeway as long as evidence is sufficient and the actual regulations are followed. (For those playing along at home, the regulations include Title 21 Code of Federal Regulations (CFR) Part 11.10 and 820, and 61 Federal Register (FR) 52602.)

Several years ago, James Bach began some consulting work with a company that made medical devices. They had hired him to analyze, report on, and contribute to the testing work being done for a particular Class III device. (I have also done some work for this company.)

The device consisted of a Control Box, operated by a technician. The Control Box was connected to a Zapper Box that delivered Healing Energy to the patient’s body. (We’ve modified some of the specific words and language here to protect confidentiality and to summarize what the devices do.) Insufficient Healing Energy is just Energy. Too much Healing Energy, or the right amount for too long, turns into Hurting Energy or Killing Energy.

When James arrived, he examined the documentation being given to testers. He found more than a hundred pages of stuff like this:

9.8.1 To verify Power Accuracy

9.8.1.1Connect the components according to the General Setup document.
9.8.1.2Power on and connect Power Monitor (instead of electrodes).
9.8.1.3Power on the Zapper Box.
9.8.1.4Power on the Control Box.
9.8.1.5Set default settings of temperature and power for zapping.
9.8.1.6Set test jig load to nominal value.
9.8.1.7Select nominal duration and nominal power setting.
9.8.1.8Press the Start button.
9.8.1.9Verify Zapper reports the power setting value ±10% on display.

Is this good formal testing?

It’s certainly a formal procedure to follow, but where’s the testing part? The closest thing is that little molecule of actual testing in the last line: the tester is instructed to apply an oracle by comparing the power setting on the Control Box with what the Zapper reports on its display. There’s nothing to suggest examining the actual power being delivered by noting the results from the Power Monitor. There’s nothing about inducing variation to obtain and extend coverage, either.

At one point, James and another tester defrosted this procedure. They tried turning on the Control Box first, and then waited for a variety of intervals to turn on the Zapper Box. To their amazement, the Zapper Box could end up in one of four different states, depending on how long they waited to start it—and at least a couple of those states were potentially dangerous to the patient or to the operator.

James replaced 50 pages of this kind of stuff with two paragraphs containing things that had not been covered previously. He started by describing the test protocol:

3.1 General testing protocol

In the test descriptions that follow, the word “verify” is used to highlight specific items that must be checked. In addition to those items a tester shall, at all times, be alert for any unexplained or erroneous behavior of the product. The tester shall bear in mind that, regardless of any specific requirements for any specific test, there is the overarching general requirement that the product shall not pose an unacceptable risk of harm to the patient, including any unacceptable risks due to reasonably foreseeable misuse.

Read that paragraph carefully, sentence by sentence, phrase by phrase. Notice the emphasis on looking for problems and risks—especially on the risk of human error.

Then he described the qualifications necessary for testers to work on this product:

3.2 Test personnel requirements

The tester shall be thoroughly familiar with the Zapper Box and Control Box Functional Requirements Specification, as well as with the working principles of the devices themselves. The tester shall also know the working principles of the Power Monitor Box test tool and associated software, including how to configure and calibrate it, and how to recognize if it is not working correctly. The tester shall have sufficient skill in data analysis and measurement theory to make sense of statistical test results. The tester shall be sufficiently familiar with test design to complement this protocol with exploratory testing, in the event that anomalies appear that require investigation. The tester shall know how to keep test records to credible, professional standard.

In summary: Be a scientist. Know the domain, know the tools, be an analyst, be an investigator, keep good lab notes.

Then James provided some concise test ideas, leaving plenty of room for variation designed to shake out bugs. Here’s an example like something from the real thing:

3.2.2 Fields and Screens

3.2.2.1With the Power Monitor test tool already running, start the Zapper Box and the Control Box. Vary the order and timing in which you start them, retain the Control Box and Power Monitor log files, and note any inconsistent or unexpected behaviour.
3.2.2.2Visually inspect the displays and verify conformance to the requirements and for the presence of any behaviour or attribute that could impair the performance or safety of the product in any material way.
3.2.2.3With the system settings at default values change the contents of every user-editable field through the range of all possible values for that field. (e.g. Use the knob to change the session duration from 1 to 300 seconds.) Visually verify that appropriate values appear and that everything that happens on the screen appears normal and acceptable.
3.2.2.4Repeat 3.2.2.3 with system settings changed to their most extreme possible values.
3.2.2.5Select at least one field and use the on-screen keyboard, knob, and external keyboard respectively to edit that field.
3.2.2.6Scan the Control Box and Power Monitor log files for any recorded error conditions or anomalies.

To examine certain aspects of the product and its behaviour, sometimes very specific test design matters. Here’s a representative snippet based on James’ test documentation:

3.5.2 Single Treatment Session Power Accuracy Measurement

3.5.2.3From the Power Monitor log file, extract the data for the measured electrode. This sample should comprise the entire power session, including cooldown, as well as the stable power period with at least 50 measurements (i.e., taken at least five times per second over 10 seconds of stable period data).
3.5.2.4From the Control Box log file, extract the corresponding data for the stable power period of the measured electrode.
3.5.2.5Calculate the deviation by subtracting the reported power for the measured electrode from the corresponding Power Monitor reading (use interpolation to synchronize the time stamp of the power meter and generation logs).
3.5.2.6Calculate the mean of the power sample X (bar) and its standard deviation (s).
3.5.2.7Find the 99% confidence and 99% two-sided tolerance interval k for the sample. (Use Table 5 of SOP-QAD-10, or use the equation below for large samples.)
3.5.2.8The equation for calculating the tolerance interval k is:
Zapper Formula
where χ2γ,N-1 is the critical value of the chi-square distribution with degrees of freedom N -1 that is exceeded with probability γ; and Z2(1-p)/2 is the critical value of the normal distribution which is exceeded with probability (1-p)/2. (See NIST Engineering Statistics Handbook.)

Now, that’s some real formal testing. And it was accepted just fine by the organization and the FDA auditors. Better yet, and following this protocol revealed some surprising behaviours that prompted more careful evaluation of the requirements for the product.

What are some lessons we could learn from this? One key point, it seems to me, is that when you’re working as a tester in a regulated environment, it’s crucial that you read the regulations and the guidance documentation. If you don’t, you run the risk of being pushed around by people who haven’t read them, and who are working on the basis of mythology and folklore.

Our over-arching mission as testers is to seek and find problems that threaten the value of the product. In contexts where human life, health, or safety are on the line, the primary job at hand is to learn about the product and problems that post risks and hazards to people. Excessive bureaucracy and paperwork can distract us from that mission; even displace it. Therefore, we must find ways to do the best testing possible, while still providing the best and least evidence that still completely satisfies auditors and regulators that we’ve done it.

Back in our coaching session, Frieda, acting the part of the manager, replied, “But… we don’t have the time to train testers to do that kind of stuff. We need them to be up to speed ASAP.”

“What does ‘up to speed’ actually mean?” I asked.

Frieda, still in character, replied “We want them to be banging on keys as quickly as possible.”

Uh huh. Imagine a development manager responsible for a medical device saying, “We don’t have time for the developers to learn what they’re developing. We want them up to speed as quickly as possible. (And, as we all know, programming is really just banging on keys.)”

The error in this line of thinking is that testing is about pushing buttons; producing widgets on a production line; flipping testburgers. If you treat testing as flipping testburgers, then there’s a risk that testers will flip whatever vaguely burger-shaped thing comes their way… burgers, frisbees, cow pies, hockey pucks… You may not get the burger you want.

If you think of testing as an investigation of the product, testers must be investigators, and skillful ones at that. Upon engaging with the product and the project, testers set to learning about the product they’re investigating and the domain in which it operates. Testers keep excellent lab notes and document their work carefully, but not to the degree that documentation displaces the goal of testing the system and finding problems in it. Testers are focused on risk, and trained to be aware of problems that they might encounter as they’re testing (per CFR Title 21 Part 820.25 (b)(2)) .

If they’re not sufficiently skilled when you hire them, you’ll supervise and train them until they are. And if they’re unskilled and can’t be trained… are you really sure you want them testing a device that could deliver Killing Energy?

How else might you guide testing work, whether in projects in regulated contexts or not? That’s a topic for next time.

Breaking the Test Case Addiction (Part 3)

Thursday, January 17th, 2019

In the previous post, “Frieda”, my coaching client, asked about producing test cases for auditors or regulators. In Rapid Software Testing (RST), we find it helpful to frame that in terms of formal testing.

Testing is formal to the degree that it must be done in a specific way, or to verify specific facts. Formal testing typically has the goal of confirming or demonstrating something in particular about the product. There’s a continuum to testing formality in RST. My version, a tiny bit different from James Bach‘s, looks like this:

Some terminology notes: checking is the process of operating and observing a product; applying decision rules to those observations; and then reporting on the outcome of those rules; all mechanistically, algorithmically. A check can be turned into a formally scripted process that can be performed by a human or by a machine.

Procedurally scripted test cases are instances of human checking, where the tester is being substantially guided by what the script tells her to do. Since people are not machines and don’t stick to the algorithms, people are not checking in the strictest sense of our parlance.

A human transceiver is someone doing things based only on the instructions of some other person, behaving as that person’s eyes, ears, and hands.

Machine checking is the most formal mode of testing, in that machines perform checks in entirely specific ways, according to a program, entirely focused on specific facts. The motivation to check doesn’t come from the machine, but from some person. Notice that programs are formal, but programming is an informal activity. Toolsmiths and people who develop automated checks are not following scripts themselves.

The degree to which you formalize is a choice, based on a number of context factors. Your context guides your choices, and both of those evolve over time.

One of the most important context factors is your mission. You might be in a regulated environment, where regulators and auditors will eventually want you to demonstrate specific things about the product and the project in a highly formal way. If you are in that context, keeping the the auditors and the regulators happy may require certain kinds of formal testing. Nonetheless, even in that context, you must perform informal testing—lots of it—for at least two big reasons.

The first big reason is to learn the about the product and its context to prepare for excellent formal testing that will stand up to the regulators’ scrutiny. This is tied to another context factor: where you are in the life of the project and your understanding of the product.

Formal testing starts with informal work that is more exploratory and tacit, with the goal of learning; less scripted and explicit, with the goal of demonstrating. All the way along, but especially in between those poles, we’re searching for problems. No less than the Food and Drug Administration emphasizes how important this is.

Thorough and complete evaluation of the device during the exploratory stage results in a better understanding of the device and how it is expected to perform. This understanding can help to confirm that the intended use of the device will be aligned with sponsor expectations. It also can help with the selection of an appropriate pivotal study design.

Section 5: The Importance of Exploratory Studies in Pivotal Study Design
Design Considerations for Pivotal Clinical Investigations for Medical Devices
Guidance for Industry, Clinical Investigators, Institutional Review Boards
and Food and Drug Administration Staff

The pivotal stage of device development, says the FDA, focuses on developing what people need to know to evaluate the safety and effectiveness of a product. The pivotal stage usually consists of one or more pivotal studies. In other words, the FDA acknowledges that development happens in loops and cycles; that development is an iterative process.

James Bach emphasized this in his talk The Dirty Secret of Formal Testing and it’s an important point in RST. Development is an iterative process because at the beginning of any cycle of work, we don’t know for sure what all the requirements are; what they mean; what we can get; and how we might decide that we’ve got it. We don’t really know that until we’ve until we’ve tested the product… and we don’t know how to test the product until we’ve tried to test the product!

Just like developing automated checks, developing formally scripted test cases is an informal process. You don’t follow a script when you’re interpreting a specification; when you’re having a conversation with a developer or a designer; when you’re exploring the product and the test space to figure out where checking might be useful or important. You don’t follow a script when you recognize a new way of using tools to learn something about the product, and apply them. And you don’t follow a script when you investigate bugs that you’ve found—either during informal testing or the formal testing that might follow it.

If you try to develop formal procedural test cases without testing the actual product, they stand a good chance of being out of sync with it. The dirty secret of format testing is that all good formal testing begins with informal testing.

It might be a very good idea for programmers to develop some automated checks that helps them with the discipline of building clean code and getting rapid feedback on it. It’s also a good idea for developers, designers, testers, and business people to develop clear ideas about intentions for a product, envisioning success. It might also be a good idea to develop some automated checks above the unit level and apply them to the build process—but not too many and certainly not too early. The beginning of the work is usually a terrible time for excessive formalization.

Which brings us to the second big reason to perform informal testing continuously throughout any project: to address the risk that our formal testing to date will fail to reveal how the product might disappoint customers; lose someone’s money; blow something up; or hurt or kill people. We must be open to discovery, and to performing the testing and investigation that supports it, all the way throughout the project, because neither epiphanies nor bugs follow scripts or schedules.

The overarching mission of testing is focused on a question: “are there problems that threaten the value of the product, or the on-time, successful completion of our work?” That’s not a question that formal testing can ever answer on its own. Fixation on automated checks or test cases runs the risk of displacing time for experimentation, exploration, discovery, and learning.

Next time, we’ll look at an example of breaking test case addiction on a real medical device project. Stay tuned.

Breaking the Test Case Addiction (Part 2)

Wednesday, January 16th, 2019

Last time out, I was responding to a coaching client, a tester who was working in an organization fixated on test cases. Here, I’ll call her Frieda. She had some more questions about how to respond to her managers.

What if they want another tester to do your tests if you are not available?

“‘Your tests’, or ‘your testing’?”, I asked.

From what I’ve heard, your tests. I don’t agree with this but trying to see it from their point of view, said Frieda.

I wonder what would happen if we asked them “What happens when you want another manager to do your managing if you are not available?” Or “What happens when you want another programmer to do programming if the programmer is not available?” It seems to me that the last thing they would suggest would be a set of management cases, or programming cases. So why the fixation on test cases?

Fixation is excessive, obsessive focus on something to the exclusion of all else. Fixation on test cases displaces people’s attention from other important things: understanding of how the testing maps to the mission; whether the testers have sufficient skill to understand and perform the testing; the learning comes from testing and that feeds back into more testing; whether formalization is premature or even necessary…

A big problem, as I suggested last time, is a lack of managers’ awareness of alternatives to test cases. That lack of awareness feeds into a lack of imagination, and then loops back into a lack of awareness. What’s worse is that many testers suffer from the same problem, and therefore can’t help to break the loop. Why do managers keep asking for test cases? Because testers keep providing them. Why do testers keep providing them? Because managers keep asking for them, because testers keep providing them…, and the cycle continues.

That cycle also continues because there’s an attractive, even seductive, aspect to test cases: they can make testing appear legible. Legibility, as Venkatesh Rao puts it beautifully here, “quells the anxieties evoked by apparent chaos”.

Test cases help to make the messy, complex, volatile landscape of development and testing seem legible, readable, comprehensible, quantifiable. A test case either fails (problem!) or passes (no problem!). A test case makes the tester’s behaviours seem predictable and clear, so clear that the tester could even be replaced by a machine. At the beginning of the project, we develop 782 test cases. When we’ve completed 527 of them, the testing is 67.39% done!

Many people see testing as rote, step-by-step, repetitive, mechanical keypressing to demonstrate that the product can work. That gets emphasized by the domain we’re in: one that values the writing of programs. If you think keypressing is all there is to it, it makes a certain kind of sense to write programs for a human to follow so that you can control the testing.

Those programs become “your tests”. We would call those “your checks—where checking is the mechanistic process of applying decision rules to observations of the software.

On the other hand, if you are willing to recognize and accept testing as a complex, cognitive investigation of products, problems, and risks, your testing is a performance. No one else can do just as you do it. No one can do again just what you’ve done before. You yourself will never do it the same way twice. If managers want people to do “your testing” when you’re not available, it might be more practical and powerful to think of it as “performing their investigation on something you’ve been investigating”.

Investigation is structured and can be guided, but good investigation can’t be scripted. That’s because in the course of a real investigation, you can’t be sure of what you’re going to find and how you’re going to respond to it. Checking can be algorithmic; the testing that surrounds and contains checking cannot.

Investigation can be influenced or guided by plenty of things that are alternatives to test cases:

Last time out, I mentioned almost all of these as things that testers could develop while learning about the product or feature. That’s not a coincidence. Testing happens in tangled loops and spirals of learning, analysis, exploration, experimentation, discovery, and investigation, all feeding back into each other. As testing proceeds, these artifacts and—more importantly—the learning they represent can be further developed, expanded, refined, overproduced, put aside, abandoned, recovered, revisited…

Testers can use artifacts of these kinds as evidence of testing that has been done, problems that have been found, and learning that has happened. Testers can include these artifacts in test reports, too.

But what if you’re in an environment where you have to produce test cases for auditors or regulators?

Good question. We’ll talk about that next time.

Breaking the Test Case Addiction (Part 1)

Tuesday, January 15th, 2019

Recently, during a coaching session, a tester was wrestling with something that was a mystery to her. She asked:

Why do some tech leaders (for example, CTOs, development managers, test managers, and test leads) jump straight to test cases when they want to provide traceability, share testing efforts with stakeholders, and share feature knowledge with testers?

I’m not sure. I fear that most of the time, fixation on test cases is simply due to ignorance. Many people literally don’t know any other way to think about testing, and have never bothered to try. Alarmingly, that seems to apply not only to leaders, but to testers, too. Much of the business of testing seems to limp along on mythology, folklore, and inertia.

Testing, as we’ve pointed out (many times), is not test cases; testing is a performance. Testing, as we’ve pointed out, is the process of learning about a product through exploration and experimentation, which includes to some degree questioning, studying, modeling, observation, inference, etc. You don’t need test cases for that.

The obsession with procedurally scripted test cases is painful to see, because a mandate to follow a script removes agency, turning the tester into a robot instead of an investigator. Overly formalized procedures run a serious risk of over-focusing testing and testers alike. As James Bach has said, “testing shouldn’t be too focused… unless you want to miss lots of bugs.”

There may be specific conditions, elements of the product, notions of quality, interactions with other products, that we’d like to examine during a test, or that might change the outcome of a test. Keeping track of these could be very important. Is a procedurally scripted test case the only way to keep track? The only way to guide the testing? The best way? A good way, even?

Let’s look at alternatives for addressing the leaders’ desires (traceability, shared knowledge of testing effort, shared feature knowledge).

Traceability. It seems to me that the usual goal of traceability is be able to narrate and justify your testing by connecting test cases to requirements. From a positive perspective, it’s a good thing to make those connections to make sure that the tester isn’t wasting time on unimportant stuff.

On the other hand, testing isn’t only about confirming that the product is consistent with the requirements documents. Testing is about finding problems that matter to people. Among other things, that requires us to learn about things that the requirements documents get wrong or don’t discuss at all. If the requirements documents are incorrect or silent on a given point, “traceable” test cases won’t reveal problems reliably.

For that reason, we’ve proposed a more powerful alternative to traceability: test framing, which is the process of establishing and describing the logical connections between the outcome of the test at the bottom and the overarching mission of testing at the top.

Requirements documents and test cases may or may not appear in the chain of connections. That’s okay, as long as the tester is able to link the test with the testing mission explicitly. In a reasonable working environment, much of the time, the framing will be tacit. If you don’t believe that, pause for a moment and note how often test cases provide a set of instructions for the tester to follow, but don’t describe the motivation for the test, or the risk that informs it.

Some testers may not have sufficient skill to describe their test framing. If that’s so, giving test cases to those testers papers over that problem in an unhelpful and unsustainable way. A much better way to address the problem would, I believe, would be to train and supervise the testers to be powerful, independent, reliable agents, with freedom to design their work and responsibility to negotiate it and account for it.

Sharing efforts with stakeholders. One key responsibility for a tester is to describe the testing work. Again, using procedurally scripted test cases seems to be a peculiar and limited means for describing what a tester does. The most important things that testers do happen inside their heads: modeling the product, studying it, observing it, making conjectures about it, analyzing risk, designing experiments… A collection of test cases, and an assertion that someone has completed them, don’t represent the thinking part of testing very well.

A test case doesn’t tell people much about your modeling and evaluation of risk. A suite of test cases doesn’t either, and typical test cases certainly don’t do so efficiently. A conversation, a list, an outline, a mind map, or a report would tend to be more fitting ways of talking about your risk models, or the processes by which you developed them.

Perhaps the worst aspect of using test cases to describe effort is that tests—performances of testing activity—become reified, turned into things, widgets, testburgers. Effort becomes recast in terms of counting test cases, which leads to no end of mischief.

If you want people to know what you’ve done, record and report on what you’ve done. Tell the testing story, which is not only about the status of the product, but also about how you performed the work, and what made it more and less valuable; harder or easier; slower or faster.

Sharing feature knowledge with testers. There are lots of ways for testers to learn about the product, and almost all of them would foster learning better than procedurally scripted test cases. Giving a tester a script tends to focus the tester on following the script, rather than learning about the product, how people might value it, and how value might be threatened.

If you want a tester to learn about a product (or feature) quickly, provide the tester with something to examine or interact with, and give the tester a mission. Try putting the tester in front of

  • the product to be tested (if that’s available)
  • an old version of the product (while you’re waiting for a newer one)
  • a prototype of the product (if there is one)
  • a comparable or competitive product or feature (if there is one)
  • a specification to be analyzed (or compared with the product, if it’s available)
  • a requirements document to be studied
  • a standard to review
  • a user story to be expanded upon
  • a tutorial to walk through
  • a user manual to digest
  • a diagram to be interpreted
  • a product manager to be interviewed
  • another tester to pair with
  • a domain expert to outline a business process

Give the tester the mission to learn something based on one or more of these things. Require the tester to take notes, and then to provide some additional evidence of what he or she learned.

(What if none of the listed items is available? If none of that is available, is any development work going on at all? If so, what is guiding the developers? Hint: it won’t be development cases!)

Perhaps some people are concerned not that there’s too little information, but too much. A corresponding worry might be that the available information is inconsistent. When important information about the product is missing, or unclear, or inconsistent, that’s a test result with important information about the project. Bugs breed in those omissions or inconsistencies.

What could be used as evidence that the tester learned something? Supplemented by the tester’s notes, the tester could

  • have a conversation with a test lead or test manager
  • provide a report on the activities the tester performed, and what the tester learned (that is, a test report)
  • produce a description of the product or feature, bugs and all (see The Honest Manual Writer Heuristic)
  • offer proposed revisions, expansions, or refinements of any of the artifacts listed above
  • identify a list of problems about the product that the tester encountered
  • develop a list of ways in which testers might identify inconsistencies between the product and something desirable (that is, a list of useful oracles)
  • report on a list of problems that the tester had in fulfilling the information mission
  • in a mind map, outline a set of ideas about how the tester might learn more about the product (that is, a test strategy)
  • list out a set of ideas about potential problems in the product (that is, a risk list)
  • develop a set of ideas about where to look for problems in product (that is, a product coverage outline)

Then review the tester’s work. Provide feedback, coaching and mentoring. Offer praise where the tester has learned something well; course correction where the tester hasn’t. Testers will get a lot more from this interactive process than from following step-by-step instructions in a test case.

My coaching client had some more questions about test cases. We’ll get to those next time.

Oracles from the Inside Out, Part 5: Oracles as References as Media

Tuesday, September 15th, 2015

Try asking testers how they recognize problems. Many will respond that they compare the product to its specification, and when they see an inconsistency between the product and its specification, they report a bug. Others will talk about creating and running automated checks, using tools to compare output from the product to specific, pre-determined, expected results; when the product produces a result inconsistent with expectations, the check identifies a bug which the tester then reports to the developer or manager. It might be tempting to think of this as moving from the bottom right quadrant on this table to the bottom left.

Traditional talk about oracles refers almost exclusively to references. W.E. Howden, who introduced “oracle” as a term of testing art, said that an oracle as “an external mechanism which can be used to check test output for correctness”. Yet thinking of oracles in terms of correctness leads to some pretty serious problems. (I’ve outlined some of them here).

In the Rapid Software Testing namespace, we take a different, broader view of oracles. Rather than focusing on correctness, we focus on problems: an oracle is a means by which we recognize a problem when we encounter one during testing. Checking for correctness, as Howden puts it, may severely limit our capacity to notice many kinds of problems. A product or service can be correct with respect to some principle, but have plenty of problems that aren’t identified by that principle; and a product can produce incorrect results without the incorrectness representing a problem for anyone. When testers fixate on documented requirements, there’s a risk that they will restrict their attention to looking for inconsistencies with specific claims; when testers fixate on automated checks, there’s a risk that they will restrict their focus to inconsistency with a comparable algorithm. Focus your attention too narrowly on a particular oracle—or a particular class of oracle—and you can be confident of one thing: you’ll miss lots of bugs.

Documents and tools are media. In the most general sense, “medium” is descriptive of something in between, like “small” and “large”. But “medium” as a noun, a medium, can be between lots of things. A communication medium like radio sits between performers and an audience; a psychic medium, so the claim goes, provides a bridge between a person and the spirit world; when people want to exchange things of value, they use often use money as a medium for the exchange. Marshall McLuhan, an early and influential media theorist, said that a medium is anything that humans create or use to effect change. Media are tools, technologies that people use to extend, enhance, enable, accelerate, or intensify human capabilities. Extension is the most obvious and prominent effect of media. Most people think of media in terms of communications media. A medium can certainly be printed pages or television screens that enable messages to be conveyed from one person to another. McLuhan viewed the phonetic alphabet as a technology—a medium that extended the range of speech over great distances and accelerated its transmission. But a cup of coffee is a medium too; it extends alertness and wakefulness, and when consumed socially with others, it can extend conversation and friendliness. Media, placed between a product and our observation of it, extend our capacity to recognize bugs.

McLuhan emphasized that media change things in many different ways at the same time. In addition to extending or enabling or accelerating our capabilities, McLuhan said, every new medium obsolesces one or more existing media, grabbing our attention away from old things; every new medium retrieves notions of formerly obsolescent media, making old things new again. McLuhan used heat as a metaphor for the degree to which media require the involvement of the user; a “cool” medium like radio, he said, requires the listener to participate and fill in the missing pieces of the experience; a “hot” medium like a movie, provides stimulation to the ear and especially the eye, requiring less engagement from the viewer. Every medium, when “overheated” (McLuhan’s term for a medium that has been stretched or extended beyond its original or intended capacity), reverses into the opposite of what it might have been originally intended to accomplish. Socrates (and the King of Egypt) recognized that writing could extend memory, but could reverse into forgetfulness (see Plato’s dialogue Phaedrus). Coffee extends alertness and conversation, but too much of it and people become too wired work and too addled to chat. A medium always draws attention to itself to some degree; an overheated medium may dazzle us so much that we begin to ignore what it contains or what we intended it to do for us. More importantly, a medium affects us. This is one of the implications of McLuhan’s famous but oblique statement “the medium is the message”. By “message”, he means “the change of scale or pace or pattern” that a new invention or innovation “introduces into human affairs.” (This explanation comes from Mark Federman, to whom I’m indebted for explaining McLuhan’s work to me over the years.)

When we pay attention, we can easily observe media overheating both in talk about testing and development work and in the work itself. Documents and tools frequently dominate conversations. In some organizations, a problem won’t be considered a bug unless it is inconsistent with an explicit statement in a specification or requirements document. Yet documents are only partial representations, subsets, of what people claim to have known or believed at some point in time, and times change. In some places, testing work is dominated by automated checking. Checks can be very valuable, providing great precision and fast feedback. But checks may focus on functional aspects of the product, and less on other parafunctional attributes.

McLuhan’s work emphasizes that media are essentially neutral, agnostic to our purposes. It is our engagement with media that produces good or bad outcomes—good and bad outcomes. Perhaps the most important implication of McLuhan’s work is that media amplify whatever we are. If we’re fabulous testers, our tools extend our capabilities, helping us to be even more fabulous. But if we’re incompetent, tools extend our incompetence, allowing us to do bad testing faster and worse than we’ve ever been able to do it before. To the degee that we are inclined to avoid conflict and arguments, we will use documents to help us avoid conflict and arguments; to the degree that we are inclined to welcome discussion and the refinement of ideas, then documents can help us do that. If we are disposed to be alert to a wide range of problems, automated checks will help us as we diversify our scope; if we are oblivious to certain kinds of problems in the product, automated checks will amplify our oblivion.

Reference oracles—documents, checking tools, representative data, comparable products—are unquestionably media, extending all of the other kinds of oracles: private and shared mental models, both private and shared feelings, conversations with others, and principles of consistency. How can we evaluate them? What do we use them for? And how can we use them to help us find problems without letting them overwhelm or displace all the other ways we might have of finding problems? That’s the subject of the next post.

Braiding The Stories (Test Reporting Part 2)

Friday, February 24th, 2012

We were in the middle of a testing exercise at the Amplifying Your Effectiveness conference in 2005. I was assisting James Bach in a workshop that he was leading on testing. He presented the group with a mysterious application written by James Lyndsay—an early version of one of the Black Box Test Machines. “How many test cases would you need to test this application?” he asked.

Just then Jerry Weinberg wandered into the room. “Ah! Jerry Weinberg!” said James. “One of the greatest testing experts in the world! He’ll know the answer to this one. How many test cases would you need to test this application, Jerry?”

Jerry looked at the screen for a moment. “Three,” he said, firmly and decisively.

James knew to play along. “Three?!“, he said, in a feigned combination of amazement, uncertainty, and curiosity. “How do you know it’s three? Is it really three, Jerry?”

“Yes,” said Jerry. “Three.” He paused, and then said drily, “Why? Were you expecting some other number?”

In yesterday’s post, I was harshly critical of pass vs. fail ratios, a very problematic yet startlingly common way of estimating the state of the product and the project. When I point out the mischief of pass vs. fail ratios, some people object. “In the real world,” they say, “we have to report pass vs. fail ratios to our managers, because that’s what they want.” Yet bogus reporting is antithetical to the “real world”. Pass vs. fail ratios come from the the fake world, a world where numbers have magical properties to soothe troubled and uncertain souls. Still, there’s no question that managers want something. It’s our mandate to give them something of value.

Some people say that managers want numbers because they want to know that we’re measuring. I’ve found two ways of thinking about measurement that have been very useful to me. One is the definition from Kaner and Bond’s splendid paper “Software Engineering Metrics: What Do They Measure and How Do We Know?”: “Measurement is the empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them.” I think that’s a superb definition of quantitative measurement, and the paper includes a set of probing questions to test the validity of a quantitative measurement. Pass vs. fail ratios fall down badly when they’re subjected to those tests.

Jerry Weinberg offers another definition of measurement that I think is more in line with what managers really want: “Measurement is the art and science of making reliable (and significant) observations.” (The main part of the definition comes from Quality Software Management, Vol. 2: First-Order Measurement; the parenthetical comes from recent correspondence over Twitter.) That’s a more general, inclusive definition. It incorporates Kaner and Bond’s notion of quantitative measurement, but it’s more welcoming to qualitative, first-order approaches. First-order measurement, as Jerry describes it, provides answers to questions like “What seems to be happening? and What should I do now?” It entails a minimum of fuss, and tends to be direct, unobtrusive, inexpensive, and qualitative, leading either to immediate action or a decision to seek more information. It’s a common, misleading, and often expensive mistake in software development to leap over first-order measurement and reporting in favour of second-order—less direct, more quantified, more abstract, and based on more elaborate and vulnerable models.

My experience, as a tester, a programmer, a program manager, and a consultant, tells me that to manage a project well, you need a good deal of immediate and significant information. “Immediate” here doesn’t only mean timely; it also means unmediated, without a bunch of stuff getting in between you and the observation. In particular, managers need to know about problems that threaten the value of the product and the on-time, successful completion of the project. That knowledge requires more than abstract data; it requires information. So, as testers, how can we inform the decision-makers? In our Rapid Software Testing class, James Bach and I have lately taken to emphasizing this: We must learn to describe and report on the product, our testing, and the quality of our testing. This involves constructing, editing, narrating, and justifying a story in three lines that weave around each other like a braid. Each line, or level, is its own story.

Level 1: Tell the product story. The product story is a qualitative report on how the product can work, how it fails, and how it might fail in ways that matter to our clients. “Working”, “failure”, and “what matters” are all qualitative evaluations. Quality is value to some person; in a business setting, quality is value to some person who matters to the business. A qualitative report about a product requires us to relate the nature of the product, the people who matter, and the presence or absence of value, risks, and problems for those people. Qualitative information makes it possible for our clients to make informed decisions about quality.

Level 2: To make the product story credible, tell the testing story. The testing story is about how we configured, operated, observed, and evaluated the product; what we actually did and what we actually saw. The testing story gives warrant to the product story; it helps our clients understand why they should believe and trust the product story we’re giving. The testing story is centred around the coverage that we obtained and the oracles that we applied. Coverage is the extent to which we’ve tested the program; it’s about where we’ve looked and how we’ve looked, and it’s also about what’s uncovered—where we might not have looked yet, and where we don’t intend to look. Oracles are central to evaluation; they’re the principles and mechanisms that allow us to recognize a problem. The product story will likely feature problems in the product; the testing story, where necessary, includes an account of how we knew they were problems, for whom they would be problems, and inferences about how serious the problems it might be. We can make inferences about the significance of problems, but not ultimate conclusions, since the decision of what matters and what constitutes a problem lies with the product owner. The product story and our clients’ reactions to it will influence the ongoing testing story, and vice versa.

Level 3: To make the testing story credible, tell a story about the quality of the testing. Just as the product story needs warrant, so too does the testing story. To tell a story about the quality of testing requires us to describe why the testing we’ve done has been good enough, and why the testing we haven’t done hasn’t been so important so far. The quality-of-testing story includes details on what made testing harder or slower, what made the product more or less testable, what the risks and costs of testing are, and what we might need or recommend in order to provide better, more accurate, more timely information. The quality-of-testing story will shape and be shaped by the other two stories.

Develop skills to tell and frame stories. People sometimes justify presenting invalid numbers in lieu of stories by saying that numbers are “efficient”. I think they mean “fast”, since efficiency of communication depends not only on speed, but also on value, relevance, validity, and the level of detail your client needs. In order to frame stories appropriately and hit the right level of detail…

Don’t think data feed; think the daily news. Testing is like investigative journalism, researching and delivering stories to people. The newspaper business knows how to direct attention efficiently to the stories in which we’re interested, such that we get the level of detail that we seek. Some of those strategies include:

  • Headlines. A quick glance over each page tells us immediately what, in the editors’ judgement, are the most salient aspects of any given story. Headlines come in different sizes, relative to the editors’ assessment of the importance of the story.
  • Front page. The paper comes folded. The stories that the paper deems most important to its reader are on the front page, above the fold. Other important stories are on the front page below the fold. The page is laid out to direct our attention to what we find most relevant, and to allow us to focus and refocus on items of interest.
  • Continuation. When an entire story is too long to fit on the front page, it’s abbreviated and the story continues elsewhere. This gives the reader the option of following the story or looking at other items on the front page.
  • Coverage areas. The newspaper is organized into sections (hard news, business, sports, life and leisure, arts, real estate, cars, travel, and so forth). Each section comes with its own front page, which generally includes headlines and continuations of its own.
  • Structured storytelling. Newspaper stories tend to be organized in spiralling levels of detail, such that the story is set up to follow the inverted pyramid (the link is well worth reading). The story typically begins with the most newsworthy information, usually immediately addressing the five W questions—who, what where, why, and when, plus how—and the the story builds from there. The key is that the reader can absorb information to the level of detail she seeks, continuing to the end of the story or jumping out when she’s satisfied.
  • Identifying who is involved and who is affected. Reporters and editors contextualize their stories. Just as in testing, people are the most important element of the context. A story is far more compelling when it affects the reader or people that the reader cares about. A good story often helps to clarify why the reader should care.
  • Varying approaches to delivering information. Newspapers often use a picture to helps to illustrate or emphasize an important aspect of a story. In the business or sports sections, where quantitative data is often crucial, information may be organized in tables, or trends may be illustrated with charts. Notice that the stories—first-order reports—are always given greater prominence than the tables of stock quotes league standings, and line scores.
  • Sidebars. Some stories are illuminated by background information that might break the flow of the main story. That information is presented in parallel; in another thread, as we might say.
  • Daily (and in the world of the Web, continuous) delivery of information. My newspaper arrives at a regular time each day, a sort of daily heartbeat for the news cycle. The paper’s Web site is updated on a continuous basis. Information is available both on a supply and a demand basis; both when I expect it and when I seek it.
  • Identifiable sources. Well-researched stories gain credibility by identifying how, where, when, and from whom the information was obtained. This helps to set up degrees of trust and skepticism in the reader.

One important note: These approaches apply to more than text. Testers need to extend these patterns not only to written or mechanical forms, but to oral discourse.

I’ll have more suggestions and additional parallels between test reporting and newspapers in the next post in this series.

Scripts or No Scripts, Managers Might Have to Manage

Wednesday, December 21st, 2011

A fellow named Oren Reshef writes in response to my post on Worthwhile Documentation.

Let me be the devil’s advocate for a post.

Not having fully detailed test steps may lead to insufficient data in bug reports.

Yup, that could be a risk (although having fully detailed steps in a test script might also lead to insufficient data in bug reports; and insufficient to whom, exactly?).

So what do you do with a problem like that? You manage it. You train the tester, reminding her of the heuristic that each problem report needs a problem description; an example of something that shows the problem; and why she thinks it’s a problem (that is, the oracle; the principle or mechanism by which the tester recognizes the problem). Problem, example, and why; PEW. You praise and reward the tester for producing reports that follow the PEW heuristic; you critique reports that don’t have them. You show the tester lots of examples of bug reports, and ask her to differentiate between the good ones and the bad ones, why each one might be consider good or bad, and in what ways. If the tester isn’t getting it, you have the tester work with and be coached by someone who does get it. The coach talks the tester through the process of identifying a problem, deciding why it’s a problem, and outlining the necessary information. Sometimes it’s steps and specific data; sometimes the steps are obvious and it’s only the data you need to specify; sometimes the problem happens with any old data, and it’s the steps that are important. And sometimes the description of the problem contains enough information that you need supply neither steps nor data. As a tester under time pressure, she needs to develop the skill to do this rapidly and well—or, if nothing works, she might have to find a job for which she is better suited.

You can argue that a good tester should include the needed information and steps in her bug report, but this raise (at least) two problems:

– The same information may be duplicated across many bugs, and even worst it will not be consistent.

As a manager, I can not only argue that a tester should include the needed information; I can require that a tester include the needed information. Come on, Mr. Advocate… this is a problem that a capable tester and a capable test manager (and presumably your client) can solve. If “the same” information is duplicated across many bugs, might that be an interesting factor worth noting? A test result, if you will? Will this actually persist for long without the test manager (or test leads, or the test team) noticing or managing it?

And in any case, would a script solve the problem that you post above? If you can solve that problem in a script, can you solve it in a (set of) bug report(s)?

Writing test steps is not as trivial as it sounds (for example due to cognitive biases, or simply by overlooking steps that seems obvious to you), and to be efficient they also need to be peer reviewed and tested. You don’t want that to happen in a bug report.

“Writing test steps is not as trivial as it sounds.” I know. It’s non-trivial in terms of time, and it’s non-trivial in terms of skill, and it’s non-trivial in terms of cost. That’s why I write about those problems. That’s why James Bach writes about them.

Again: how do you solve problems like testers providing inefficient repro steps? You solve it with training, practice, coaching, review, supervision, observation, interaction… that is, if you don’t like the results you’re getting, you steer the testers in the direction you want them to go, with leadership and management.

The tester may choose the same steps over and over, or steps that are easier for her but does not represent real customers.

Yes, I often hear things like this to justify poor testing. “Real customers” according to whom? It seems as though many organizations have a problem recognizing that hackers are real; that people under pressure are real; that people who make mistakes are real; that people who can become distracted are real. That people who get up and go away from the keyboard, such that a transaction times out are real.

Is it the role of testers to behave always like idealized “real” customers? That’s like saying that it’s the role of airport security to assume that all of the business class customers are “real” business people. I’d argue that it’s nice for testers to be able to act like customers, but it’s far more important for testers to act like testers. It’s the tester’s role to identify important vulnerabilities in the product. Sometimes that involves behaving like a typical customer, and sometimes it involves behaving like an atypical customer, or and sometimes it involves behaving like someone who is not a customer at all. But again, mostly it involves behaving like a tester.

Again you may argue that a good tester should take all that into account, but it’s not that simple to verify it especially for tests involving many short trivial steps.

Maybe it isn’t that simple. If that’s a problem, what about logging? What about screen capture tools? Such tools will track activities far more accurately than a script the tester allegedly followed. After all, a test script is just a rumour of how something should be done, and the claim that the script was followed is also a rumour. What about direct supervision and scrutiny? What about occasional pairing? What about reviewing the testers’ work? What about providing feedback to testers, while affording them both freedom and responsibility?

And would scripts solve that problem when (for example) you’re a recording bug that you’ve just discovered (probably after deviating from a script)? How, exactly? What happens when a problem identified by a script is fixed? Does the value of the script stay constant over time?

Detailed test steps (at least to some extent) might important if your test activity might be transferred to another offshore team someday (happened to me a few weeks ago, I sent them a test document with only high level details and hoped for the best), or your customer requires in-depth understanding of your tests (a multi-billion Canadian telecommunication company insisted on getting those from us during the late 90’s, we chose the least readable TestDirector export format and shipped it to them…).

Ah, yes. “I sent them a test document with only high level details and hoped for the best.” What can I say about “hope” as a management approach? Does a pile of test scripts impart in-depth understanding? Or are they (as I suspect) a way of responding to a question that you didn’t know how to answer, which was in fact a question that the telco didn’t know how to ask?

Going through some set of actions by rote is not a test. A test script is not a test. A test is what you think and what you do. It is a complex, cognitive activity that requires the presence or the development of much tacit knowledge. Raw data or raw instructions at best provide you with a miniscule fraction of what you need to know. If someone wanted in-depth understanding of how a retail store works, would you send them a pile of uncontextualized cash register receipts?

The Devil’s Advocate never seems to have a thoughtful manager for a client. I would suggest that a tester neither hire nor work for the devil.

Thank you for playing the devil’s advocate, Oren.

What Exploratory Testing Is Not (Part 5): Undocumented Testing

Wednesday, December 21st, 2011

This week I had the great misfortune of reading yet another article which makes the false and ridiculous claim that exploratory testing is “undocumented”. After years and years of plenty of people talking about and writing about and practicing excellent documentation as part of an exploratory testing approach, it’s depressing to see that there are still people shovelling fresh manure onto a pile that should have been carted off years ago.

Like the other approaches to test activities that have been discussed in this series (“touring“, “after-everything-else“, “tool-free“, and “quick testing“), “documented vs. undocumented” is in a category orthogonal to “exploratory vs. scripted”. True: usually scripted activities are performed by some agency following a set of instructions that has been written down somewhere. But we could choose to think of “scripted” in a slightly different and more expansive way, as “prescriptive”, or “mimeomorphic“. A scripted activity, in this sense, is one for which the actions to be performed have been established in advance, and the choices of the actions are not determined by the agency performing them. In that sense, a cook at McDonalds doesn’t read a script as he prepares your burger, but the preparation of a McDonald’s burger is a highly scripted activity.

Thus any kind of testing can be heavily documented or completely undocumented. A thoroughly documented test might be highly exploratory in nature, or it might be highly scripted.

In the Rapid Software Testing class, James Bach and I point out that when someone says “that should be documented”, what they’re really saying is “that should be documented if and how and when it serves our purposes.” So, let’s start by looking at the “when”.

When we question anything in order to evaluate it, there are moments in the process in which we might choose to record ideas or actions. I’ve broken these down into three basic categories that I hope you find helpful:

  • Before

  • During

  • After

There are “before”, “during”, and “after” moments with respect to any test activity, whether it’s a part of test design, test execution, result interpretation, or learning. Again, a hallmark of exploratory testing is the tester’s freedom and responsibility to optimize the value of the work as it’s happening. That means that when it’s important to record something, the tester is not only welcome but encouraged to

  • pick up a pen
  • take a screen shot
  • launch a session of Rapid Reporter
  • create or update a mind map
  • fire up a screen recorder
  • initiate logging (if it doesn’t start by default on the product you’re testing—and if logging isn’t available, you might consider identifying that as a testability problem and a related product and project risk)
  • sketch out a flowchart diagram
  • type notes into a private or shared repository
  • add to a table of data in Excel
  • fire off a note to a programmer or a product owner
and that’s an incomplete list. But they’re all forms of documentation.

Freedom to document at will should also mean that the tester is free to refrain from documenting something when the documentation doesn’t add value. At the same time, the tester is responsible and accountable for that decision. In Rapid Testing, we recommend writing down (or saving, or illustrating) only the things that are necessary or valuable to the project, and only when the value of doing so exceeds the cost. This doesn’t mean no documentation; it means the most informative yet fastest and least expensive documentation that completely fulfils the testing mission. Integrating that with testing work leads, we hold, to excellent testing—but it takes practice and skill.

For most test activities, it’s possible to relay information to other people orally, or even sometimes by allowing people to observe our behaviour. (At the beginning of the Rapid Testing class, I sometimes silently hold aloft a 5″ x 8″ index card in landscape orientation. I fold it in half along the horizontal axis, and write my first name on one side using a coloured marker. Everyone in the class mimics my actions. Without a single word of instruction being given or questions being asked, either verbally or in writing, the mission has been accomplished: each person now has a tent card in front of him.)

There’s a potential risk associated with an exploratory approach: that the tester might fail to document something important. In that case, we do what skilled people do with risk: we manage it. James Bach talks at length about managing exploratory testing sessions here. Producing appropriate documentation is partly a technical process, but the technical considerations are dominated by business imperatives: cost, value, and risk. There are also social considerations, too. The tester, the test lead, the test manager, the programmers, other managers, and the product owner determine collaboratively what’s important to document and what’s not so important with respect to the current testing mission. In an exploratory approach, we’re more likely to be emphasizing the discovery of new information. So we’re less likely to spend time on documenting what we will do, more likely to document what we are doing and what we have done. We could do a good deal of preparatory reading and writing, even in an exploratory approach—but we realize that there’s an ever-increasing risk that new discoveries will undermine the worth of what we write ahead of time.

That leads directly to “our purposes”, the task that we want to accomplish when documenting something. Just as testing itself has many possible missions, so too does test documentation. Here’s a decidedly non-exhaustive list, prepared over a couple of minutes:

  • to express testing strategy and tactics for an entire project, or for projects in general
  • to keep a set of personal notes to help structure a debriefing conversation
  • to outline testing activities for a test cycle
  • to report on activities during testing execution
  • to outline attributes of a particular quality criterion
  • to catalogue ideas about risk
  • to describe test coverage
  • to account for the work that we’ve done
  • to program a machine to perform a given set of actions
  • to alert people to potential problems in the product
  • to guide a tester’s actions over a test session
  • to identify structures in the application or service
  • to provide a description of how to use a particular test tool that we’ve crafted
  • to describe the tester’s role, skills, and qualifications
  • to explain business rules to someone else on the team
  • to outline scenarios in which the product might be used or tested
  • to identify, for a tester, a specific, explicit sequence of actions to perform, input to provide, and observations to make

That last item is the classic form of highly scripted testing, and that kind of documentation is usually absent from exploratory testing. Even so, a tester can take an exploratory approach using a script as a point of departure or as a reference, just as you might use a trail map to help guide an off-trail hike (among other things, you might want to discover shortcuts or avoid the usual pathways). So when someone says that “exploratory testing is undocumented”, I hear them saying something else. I hear them saying, “I only understand one form of test documentation, and I’ve successfully ignored every other approach to it or purpose for it.”

If you look in the appendices for the Rapid Software Testing class (you can find a .PDF at http://www.satisfice.com/rst-appendices.pdf), you’ll see a large number of examples of documentation that are entirely consistent with an exploratory approach. That’s just one source. For each item in my partial list above, here’s a partial list of approaches, examples, and tools.

Testing strategy and tactics for an entire project, or for projects in general.
Look at the Satisfice Heuristic Test Strategy Model and the Context Model for Heuristic Test Planning (these also appear in the RST Appendices).

An outline of testing activities for a test cycle.
Look at the General Functionality and Stability Test Procedure for Certifed for Microsoft Windows Logo. See also the OWL Quality Plan (and the Risk and Task Correlation) in the RST Appendices.

Keeping a set of personal notes to help structure a debriefing or other conversation.
See the “Beans ‘R Us Test Report” in the RST Appendices; or see my notes on testing an in-flight entertainment system which I did for fun on a flight from India to Amsterdam.

Recording activities and ideas during test execution
A video camera or a screen recording tool can capture the specific actions of a tester for later playback and review. Well-designed log files may also provide a kind of retrospective record about what was testing. Still neither of these provide insight into the tester’s mind. Recorded narration or conversation can do that; tools like BB Test Assistant, Camtasia, or Morae can help. The classic approach, of course, is to take notes. Have a look at my presentation, “An Exploratory Tester’s Notebook“, which has examples of freestyle notes taken during an impromptu testing session, and detailed, annotated examples of Session-Based Test Management sessions. Shmuel Gerson’s Rapid Reporter and Jonathan Kohl’s Session Tester are tools oriented towards taking notes (and, in the former case, including screen captures) of testing sessions on the fly.

Outlining many attributes of a particular quality criterion
See “Heuristics of Software Testability” in the RST Appendices for one example.

Cataloguing ideas about risk
Several examples of this in the RST Appendices, most extensively in the “Deployment Planning and Risk Analysis” example. You’ll also find an “Install Risk Catalog”; “The Risk of Incompatibility”; the Risk vs. Tasks section in the “OWL Quality Plan”; the “Y2K Compliance Report”; “Round Results Risk A”, which shows a mapping of Risk Areas vs. Test Strategy and Tasks.

Describing or outlining test coverage
A mapping establishes or illustrates relationships between things. We can use any of these to help us think about test coverage. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. These can be constructed before, after, or during a given test activity, with the goal of covering the map with tests, or using testing to extend the map. I catalogued several ways of thinking about coverage and reporting on it, in three articles Got You Covered, Cover or Discover, and A Map By Any Other Name. Several examples of lightweight coverage outlines can be found in the RST Appendices (“Putt Putt Saves the Zoo”, “Table Formatting Test Notes”, There are also coverage ideas incorporated into the Apollo mission notes that we’ve titled “Guideword Heuristics for Astronauts”).

Accounting for testing work that we’ve done.
See Session-Based Test Management, and see “An Exploratory Tester’s Notebook“. Darren McMillan provides excellent examples of annotated mind maps; scroll down to the section headed “Session Reports”, and continue through “Simplifying feedback to management” and “Simplifying feedback to groups”. A forthcoming article, written by me, shows how a senior test manager tracks testing sessions at a half-day granularity level.

Programming a machine to help you to explore
See all manner of books on programming, both references and cookbooks, but for testers in particular, have a look at Brian Marick’s Everyday Scripting with Ruby. Check out Pete Houghton’s splendid examples of exploratory test automation that begin here. Cem Kaner (often in collaboration with Doug Hoffman) write extensively about automation-assisted exploratory testing; an example is here.

Alerting people to potential problems in the product
In general, bug reporting systems provide one way to handle the task of recording and reporting problems in the product. James Bach provides an example of a report that he provided to a client (along with a more informal account of the session).

Guiding a tester’s actions over a test session
Guiding a tester involves skills like chartering and checklisting. Start with the documentation on Session Based Test Management (http://www.satisfice.com/sbtm). Selena Delesie has produced an excellent blog post on chartering exploratory testing sessions. The title of Cem Kaner’s presentation at CAST 2008, The Value of Checklists and the Danger of Scripts: What legal training suggests for testers describes the content perfectly. Michael Hunter’s You Are Not Done Yet lists can be used and adapted to your context as a set of checklists.

To identify structures in the application or service
The “Product Elements” section in the Heuristic Test Strategy Model provides a kind of framework for documenting product structures. In the RST Appendices, the test notes for “Putt Putt Saves the Zoo” and “Diskmapper”, and the “OWL Quality Plan” provide examples of identifying several different structures in the programs under test. Mind mapping provides a means of describing and illustrating structures, too; see Darren McMillan’s examples here and here. Ruud Cox and Ru Cindrea used a mind map of product elements to help win the Best Bug Report award in the Test Lab at EuroSTAR 2011. I’ve created a list of structures that support exploratory testing, and many of these are related to structures in the product.

Providing a description of how to use a particular test tool that we’ve crafted
While working at a bank, I developed (in Excel and VBA) a tool that could be used as an oracle and as a way of recording test results. (Thanks to non-disclosure agreements, I can describe these, but cannot provide examples.) When I left the project, I was obliged to document my work. I didn’t work on the assumption that anyone off the street would be reading the document. Instead, I presumed that anyone assigned to that testing job and to using that tool, would have the rapid learning skill to explore the tool, the product, and the business domain in a mutually supportive way. So I crafted documentation that was intended to tell testers just enough to get them exploring.

Explaining business rules to someone else on the team
I did include documentation for novices of one kind: within the documentation for that testing tool, I included a general description of how foreign exchange transactions worked from the bank’s perspective, and how appropriate accounts got credited and debited. I had learned this by reverse-engineering use cases and consulting with the local business analyst. I summarized it with a two-page document written in simple, direct language, referring disrectly to the simpler use cases and explaining the more confusing bits in more detail. For those whose learning style was oriented toward code, I also described the tables and array formulas that applied the business rules.

Outlining scenarios in which the product might be used or tested
I discuss some issues about scenarios here—why they’re important, and why it’s important to keep them open-ended and open to interpretation. It’s more important to record than to prescribe, since in a good scenario, you’ll observe and discover much more than you’ve articulated in advance. Cem Kaner gives ideas on how to produce scenarios; Hans Buwalda presents examples of soap opera testing.

Identifying required tester skill
People with skill don’t need prescriptive documentation for every little thing. Responsible managers identify the skills needed to test, and who commit to employing people who either have those skills or can develop them quickly. James Bach eliminated 50 pages of otiose documentation with two paragraphs. (Otiose is a marvelous word; it’s fun to look it up in a thesaurus.)

Identifying, for a tester, a particular explicit sequence of actions to perform, input to provide, and observations to make.
Again, a document that attempts to specify exactly what a tester should do is the hallmark of scripted testing. James Bach articulates a paradox that has not yet been noted clearly in our craft: in order to perform a scripted test well, you need signficant amounts of skill and tacit knowledge (and you also need to ignore the script on occasion, and you need to know when those occasions are). There’s another interesting issue here: preparing such documents usually depends on exploratory activity. There’s no script to tell you how to write a script. (You might argue there’s one exception. You can follow this script to write a test script: take each line of a requirements document, and add the words “Verify that” to the beginning of each line.)

Now, just as you can perform testing badly using any approach, you can perform exploratory testing and document it inappropriately, either by under-documenting it OR over-documenting it using any of the kinds of documentation above. But, as this document shows, the notion that exploratory testing is by its nature undocumented is not only ignorant, but aggressively ignorant about both testing and documentation. Whenever you see someone claim that exploratory testing is undocumented, I’d ask you to help by setting the record straight. Feel free to refer to this blog post, if you find it helpful; also, please point me to other exemplars of excellent documentation that are consistent with exploratory approaches. If we all work together, we can bury this myth, while providing excellent records and reports for our clients.