Blog Posts for the ‘Test Framing’ Category

Breaking the Test Case Addiction (Part 6)

Tuesday, February 5th, 2019

In the last installment, we ended by asking “Once the tester has learned something about the product, how can you focus a tester’s work without over-focusing it?

I provided some examples in Part 4 of this series. Here’s another: scenario testing. The examples I’ll provide here are based on work done by James Bach and Geordie Keitt several years ago. (I’ve helped several other organizations apply this approach much more recently, but they’re less willing to share details.)

The idea is to use scenarios to guide the tester to explore, experiment, and get experience with the product, acting on ideas about real-world use and about how the product might foreseeably be misused. It’s nice to believe that careful designs, unit testing, BDD, and automated checking will prevent bugs in the product — as they certainly help to do — but to paraphrase Gertrude Stein, experience teaches experience teaches. Pardon my words, but if you want to discover problems that people will encounter in using the product, it might help to use the damned product.

The scenario approach that James and Geordie developed uses richer, more elaborate documentation than the one- to three-sentence charters of session-based test management. One goal is to prompt the tester to perform certain kinds of actions to obtain specific kinds of coverage, especially operational coverage. Another goal is to make the tester’s mission more explicit and legible for managers and the rest of the team.

Preparing for scenario testing involves learning about the product using artifacts, conversations, and preliminary forms of test activity (I’ve given examples throughout this series, but especially in Part 1). That work leads into developing and refining the scenarios to cover the product with testing.

Scenarios are typically based around user roles, representing people who might use the product in particular ways. Create at least a handful of them. Identify specifics about them, certainly about the jobs they do and the tasks they perform. You might also want to incorporate personal details about their lives, personalities, temperaments, and conditions under which they might be using the product.

(Some people refer to user roles as “personas”, as the examples below do. A word of caution over a potential namespace clash: what you’ll see below is a relatively lightweight notion of “persona”. Alan Cooper has a different one, which he articulated for design purposes, richer and more elaborate than what you’ll see here. You might seriously consider reading his books in any case, especially About Face (with Reimann, Cronin, and Noessel) and the older The Inmates are Running the Asylum.)

Consider not only a variety of roles, but a variety of experience levels within the roles. People may be new to our product; they may be new to the business domain in which our product is situated; or both. New users may be well or poorly trained, subject to constant scrutiny or not being observed at all. Other users might be expert in past versions of our products, and be irritated or confused by changes we’ve made.

Outline realistic work that people do within their roles. Identify specific tasks that they might want to accomplish, and look for things that might cause problems for them or for people affected by the product. Problems might take the form of harm, loss, or diminished value to some person who matters. Problems might also include feelings like confusion, irritation, frustration, or annoyance.

Remember that use cases or user stories typically omit lots of real-life activity. People are often inattentive, careless, distractable, under pressure. People answer instant messages, look things up on the web, cut and paste stuff between applications. They go outside, ride in elevators, get on airplanes and lose access to the internet; things that we all do every day that we don’t notice. And, very occasionally, they’re actively malicious.

Our product may be a participant in a system, or linked to other products via interfaces or add-ins or APIs. At very least, our product depends on platform elements: the hardware upon which it runs; peripherals to which it might be connected, like networks, printers, or other devices; application frameworks and libraries from outside our organization; frameworks and libraries that we developed in-house, but that are not within the scope of our current project.

Apropos of all this, the design of a set of scenarios includes activity patterns or moves that a tester might make during testing:

  • Assuming the role or persona of a particular user, and performing tasks that the user might reasonably perform.
  • Considering people who are new to the product and/or the domain in which the product operates (testing for problems with ease of learning)
  • Considering people who have substantial experience with the product (testing for problems with ease of use).
  • Deliberately making foreseeable mistakes that a user in a given role might make (testing for problems due to plausible errors).
  • Using lots of functions and features of the product in realistic but increasingly elaborate ways, and that trigger complex interactions between functions.
  • Working with records, objects, or other data elements to cover their entire lifespan: creating, revising, refining, retrieving, viewing, updating, merging, splitting, deleting, recovering… and thereby…
  • Developing rich, complex sets of data for experimentation over periods longer than single sessions.
  • Simulating turbulence or friction that a user might encounter: interruptions, distractions, obstacles, branching and backtracking, aborting processes in mid-stream, system updates, closing the laptop lid, going through a train tunnel…
  • Working with multiple instances of the product, tools, and/or multiple testers to introduce competition, contention, and conflict in accessing particular data items or resources.
  • Giving the product to different peripherals, running it on different hardware and software platforms, connecting it to interacting applications, working in multiple languages (yes, we do that here in Canada).
  • Reproducing behaviours or workflows from comparable or competing products.
  • Considering not only the people using the product, but the people that interact with them; their customers, clients, network support people, tech support people, or managers.

To put these ideas to work at ProChain (a company that produces project management software), James and Geordie developed a scenario playbook.
Let’s look at some examples from it.

The first exhibit is a one-page document that outlines the general protocol for setting up scenario sessions.

PCE Scenario Testing Setup
PCE Scenario Testing General Setup Sheet

This document is an overview that applies to every sessions. It is designed primarily to give managers and supporting testers a brief overview of the process and and how it should be carried out. (A supporting tester is someone who is not a full-time tester, but is performing testing under the guidance and supervision of a responsible tester — an experienced tester, test lead, or a test manager. A responsible tester is expected to have learned and internalized the instructions on this sheet.) There are general notes here for setting up and patterns of activities to be performed during the session.

Testers should be familiar with oracles by which we recognize problems, or should learn about oracles quickly. When this document was developed, there was a list of patterns of consistency with the mnemonic acronym HICCUPP; that’s now FEW HICCUPPS. For any given charter, there may be specific consistency patterns, artifacts, documents, tools, or mechanisms to apply that can help the tester to notice and describe problems.

Here’s an example of a charter for a specific testing mission:

PCE Scenario Testing Example Charter 1
PCE Scenario Testing Example Charter 1

The Theme section outlines the general purpose of the session, as a one- to three- line charter would in session-based test management. The Setup section identifies anything that should be done specifically for this session.

Note that the Activities section offers suggestions that are both specific and open. Openness helps to encourage variation that broadens coverage and helps to keep the tester engaged (“For some tasks…”; “…in some way,…”). The specificity helps to focus coverage (“set the task filter to show at least…”; the list of different ways to update tasks).

The Oracles section identifies specific ways for the tester to look for problems, in addition to more general oracle principles and mechanisms. The Variations section prompts the tester to try ideas that will introduce turbulence, increase stress, or cover more test conditions.

A debrief and a review of the tester’s notes after the session helps to make sure that the tester obtained reasonable coverage.

Here’s another example from the same project:

Here the tester is being given a different role, which requires a different set of access rights and a different set of tasks. In the Activities and Variations section, the tester is encouraged to explore and to put the system into states that cause conflicts and contention for resources.

Creating session sheets like these can be a lot more fun and less tedious than typing out instructions in formally procedurally scripted test cases. Because they focus on themes and test ideas, rather than specific test conditions, the sheets are more compact and easier to review and maintain. If there are specific functions, conditions, or data values that must be checked, they can be noted directly on the sheet — or kept separately with a reference to them in the sheet.

The sheets provide plenty of guidance to the tester while giving him or her freedom to vary the details during the session. Since the tester has a general mission to investigate the product, but not a script to follow, he or she is also encouraged and empowered to follow up on anything that looks unusual or improper. All this helps to keep the tester engaged, and prevents him or her from being hypnotized by a script full of someone else’s ideas.

You can find more details on the development of the scenarios in the section “PCE Scenario Testing” in the Rapid Software Testing Appendices.

Back in our coaching session, Frieda once again picked up the role of the test-case-fixated manager. “If we don’t give them test cases, then there’s nothing to look at when they’re done? How will we know for sure what the tester has covered?”

It might seem as though a list of test cases with check marks beside them would solve the accountability problem — but would it? If you don’t trust a tester to perform testing without a script, can you really trust him to perform testing with one?

There are lots of ways to record testing work: the tester’s personal notes or SBTM session sheets, check marks and annotations on requirements and other artifacts, application log files, snapshot tools, video recording… Combine these supporting materials with a quick debriefing to make sure that the tester is working in professional way and getting the job done. If the tester is new, or a supporting tester, increase training, personal supervision and feedback until he or she gains your trust. And if you still can’t bring yourself to trust them, you probably shouldn’t have them testing for you at all.

Frieda, still in character, replied “Hmmm… I’d like to know more about debriefing.” Next time!

Breaking the Test Case Addiction (Part 5)

Tuesday, January 29th, 2019

In our coaching session (which started here), Frieda was still playing the part of a manager who was fixated on test cases—and doing it very well. She played a typical management card: “What about learning about the product? Aren’t test cases a good way to do that?”

In Rapid Software Testing, we say that testing is evaluating a product by learning about it through exploration and experimentation, which includes questioning, modeling, studying, manipulating, making inferences, etc. So learning is an essential part of testing. There are lots of artifacts and people that testers could interact with to start learning about the product, which I’ve discussed already. Let’s look at why making a tester work through test cases might not be such a good approach.

Though test cases are touted as a means of learning about the product, my personal experience is that they’re not very helpful at all for that purpose. Have you ever driven somewhere, being guided by a list of instructions from Google Maps, synthesized speech from a navigation system, or even spoken instructions from another person? My experience is that having someone else direct my actions disconnects me from wayfinding and sensemaking. When I get to my destination, I’m not sure how I got there, and I’m not sure I could find my way back.

If I want to learn something and have it stick, a significant part of my learning must be self-guided. From time to time, I must make sense of where I’ve been, where I am, and where I’m going. I must experience some degree of confusion and little obstacles along the way. I must notice things that are interesting and important to me that I can connect to the journey. I must have the freedom to make and correct little mistakes.

Following detailed instructions might aid in accomplishing certain kinds of tasks efficiently. However, following instructions can get in the way of learning something, and the primary mission of testing is to learn about the product and its status.

You could change the assignment by challenging the tester to walk through a set of test cases to find problems in them, or to try to divine the motivation for them, and that may generate some useful insights.

But if you really want testers to learn about the product, here’s how I’d do it: give them a mission to learn about the product. Today we’ll look at instances of learning missions that you can apply early in the tester’s engagement or your own. Such missions tend to be broad and open, and less targeted towards specific risks and problems than they might be later. I’ll provide a few examples, with comments after each one.

“Interview the product manager about the new feature. Identify three to six user roles, and (in addition to your other notes) create sketches or whiteboard diagrams of some common instances of how they might use the feature. In your conversation, raise and discuss the possibility of obstacles or interruptions that might impede the workflow. Take notes and photos.”

As the principles of context-driven testing note, the product is a solution. If the problem isn’t solved, the product doesn’t work. When the product poses new problems, it might not be working either from the customer’s perspective.

“Attend the planning session for the new feature. Ask for descriptions of what we’re building; who we’re building it for; what kind of problems they might experience; and how we would recognize them as problems. Raise questions periodically about testability. Take minutes of the discussions in the meeting.”

Planning meetings tend to be focused on envisioning success; on intention. Those meetings present opportunities to talk anticipating failure; on how we or the customer might not achieve our goals, or might encounter problems. Planning a product involves planning ways of noticing how it might go wrong, too.

“Perform a walkthrough of this component’s functionality with a developer or a senior tester. Gather instances of functions in the product, or data that it processes, that might represent exceptions or extremes. Collect sets of ideas for test conditions that might trigger extreme or exceptional behaviour, or that might put the product in an unstable state. Create a risk list, with particular focus on threats to capability, reliability, and data integrity that might lead to functional errors or data loss.”

In Rapid Software Testing parlance, a test condition is something that can be examined during a test, or something that might change the outcome of a test. It seems to me that when people use formalized procedural test cases, often their intention is to examine particular test conditions. However, those conditions can be collected and examined using many different kinds of artifacts: tables, lists, annotated diagrams or flowcharts, mind maps…

“Review the specification for the product with the writer of the user manual. In addition to any notes or diagrams that you keep, code the contents of the specification. (Note: “code” is used here in the sense used in qualitative research; not in the sense of writing computer code.) That is, for each numbered paragraph, try to identify at least one and up to three quality criteria that are explicitly or implicitly mentioned. Collate the results and look for quality criteria that are barely mentioned or missing altogether, and be on the lookout for mysterious silences.”

There’s a common misconception about testing: that testers look for inconsistencies between the product and a description of the product, and that’s all. But excellent testers look at the product, at descriptions of the product, and at intentions for the product, and seek inconsistencies between all of those things. Many of our intentions are tacit, not explicit. Note also that the designer’s model of the user’s task may be significantly different from the user’s model.

Notice that each example above includes an information mission. Each one includes a mandate to produce specific, reviewable artifacts, so that the tester’s learning can be evaluated with conversation and documented evidence. Debriefing and relating learning to others is an important part of testing in general, and session-based test management in particular.

Each example also involves collaboration with other people on the team, so that inconsistencies between perspectives can be identified and discussed. And notice: these are examples. They are not templates to be followed. It’s important that you develop your own missions, suited to the context in which you’re working.

At early stages of the tester’s engagement, finding problems is not the focus. Learning is. Nonetheless, as one beneficial side effect, the learning may reveal some errors or inconsistencies before they can turn into bugs in the product. As another benefit, testers and teams can collect ideas for product and project risk lists. Finally, the learning might reveal test conditions that can usefully be checked with tools, or that might be important to verify via explicit procedures.

Back to the coaching session. “Sometimes managers say that it’s important to give testers explicit instructions when we’re dealing with an offshore team whose first language is not English”, said Frieda.

Would test cases really make that problem go away? Presumably the test cases and the product would be written in English too. If the testers don’t understand English well, then they’ll scarcely be able to read the test cases well, or to comprehend the requirements or the standards, or to understand what the product is trying to tell them through its (presumably also English) user interface.

Maybe the product and the surrounding artifacts are translated from English into the testers’ native language. That addresses one kind of problem, but introduces a new one: requirements and specifications and designs and jargon routinely get misinterpreted even when everyone is working in English. When that material is translated, some meaning is inevitably changed or lost in translation. All of these problems will need attention and management.

If a product does something important, presumably there’s a risk of important problems, many of which will be unanticipated by test cases. Wouldn’t it be a good idea to have skilled testers learn the product reasonably rapidly but also deeply to prepare them to seek and recognize problems that matter?

When testers are up and running on a project, there are several approaches towards focusing their work without over-focusing it. I’ve mentioned a few already. We’ll look at another one of those next.

Breaking the Test Case Addiction (Part 4)

Monday, January 21st, 2019

Note: this post is long from the perspective of the kitten-like attention spans that modern social media tends to encourage. Fear not. Reading it could help you to recognize how you might save you hours, weeks, months of excess and unnecessary work, especially if you’re working as a tester or manager in a regulated environment.

Testers frequently face problems associated with excessive emphasis on formal, procedurally scripted testing. Politics, bureaucracy, and paperwork combine with fixation on test cases. Project managers and internal auditors mandate test cases structured and written in a certain form “because FDA”. When someone tells you this, it’s a pretty good indication that they haven’t read the FDA’s guidance documentation.

Because here’s what it really says:

For each of the software life cycle activities, there are certain “typical” tasks that support a conclusion that the software is validated. However, the specific tasks to be performed, their order of performance, and the iteration and timing of their performance will be dictated by the specific software life cycle model that is selected and the safety risk associated with the software application. For very low risk applications, certain tasks may not be needed at all. However, the software developer should at least consider each of these tasks and should define and document which tasks are or are not appropriate for their specific application. The following discussion is generic and is not intended to prescribe any particular software life cycle model or any particular order in which tasks are to be performed.

General Principles of Software Validation;
Final Guidance for Industry and FDA Staff, 2002

The General Principles of Software Validation document is to some degree impressive for its time, 2002. It describes some important realities. Software problems are mostly due to design and development, far less to building and reproduction. Even trivial programs are complex. Testing can’t find all the problems in a product. Software doesn’t wear out like physical things do, and so problems often manifest without warning. Little changes can have big, wide-ranging, and unanticipated effects. Using standard and well-tested software components addresses one kind of risk, but integrating those components requires careful attention.

There are lots of problems with General Principles of Software Validation document, too. I’ll address several of these, I hope, in future posts.

Apropos of the present discussion, the document doesn’t describe what a test case is, nor how it should be documented. By my count, the document mentions “test case” or “test cases” 30 times. Here’s one instance:

“Test plans and test cases should be created as early in the software development process as feasible.”

Here are two more:

“A software product should be challenged with test cases based on its internal structure and with test cases based on its external specification.”

If you choose to interpret “test case” as an artifact, and consider that challenge sufficient, this would be pretty terrible advice. It would be analogous to saying that children should be fed with recipes, or that buildings should be constructed with blueprints. A shallow reading could suggest that the artifact and the performance guided by that artifact are the same thing; that you prepare the recipe before you find out what the kids can and can’t eat, and what’s in the fridge; that you evaluate the building by comparing it to the blueprints and then you’re done.

On the other hand, if you substitute “test cases” with “tests” or “testing”, it’s pretty great advice. It’s a really good idea to challenge a software product with tests, with testing, based on internal and external perspectives.

The FDA does not define “test case” in the guidance documentation. A definition does appear in Glossary of Computer System Software Development Terminology (8/95).

test case. (IEEE) Documentation specifying inputs, predicted results, and a set of execution conditions for a test item. Syn: test case specification. See: test procedure

Okay, let’s see “test procedure”:

test procedure (NIST) A formal document developed from a test plan that presents detailed instructions for the setup, operation, and evaluation of the results for each defined test. See: test case.

So it is pretty terrible advice after all.

(Does that “8/95” refer to August 1995? Yes, it does. None of the source documents for the  Glossary of Computer System Software Development Terminology (8/95) is dated after 1994. For some perspective, that’s before Windows 95; before Google; before smartphones and tablets; before the Manifesto for Agile Software Development; before the principles of context-driven testing…)

But happily, in Section 2 of General Principles of Software Validation, before any of the guidance on testing itself, is the Principle of the Least Burdensome Approach:

We believe we should consider the least burdensome approach in all areas of medical device regulation. This guidance reflects our careful review of the relevant scientific and legal requirements and what we believe is the least burdensome way for you to comply with those requirements. However, if you believe that an alternative approach would be less burdensome, please contact us so we can consider your point of view.

The “careful review” happened in the period leading up to 2002, which is the publication date of this guidance document. In testing community of those days, anything other than ponderously scripted procedural test cases were viewed with great suspicion in writing and conference talks. Thanks to work led by Cem Kaner, James Bach, and other prominent voices in the testing community, the world is now a safer place for exploration in testing. And, as noted in the previous post in this series, the FDA itself has acknowledged the significance and importance of exploratory work.

Test documentation may take many forms more efficient and effective than formally scripted procedures, and the Least Burdensome Approach appears to allow a lot of leeway as long as evidence is sufficient and the actual regulations are followed. (For those playing along at home, the regulations include Title 21 Code of Federal Regulations (CFR) Part 11.10 and 820, and 61 Federal Register (FR) 52602.)

Several years ago, James Bach began some consulting work with a company that made medical devices. They had hired him to analyze, report on, and contribute to the testing work being done for a particular Class III device. (I have also done some work for this company.)

The device consisted of a Control Box, operated by a technician. The Control Box was connected to a Zapper Box that delivered Healing Energy to the patient’s body. (We’ve modified some of the specific words and language here to protect confidentiality and to summarize what the devices do.) Insufficient Healing Energy is just Energy. Too much Healing Energy, or the right amount for too long, turns into Hurting Energy or Killing Energy.

When James arrived, he examined the documentation being given to testers. He found more than a hundred pages of stuff like this:

9.8.1 To verify Power Accuracy

9.8.1.1Connect the components according to the General Setup document.
9.8.1.2Power on and connect Power Monitor (instead of electrodes).
9.8.1.3Power on the Zapper Box.
9.8.1.4Power on the Control Box.
9.8.1.5Set default settings of temperature and power for zapping.
9.8.1.6Set test jig load to nominal value.
9.8.1.7Select nominal duration and nominal power setting.
9.8.1.8Press the Start button.
9.8.1.9Verify Zapper reports the power setting value ±10% on display.

Is this good formal testing?

It’s certainly a formal procedure to follow, but where’s the testing part? The closest thing is that little molecule of actual testing in the last line: the tester is instructed to apply an oracle by comparing the power setting on the Control Box with what the Zapper reports on its display. There’s nothing to suggest examining the actual power being delivered by noting the results from the Power Monitor. There’s nothing about inducing variation to obtain and extend coverage, either.

At one point, James and another tester defrosted this procedure. They tried turning on the Control Box first, and then waited for a variety of intervals to turn on the Zapper Box. To their amazement, the Zapper Box could end up in one of four different states, depending on how long they waited to start it—and at least a couple of those states were potentially dangerous to the patient or to the operator.

James replaced 50 pages of this kind of stuff with two paragraphs containing things that had not been covered previously. He started by describing the test protocol:

3.1 General testing protocol

In the test descriptions that follow, the word “verify” is used to highlight specific items that must be checked. In addition to those items a tester shall, at all times, be alert for any unexplained or erroneous behavior of the product. The tester shall bear in mind that, regardless of any specific requirements for any specific test, there is the overarching general requirement that the product shall not pose an unacceptable risk of harm to the patient, including any unacceptable risks due to reasonably foreseeable misuse.

Read that paragraph carefully, sentence by sentence, phrase by phrase. Notice the emphasis on looking for problems and risks—especially on the risk of human error.

Then he described the qualifications necessary for testers to work on this product:

3.2 Test personnel requirements

The tester shall be thoroughly familiar with the Zapper Box and Control Box Functional Requirements Specification, as well as with the working principles of the devices themselves. The tester shall also know the working principles of the Power Monitor Box test tool and associated software, including how to configure and calibrate it, and how to recognize if it is not working correctly. The tester shall have sufficient skill in data analysis and measurement theory to make sense of statistical test results. The tester shall be sufficiently familiar with test design to complement this protocol with exploratory testing, in the event that anomalies appear that require investigation. The tester shall know how to keep test records to credible, professional standard.

In summary: Be a scientist. Know the domain, know the tools, be an analyst, be an investigator, keep good lab notes.

Then James provided some concise test ideas, leaving plenty of room for variation designed to shake out bugs. Here’s an example like something from the real thing:

3.2.2 Fields and Screens

3.2.2.1With the Power Monitor test tool already running, start the Zapper Box and the Control Box. Vary the order and timing in which you start them, retain the Control Box and Power Monitor log files, and note any inconsistent or unexpected behaviour.
3.2.2.2Visually inspect the displays and verify conformance to the requirements and for the presence of any behaviour or attribute that could impair the performance or safety of the product in any material way.
3.2.2.3With the system settings at default values change the contents of every user-editable field through the range of all possible values for that field. (e.g. Use the knob to change the session duration from 1 to 300 seconds.) Visually verify that appropriate values appear and that everything that happens on the screen appears normal and acceptable.
3.2.2.4Repeat 3.2.2.3 with system settings changed to their most extreme possible values.
3.2.2.5Select at least one field and use the on-screen keyboard, knob, and external keyboard respectively to edit that field.
3.2.2.6Scan the Control Box and Power Monitor log files for any recorded error conditions or anomalies.

To examine certain aspects of the product and its behaviour, sometimes very specific test design matters. Here’s a representative snippet based on James’ test documentation:

3.5.2 Single Treatment Session Power Accuracy Measurement

3.5.2.3From the Power Monitor log file, extract the data for the measured electrode. This sample should comprise the entire power session, including cooldown, as well as the stable power period with at least 50 measurements (i.e., taken at least five times per second over 10 seconds of stable period data).
3.5.2.4From the Control Box log file, extract the corresponding data for the stable power period of the measured electrode.
3.5.2.5Calculate the deviation by subtracting the reported power for the measured electrode from the corresponding Power Monitor reading (use interpolation to synchronize the time stamp of the power meter and generation logs).
3.5.2.6Calculate the mean of the power sample X (bar) and its standard deviation (s).
3.5.2.7Find the 99% confidence and 99% two-sided tolerance interval k for the sample. (Use Table 5 of SOP-QAD-10, or use the equation below for large samples.)
3.5.2.8The equation for calculating the tolerance interval k is:
Zapper Formula
where χ2γ,N-1 is the critical value of the chi-square distribution with degrees of freedom N -1 that is exceeded with probability γ; and Z2(1-p)/2 is the critical value of the normal distribution which is exceeded with probability (1-p)/2. (See NIST Engineering Statistics Handbook.)

Now, that’s some real formal testing. And it was accepted just fine by the organization and the FDA auditors. Better yet, and following this protocol revealed some surprising behaviours that prompted more careful evaluation of the requirements for the product.

What are some lessons we could learn from this? One key point, it seems to me, is that when you’re working as a tester in a regulated environment, it’s crucial that you read the regulations and the guidance documentation. If you don’t, you run the risk of being pushed around by people who haven’t read them, and who are working on the basis of mythology and folklore.

Our over-arching mission as testers is to seek and find problems that threaten the value of the product. In contexts where human life, health, or safety are on the line, the primary job at hand is to learn about the product and problems that post risks and hazards to people. Excessive bureaucracy and paperwork can distract us from that mission; even displace it. Therefore, we must find ways to do the best testing possible, while still providing the best and least evidence that still completely satisfies auditors and regulators that we’ve done it.

Back in our coaching session, Frieda, acting the part of the manager, replied, “But… we don’t have the time to train testers to do that kind of stuff. We need them to be up to speed ASAP.”

“What does ‘up to speed’ actually mean?” I asked.

Frieda, still in character, replied “We want them to be banging on keys as quickly as possible.”

Uh huh. Imagine a development manager responsible for a medical device saying, “We don’t have time for the developers to learn what they’re developing. We want them up to speed as quickly as possible. (And, as we all know, programming is really just banging on keys.)”

The error in this line of thinking is that testing is about pushing buttons; producing widgets on a production line; flipping testburgers. If you treat testing as flipping testburgers, then there’s a risk that testers will flip whatever vaguely burger-shaped thing comes their way… burgers, frisbees, cow pies, hockey pucks… You may not get the burger you want.

If you think of testing as an investigation of the product, testers must be investigators, and skillful ones at that. Upon engaging with the product and the project, testers set to learning about the product they’re investigating and the domain in which it operates. Testers keep excellent lab notes and document their work carefully, but not to the degree that documentation displaces the goal of testing the system and finding problems in it. Testers are focused on risk, and trained to be aware of problems that they might encounter as they’re testing (per CFR Title 21 Part 820.25 (b)(2)) .

If they’re not sufficiently skilled when you hire them, you’ll supervise and train them until they are. And if they’re unskilled and can’t be trained… are you really sure you want them testing a device that could deliver Killing Energy?

How else might you guide testing work, whether in projects in regulated contexts or not? That’s a topic for next time.

Breaking the Test Case Addiction (Part 3)

Thursday, January 17th, 2019

In the previous post, “Frieda”, my coaching client, asked about producing test cases for auditors or regulators. In Rapid Software Testing (RST), we find it helpful to frame that in terms of formal testing.

Testing is formal to the degree that it must be done in a specific way, or to verify specific facts. Formal testing typically has the goal of confirming or demonstrating something in particular about the product. There’s a continuum to testing formality in RST. My version, a tiny bit different from James Bach‘s, looks like this:

Some terminology notes: checking is the process of operating and observing a product; applying decision rules to those observations; and then reporting on the outcome of those rules; all mechanistically, algorithmically. A check can be turned into a formally scripted process that can be performed by a human or by a machine.

Procedurally scripted test cases are instances of human checking, where the tester is being substantially guided by what the script tells her to do. Since people are not machines and don’t stick to the algorithms, people are not checking in the strictest sense of our parlance.

A human transceiver is someone doing things based only on the instructions of some other person, behaving as that person’s eyes, ears, and hands.

Machine checking is the most formal mode of testing, in that machines perform checks in entirely specific ways, according to a program, entirely focused on specific facts. The motivation to check doesn’t come from the machine, but from some person. Notice that programs are formal, but programming is an informal activity. Toolsmiths and people who develop automated checks are not following scripts themselves.

The degree to which you formalize is a choice, based on a number of context factors. Your context guides your choices, and both of those evolve over time.

One of the most important context factors is your mission. You might be in a regulated environment, where regulators and auditors will eventually want you to demonstrate specific things about the product and the project in a highly formal way. If you are in that context, keeping the the auditors and the regulators happy may require certain kinds of formal testing. Nonetheless, even in that context, you must perform informal testing—lots of it—for at least two big reasons.

The first big reason is to learn the about the product and its context to prepare for excellent formal testing that will stand up to the regulators’ scrutiny. This is tied to another context factor: where you are in the life of the project and your understanding of the product.

Formal testing starts with informal work that is more exploratory and tacit, with the goal of learning; less scripted and explicit, with the goal of demonstrating. All the way along, but especially in between those poles, we’re searching for problems. No less than the Food and Drug Administration emphasizes how important this is.

Thorough and complete evaluation of the device during the exploratory stage results in a better understanding of the device and how it is expected to perform. This understanding can help to confirm that the intended use of the device will be aligned with sponsor expectations. It also can help with the selection of an appropriate pivotal study design.

Section 5: The Importance of Exploratory Studies in Pivotal Study Design
Design Considerations for Pivotal Clinical Investigations for Medical Devices
Guidance for Industry, Clinical Investigators, Institutional Review Boards
and Food and Drug Administration Staff

The pivotal stage of device development, says the FDA, focuses on developing what people need to know to evaluate the safety and effectiveness of a product. The pivotal stage usually consists of one or more pivotal studies. In other words, the FDA acknowledges that development happens in loops and cycles; that development is an iterative process.

James Bach emphasized this in his talk The Dirty Secret of Formal Testing and it’s an important point in RST. Development is an iterative process because at the beginning of any cycle of work, we don’t know for sure what all the requirements are; what they mean; what we can get; and how we might decide that we’ve got it. We don’t really know that until we’ve until we’ve tested the product… and we don’t know how to test the product until we’ve tried to test the product!

Just like developing automated checks, developing formally scripted test cases is an informal process. You don’t follow a script when you’re interpreting a specification; when you’re having a conversation with a developer or a designer; when you’re exploring the product and the test space to figure out where checking might be useful or important. You don’t follow a script when you recognize a new way of using tools to learn something about the product, and apply them. And you don’t follow a script when you investigate bugs that you’ve found—either during informal testing or the formal testing that might follow it.

If you try to develop formal procedural test cases without testing the actual product, they stand a good chance of being out of sync with it. The dirty secret of format testing is that all good formal testing begins with informal testing.

It might be a very good idea for programmers to develop some automated checks that helps them with the discipline of building clean code and getting rapid feedback on it. It’s also a good idea for developers, designers, testers, and business people to develop clear ideas about intentions for a product, envisioning success. It might also be a good idea to develop some automated checks above the unit level and apply them to the build process—but not too many and certainly not too early. The beginning of the work is usually a terrible time for excessive formalization.

Which brings us to the second big reason to perform informal testing continuously throughout any project: to address the risk that our formal testing to date will fail to reveal how the product might disappoint customers; lose someone’s money; blow something up; or hurt or kill people. We must be open to discovery, and to performing the testing and investigation that supports it, all the way throughout the project, because neither epiphanies nor bugs follow scripts or schedules.

The overarching mission of testing is focused on a question: “are there problems that threaten the value of the product, or the on-time, successful completion of our work?” That’s not a question that formal testing can ever answer on its own. Fixation on automated checks or test cases runs the risk of displacing time for experimentation, exploration, discovery, and learning.

Next time, we’ll look at an example of breaking test case addiction on a real medical device project. Stay tuned.

Breaking the Test Case Addiction (Part 2)

Wednesday, January 16th, 2019

Last time out, I was responding to a coaching client, a tester who was working in an organization fixated on test cases. Here, I’ll call her Frieda. She had some more questions about how to respond to her managers.

What if they want another tester to do your tests if you are not available?

“‘Your tests’, or ‘your testing’?”, I asked.

From what I’ve heard, your tests. I don’t agree with this but trying to see it from their point of view, said Frieda.

I wonder what would happen if we asked them “What happens when you want another manager to do your managing if you are not available?” Or “What happens when you want another programmer to do programming if the programmer is not available?” It seems to me that the last thing they would suggest would be a set of management cases, or programming cases. So why the fixation on test cases?

Fixation is excessive, obsessive focus on something to the exclusion of all else. Fixation on test cases displaces people’s attention from other important things: understanding of how the testing maps to the mission; whether the testers have sufficient skill to understand and perform the testing; the learning comes from testing and that feeds back into more testing; whether formalization is premature or even necessary…

A big problem, as I suggested last time, is a lack of managers’ awareness of alternatives to test cases. That lack of awareness feeds into a lack of imagination, and then loops back into a lack of awareness. What’s worse is that many testers suffer from the same problem, and therefore can’t help to break the loop. Why do managers keep asking for test cases? Because testers keep providing them. Why do testers keep providing them? Because managers keep asking for them, because testers keep providing them…, and the cycle continues.

That cycle also continues because there’s an attractive, even seductive, aspect to test cases: they can make testing appear legible. Legibility, as Venkatesh Rao puts it beautifully here, “quells the anxieties evoked by apparent chaos”.

Test cases help to make the messy, complex, volatile landscape of development and testing seem legible, readable, comprehensible, quantifiable. A test case either fails (problem!) or passes (no problem!). A test case makes the tester’s behaviours seem predictable and clear, so clear that the tester could even be replaced by a machine. At the beginning of the project, we develop 782 test cases. When we’ve completed 527 of them, the testing is 67.39% done!

Many people see testing as rote, step-by-step, repetitive, mechanical keypressing to demonstrate that the product can work. That gets emphasized by the domain we’re in: one that values the writing of programs. If you think keypressing is all there is to it, it makes a certain kind of sense to write programs for a human to follow so that you can control the testing.

Those programs become “your tests”. We would call those “your checks—where checking is the mechanistic process of applying decision rules to observations of the software.

On the other hand, if you are willing to recognize and accept testing as a complex, cognitive investigation of products, problems, and risks, your testing is a performance. No one else can do just as you do it. No one can do again just what you’ve done before. You yourself will never do it the same way twice. If managers want people to do “your testing” when you’re not available, it might be more practical and powerful to think of it as “performing their investigation on something you’ve been investigating”.

Investigation is structured and can be guided, but good investigation can’t be scripted. That’s because in the course of a real investigation, you can’t be sure of what you’re going to find and how you’re going to respond to it. Checking can be algorithmic; the testing that surrounds and contains checking cannot.

Investigation can be influenced or guided by plenty of things that are alternatives to test cases:

Last time out, I mentioned almost all of these as things that testers could develop while learning about the product or feature. That’s not a coincidence. Testing happens in tangled loops and spirals of learning, analysis, exploration, experimentation, discovery, and investigation, all feeding back into each other. As testing proceeds, these artifacts and—more importantly—the learning they represent can be further developed, expanded, refined, overproduced, put aside, abandoned, recovered, revisited…

Testers can use artifacts of these kinds as evidence of testing that has been done, problems that have been found, and learning that has happened. Testers can include these artifacts in test reports, too.

But what if you’re in an environment where you have to produce test cases for auditors or regulators?

Good question. We’ll talk about that next time.

Breaking the Test Case Addiction (Part 1)

Tuesday, January 15th, 2019

Recently, during a coaching session, a tester was wrestling with something that was a mystery to her. She asked:

Why do some tech leaders (for example, CTOs, development managers, test managers, and test leads) jump straight to test cases when they want to provide traceability, share testing efforts with stakeholders, and share feature knowledge with testers?

I’m not sure. I fear that most of the time, fixation on test cases is simply due to ignorance. Many people literally don’t know any other way to think about testing, and have never bothered to try. Alarmingly, that seems to apply not only to leaders, but to testers, too. Much of the business of testing seems to limp along on mythology, folklore, and inertia.

Testing, as we’ve pointed out (many times), is not test cases; testing is a performance. Testing, as we’ve pointed out, is the process of learning about a product through exploration and experimentation, which includes to some degree questioning, studying, modeling, observation, inference, etc. You don’t need test cases for that.

The obsession with procedurally scripted test cases is painful to see, because a mandate to follow a script removes agency, turning the tester into a robot instead of an investigator. Overly formalized procedures run a serious risk of over-focusing testing and testers alike. As James Bach has said, “testing shouldn’t be too focused… unless you want to miss lots of bugs.”

There may be specific conditions, elements of the product, notions of quality, interactions with other products, that we’d like to examine during a test, or that might change the outcome of a test. Keeping track of these could be very important. Is a procedurally scripted test case the only way to keep track? The only way to guide the testing? The best way? A good way, even?

Let’s look at alternatives for addressing the leaders’ desires (traceability, shared knowledge of testing effort, shared feature knowledge).

Traceability. It seems to me that the usual goal of traceability is be able to narrate and justify your testing by connecting test cases to requirements. From a positive perspective, it’s a good thing to make those connections to make sure that the tester isn’t wasting time on unimportant stuff.

On the other hand, testing isn’t only about confirming that the product is consistent with the requirements documents. Testing is about finding problems that matter to people. Among other things, that requires us to learn about things that the requirements documents get wrong or don’t discuss at all. If the requirements documents are incorrect or silent on a given point, “traceable” test cases won’t reveal problems reliably.

For that reason, we’ve proposed a more powerful alternative to traceability: test framing, which is the process of establishing and describing the logical connections between the outcome of the test at the bottom and the overarching mission of testing at the top.

Requirements documents and test cases may or may not appear in the chain of connections. That’s okay, as long as the tester is able to link the test with the testing mission explicitly. In a reasonable working environment, much of the time, the framing will be tacit. If you don’t believe that, pause for a moment and note how often test cases provide a set of instructions for the tester to follow, but don’t describe the motivation for the test, or the risk that informs it.

Some testers may not have sufficient skill to describe their test framing. If that’s so, giving test cases to those testers papers over that problem in an unhelpful and unsustainable way. A much better way to address the problem would, I believe, would be to train and supervise the testers to be powerful, independent, reliable agents, with freedom to design their work and responsibility to negotiate it and account for it.

Sharing efforts with stakeholders. One key responsibility for a tester is to describe the testing work. Again, using procedurally scripted test cases seems to be a peculiar and limited means for describing what a tester does. The most important things that testers do happen inside their heads: modeling the product, studying it, observing it, making conjectures about it, analyzing risk, designing experiments… A collection of test cases, and an assertion that someone has completed them, don’t represent the thinking part of testing very well.

A test case doesn’t tell people much about your modeling and evaluation of risk. A suite of test cases doesn’t either, and typical test cases certainly don’t do so efficiently. A conversation, a list, an outline, a mind map, or a report would tend to be more fitting ways of talking about your risk models, or the processes by which you developed them.

Perhaps the worst aspect of using test cases to describe effort is that tests—performances of testing activity—become reified, turned into things, widgets, testburgers. Effort becomes recast in terms of counting test cases, which leads to no end of mischief.

If you want people to know what you’ve done, record and report on what you’ve done. Tell the testing story, which is not only about the status of the product, but also about how you performed the work, and what made it more and less valuable; harder or easier; slower or faster.

Sharing feature knowledge with testers. There are lots of ways for testers to learn about the product, and almost all of them would foster learning better than procedurally scripted test cases. Giving a tester a script tends to focus the tester on following the script, rather than learning about the product, how people might value it, and how value might be threatened.

If you want a tester to learn about a product (or feature) quickly, provide the tester with something to examine or interact with, and give the tester a mission. Try putting the tester in front of

  • the product to be tested (if that’s available)
  • an old version of the product (while you’re waiting for a newer one)
  • a prototype of the product (if there is one)
  • a comparable or competitive product or feature (if there is one)
  • a specification to be analyzed (or compared with the product, if it’s available)
  • a requirements document to be studied
  • a standard to review
  • a user story to be expanded upon
  • a tutorial to walk through
  • a user manual to digest
  • a diagram to be interpreted
  • a product manager to be interviewed
  • another tester to pair with
  • a domain expert to outline a business process

Give the tester the mission to learn something based on one or more of these things. Require the tester to take notes, and then to provide some additional evidence of what he or she learned.

(What if none of the listed items is available? If none of that is available, is any development work going on at all? If so, what is guiding the developers? Hint: it won’t be development cases!)

Perhaps some people are concerned not that there’s too little information, but too much. A corresponding worry might be that the available information is inconsistent. When important information about the product is missing, or unclear, or inconsistent, that’s a test result with important information about the project. Bugs breed in those omissions or inconsistencies.

What could be used as evidence that the tester learned something? Supplemented by the tester’s notes, the tester could

  • have a conversation with a test lead or test manager
  • provide a report on the activities the tester performed, and what the tester learned (that is, a test report)
  • produce a description of the product or feature, bugs and all (see The Honest Manual Writer Heuristic)
  • offer proposed revisions, expansions, or refinements of any of the artifacts listed above
  • identify a list of problems about the product that the tester encountered
  • develop a list of ways in which testers might identify inconsistencies between the product and something desirable (that is, a list of useful oracles)
  • report on a list of problems that the tester had in fulfilling the information mission
  • in a mind map, outline a set of ideas about how the tester might learn more about the product (that is, a test strategy)
  • list out a set of ideas about potential problems in the product (that is, a risk list)
  • develop a set of ideas about where to look for problems in product (that is, a product coverage outline)

Then review the tester’s work. Provide feedback, coaching and mentoring. Offer praise where the tester has learned something well; course correction where the tester hasn’t. Testers will get a lot more from this interactive process than from following step-by-step instructions in a test case.

My coaching client had some more questions about test cases. We’ll get to those next time.

How is the testing going?

Thursday, February 8th, 2018

Last week on Twitter, I posted this:

“The testing is going well.” Does this mean the product is in good shape, or that we’re obtaining good coverage, or finding lots of bugs? “The testing is going badly.” The product is in good shape? Testing is blocked? We’re noting lots of bugs erroneously?

People replied offering their interpretations. That wasn’t too surprising. Their interpretations differed; that wasn’t too surprising either. I was more surprised at how many people seemed to believe that there was a single basis on which we could say “the testing is going well” or “the testing is going badly”—along with the implicit assumption that people would automatically understand the answer.

To test is—among many other things—to construct, edit, narrate, and justify a story. Like any really good story, a testing story involves more than a single subject. A excellent, expert testing story has at least three significant elements, three plot lines that weave around each other like a braid. Miss one of those elements, and the chance of misinterpretation skyrockets. I’ve talked about this before, but it seems it’s time for a reminder.

In Rapid Software Testing, we emphasize the importance of a testing story with three strands, each of which is its own story.

We must tell a story about the product and its status. As we have tested, we have learned things about the product: what it is, what it does, how it works, how it doesn’t work, and how it might not work in ways that matter to our various clients. The overarching reason that most clients hire testers is to learn about problems that threaten the value of the product, so bugs—actual, manifest problems—tend to lead in the product story.

Risks—unobserved but potential problems—figure prominently in the product story too. From a social perspective, good news about the product is easier to deliver, and it does figure in a well-rounded report about the product’s state. But it’s the bad news—and the potential for more of it—that requires management attention.

We must tell a story about the testing. If we want management to trust us, our product story needs a warrant. Our product story becomes justified and is more trustworthy when we can describe how we configured, operated, observed, and evaluated the product. Part of this second strand of the testing story involves describing the ways in which we recognized problems; our oracles. Another part of this strand involves where we looked for problems; our coverage.

It’s important to talk about what we’ve covered with our testing. It may be far more important to talk about what we haven’t covered yet, or won’t cover at all unless something changes. Uncovered areas of the test space may conceal bugs and risks worse than any we’ve encountered so far.

Since we have limited time and resources, we must make choices about what to test. It’s our responsibility to make sure that our clients are aware of those choices, how we’re making them, and why we’re making them. We must highlight potentially important testing that hasn’t been done. When we do that, our clients can make informed decisions about the risks of leaving areas of the product untested—or provide the direction and resources to make sure that they do get tested.

We must tell a story about how good the testing is. If the second strand of the testing story supports the first, this third strand supports the second. Here it’s our job to describe why our testing is the most fabulous testing we could possibly do—or to the degree that it isn’t, why it isn’t, and what we need or recommend to make it better.

In particular, we must describe the issues that present obstacles to the fastest, least expensive, most powerful testing we can do. In the Rapid Software Testing namespace, a bug is a problem that threatens the value of the product; an issue is a problem that threatens the value of the testing. (Some people say “issue” for what we mean by “bug”, and “concern” for what we mean by “issue”. The labels don’t matter much, as long people recognize that there may be problems that get in the way of testing, and bring them to management’s attention.)

A key element in this third strand of the testing story is testability. Anything that makes testing harder, slower, or weaker gives bugs more time and more opportunity to survive undetected. Managers need to know about problems that impede testing, and must make management decisions to address them. As testers, we’re obliged to help managers make informed decisions.

On an unhappy day, some time in the future, when a manager asks “Why didn’t you find that bug?”, I want to be able to provide a reasonable response. For one thing, it’s not only that I didn’t notice the bug; no one on the team noticed the bug. For another, I want to be able to remind the manager that, during development, we all did our best and that we collaboratively decided where to direct our attention in testing and testability. Without talking about testing-related issues during development, those decisions will be poorly informed. And if we missed bugs, I want to make sure that we learn from whatever mistakes we’ve made. Allowing issues to remain hidden might be one of those mistakes.

In my experience, testers tend to recognize the importance of the first strand—reporting on the status of the product. It’s not often that I see testers who are good at the second strand—modeling and describing their coverage. Worse, I almost never encounter test reports in which testers describe what hasn’t been covered yet or will not be covered at all; important testing not done. As for the third strand, it seems to me that testers are pretty good at reporting problems that threaten the value of the testing work to each other. They’re not so good, alas, at reporting those problems to managers. Testers also aren’t necessarily so good at connecting problems with the testing to the risk that we’ll miss important problems in the product.

Managers: when you want a report from a tester and don’t want to be misled, ask about all three parts of the story. “How’s the product doing?” “How do you know? What have you covered, and what important testing hasn’t been done yet?” “Why should we be happy with the testing work? Why should we be concerned? What’s getting in the way of your doing the best testing you possibly could? How can we make the testing go faster, more easily, more comprehensively?”

Testers: when people ask “How is the testing going?”, they may be asking about any of the three strands in the testing story. When we don’t specify what we’re talking about, and reply with vague answers like “the testing is going well”, “the testing is going badly”, the person asking may apply the answer to the status of the product, the test coverage, or the quality of the testing work. The report that they hear may not be the report that we intended to deliver. To be safe, even when you answer briefly, make sure to provide a reply that touches on all three strands of the testing story.

Deeper Testing (2): Automating the Testing

Saturday, April 22nd, 2017

Here’s an easy-to-remember little substitution that you can perform when someone suggests “automating the testing”:

“Automate the evaluation
and learning
and exploration
and experimentation
and modeling
and studying of the specs
and observation of the product
and inference-drawing
and questioning
and risk assessment
and prioritization
and coverage analysis
and pattern recognition
and decision making
and design of the test lab
and preparation of the test lab
and sensemaking
and test code development
and tool selection
and recruiting of helpers
and making test notes
and preparing simulations
and bug advocacy
and triage
and relationship building
and analyzing platform dependencies
and product configuration
and application of oracles
and spontaneous playful interaction with the product
and discovery of new information
and preparation of reports for management
and recording of problems
and investigation of problems
and working out puzzling situations
and building the test team
and analyzing competitors
and resolving conflicting information
and benchmarking…”

And you can add things to this list too. Okay, so maybe it’s not so easy to remember. But that’s what it would mean to automate the testing.

Use tools? Absolutely! Tools are hugely important to amplify and extend and accelerate certain tasks within testing. We can talk about using tools in testing in powerful ways for specific purposes, including automated (or “programmed“) checking. Speaking more precisely costs very little, helps us establish our credibility, and affords deeper thinking about testing—and about how we might apply tools thoughtfully to testing work.

Just like research, design, programming, and management, testing can’t be automated. Trouble arises when we talk about “automated testing”: people who have not yet thought about testing too deeply (particularly naïve managers) might sustain the belief that testing can be automated. So let’s be helpful and careful not to enable that belief.

Drop the Crutches

Thursday, January 5th, 2017

This post is adapted from a recent blast of tweets. You may find answers to some of your questions in the links; as usual, questions and comments are welcome.

Update, 2017-01-07: In response to a couple of people asking, here’s how I’m thinking of “test case” for the purposes of this post: Test cases are formally structured, specific, proceduralized, explicit, documented, and largely confirmatory test ideas. And, often, excessively so. My concern here is directly proportional to the degree to which a given test case or a given test strategy emphasizes these things.

crutches

I had a fun chat with a client/colleague yesterday. He proposed—and I agreed—that test cases are like crutches. I added that the crutches are regularly foisted on people who weren’t limping to start with. It’s as though before the soccer game begins, we hand all the players a crutch. The crutches then hobble them.

We also agreed that test cases often lead to goal displacement. Instead of a thorough investigation of the product, the goal morphs into “finish the test cases!” Managers are inclined to ask “How’s the testing going?” But they usually don’t mean that. Instead, they almost certainly mean “How’s the product doing?” But, it seems to me, testers often interpret “How’s the testing going?” as “Are you done those test cases?”, which ramps up the goal displacement.

Of course, “How’s the testing going?” is an important part of the three-part testing story, especially if problems in the product or project are preventing us from learning more deeply about the product. But most of the time, that’s probably not the part of story we want to lead with. In my experience, both as a program manager and as a tester, managers want to know one thing above all:

Are there problems that threaten the on-time, successful completion of the project?

The most successful and respected testers—in my experience—are the ones that answer that question by actively investigating the product and telling the story of what they’ve found. The testers that overfocus on test cases distract themselves AND their teams and managers from that investigation, and from the problems investigation would reveal.

For a tester, there’s nothing wrong with checking quickly to see that the product can do something—but there’s not much right—or interesting—about it either. Checking seems to me to be a reasonably good thing to work into your programming practice; checks can be excellent alerts to unwanted low-level changes. But demonstration—showing that the product can work—is different from testing—investigating and experimenting to find out how it does (or doesn’t) work in a variety of circumstances and conditions.

Sometimes people object saying that they have to confirm that the product works and that they have don’t have time to investigate. To me, that’s getting things backwards. If you actively, vigorously look for problems and don’t find them, you’ll get that confirmation you crave, as a happy side effect.

No matter what, you must prepare yourself to realize this:

Nobody can be relied upon to anticipate all of the problems that can beset a non-trivial product.

fractalWe call it “development” for a reason. The product and everything around it, including the requirements and the test strategy, do not arrive fully-formed. We continuously refine what we know about the product, and how to test it, and what the requirements really are, and all of those things feed back into each other. Things are revealed to us as we go, not as a cascade of boxes on a process diagram, but more like a fractal.

The idea that we could know entirely what the requirements are before we’ve discussed and decided we’re done seems like total hubris to me. We humans have a poor track record in understanding and expressing exactly what we want. We’re no better at predicting the future. Deciding today what will make us happy ten months—or even days—from now combines both of those weaknesses and multiplies them.

For that reason, it seems to me that any hard or overly specific “Definition of Done” is antithetical to real agility. Let’s embrace unpredictability, learning, and change, and treat “Definition of Done” as a very unreliable heuristic. Better yet, consider a Definition of Not Done Yet: “we’re probably not done until at least These Things are done”. The “at least” part of DoNDY affords the possibility that we may recognize or discover important requirements along the way. And who knows?—we may at any time decide that we’re okay with dropping something from our DoNDY too. Maybe the only thing we can really depend upon is The Unsettling Rule.

Test cases—almost always prepared in advance of an actual test—are highly vulnerable to a constantly shifting landscape. They get old. And they pile up. There usually isn’t a lot of time to revisit them. But there’s typically little need to revisit many of them either. Many test cases lose relevance as the product changes or as it stabilizes.

Many people seem prone to say “We have to run a bunch of old test cases because we don’t know how changes to the code are affecting our product!” If you have lost your capacity to comprehend the product, why believe that you still comprehend those test cases? Why believe that they’re still relevant?

Therefore: just as you (appropriately) remain skeptical about the product, remain skeptical of your test ideas—especially test cases. Since requirements, products, and test ideas are subject to both gradual and explosive change, don’t overformalize or otherwise constrain your testing to stuff that you’ve already anticipated. You WILL learn as you go.

Instead of overfocusing on test cases and worrying about completing them, focus on risk. Ask “How might some person suffer loss, harm, annoyance, or diminished value?” Then learn about the product, the technologies, and the people around it. Map those things out. Don’t feel obliged to be overly or prematurely specific; recognize that your map won’t perfectly match the territory, and that that’s okay—and it might even be a Good Thing. Seek coverage of risks and interesting conditions. Design your test ideas and prepare to test in ways that allow you to embrace change and adapt to it. Explain what you’ve learned.

Do all that, and you’ll find yourself throwing away the crutches that you never needed anyway. You’ll provide a more valuable service to your client and to your team. You and your testing will remain relevant.

Happy New Year.

Further reading:

Testing By Percentages
Very Short Blog Posts (11): Passing Test Cases
A Test is a Performance
Test Cases Are Not Testing: Toward a Culture of Test Performance” by James Bach & Aaron Hodder (in http://www.testingcircus.com/documents/TestingTrapeze-2014-February.pdf#page=31)

As Expected

Tuesday, April 12th, 2016

This morning, I started a local backup. Moments later, I started an online backup. I was greeted with this dialog:

Looks a little sparse. Unhelpful. But there is that “More details” drop-down to click on. Let’s do that.

Ah. Well, that’s more information. But it’s confusing and unhelpful, but I suppose it holds the promise of something more helpful to come. I notice that there’s a URL, but that it’s not a clickable link. I notice that if the dialog means what it says, I should copy those error codes and be ready to paste them into the page that comes up. I can also infer that there’s not local help for these error codes. Well, let’s click on the Knowledge Base button.

Oh. The issue is that another backup is running, and starting a second one is not allowed.

As a tester, I wonder how this was tested.

Was an automated check programmed to start a backup, start a second backup, and then query to see if a dialog appeared with the words “Failed to run now: task not executed” in it? If so, the behaviour is as expected, and the check passed.

Was an automated check programmed to start a backup, start a second backup, and then check for any old dialog to appear? If so, the behaviour is as expected, and the check passed.

Was a test script given to a tester that included the instruction to start a backup, start a second backup, and then check for a dialog to appear, including the words “Failed to run now: task not executed”? Or any old dialog that hinted at something? If so, the behaviour is as expected, and the “manual” test passed.

Here’s what that first dialog could have said: “A backup is in progress. Please wait for that backup to complete before starting another.”

At this company, what is the basic premise for testing? When testing is designed, and when results are interpreted, is the focus on confirming that the product “works as expected”? If so, and if the expectations above are met, no bug will be noticed. To me, this illustrates the basic bankruptcy of testing to confirm expectations; to “make sure the tests all pass”; to show that the product “meets requirements”. “Meets requirements”, in practice, is typically taken to mean “is consistent with statements in a requirements document, however misbegotten those statements might be”.

Instead of confirmation, “pass or fail”, “meets the requirements (documents)” or “as expected”, let’s test from the perspective of two questions: “Is there a problem here?” and “Are we okay with this?” As we do so, let’s look at some of the observations that we might make were and questions we might ask. (Notice that I’m doing this without reference to a specification or requirements document.)

Upon starting a local backup and then attempting to start an online backup, I observe this dialog.

I am surprised by the dialog. My surprise is an oracle, a means by which I might recognize a problem. Why am I surprised? Is there a problem here?

I had a desire to create a local backup and an online backup at the same time. On a multi-tasking, multi-threaded operating system, that desire seems reasonable to me, and I’m surprised that it didn’t happen.

Inconsistency with reasonable user desire is an oracle principle, linked to quality criteria that might include capability, usability, performance, and charisma. The product apparently fails to fulfill quality criteria that, in my opinion, a reasonable user might have. Of course, as a tester, I don’t run the project. So I must ask the designer, or the developer, or the product manager: Are we okay with this?

This might be exactly the dialog that has been programmed to appear under this condition—whatever the condition is. I don’t know that condition, though, because the dialog doesn’t tell me anything specific about the problem that the software is having with fulfilling my desire. So I’m somewhat frustrated, and confused. Is there a problem here?

I can’t explain or even understand what’s going on, other than the fact that my desire has been thwarted. My oracle—pointing to a problem—is inconsistency with explainability, in addition to inconsistency with my desires. So I’m seeing a potential problem not only with the product’s behaviour, but also in the dialog. Are we okay with this?

Maybe more information will clear that up.

Still nothing more useful here. All I see is a bunch of error codes; no further explanation of why the product won’t do what I want. I remain frustrated, and even more confused than before. In fact, I’m getting annoyed. Is there a problem here?

One key purpose of a dialog is to provide a user with useful information, and the product seems inconsistent with that (the inconsistency-with-purpose oracle). Are these codes correct? Maybe these error codes are wildly wrong. If they are, that would be a problem too. If that’s the case, I don’t have a spec available, so that’s a problem I’m simply going to miss. Are we okay with that?

I have to accept that, as a human being, there are some problems I’m going to miss—although, if I were testing this in-house, there are things I could do to address the gaps in my knowledge and awareness. I could note the codes and ask the developer about them; or I could ask for a table of the available codes. (Oh… no one has collected a comprehensive listing of the error codes; they’re just scattered through the product’s source code. Are we okay with this?)

Back to the dialog. Maybe those error codes are precisely correct, but they’re not helping me. Are we okay with this?

All right, so there’s that Knowledge Base button. Let’s try it. When I click on the button, this appears:

Let’s look at this in detail. I observe the title: 32493: Acronis True Image: “Failed to run now: task not executed.” That’s consistent with the message that was in the dialog. I notice the dates; something like this has been appeared in the knowledgebase for a while. In that sense, it seems that the product is consistent with its history, but is that a desirable consistency? Is there a problem here?

The error codes being displayed on this Web page seem consistent with the error codes in the dialog, so if there’s a problem with that, I don’t see it. Then I notice the line that says “You cannot run two tasks simultaneously.” Reading down over a long list of products, and through the symptoms, I observe that the product is not intended to perform two tasks simultaneously. The workaround is to wait until the first task is done; then start the second one. In that sense, the product indeed “works as expected”. And yet…are we okay with this?

Once again, it seems to me that attempting to start a second task could be a reasonable user desire. The product doesn’t support that, but maybe we’re okay with that. Yet is there a problem here?

The product displays a terse, cryptic error message that confuses and annoys the user without fulfilling its apparent intended purpose to inform the user of something. The product sends the user to the Web (not even to a local Help file!) to find that the issue is an ordinary, easily anticipated limitation of the program. It does look kind of amateurish to deal with this situation in this convoluted way, instead of simply putting the relevant information in the initial dialog. Is there a problem here?

I believe that this behaviour is inconsistent with an image that the company might reasonably want to project. The behaviour is also inconsistent with the quality criteria we call usability and charisma. A usable product is one that behaves in a way that allows the user to accomplish a task (including dealing with the product’s limitations) quickly and smoothly. A charismatic product is one that does its thing in an elegant way; that engages the user instead of irritating the user; that doesn’t make the development group look silly; that doesn’t prompt a blog post from a customer highlighting the silliness.

So here’s my bug report. Note that I don’t mention expectations, but I do talk about desires, and I cite two oracles. The title is “Unhelpful dialog inconsistent with purpose.” The body would say “Upon attempting to start a second backup while one is in progress, a dialog appears saying ‘Failed to run now: task not executed.’ While technically correct, this message seems inconsistent with the purpose of informing the user that we can’t perform two backup tasks at once. The user is then sent to the (online) knowledge base to find this out. This also seems inconsistent with the product’s image of giving the user a seamless, reliable experience. Is all this desired behaviour?”

Finally: it could be that the testers discovered all of these problems, and laid them out for the the product’s designers, developers, and managers, just as I’ve done here. And maybe the reports were dismissed because the product works “as expected”. But “as expected” doesn’t mean “no problem”. If I can’t trust a backup product to post a simple, helpful dialog, can I really trust it to back up my data?