Blog Posts for the ‘Documentation’ Category

Breaking the Test Case Addiction (Part 10)

Monday, June 8th, 2020

This post serves two purposes. It is yet another installation in The Series That Ate My Blog; and it’s a kind of personal exploration of work in progress on the Rapid Software Testing Guide to Test Reporting. Your feedback and questions on this post will help to inform the second project, so I welcome your comments.

As a tester, your mission is to evaluate the product and report on its status, typically with a special emphasis on finding problems that matter. We’ve discussed bug reporting in the Rapid Testing Guide to Making Good Bug Reports. In this installment of Breaking the Test Case Addiction, I’m describing test reporting as something that responsible testers do.

Sounds straightforward, right? But right away, I want to address the risk of misunderstanding, so let me clear up what I mean by certain terms here.

Responsible Testers
Responsible testers are people who assume the role of tester on a project, and who commit themselves to doing that job well over time. Supporting testers (which we used to call “helpers”) help the test effort temporarily or intermittently, but are not committed to the testing role. Supporting testers are generally not required to report on their testing work to the same degree as responsible testers are.

Test Project
In this post, when I say test project, I’m referring to any set of activities focused on testing of any product or service, or any part of it: a low-level unit, a function, a component, a feature, a story, a service, an entire system… A test project can contain lots of little test projects. Accordingly, depending on the level of granularity we’re referring to, a test project might happen over moments or minutes, days, weeks, or months. A report on a test project might cover similar spans of time—instants, episodes, sprints, releases…

“Test project” here could refer to something that happens outside of development. More typically, it refers to testing activity that happens inside a development project, in parallel with the other aspects of development, like design, programming, or other testing.

Product
When I say product here, I mean anything that anyone has produced that might be subject to testing. While that includes running code, “product” could include code that is not running yet; prototypes and mockups; specifications and other requirement documents; flowcharts, diagrams, or state models; user documentation; sales and marketing material; or ideas about any of those things. When we refer to testing activity pointed at things that are static, like most of the items in the preceding list, we usually call it “review”; we might also call it “performing a thought experiment”. Review is a kind of testing activity that may be closely or distantly associated with performing a test—which brings us to what we mean by “testing”.

Testing, Test Activities, and Review
When I say testing here, I am using the Rapid Software Testing definition. To us, testing is the process of evaluating a product by learning about it through experiencing, exploring, and experimenting.

Testing includes many activities: questioning, studying, modeling, operating the product, manipulating it, making inferences, analyzing risk, thinking critically, recording the process, reporting on it, etc. Testing activities also include investigating and analyzing bugs and suspicious behaviour. Testing typically includes applying tools to help with any testing activities.

A test is an instance of testing, and to perform a test means to explore, experiment with, and gain experience of a product. In general, to perform a test implies that we will operate and observe a product or its output by some means.

In review, operation of the product as such typically isn’t available. In review, though, we engage in other testing activities as mentioned above. We can’t perform experiments on the running product but, as I mentioned above, we might perform thought experiments on it, imagining interactions between the product and the people using it. Of course, a thought experiment isn’t the same as a real-world experiment; that’s a key difference between review and performing a test.

Why go on about all this? Because reporting is central to our role as testers. We test; we learn; and we report on what we’ve learned.

Are you doing testing work of any kind, or even thinking about doing testing? Then you’ve got a test project on the go, and you can report on its status, even if your report starts with “I haven’t started testing the product yet, but here are some ideas about how we might go about it.”

Report
Next, let’s unpack the idea of a report. A report is a description, explanation, or justification of something. A report is a communication, but a report is not necessarily a document.

Communicating a report might happen as conversation in a hallway, or beside a coffee machine or a water cooler; as a couple of sentences uttered at a stand-up meeting; as a quick mention of a bug in passing to a developer; as a lengthy description of the status of the product and the status of testing at a go-live meeting. A report might be conveyed in writing as a paragraph, a page, or several pages of text; as (heaven help us) a PowerPoint presentation; or as hundreds of pages in bound books, formally presented to a government or regulatory body.

We might include or refer to artifacts collected or produced during the activity that led to the report—the reporter’s raw notes, data sets, program code, design notes for the activity itself. A report might be supplemented with illustrations, charts, graphs, or diagrams, sketched on a whiteboard or formally rendered on glossy paper. Or a report might be accompanied by photographs, audio, video, mind maps, tables, and references to other artifacts.

Test Report
A test report is any description, explanation, or justification of the status of a test project.

A comprehensive test report is all of those things together.

A professional test report is one that is competently, thoughtfully, and ethically designed to serve your clients in their context. A professional test report need not be a comprehensive test report, nor vice versa.

Some might say that a test report is “just the facts”, but it isn’t; it cannot be. A test report is based on facts, but it’s a story about facts—a story framed for the person or people receiving it. Stories always emphasize some things and leave other things out. We never have all the facts, and facts are sometimes in dispute. Stories are always, to some degree, biased by the storyteller and focused by what the storyteller wants the audience to hear, to learn, and to know. Those biases can seen be as problems in the report, features of it, or both.

The audience for your test report might include insiders who are directly involved in the testing and development work; other insiders (who might be overseeing that work, or affected by it without being directly involved); or outsiders.

For now, I’m going to assume your audience is in the first two categories. On that basis, it helps to consider what the audiences for a test report probably wants to know above all else.

They almost certainly don’t want to know about test case counts (although they might think they do).
They almost certainly don’t want to know about pass-fail ratios (although they might think they do).
They almost certainly don’t want to know about when the testing is going to be done (although they might think they do).

(I realize that these claims may sound strange to you. I will address these (non-)desires in a future post.)

Having been a program manager, a developer, and having worked with lots of them, I can tell you what those people almost certainly do want to know:

What is the actual status of the product? Are there problems that threaten the value of the product? Do these problems threaten the on-time, successful completion of our work?

A test report addresses those questions.

Three Aspects of Test Reporting
A good test report braids three strands of story together:

  • a story about the product and its status; what the product is, what it does, how it works, how it doesn’t work, and how it might not work in ways that matter to our various clients. This is a story about bugs, problems, and risks about the product.
  • a story about how the testing was done—how the product story in was obtained; how we configured, operated, observed, and evaluated the product. A thread in this second strand of the testing story involves describing the ways in which we recognized problems; our oracles. Another thread in this strand involves where we looked for problems; our coverage. Yet another thread includes what we haven’t covered yet, or won’t cover at all unless something changes.
  • a story about the quality of the testing work—why the testing that was done can be trusted, or to the degree that it is untrustworthy, issues that present obstacles to the fastest, least expensive, most powerful testing we can do. In this strand, we also identify what we might need or recommend to the testing better, and we may also provide a context and and evaluation of the quality of the report itself.

Most of the time, the client of the testing will be most interested in that first strand. Sometimes the client might be more interested in one of the other two. Nonetheless, whatever form the report might take, the reporter should at least be prepared to address all three strands.

(I’ve written more about this pattern here, here, and here.)

Credibility
If you’re not credible, your reports won’t be taken seriously. In your reporting, you may be delivering surprising or uncomfortable information. Your clients, unconsciously or deliberately, may assume that you’re mistaken or that you’re exaggerating risks, and they may try to micro-manage your reporting. Credibility is an antidote to all this.

To build and maintain credibility, it’s important to actually care about the project and the people on it. It’s important to take your work and your skills seriously, and to demonstrate that seriousness in your attitude, commitments, and behaviour. There will be more to say about this later, but for now…

  • Actually know how to do your job.
  • Gain experience with the product.
  • Study the technology in and around your project.
  • Read all of the relevant requirement, specification, and standards documents carefully, especially when you’re in a regulated environment.
  • Take notes diligently on your own work to inform your reporting.
  • Sweat the details in your own work.
  • Find things to appreciate about the work of others.
  • Acknowledge mistakes, correct them and learn from them.
  • Do not tell lies or exaggerate.

Examples
Note that Part 7 of this series included a number of test reports delivered verbally. Here I’m providing examples of test report documents.

As you survey them, you might want to consider the context for which they’re intended; the reporting levels that they focus on (product, testing, or quality-of-testing); the evidence or references included to support the report; and what the report might need or could leave out.

Note that while a couple of reports refer to specific things to be checked, there is rarely even a mention of test cases. The focus, instead, is usually on bugs or potential problems in the product that represent risk to the value of the product, and therefore risk to the business.

Spot Check Test Report

Click to access mpim-report.pdf


Here is an example of a real, comprehensive, professional test report, prepared by James Bach and edited by me. Over five pages, it describes a paired exploratory testing session that found problems in a real medical device. (The names, nouns and verbs have been changed to shield the identity of the company and the product.)

Cheese Grater Incident Report

Click to access cheesegrater.pdf


This is two reports in one: a whimsical yet serious report on repairing a broken Parmesan cheese dispenser; and a much longer, detailed set of notes on how to perform an investigation and report on it. Indeed, the latter section is a really worthwhile complement to this blog post.

OEW Case Tool

Click to access OEWCaseToolReport.pdf


An example of a two-page summary report (from 1994!) about a computer-aided software engineering (CASE) tool at Borland.

Y2K Compliance Report

Click to access Y2KComplianceReport.pdf


An eight-page report prepared for compliance with Y2K requirements, including notes on strategy; the test approaches that were applied (and risks that prompted those approaches); the results; and a list of specific items that needed to be checked.

OWL Quality Plan

Click to access OWLQualityPlan.pdf


This is a report on proposed plans for testing another Borland product, the Object Windows Library. The report includes a table linking product risks to testing work necessary to investigate those risks. It also includes a listing of components and sub-components in the product.

An Exploratory Tester’s Notebook

Click to access etnotebook.pdf


This paper on recording and reporting includes a report on my spontaneous investigation of an in-flight entertainment system, and a couple of session-based test management session sheets.

A Sticky Situation

Click to access 2012-02-AStickySituation.pdf


This is an example of a form of reporting that’s sometimes called an “information radiator”. It visualizes the status of a test project (and some degree of test coverage) using sticky notes.

The Low-Tech Testing Dashboard

Click to access dashboard.pdf


Of this, James Bach says “Back in 1997, I was challenged by top management to create a way to convey testing status at a glance. Thus was born the “low-tech testing dashboard” which has since been rendered in various electronic, distributed forms. The important thing about the dashboard is that there are no “measurements.” We don’t count anything. Instead there are assessments. These are subjective, yes, but always grounded in evidence.

Who Killed My Battery?

Click to access boneh-www2012.pdf


A splendid research paper on what drains mobile phone batteries… and why. Also a presentation on YouTube: https://www.youtube.com/watch?v=_uv057DP2Vs

Once again, these reports don’t focus test cases, but on testing. They’re examples of powerful and reasonable test reports that offer an alternative to management that is fixated on test cases.

Managers are more likely to relax their obsession with test cases when we provide them with reports that tell the product and testing stories.

Breaking the Test Case Addiction (Part 6)

Tuesday, February 5th, 2019

In the last installment, we ended by asking “Once the tester has learned something about the product, how can you focus a tester’s work without over-focusing it?

I provided some examples in Part 4 of this series. Here’s another: scenario testing. The examples I’ll provide here are based on work done by James Bach and Geordie Keitt several years ago. (I’ve helped several other organizations apply this approach much more recently, but they’re less willing to share details.)

The idea is to use scenarios to guide the tester to explore, experiment, and get experience with the product, acting on ideas about real-world use and about how the product might foreseeably be misused. It’s nice to believe that careful designs, unit testing, BDD, and automated checking will prevent bugs in the product — as they certainly help to do — but to paraphrase Gertrude Stein, experience teaches experience teaches. Pardon my words, but if you want to discover problems that people will encounter in using the product, it might be a good idea to try using the damned product.

The scenario approach that James and Geordie developed uses richer, more elaborate documentation than the one- to three-sentence charters of session-based test management. One goal is to prompt the tester to perform certain kinds of actions to obtain specific kinds of coverage, especially operational coverage. Another goal is to make the tester’s mission more explicit and legible for managers and the rest of the team.

Preparing for scenario testing involves learning about the product using artifacts, conversations, and preliminary forms of test activity (I’ve given examples throughout this series, but especially in Part 1). That work leads into developing and refining the scenarios to cover the product with testing.

Scenarios are typically based around user roles, representing people who might use the product in particular ways. Create at least a handful of them. Identify specifics about them, certainly about the jobs they do and the tasks they perform. You might also want to incorporate personal details about their lives, personalities, temperaments, and conditions under which they might be using the product.

(Some people refer to user roles as “personas”, as the examples below do. A word of caution over a potential namespace clash: what you’ll see below is a relatively lightweight notion of “persona”. Alan Cooper has a different one, which he articulated for design purposes, richer and more elaborate than what you’ll see here. You might seriously consider reading his books in any case, especially About Face (with Reimann, Cronin, and Noessel) and the older The Inmates are Running the Asylum.)

Consider not only a variety of roles, but a variety of experience levels within the roles. People may be new to our product; they may be new to the business domain in which our product is situated; or both. New users may be well or poorly trained, subject to constant scrutiny or not being observed at all. Other users might be expert in past versions of our products, and be irritated or confused by changes we’ve made.

Outline realistic work that people do within their roles. Identify specific tasks that they might want to accomplish, and look for things that might cause problems for them or for people affected by the product. Problems might take the form of harm, loss, or diminished value to some person who matters. Problems might also include feelings like confusion, irritation, frustration, or annoyance.

Remember that use cases or user stories typically omit lots of real-life activity. People are often inattentive, careless, distractable, under pressure. People answer instant messages, look things up on the web, cut and paste stuff between applications. They go outside, ride in elevators, get on airplanes and lose access to the internet; things that we all do every day that we don’t notice. And, very occasionally, they’re actively malicious.

Our product may be a participant in a system, or linked to other products via interfaces or add-ins or APIs. At very least, our product depends on platform elements: the hardware upon which it runs; peripherals to which it might be connected, like networks, printers, or other devices; application frameworks and libraries from outside our organization; frameworks and libraries that we developed in-house, but that are not within the scope of our current project.

Apropos of all this, the design of a set of scenarios includes activity patterns or moves that a tester might make during testing:

  • Assuming the role or persona of a particular user, and performing tasks that the user might reasonably perform.
  • Considering people who are new to the product and/or the domain in which the product operates (testing for problems with ease of learning)
  • Considering people who have substantial experience with the product (testing for problems with ease of use).
  • Deliberately making foreseeable mistakes that a user in a given role might make (testing for problems due to plausible errors).
  • Using lots of functions and features of the product in realistic but increasingly elaborate ways, and that trigger complex interactions between functions.
  • Working with records, objects, or other data elements to cover their entire lifespan: creating, revising, refining, retrieving, viewing, updating, merging, splitting, deleting, recovering… and thereby…
  • Developing rich, complex sets of data for experimentation over periods longer than single sessions.
  • Simulating turbulence or friction that a user might encounter: interruptions, distractions, obstacles, branching and backtracking, aborting processes in mid-stream, system updates, closing the laptop lid, going through a train tunnel…
  • Working with multiple instances of the product, tools, and/or multiple testers to introduce competition, contention, and conflict in accessing particular data items or resources.
  • Giving the product to different peripherals, running it on different hardware and software platforms, connecting it to interacting applications, working in multiple languages (yes, we do that here in Canada).
  • Reproducing behaviours or workflows from comparable or competing products.
  • Considering not only the people using the product, but the people that interact with them; their customers, clients, network support people, tech support people, or managers.

To put these ideas to work at ProChain (a company that produces project management software), James and Geordie developed a scenario playbook.
Let’s look at some examples from it.

The first exhibit is a one-page document that outlines the general protocol for setting up scenario sessions.

PCE Scenario Testing Setup
PCE Scenario Testing General Setup Sheet

This document is an overview that applies to every sessions. It is designed primarily to give managers and supporting testers a brief overview of the process and and how it should be carried out. (A supporting tester is someone who is not a full-time tester, but is performing testing under the guidance and supervision of a responsible tester — an experienced tester, test lead, or a test manager. A responsible tester is expected to have learned and internalized the instructions on this sheet.) There are general notes here for setting up and patterns of activities to be performed during the session.

Testers should be familiar with oracles by which we recognize problems, or should learn about oracles quickly. When this document was developed, there was a list of patterns of consistency with the mnemonic acronym HICCUPP; that’s now FEW HICCUPPS. For any given charter, there may be specific consistency patterns, artifacts, documents, tools, or mechanisms to apply that can help the tester to notice and describe problems.

Here’s an example of a charter for a specific testing mission:

PCE Scenario Testing Example Charter 1
PCE Scenario Testing Example Charter 1

The Theme section outlines the general purpose of the session, as a one- to three- line charter would in session-based test management. The Setup section identifies anything that should be done specifically for this session.

Note that the Activities section offers suggestions that are both specific and open. Openness helps to encourage variation that broadens coverage and helps to keep the tester engaged (“For some tasks…”; “…in some way,…”). The specificity helps to focus coverage (“set the task filter to show at least…”; the list of different ways to update tasks).

The Oracles section identifies specific ways for the tester to look for problems, in addition to more general oracle principles and mechanisms. The Variations section prompts the tester to try ideas that will introduce turbulence, increase stress, or cover more test conditions.

A debrief and a review of the tester’s notes after the session helps to make sure that the tester obtained reasonable coverage.

Here’s another example from the same project:

Here the tester is being given a different role, which requires a different set of access rights and a different set of tasks. In the Activities and Variations section, the tester is encouraged to explore and to put the system into states that cause conflicts and contention for resources.

Creating session sheets like these can be a lot more fun and less tedious than typing out instructions in formally procedurally scripted test cases. Because they focus on themes and test ideas, rather than specific test conditions, the sheets are more compact and easier to review and maintain. If there are specific functions, conditions, or data values that must be checked, they can be noted directly on the sheet — or kept separately with a reference to them in the sheet.

The sheets provide plenty of guidance to the tester while giving him or her freedom to vary the details during the session. Since the tester has a general mission to investigate the product, but not a script to follow, he or she is also encouraged and empowered to follow up on anything that looks unusual or improper. All this helps to keep the tester engaged, and prevents him or her from being hypnotized by a script full of someone else’s ideas.

You can find more details on the development of the scenarios in the section “PCE Scenario Testing” in the Rapid Software Testing Appendices.

Back in our coaching session, Frieda once again picked up the role of the test-case-fixated manager. “If we don’t give them test cases, then there’s nothing to look at when they’re done? How will we know for sure what the tester has covered?”

It might seem as though a list of test cases with check marks beside them would solve the accountability problem — but would it? If you don’t trust a tester to perform testing without a script, can you really trust him to perform testing with one?

There are lots of ways to record testing work: the tester’s personal notes or SBTM session sheets, check marks and annotations on requirements and other artifacts, application log files, snapshot tools, video recording… Combine these supporting materials with a quick debriefing to make sure that the tester is working in professional way and getting the job done. If the tester is new, or a supporting tester, increase training, personal supervision and feedback until he or she gains your trust. And if you still can’t bring yourself to trust them, you probably shouldn’t have them testing for you at all.

Frieda, still in character, replied “Hmmm… I’d like to know more about debriefing.Next time!

Breaking the Test Case Addiction (Part 5)

Tuesday, January 29th, 2019

In our coaching session (which started here), Frieda was still playing the part of a manager who was fixated on test cases—and doing it very well. She played a typical management card: “What about learning about the product? Aren’t test cases a good way to do that?”

In Rapid Software Testing, we say that testing is evaluating a product by learning about it through exploration and experimentation, which includes questioning, modeling, studying, manipulating, making inferences, etc. So learning is an essential part of testing. There are lots of artifacts and people that testers could interact with to start learning about the product, which I’ve discussed already. Let’s look at why making a tester work through test cases might not be such a good approach.

Though test cases are touted as a means of learning about the product, my personal experience is that they’re not very helpful at all for that purpose. Have you ever driven somewhere, being guided by a list of instructions from Google Maps, synthesized speech from a navigation system, or even spoken instructions from another person? My experience is that having someone else direct my actions disconnects me from wayfinding and sensemaking. When I get to my destination, I’m not sure how I got there, and I’m not sure I could find my way back.

If I want to learn something and have it stick, a significant part of my learning must be self-guided. From time to time, I must make sense of where I’ve been, where I am, and where I’m going. I must experience some degree of confusion and little obstacles along the way. I must notice things that are interesting and important to me that I can connect to the journey. I must have the freedom to make and correct little mistakes.

Following detailed instructions might aid in accomplishing certain kinds of tasks efficiently. However, following instructions can get in the way of learning something, and the primary mission of testing is to learn about the product and its status.

You could change the assignment by challenging the tester to walk through a set of test cases to find problems in them, or to try to divine the motivation for them, and that may generate some useful insights.

But if you really want testers to learn about the product, here’s how I’d do it: give them a mission to learn about the product. Today we’ll look at instances of learning missions that you can apply early in the tester’s engagement or your own. Such missions tend to be broad and open, and less targeted towards specific risks and problems than they might be later. I’ll provide a few examples, with comments after each one.

“Interview the product manager about the new feature. Identify three to six user roles, and (in addition to your other notes) create sketches or whiteboard diagrams of some common instances of how they might use the feature. In your conversation, raise and discuss the possibility of obstacles or interruptions that might impede the workflow. Take notes and photos.”

As the principles of context-driven testing note, the product is a solution. If the problem isn’t solved, the product doesn’t work. When the product poses new problems, it might not be working either from the customer’s perspective.

“Attend the planning session for the new feature. Ask for descriptions of what we’re building; who we’re building it for; what kind of problems they might experience; and how we would recognize them as problems. Raise questions periodically about testability. Take minutes of the discussions in the meeting.”

Planning meetings tend to be focused on envisioning success; on intention. Those meetings present opportunities to talk anticipating failure; on how we or the customer might not achieve our goals, or might encounter problems. Planning a product involves planning ways of noticing how it might go wrong, too.

“Perform a walkthrough of this component’s functionality with a developer or a senior tester. Gather instances of functions in the product, or data that it processes, that might represent exceptions or extremes. Collect sets of ideas for test conditions that might trigger extreme or exceptional behaviour, or that might put the product in an unstable state. Create a risk list, with particular focus on threats to capability, reliability, and data integrity that might lead to functional errors or data loss.”

In Rapid Software Testing parlance, a test condition is something that can be examined during a test, or something that might change the outcome of a test. It seems to me that when people use formalized procedural test cases, often their intention is to examine particular test conditions. However, those conditions can be collected and examined using many different kinds of artifacts: tables, lists, annotated diagrams or flowcharts, mind maps…

“Review the specification for the product with the writer of the user manual. In addition to any notes or diagrams that you keep, code the contents of the specification. (Note: “code” is used here in the sense used in qualitative research; not in the sense of writing computer code.) That is, for each numbered paragraph, try to identify at least one and up to three quality criteria that are explicitly or implicitly mentioned. Collate the results and look for quality criteria that are barely mentioned or missing altogether, and be on the lookout for mysterious silences.”

There’s a common misconception about testing: that testers look for inconsistencies between the product and a description of the product, and that’s all. But excellent testers look at the product, at descriptions of the product, and at intentions for the product, and seek inconsistencies between all of those things. Many of our intentions are tacit, not explicit. Note also that the designer’s model of the user’s task may be significantly different from the user’s model.

Notice that each example above includes an information mission. Each one includes a mandate to produce specific, reviewable artifacts, so that the tester’s learning can be evaluated with conversation and documented evidence. Debriefing and relating learning to others is an important part of testing in general, and session-based test management in particular.

Each example also involves collaboration with other people on the team, so that inconsistencies between perspectives can be identified and discussed. And notice: these are examples. They are not templates to be followed. It’s important that you develop your own missions, suited to the context in which you’re working.

At early stages of the tester’s engagement, finding problems is not the focus. Learning is. Nonetheless, as one beneficial side effect, the learning may reveal some errors or inconsistencies before they can turn into bugs in the product. As another benefit, testers and teams can collect ideas for product and project risk lists. Finally, the learning might reveal test conditions that can usefully be checked with tools, or that might be important to verify via explicit procedures.

Back to the coaching session. “Sometimes managers say that it’s important to give testers explicit instructions when we’re dealing with an offshore team whose first language is not English”, said Frieda.

Would test cases really make that problem go away? Presumably the test cases and the product would be written in English too. If the testers don’t understand English well, then they’ll scarcely be able to read the test cases well, or to comprehend the requirements or the standards, or to understand what the product is trying to tell them through its (presumably also English) user interface.

Maybe the product and the surrounding artifacts are translated from English into the testers’ native language. That addresses one kind of problem, but introduces a new one: requirements and specifications and designs and jargon routinely get misinterpreted even when everyone is working in English. When that material is translated, some meaning is inevitably changed or lost in translation. All of these problems will need attention and management.

If a product does something important, presumably there’s a risk of important problems, many of which will be unanticipated by test cases. Wouldn’t it be a good idea to have skilled testers learn the product reasonably rapidly but also deeply to prepare them to seek and recognize problems that matter?

When testers are up and running on a project, there are several approaches towards focusing their work without over-focusing it. I’ve mentioned a few already. We’ll look at another one of those next.

Breaking the Test Case Addiction (Part 4)

Monday, January 21st, 2019

Note: this post is long from the perspective of the kitten-like attention spans that modern social media tends to encourage. Fear not. Reading it could help you to recognize how you might save you hours, weeks, months of excess and unnecessary work, especially if you’re working as a tester or manager in a regulated environment.

Testers frequently face problems associated with excessive emphasis on formal, procedurally scripted testing. Politics, bureaucracy, and paperwork combine with fixation on test cases. Project managers and internal auditors mandate test cases structured and written in a certain form “because FDA”. When someone tells you this, it’s a pretty good indication that they haven’t read the FDA’s guidance documentation.

Because here’s what it really says:

For each of the software life cycle activities, there are certain “typical” tasks that support a conclusion that the software is validated. However, the specific tasks to be performed, their order of performance, and the iteration and timing of their performance will be dictated by the specific software life cycle model that is selected and the safety risk associated with the software application. For very low risk applications, certain tasks may not be needed at all. However, the software developer should at least consider each of these tasks and should define and document which tasks are or are not appropriate for their specific application. The following discussion is generic and is not intended to prescribe any particular software life cycle model or any particular order in which tasks are to be performed.

General Principles of Software Validation;
Final Guidance for Industry and FDA Staff, 2002

The General Principles of Software Validation document is to some degree impressive for its time, 2002. It describes some important realities. Software problems are mostly due to design and development, far less to building and reproduction. Even trivial programs are complex. Testing can’t find all the problems in a product. Software doesn’t wear out like physical things do, and so problems often manifest without warning. Little changes can have big, wide-ranging, and unanticipated effects. Using standard and well-tested software components addresses one kind of risk, but integrating those components requires careful attention.

There are lots of problems with General Principles of Software Validation document, too. I’ll address several of these, I hope, in future posts.

Apropos of the present discussion, the document doesn’t describe what a test case is, nor how it should be documented. By my count, the document mentions “test case” or “test cases” 30 times. Here’s one instance:

“Test plans and test cases should be created as early in the software development process as feasible.”

Here are two more:

“A software product should be challenged with test cases based on its internal structure and with test cases based on its external specification.”

If you choose to interpret “test case” as an artifact, and consider that challenge sufficient, this would be pretty terrible advice. It would be analogous to saying that children should be fed with recipes, or that buildings should be constructed with blueprints. A shallow reading could suggest that the artifact and the performance guided by that artifact are the same thing; that you prepare the recipe before you find out what the kids can and can’t eat, and what’s in the fridge; that you evaluate the building by comparing it to the blueprints and then you’re done.

On the other hand, if you substitute “test cases” with “tests” or “testing”, it’s pretty great advice. It’s a really good idea to challenge a software product with tests, with testing, based on internal and external perspectives.

The FDA does not define “test case” in the guidance documentation. A definition does appear in Glossary of Computer System Software Development Terminology (8/95).

test case. (IEEE) Documentation specifying inputs, predicted results, and a set of execution conditions for a test item. Syn: test case specification. See: test procedure

Okay, let’s see “test procedure”:

test procedure (NIST) A formal document developed from a test plan that presents detailed instructions for the setup, operation, and evaluation of the results for each defined test. See: test case.

So it is pretty terrible advice after all.

(Does that “8/95” refer to August 1995? Yes, it does. None of the source documents for the  Glossary of Computer System Software Development Terminology (8/95) is dated after 1994. For some perspective, that’s before Windows 95; before Google; before smartphones and tablets; before the Manifesto for Agile Software Development; before the principles of context-driven testing…)

But happily, in Section 2 of General Principles of Software Validation, before any of the guidance on testing itself, is the Principle of the Least Burdensome Approach:

We believe we should consider the least burdensome approach in all areas of medical device regulation. This guidance reflects our careful review of the relevant scientific and legal requirements and what we believe is the least burdensome way for you to comply with those requirements. However, if you believe that an alternative approach would be less burdensome, please contact us so we can consider your point of view.

The “careful review” happened in the period leading up to 2002, which is the publication date of this guidance document. In testing community of those days, anything other than ponderously scripted procedural test cases were viewed with great suspicion in writing and conference talks. Thanks to work led by Cem Kaner, James Bach, and other prominent voices in the testing community, the world is now a safer place for exploration in testing. And, as noted in the previous post in this series, the FDA itself has acknowledged the significance and importance of exploratory work.

Test documentation may take many forms more efficient and effective than formally scripted procedures, and the Least Burdensome Approach appears to allow a lot of leeway as long as evidence is sufficient and the actual regulations are followed. (For those playing along at home, the regulations include Title 21 Code of Federal Regulations (CFR) Part 11.10 and 820, and 61 Federal Register (FR) 52602.)

Several years ago, James Bach began some consulting work with a company that made medical devices. They had hired him to analyze, report on, and contribute to the testing work being done for a particular Class III device. (I have also done some work for this company.)

The device consisted of a Control Box, operated by a technician. The Control Box was connected to a Zapper Box that delivered Healing Energy to the patient’s body. (We’ve modified some of the specific words and language here to protect confidentiality and to summarize what the devices do.) Insufficient Healing Energy is just Energy. Too much Healing Energy, or the right amount for too long, turns into Hurting Energy or Killing Energy.

When James arrived, he examined the documentation being given to testers. He found more than a hundred pages of stuff like this:

9.8.1 To verify Power Accuracy

9.8.1.1Connect the components according to the General Setup document.
9.8.1.2Power on and connect Power Monitor (instead of electrodes).
9.8.1.3Power on the Zapper Box.
9.8.1.4Power on the Control Box.
9.8.1.5Set default settings of temperature and power for zapping.
9.8.1.6Set test jig load to nominal value.
9.8.1.7Select nominal duration and nominal power setting.
9.8.1.8Press the Start button.
9.8.1.9Verify Zapper reports the power setting value ±10% on display.

Is this good formal testing?

It’s certainly a formal procedure to follow, but where’s the testing part? The closest thing is that little molecule of actual testing in the last line: the tester is instructed to apply an oracle by comparing the power setting on the Control Box with what the Zapper reports on its display. There’s nothing to suggest examining the actual power being delivered by noting the results from the Power Monitor. There’s nothing about inducing variation to obtain and extend coverage, either.

At one point, James and another tester defrosted this procedure. They tried turning on the Control Box first, and then waited for a variety of intervals to turn on the Zapper Box. To their amazement, the Zapper Box could end up in one of four different states, depending on how long they waited to start it—and at least a couple of those states were potentially dangerous to the patient or to the operator.

James replaced 50 pages of this kind of stuff with two paragraphs containing things that had not been covered previously. He started by describing the test protocol:

3.1 General testing protocol
In the test descriptions that follow, the word “verify” is used to highlight specific items that must be checked. In addition to those items a tester shall, at all times, be alert for any unexplained or erroneous behavior of the product. The tester shall bear in mind that, regardless of any specific requirements for any specific test, there is the overarching general requirement that the product shall not pose an unacceptable risk of harm to the patient, including any unacceptable risks due to reasonably foreseeable misuse.

Read that paragraph carefully, sentence by sentence, phrase by phrase. Notice the emphasis on looking for problems and risks—especially on the risk of human error.

Then he described the qualifications necessary for testers to work on this product:

3.2 Test personnel requirements

The tester shall be thoroughly familiar with the Zapper Box and Control Box Functional Requirements Specification, as well as with the working principles of the devices themselves. The tester shall also know the working principles of the Power Monitor Box test tool and associated software, including how to configure and calibrate it, and how to recognize if it is not working correctly. The tester shall have sufficient skill in data analysis and measurement theory to make sense of statistical test results. The tester shall be sufficiently familiar with test design to complement this protocol with exploratory testing, in the event that anomalies appear that require investigation. The tester shall know how to keep test records to credible, professional standard.

In summary: Be a scientist. Know the domain, know the tools, be an analyst, be an investigator, keep good lab notes.

Then James provided some concise test ideas, leaving plenty of room for variation designed to shake out bugs. Here’s an example like something from the real thing:

3.2.2 Fields and Screens

3.2.2.1With the Power Monitor test tool already running, start the Zapper Box and the Control Box. Vary the order and timing in which you start them, retain the Control Box and Power Monitor log files, and note any inconsistent or unexpected behaviour.
3.2.2.2Visually inspect the displays and verify conformance to the requirements and for the presence of any behaviour or attribute that could impair the performance or safety of the product in any material way.
3.2.2.3With the system settings at default values change the contents of every user-editable field through the range of all possible values for that field. (e.g. Use the knob to change the session duration from 1 to 300 seconds.) Visually verify that appropriate values appear and that everything that happens on the screen appears normal and acceptable.
3.2.2.4Repeat 3.2.2.3 with system settings changed to their most extreme possible values.
3.2.2.5Select at least one field and use the on-screen keyboard, knob, and external keyboard respectively to edit that field.
3.2.2.6Scan the Control Box and Power Monitor log files for any recorded error conditions or anomalies.

To examine certain aspects of the product and its behaviour, sometimes very specific test design matters. Here’s a representative snippet based on James’ test documentation:

3.5.2 Single Treatment Session Power Accuracy Measurement

3.5.2.3From the Power Monitor log file, extract the data for the measured electrode. This sample should comprise the entire power session, including cooldown, as well as the stable power period with at least 50 measurements (i.e., taken at least five times per second over 10 seconds of stable period data).
3.5.2.4From the Control Box log file, extract the corresponding data for the stable power period of the measured electrode.
3.5.2.5Calculate the deviation by subtracting the reported power for the measured electrode from the corresponding Power Monitor reading (use interpolation to synchronize the time stamp of the power meter and generation logs).
3.5.2.6Calculate the mean of the power sample X (bar) and its standard deviation (s).
3.5.2.7Find the 99% confidence and 99% two-sided tolerance interval k for the sample. (Use Table 5 of SOP-QAD-10, or use the equation below for large samples.)
3.5.2.8The equation for calculating the tolerance interval k is:
Zapper Formula
where χ2γ,N-1 is the critical value of the chi-square distribution with degrees of freedom N -1 that is exceeded with probability γ; and Z2(1-p)/2 is the critical value of the normal distribution which is exceeded with probability (1-p)/2. (See NIST Engineering Statistics Handbook.)

Now, that’s some real formal testing. And it was accepted just fine by the organization and the FDA auditors. Better yet, and following this protocol revealed some surprising behaviours that prompted more careful evaluation of the requirements for the product.

What are some lessons we could learn from this? One key point, it seems to me, is that when you’re working as a tester in a regulated environment, it’s crucial that you read the regulations and the guidance documentation. If you don’t, you run the risk of being pushed around by people who haven’t read them, and who are working on the basis of mythology and folklore.

Our over-arching mission as testers is to seek and find problems that threaten the value of the product. In contexts where human life, health, or safety are on the line, the primary job at hand is to learn about the product and problems that post risks and hazards to people. Excessive bureaucracy and paperwork can distract us from that mission; even displace it. Therefore, we must find ways to do the best testing possible, while still providing the best and least evidence that still completely satisfies auditors and regulators that we’ve done it.

Back in our coaching session, Frieda, acting the part of the manager, replied, “But… we don’t have the time to train testers to do that kind of stuff. We need them to be up to speed ASAP.”

“What does ‘up to speed’ actually mean?” I asked.

Frieda, still in character, replied “We want them to be banging on keys as quickly as possible.”

Uh huh. Imagine a development manager responsible for a medical device saying, “We don’t have time for the developers to learn what they’re developing. We want them up to speed as quickly as possible. (And, as we all know, programming is really just banging on keys.)”

The error in this line of thinking is that testing is about pushing buttons; producing widgets on a production line; flipping testburgers. If you treat testing as flipping testburgers, then there’s a risk that testers will flip whatever vaguely burger-shaped thing comes their way… burgers, frisbees, cow pies, hockey pucks… You may not get the burger you want.

If you think of testing as an investigation of the product, testers must be investigators, and skillful ones at that. Upon engaging with the product and the project, testers set to learning about the product they’re investigating and the domain in which it operates. Testers keep excellent lab notes and document their work carefully, but not to the degree that documentation displaces the goal of testing the system and finding problems in it. Testers are focused on risk, and trained to be aware of problems that they might encounter as they’re testing (per CFR Title 21 Part 820.25 (b)(2)) .

If they’re not sufficiently skilled when you hire them, you’ll supervise and train them until they are. And if they’re unskilled and can’t be trained… are you really sure you want them testing a device that could deliver Killing Energy?

How else might you guide testing work, whether in projects in regulated contexts or not? That’s a topic for next time.

Breaking the Test Case Addiction (Part 3)

Thursday, January 17th, 2019

In the previous post, “Frieda”, my coaching client, asked about producing test cases for auditors or regulators. In Rapid Software Testing (RST), we find it helpful to frame that in terms of formal testing.

Testing is formal to the degree that it must be done in a specific way, or to verify specific facts. Formal testing typically has the goal of confirming or demonstrating something in particular about the product. There’s a continuum to testing formality in RST. My version, a tiny bit different from James Bach‘s, looks like this:

Some terminology notes: checking is the process of operating and observing a product; applying decision rules to those observations; and then reporting on the outcome of those rules; all mechanistically, algorithmically. A check can be turned into a formally scripted process that can be performed by a human or by a machine.

Procedurally scripted test cases are instances of human checking, where the tester is being substantially guided by what the script tells her to do. Since people are not machines and don’t stick to the algorithms, people are not checking in the strictest sense of our parlance.

A human transceiver is someone doing things based only on the instructions of some other person, behaving as that person’s eyes, ears, and hands.

Machine checking is the most formal mode of testing, in that machines perform checks in entirely specific ways, according to a program, entirely focused on specific facts. The motivation to check doesn’t come from the machine, but from some person. Notice that programs are formal, but programming is an informal activity. Toolsmiths and people who develop automated checks are not following scripts themselves.

The degree to which you formalize is a choice, based on a number of context factors. Your context guides your choices, and both of those evolve over time.

One of the most important context factors is your mission. You might be in a regulated environment, where regulators and auditors will eventually want you to demonstrate specific things about the product and the project in a highly formal way. If you are in that context, keeping the the auditors and the regulators happy may require certain kinds of formal testing. Nonetheless, even in that context, you must perform informal testing—lots of it—for at least two big reasons.

The first big reason is to learn the about the product and its context to prepare for excellent formal testing that will stand up to the regulators’ scrutiny. This is tied to another context factor: where you are in the life of the project and your understanding of the product.

Formal testing starts with informal work that is more exploratory and tacit, with the goal of learning; less scripted and explicit, with the goal of demonstrating. All the way along, but especially in between those poles, we’re searching for problems. No less than the Food and Drug Administration emphasizes how important this is.

Thorough and complete evaluation of the device during the exploratory stage results in a better understanding of the device and how it is expected to perform. This understanding can help to confirm that the intended use of the device will be aligned with sponsor expectations. It also can help with the selection of an appropriate pivotal study design.

Section 5: The Importance of Exploratory Studies in Pivotal Study Design
Design Considerations for Pivotal Clinical Investigations for Medical Devices
Guidance for Industry, Clinical Investigators, Institutional Review Boards
and Food and Drug Administration Staff

The pivotal stage of device development, says the FDA, focuses on developing what people need to know to evaluate the safety and effectiveness of a product. The pivotal stage usually consists of one or more pivotal studies. In other words, the FDA acknowledges that development happens in loops and cycles; that development is an iterative process.

James Bach emphasized this in his talk The Dirty Secret of Formal Testing and it’s an important point in RST. Development is an iterative process because at the beginning of any cycle of work, we don’t know for sure what all the requirements are; what they mean; what we can get; and how we might decide that we’ve got it. We don’t really know that until we’ve until we’ve tested the product… and we don’t know how to test the product until we’ve tried to test the product!

Just like developing automated checks, developing formally scripted test cases is an informal process. You don’t follow a script when you’re interpreting a specification; when you’re having a conversation with a developer or a designer; when you’re exploring the product and the test space to figure out where checking might be useful or important. You don’t follow a script when you recognize a new way of using tools to learn something about the product, and apply them. And you don’t follow a script when you investigate bugs that you’ve found—either during informal testing or the formal testing that might follow it.

If you try to develop formal procedural test cases without testing the actual product, they stand a good chance of being out of sync with it. The dirty secret of formal testing is that all good formal testing begins with informal testing.

It might be a very good idea for programmers to develop some automated checks that helps them with the discipline of building clean code and getting rapid feedback on it. It’s also a good idea for developers, designers, testers, and business people to develop clear ideas about intentions for a product, envisioning success. It might also be a good idea to develop some automated checks above the unit level and apply them to the build process—but not too many and certainly not too early. The beginning of the work is usually a terrible time for excessive formalization.

Which brings us to the second big reason to perform informal testing continuously throughout any project: to address the risk that our formal testing to date will fail to reveal how the product might disappoint customers; lose someone’s money; blow something up; or hurt or kill people. We must be open to discovery, and to performing the testing and investigation that supports it, all the way throughout the project, because neither epiphanies nor bugs follow scripts or schedules.

The overarching mission of testing is focused on a question: “are there problems that threaten the value of the product, or the on-time, successful completion of our work?” That’s not a question that formal testing can ever answer on its own. Fixation on automated checks or test cases runs the risk of displacing time for experimentation, exploration, discovery, and learning.

Next time, we’ll look at an example of breaking test case addiction on a real medical device project. Stay tuned.

Breaking the Test Case Addiction (Part 2)

Wednesday, January 16th, 2019

Last time out, I was responding to a coaching client, a tester who was working in an organization fixated on test cases. Here, I’ll call her Frieda. She had some more questions about how to respond to her managers.

What if they want another tester to do your tests if you are not available?

“‘Your tests’, or ‘your testing’?”, I asked.

From what I’ve heard, your tests. I don’t agree with this but trying to see it from their point of view, said Frieda.

I wonder what would happen if we asked them “What happens when you want another manager to do your managing if you are not available?” Or “What happens when you want another programmer to do programming if the programmer is not available?” It seems to me that the last thing they would suggest would be a set of management cases, or programming cases. So why the fixation on test cases?

Fixation is excessive, obsessive focus on something to the exclusion of all else. Fixation on test cases displaces people’s attention from other important things: understanding of how the testing maps to the mission; whether the testers have sufficient skill to understand and perform the testing; the learning comes from testing and that feeds back into more testing; whether formalization is premature or even necessary…

A big problem, as I suggested last time, is a lack of managers’ awareness of alternatives to test cases. That lack of awareness feeds into a lack of imagination, and then loops back into a lack of awareness. What’s worse is that many testers suffer from the same problem, and therefore can’t help to break the loop. Why do managers keep asking for test cases? Because testers keep providing them. Why do testers keep providing them? Because managers keep asking for them, because testers keep providing them…, and the cycle continues.

That cycle also continues because there’s an attractive, even seductive, aspect to test cases: they can make testing appear legible. Legibility, as Venkatesh Rao puts it beautifully here, “quells the anxieties evoked by apparent chaos”.

Test cases help to make the messy, complex, volatile landscape of development and testing seem legible, readable, comprehensible, quantifiable. A test case either fails (problem!) or passes (no problem!). A test case makes the tester’s behaviours seem predictable and clear, so clear that the tester could even be replaced by a machine. At the beginning of the project, we develop 782 test cases. When we’ve completed 527 of them, the testing is 67.39% done!

Many people see testing as rote, step-by-step, repetitive, mechanical keypressing to demonstrate that the product can work. That gets emphasized by the domain we’re in: one that values the writing of programs. If you think keypressing is all there is to it, it makes a certain kind of sense to write programs for a human to follow so that you can control the testing.

Those programs become “your tests”. We would call those “your checks—where checking is the mechanistic process of applying decision rules to observations of the software.

On the other hand, if you are willing to recognize and accept testing as a complex, cognitive investigation of products, problems, and risks, your testing is a performance. No one else can do just as you do it. No one can do again just what you’ve done before. You yourself will never do it the same way twice. If managers want people to do “your testing” when you’re not available, it might be more practical and powerful to think of it as “performing their investigation on something you’ve been investigating”.

Investigation is structured and can be guided, but good investigation can’t be scripted. That’s because in the course of a real investigation, you can’t be sure of what you’re going to find and how you’re going to respond to it. Checking can be algorithmic; the testing that surrounds and contains checking cannot.

Investigation can be influenced or guided by plenty of things that are alternatives to test cases:

Last time out, I mentioned almost all of these as things that testers could develop while learning about the product or feature. That’s not a coincidence. Testing happens in tangled loops and spirals of learning, analysis, exploration, experimentation, discovery, and investigation, all feeding back into each other. As testing proceeds, these artifacts and—more importantly—the learning they represent can be further developed, expanded, refined, overproduced, put aside, abandoned, recovered, revisited…

Testers can use artifacts of these kinds as evidence of testing that has been done, problems that have been found, and learning that has happened. Testers can include these artifacts in test reports, too.

But what if you’re in an environment where you have to produce test cases for auditors or regulators?

Good question. We’ll talk about that next time.

Breaking the Test Case Addiction (Part 1)

Tuesday, January 15th, 2019

Recently, during a coaching session, a tester was wrestling with something that was a mystery to her. She asked:

Why do some tech leaders (for example, CTOs, development managers, test managers, and test leads) jump straight to test cases when they want to provide traceability, share testing efforts with stakeholders, and share feature knowledge with testers?

I’m not sure. I fear that most of the time, fixation on test cases is simply due to ignorance. Many people literally don’t know any other way to think about testing, and have never bothered to try. Alarmingly, that seems to apply not only to leaders, but to testers, too. Much of the business of testing seems to limp along on mythology, folklore, and inertia.

Testing, as we’ve pointed out (many times), is not test cases; testing is a performance. Testing, as we’ve pointed out, is the process of learning about a product through exploration and experimentation, which includes to some degree questioning, studying, modeling, observation, inference, etc. You don’t need test cases for that.

The obsession with procedurally scripted test cases is painful to see, because a mandate to follow a script removes agency, turning the tester into a robot instead of an investigator. Overly formalized procedures run a serious risk of over-focusing testing and testers alike. As James Bach has said, “testing shouldn’t be too focused… unless you want to miss lots of bugs.”

There may be specific conditions, elements of the product, notions of quality, interactions with other products, that we’d like to examine during a test, or that might change the outcome of a test. Keeping track of these could be very important. Is a procedurally scripted test case the only way to keep track? The only way to guide the testing? The best way? A good way, even?

Let’s look at alternatives for addressing the leaders’ desires (traceability, shared knowledge of testing effort, shared feature knowledge).

Traceability. It seems to me that the usual goal of traceability is be able to narrate and justify your testing by connecting test cases to requirements. From a positive perspective, it’s a good thing to make those connections to make sure that the tester isn’t wasting time on unimportant stuff.

On the other hand, testing isn’t only about confirming that the product is consistent with the requirements documents. Testing is about finding problems that matter to people. Among other things, that requires us to learn about things that the requirements documents get wrong or don’t discuss at all. If the requirements documents are incorrect or silent on a given point, “traceable” test cases won’t reveal problems reliably.

For that reason, we’ve proposed a more powerful alternative to traceability: test framing, which is the process of establishing and describing the logical connections between the outcome of the test at the bottom and the overarching mission of testing at the top.

Requirements documents and test cases may or may not appear in the chain of connections. That’s okay, as long as the tester is able to link the test with the testing mission explicitly. In a reasonable working environment, much of the time, the framing will be tacit. If you don’t believe that, pause for a moment and note how often test cases provide a set of instructions for the tester to follow, but don’t describe the motivation for the test, or the risk that informs it.

Some testers may not have sufficient skill to describe their test framing. If that’s so, giving test cases to those testers papers over that problem in an unhelpful and unsustainable way. A much better way to address the problem would, I believe, would be to train and supervise the testers to be powerful, independent, reliable agents, with freedom to design their work and responsibility to negotiate it and account for it.

Sharing efforts with stakeholders. One key responsibility for a tester is to describe the testing work. Again, using procedurally scripted test cases seems to be a peculiar and limited means for describing what a tester does. The most important things that testers do happen inside their heads: modeling the product, studying it, observing it, making conjectures about it, analyzing risk, designing experiments… A collection of test cases, and an assertion that someone has completed them, don’t represent the thinking part of testing very well.

A test case doesn’t tell people much about your modeling and evaluation of risk. A suite of test cases doesn’t either, and typical test cases certainly don’t do so efficiently. A conversation, a list, an outline, a mind map, or a report would tend to be more fitting ways of talking about your risk models, or the processes by which you developed them.

Perhaps the worst aspect of using test cases to describe effort is that tests—performances of testing activity—become reified, turned into things, widgets, testburgers. Effort becomes recast in terms of counting test cases, which leads to no end of mischief.

If you want people to know what you’ve done, record and report on what you’ve done. Tell the testing story, which is not only about the status of the product, but also about how you performed the work, and what made it more and less valuable; harder or easier; slower or faster.

Sharing feature knowledge with testers. There are lots of ways for testers to learn about the product, and almost all of them would foster learning better than procedurally scripted test cases. Giving a tester a script tends to focus the tester on following the script, rather than learning about the product, how people might value it, and how value might be threatened.

If you want a tester to learn about a product (or feature) quickly, provide the tester with something to examine or interact with, and give the tester a mission. Try putting the tester in front of

  • the product to be tested (if that’s available)
  • an old version of the product (while you’re waiting for a newer one)
  • a prototype of the product (if there is one)
  • a comparable or competitive product or feature (if there is one)
  • a specification to be analyzed (or compared with the product, if it’s available)
  • a requirements document to be studied
  • a standard to review
  • a user story to be expanded upon
  • a tutorial to walk through
  • a user manual to digest
  • a diagram to be interpreted
  • a product manager to be interviewed
  • another tester to pair with
  • a domain expert to outline a business process

Give the tester the mission to learn something based on one or more of these things. Require the tester to take notes, and then to provide some additional evidence of what he or she learned.

(What if none of the listed items is available? If none of that is available, is any development work going on at all? If so, what is guiding the developers? Hint: it won’t be development cases!)

Perhaps some people are concerned not that there’s too little information, but too much. A corresponding worry might be that the available information is inconsistent. When important information about the product is missing, or unclear, or inconsistent, that’s a test result with important information about the project. Bugs breed in those omissions or inconsistencies.

What could be used as evidence that the tester learned something? Supplemented by the tester’s notes, the tester could

  • have a conversation with a test lead or test manager
  • provide a report on the activities the tester performed, and what the tester learned (that is, a test report)
  • produce a description of the product or feature, bugs and all (see The Honest Manual Writer Heuristic)
  • offer proposed revisions, expansions, or refinements of any of the artifacts listed above
  • identify a list of problems about the product that the tester encountered
  • develop a list of ways in which testers might identify inconsistencies between the product and something desirable (that is, a list of useful oracles)
  • report on a list of problems that the tester had in fulfilling the information mission
  • in a mind map, outline a set of ideas about how the tester might learn more about the product (that is, a test strategy)
  • list out a set of ideas about potential problems in the product (that is, a risk list)
  • develop a set of ideas about where to look for problems in product (that is, a product coverage outline)

Then review the tester’s work. Provide feedback, coaching and mentoring. Offer praise where the tester has learned something well; course correction where the tester hasn’t. Testers will get a lot more from this interactive process than from following step-by-step instructions in a test case.

My coaching client had some more questions about test cases. We’ll get to those next time.

Oracles from the Inside Out, Part 5: Oracles as References as Media

Tuesday, September 15th, 2015

Try asking testers how they recognize problems. Many will respond that they compare the product to its specification, and when they see an inconsistency between the product and its specification, they report a bug. Others will talk about creating and running automated checks, using tools to compare output from the product to specific, pre-determined, expected results; when the product produces a result inconsistent with expectations, the check identifies a bug which the tester then reports to the developer or manager. It might be tempting to think of this as moving from the bottom right quadrant on this table to the bottom left.

Traditional talk about oracles refers almost exclusively to references. W.E. Howden, who introduced “oracle” as a term of testing art, said that an oracle as “an external mechanism which can be used to check test output for correctness”. Yet thinking of oracles in terms of correctness leads to some pretty serious problems. (I’ve outlined some of them here).

In the Rapid Software Testing namespace, we take a different, broader view of oracles. Rather than focusing on correctness, we focus on problems: an oracle is a means by which we recognize a problem when we encounter one during testing. Checking for correctness, as Howden puts it, may severely limit our capacity to notice many kinds of problems. A product or service can be correct with respect to some principle, but have plenty of problems that aren’t identified by that principle; and a product can produce incorrect results without the incorrectness representing a problem for anyone. When testers fixate on documented requirements, there’s a risk that they will restrict their attention to looking for inconsistencies with specific claims; when testers fixate on automated checks, there’s a risk that they will restrict their focus to inconsistency with a comparable algorithm. Focus your attention too narrowly on a particular oracle—or a particular class of oracle—and you can be confident of one thing: you’ll miss lots of bugs.

Documents and tools are media. In the most general sense, “medium” is descriptive of something in between, like “small” and “large”. But “medium” as a noun, a medium, can be between lots of things. A communication medium like radio sits between performers and an audience; a psychic medium, so the claim goes, provides a bridge between a person and the spirit world; when people want to exchange things of value, they use often use money as a medium for the exchange. Marshall McLuhan, an early and influential media theorist, said that a medium is anything that humans create or use to effect change. Media are tools, technologies that people use to extend, enhance, enable, accelerate, or intensify human capabilities. Extension is the most obvious and prominent effect of media. Most people think of media in terms of communications media. A medium can certainly be printed pages or television screens that enable messages to be conveyed from one person to another. McLuhan viewed the phonetic alphabet as a technology—a medium that extended the range of speech over great distances and accelerated its transmission. But a cup of coffee is a medium too; it extends alertness and wakefulness, and when consumed socially with others, it can extend conversation and friendliness. Media, placed between a product and our observation of it, extend our capacity to recognize bugs.

McLuhan emphasized that media change things in many different ways at the same time. In addition to extending or enabling or accelerating our capabilities, McLuhan said, every new medium obsolesces one or more existing media, grabbing our attention away from old things; every new medium retrieves notions of formerly obsolescent media, making old things new again. McLuhan used heat as a metaphor for the degree to which media require the involvement of the user; a “cool” medium like radio, he said, requires the listener to participate and fill in the missing pieces of the experience; a “hot” medium like a movie, provides stimulation to the ear and especially the eye, requiring less engagement from the viewer. Every medium, when “overheated” (McLuhan’s term for a medium that has been stretched or extended beyond its original or intended capacity), reverses into the opposite of what it might have been originally intended to accomplish. Socrates (and the King of Egypt) recognized that writing could extend memory, but could reverse into forgetfulness (see Plato’s dialogue Phaedrus). Coffee extends alertness and conversation, but too much of it and people become too wired work and too addled to chat. A medium always draws attention to itself to some degree; an overheated medium may dazzle us so much that we begin to ignore what it contains or what we intended it to do for us. More importantly, a medium affects us. This is one of the implications of McLuhan’s famous but oblique statement “the medium is the message”. By “message”, he means “the change of scale or pace or pattern” that a new invention or innovation “introduces into human affairs.” (This explanation comes from Mark Federman, to whom I’m indebted for explaining McLuhan’s work to me over the years.)

When we pay attention, we can easily observe media overheating both in talk about testing and development work and in the work itself. Documents and tools frequently dominate conversations. In some organizations, a problem won’t be considered a bug unless it is inconsistent with an explicit statement in a specification or requirements document. Yet documents are only partial representations, subsets, of what people claim to have known or believed at some point in time, and times change. In some places, testing work is dominated by automated checking. Checks can be very valuable, providing great precision and fast feedback. But checks may focus on functional aspects of the product, and less on other parafunctional attributes.

McLuhan’s work emphasizes that media are essentially neutral, agnostic to our purposes. It is our engagement with media that produces good or bad outcomes—good and bad outcomes. Perhaps the most important implication of McLuhan’s work is that media amplify whatever we are. If we’re fabulous testers, our tools extend our capabilities, helping us to be even more fabulous. But if we’re incompetent, tools extend our incompetence, allowing us to do bad testing faster and worse than we’ve ever been able to do it before. To the degee that we are inclined to avoid conflict and arguments, we will use documents to help us avoid conflict and arguments; to the degree that we are inclined to welcome discussion and the refinement of ideas, then documents can help us do that. If we are disposed to be alert to a wide range of problems, automated checks will help us as we diversify our scope; if we are oblivious to certain kinds of problems in the product, automated checks will amplify our oblivion.

Reference oracles—documents, checking tools, representative data, comparable products—are unquestionably media, extending all of the other kinds of oracles: private and shared mental models, both private and shared feelings, conversations with others, and principles of consistency. How can we evaluate them? What do we use them for? And how can we use them to help us find problems without letting them overwhelm or displace all the other ways we might have of finding problems? That’s the subject of the next post.

Braiding The Stories (Test Reporting Part 2)

Friday, February 24th, 2012

We were in the middle of a testing exercise at the Amplifying Your Effectiveness conference in 2005. I was assisting James Bach in a workshop that he was leading on testing. He presented the group with a mysterious application written by James Lyndsay—an early version of one of the Black Box Test Machines. “How many test cases would you need to test this application?” he asked.

Just then Jerry Weinberg wandered into the room. “Ah! Jerry Weinberg!” said James. “One of the greatest testing experts in the world! He’ll know the answer to this one. How many test cases would you need to test this application, Jerry?”

Jerry looked at the screen for a moment. “Three,” he said, firmly and decisively.

James knew to play along. “Three?!“, he said, in a feigned combination of amazement, uncertainty, and curiosity. “How do you know it’s three? Is it really three, Jerry?”

“Yes,” said Jerry. “Three.” He paused, and then said drily, “Why? Were you expecting some other number?”

In yesterday’s post, I was harshly critical of pass vs. fail ratios, a very problematic yet startlingly common way of estimating the state of the product and the project. When I point out the mischief of pass vs. fail ratios, some people object. “In the real world,” they say, “we have to report pass vs. fail ratios to our managers, because that’s what they want.” Yet bogus reporting is antithetical to the “real world”. Pass vs. fail ratios come from the the fake world, a world where numbers have magical properties to soothe troubled and uncertain souls. Still, there’s no question that managers want something. It’s our mandate to give them something of value.

Some people say that managers want numbers because they want to know that we’re measuring. I’ve found two ways of thinking about measurement that have been very useful to me. One is the definition from Kaner and Bond’s splendid paper “Software Engineering Metrics: What Do They Measure and How Do We Know?”: “Measurement is the empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them.” I think that’s a superb definition of quantitative measurement, and the paper includes a set of probing questions to test the validity of a quantitative measurement. Pass vs. fail ratios fall down badly when they’re subjected to those tests.

Jerry Weinberg offers another definition of measurement that I think is more in line with what managers really want: “Measurement is the art and science of making reliable (and significant) observations.” (The main part of the definition comes from Quality Software Management, Vol. 2: First-Order Measurement; the parenthetical comes from recent correspondence over Twitter.) That’s a more general, inclusive definition. It incorporates Kaner and Bond’s notion of quantitative measurement, but it’s more welcoming to qualitative, first-order approaches. First-order measurement, as Jerry describes it, provides answers to questions like “What seems to be happening? and What should I do now?” It entails a minimum of fuss, and tends to be direct, unobtrusive, inexpensive, and qualitative, leading either to immediate action or a decision to seek more information. It’s a common, misleading, and often expensive mistake in software development to leap over first-order measurement and reporting in favour of second-order—less direct, more quantified, more abstract, and based on more elaborate and vulnerable models.

My experience, as a tester, a programmer, a program manager, and a consultant, tells me that to manage a project well, you need a good deal of immediate and significant information. “Immediate” here doesn’t only mean timely; it also means unmediated, without a bunch of stuff getting in between you and the observation. In particular, managers need to know about problems that threaten the value of the product and the on-time, successful completion of the project. That knowledge requires more than abstract data; it requires information. So, as testers, how can we inform the decision-makers? In our Rapid Software Testing class, James Bach and I have lately taken to emphasizing this: We must learn to describe and report on the product, our testing, and the quality of our testing. This involves constructing, editing, narrating, and justifying a story in three lines that weave around each other like a braid. Each line, or level, is its own story.

Level 1: Tell the product story. The product story is a qualitative report on how the product can work, how it fails, and how it might fail in ways that matter to our clients. “Working”, “failure”, and “what matters” are all qualitative evaluations. Quality is value to some person; in a business setting, quality is value to some person who matters to the business. A qualitative report about a product requires us to relate the nature of the product, the people who matter, and the presence or absence of value, risks, and problems for those people. Qualitative information makes it possible for our clients to make informed decisions about quality.

Level 2: To make the product story credible, tell the testing story. The testing story is about how we configured, operated, observed, and evaluated the product; what we actually did and what we actually saw. The testing story gives warrant to the product story; it helps our clients understand why they should believe and trust the product story we’re giving. The testing story is centred around the coverage that we obtained and the oracles that we applied. Coverage is the extent to which we’ve tested the program; it’s about where we’ve looked and how we’ve looked, and it’s also about what’s uncovered—where we might not have looked yet, and where we don’t intend to look. Oracles are central to evaluation; they’re the principles and mechanisms that allow us to recognize a problem. The product story will likely feature problems in the product; the testing story, where necessary, includes an account of how we knew they were problems, for whom they would be problems, and inferences about how serious the problems it might be. We can make inferences about the significance of problems, but not ultimate conclusions, since the decision of what matters and what constitutes a problem lies with the product owner. The product story and our clients’ reactions to it will influence the ongoing testing story, and vice versa.

Level 3: To make the testing story credible, tell a story about the quality of the testing. Just as the product story needs warrant, so too does the testing story. To tell a story about the quality of testing requires us to describe why the testing we’ve done has been good enough, and why the testing we haven’t done hasn’t been so important so far. The quality-of-testing story includes details on what made testing harder or slower, what made the product more or less testable, what the risks and costs of testing are, and what we might need or recommend in order to provide better, more accurate, more timely information. The quality-of-testing story will shape and be shaped by the other two stories.

Develop skills to tell and frame stories. People sometimes justify presenting invalid numbers in lieu of stories by saying that numbers are “efficient”. I think they mean “fast”, since efficiency of communication depends not only on speed, but also on value, relevance, validity, and the level of detail your client needs. In order to frame stories appropriately and hit the right level of detail…

Don’t think data feed; think the daily news. Testing is like investigative journalism, researching and delivering stories to people. The newspaper business knows how to direct attention efficiently to the stories in which we’re interested, such that we get the level of detail that we seek. Some of those strategies include:

  • Headlines. A quick glance over each page tells us immediately what, in the editors’ judgement, are the most salient aspects of any given story. Headlines come in different sizes, relative to the editors’ assessment of the importance of the story.
  • Front page. The paper comes folded. The stories that the paper deems most important to its reader are on the front page, above the fold. Other important stories are on the front page below the fold. The page is laid out to direct our attention to what we find most relevant, and to allow us to focus and refocus on items of interest.
  • Continuation. When an entire story is too long to fit on the front page, it’s abbreviated and the story continues elsewhere. This gives the reader the option of following the story or looking at other items on the front page.
  • Coverage areas. The newspaper is organized into sections (hard news, business, sports, life and leisure, arts, real estate, cars, travel, and so forth). Each section comes with its own front page, which generally includes headlines and continuations of its own.
  • Structured storytelling. Newspaper stories tend to be organized in spiralling levels of detail, such that the story is set up to follow the inverted pyramid (the link is well worth reading). The story typically begins with the most newsworthy information, usually immediately addressing the five W questions—who, what where, why, and when, plus how—and the the story builds from there. The key is that the reader can absorb information to the level of detail she seeks, continuing to the end of the story or jumping out when she’s satisfied.
  • Identifying who is involved and who is affected. Reporters and editors contextualize their stories. Just as in testing, people are the most important element of the context. A story is far more compelling when it affects the reader or people that the reader cares about. A good story often helps to clarify why the reader should care.
  • Varying approaches to delivering information. Newspapers often use a picture to helps to illustrate or emphasize an important aspect of a story. In the business or sports sections, where quantitative data is often crucial, information may be organized in tables, or trends may be illustrated with charts. Notice that the stories—first-order reports—are always given greater prominence than the tables of stock quotes league standings, and line scores.
  • Sidebars. Some stories are illuminated by background information that might break the flow of the main story. That information is presented in parallel; in another thread, as we might say.
  • Daily (and in the world of the Web, continuous) delivery of information. My newspaper arrives at a regular time each day, a sort of daily heartbeat for the news cycle. The paper’s Web site is updated on a continuous basis. Information is available both on a supply and a demand basis; both when I expect it and when I seek it.
  • Identifiable sources. Well-researched stories gain credibility by identifying how, where, when, and from whom the information was obtained. This helps to set up degrees of trust and skepticism in the reader.

One important note: These approaches apply to more than text. Testers need to extend these patterns not only to written or mechanical forms, but to oral discourse.

I’ll have more suggestions and additional parallels between test reporting and newspapers in the next post in this series.

Scripts or No Scripts, Managers Might Have to Manage

Wednesday, December 21st, 2011

A fellow named Oren Reshef writes in response to my post on Worthwhile Documentation.

Let me be the devil’s advocate for a post.

Not having fully detailed test steps may lead to insufficient data in bug reports.

Yup, that could be a risk (although having fully detailed steps in a test script might also lead to insufficient data in bug reports; and insufficient to whom, exactly?).

So what do you do with a problem like that? You manage it. You train the tester, reminding her of the heuristic that each problem report needs a problem description; an example of something that shows the problem; and why she thinks it’s a problem (that is, the oracle; the principle or mechanism by which the tester recognizes the problem). Problem, example, and why; PEW. You praise and reward the tester for producing reports that follow the PEW heuristic; you critique reports that don’t have them. You show the tester lots of examples of bug reports, and ask her to differentiate between the good ones and the bad ones, why each one might be consider good or bad, and in what ways. If the tester isn’t getting it, you have the tester work with and be coached by someone who does get it. The coach talks the tester through the process of identifying a problem, deciding why it’s a problem, and outlining the necessary information. Sometimes it’s steps and specific data; sometimes the steps are obvious and it’s only the data you need to specify; sometimes the problem happens with any old data, and it’s the steps that are important. And sometimes the description of the problem contains enough information that you need supply neither steps nor data. As a tester under time pressure, she needs to develop the skill to do this rapidly and well—or, if nothing works, she might have to find a job for which she is better suited.

You can argue that a good tester should include the needed information and steps in her bug report, but this raise (at least) two problems:

– The same information may be duplicated across many bugs, and even worst it will not be consistent.

As a manager, I can not only argue that a tester should include the needed information; I can require that a tester include the needed information. Come on, Mr. Advocate… this is a problem that a capable tester and a capable test manager (and presumably your client) can solve. If “the same” information is duplicated across many bugs, might that be an interesting factor worth noting? A test result, if you will? Will this actually persist for long without the test manager (or test leads, or the test team) noticing or managing it?

And in any case, would a script solve the problem that you post above? If you can solve that problem in a script, can you solve it in a (set of) bug report(s)?

Writing test steps is not as trivial as it sounds (for example due to cognitive biases, or simply by overlooking steps that seems obvious to you), and to be efficient they also need to be peer reviewed and tested. You don’t want that to happen in a bug report.

“Writing test steps is not as trivial as it sounds.” I know. It’s non-trivial in terms of time, and it’s non-trivial in terms of skill, and it’s non-trivial in terms of cost. That’s why I write about those problems. That’s why James Bach writes about them.

Again: how do you solve problems like testers providing inefficient repro steps? You solve it with training, practice, coaching, review, supervision, observation, interaction… that is, if you don’t like the results you’re getting, you steer the testers in the direction you want them to go, with leadership and management.

The tester may choose the same steps over and over, or steps that are easier for her but does not represent real customers.

Yes, I often hear things like this to justify poor testing. “Real customers” according to whom? It seems as though many organizations have a problem recognizing that hackers are real; that people under pressure are real; that people who make mistakes are real; that people who can become distracted are real. That people who get up and go away from the keyboard, such that a transaction times out are real.

Is it the role of testers to behave always like idealized “real” customers? That’s like saying that it’s the role of airport security to assume that all of the business class customers are “real” business people. I’d argue that it’s nice for testers to be able to act like customers, but it’s far more important for testers to act like testers. It’s the tester’s role to identify important vulnerabilities in the product. Sometimes that involves behaving like a typical customer, and sometimes it involves behaving like an atypical customer, or and sometimes it involves behaving like someone who is not a customer at all. But again, mostly it involves behaving like a tester.

Again you may argue that a good tester should take all that into account, but it’s not that simple to verify it especially for tests involving many short trivial steps.

Maybe it isn’t that simple. If that’s a problem, what about logging? What about screen capture tools? Such tools will track activities far more accurately than a script the tester allegedly followed. After all, a test script is just a rumour of how something should be done, and the claim that the script was followed is also a rumour. What about direct supervision and scrutiny? What about occasional pairing? What about reviewing the testers’ work? What about providing feedback to testers, while affording them both freedom and responsibility?

And would scripts solve that problem when (for example) you’re a recording bug that you’ve just discovered (probably after deviating from a script)? How, exactly? What happens when a problem identified by a script is fixed? Does the value of the script stay constant over time?

Detailed test steps (at least to some extent) might important if your test activity might be transferred to another offshore team someday (happened to me a few weeks ago, I sent them a test document with only high level details and hoped for the best), or your customer requires in-depth understanding of your tests (a multi-billion Canadian telecommunication company insisted on getting those from us during the late 90’s, we chose the least readable TestDirector export format and shipped it to them…).

Ah, yes. “I sent them a test document with only high level details and hoped for the best.” What can I say about “hope” as a management approach? Does a pile of test scripts impart in-depth understanding? Or are they (as I suspect) a way of responding to a question that you didn’t know how to answer, which was in fact a question that the telco didn’t know how to ask?

Going through some set of actions by rote is not a test. A test script is not a test. A test is what you think and what you do. It is a complex, cognitive activity that requires the presence or the development of much tacit knowledge. Raw data or raw instructions at best provide you with a miniscule fraction of what you need to know. If someone wanted in-depth understanding of how a retail store works, would you send them a pile of uncontextualized cash register receipts?

The Devil’s Advocate never seems to have a thoughtful manager for a client. I would suggest that a tester neither hire nor work for the devil.

Thank you for playing the devil’s advocate, Oren.