Blog Posts for the ‘Documentation’ Category

Braiding The Stories (Test Reporting Part 2)

Friday, February 24th, 2012

We were in the middle of a testing exercise at the Amplifying Your Effectiveness conference in 2005. I was assisting James Bach in a workshop that he was leading on testing. He presented the group with a mysterious application written by James Lyndsay—an early version of one of the Black Box Test Machines. “How many test cases would you need to test this application?” he asked.

Just then Jerry Weinberg wandered into the room. “Ah! Jerry Weinberg!” said James. “One of the greatest testing experts in the world! He’ll know the answer to this one. How many test cases would you need to test this application, Jerry?”

Jerry looked at the screen for a moment. “Three,” he said, firmly and decisively.

James knew to play along. “Three?!“, he said, in a feigned combination of amazement, uncertainty, and curiosity. “How do you know it’s three? Is it really three, Jerry?”

“Yes,” said Jerry. “Three.” He paused, and then said drily, “Why? Were you expecting some other number?”

In yesterday’s post, I was harshly critical of pass vs. fail ratios, a very problematic yet startlingly common way of estimating the state of the product and the project. When I point out the mischief of pass vs. fail ratios, some people object. “In the real world,” they say, “we have to report pass vs. fail ratios to our managers, because that’s what they want.” Yet bogus reporting is antithetical to the “real world”. Pass vs. fail ratios come from the the fake world, a world where numbers have magical properties to soothe troubled and uncertain souls. Still, there’s no question that managers want something. It’s our mandate to give them something of value.

Some people say that managers want numbers because they want to know that we’re measuring. I’ve found two ways of thinking about measurement that have been very useful to me. One is the definition from Kaner and Bond’s splendid paper “Software Engineering Metrics: What Do They Measure and How Do We Know?”: “Measurement is the empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them.” I think that’s a superb definition of quantitative measurement, and the paper includes a set of probing questions to test the validity of a quantitative measurement. Pass vs. fail ratios fall down badly when they’re subjected to those tests.

Jerry Weinberg offers another definition of measurement that I think is more in line with what managers really want: “Measurement is the art and science of making reliable (and significant) observations.” (The main part of the definition comes from Quality Software Management, Vol. 2: First-Order Measurement; the parenthetical comes from recent correspondence over Twitter.) That’s a more general, inclusive definition. It incorporates Kaner and Bond’s notion of quantitative measurement, but it’s more welcoming to qualitative, first-order approaches. First-order measurement, as Jerry describes it, provides answers to questions like “What seems to be happening? and What should I do now?” It entails a minimum of fuss, and tends to be direct, unobtrusive, inexpensive, and qualitative, leading either to immediate action or a decision to seek more information. It’s a common, misleading, and often expensive mistake in software development to leap over first-order measurement and reporting in favour of second-order—less direct, more quantified, more abstract, and based on more elaborate and vulnerable models.

My experience, as a tester, a programmer, a program manager, and a consultant, tells me that to manage a project well, you need a good deal of immediate and significant information. “Immediate” here doesn’t only mean timely; it also means unmediated, without a bunch of stuff getting in between you and the observation. In particular, managers need to know about problems that threaten the value of the product and the on-time, successful completion of the project. That knowledge requires more than abstract data; it requires information. So, as testers, how can we inform the decision-makers? In our Rapid Software Testing class, James Bach and I have lately taken to emphasizing this: We must learn to describe and report on the product, our testing, and the quality of our testing. This involves constructing, editing, narrating, and justifying a story in three lines that weave around each other like a braid. Each line, or level, is its own story.

Level 1: Tell the product story. The product story is a qualitative report on how the product can work, how it fails, and how it might fail in ways that matter to our clients. “Working”, “failure”, and “what matters” are all qualitative evaluations. Quality is value to some person; in a business setting, quality is value to some person who matters to the business. A qualitative report about a product requires us to relate the nature of the product, the people who matter, and the presence or absence of value, risks, and problems for those people. Qualitative information makes it possible for our clients to make informed decisions about quality.

Level 2: To make the product story credible, tell the testing story. The testing story is about how we configured, operated, observed, and evaluated the product; what we actually did and what we actually saw. The testing story gives warrant to the product story; it helps our clients understand why they should believe and trust the product story we’re giving. The testing story is centred around the coverage that we obtained and the oracles that we applied. Coverage is the extent to which we’ve tested the program; it’s about where we’ve looked and how we’ve looked, and it’s also about what’s uncovered—where we might not have looked yet, and where we don’t intend to look. Oracles are central to evaluation; they’re the principles and mechanisms that allow us to recognize a problem. The product story will likely feature problems in the product; the testing story, where necessary, includes an account of how we knew they were problems, for whom they would be problems, and inferences about how serious the problems it might be. We can make inferences about the significance of problems, but not ultimate conclusions, since the decision of what matters and what constitutes a problem lies with the product owner. The product story and our clients’ reactions to it will influence the ongoing testing story, and vice versa.

Level 3: To make the testing story credible, tell a story about the quality of the testing. Just as the product story needs warrant, so too does the testing story. To tell a story about the quality of testing requires us to describe why the testing we’ve done has been good enough, and why the testing we haven’t done hasn’t been so important so far. The quality-of-testing story includes details on what made testing harder or slower, what made the product more or less testable, what the risks and costs of testing are, and what we might need or recommend in order to provide better, more accurate, more timely information. The quality-of-testing story will shape and be shaped by the other two stories.

Develop skills to tell and frame stories. People sometimes justify presenting invalid numbers in lieu of stories by saying that numbers are “efficient”. I think they mean “fast”, since efficiency of communication depends not only on speed, but also on value, relevance, validity, and the level of detail your client needs. In order to frame stories appropriately and hit the right level of detail…

Don’t think data feed; think the daily news. Testing is like investigative journalism, researching and delivering stories to people. The newspaper business knows how to direct attention efficiently to the stories in which we’re interested, such that we get the level of detail that we seek. Some of those strategies include:

  • Headlines. A quick glance over each page tells us immediately what, in the editors’ judgement, are the most salient aspects of any given story. Headlines come in different sizes, relative to the editors’ assessment of the importance of the story.
  • Front page. The paper comes folded. The stories that the paper deems most important to its reader are on the front page, above the fold. Other important stories are on the front page below the fold. The page is laid out to direct our attention to what we find most relevant, and to allow us to focus and refocus on items of interest.
  • Continuation. When an entire story is too long to fit on the front page, it’s abbreviated and the story continues elsewhere. This gives the reader the option of following the story or looking at other items on the front page.
  • Coverage areas. The newspaper is organized into sections (hard news, business, sports, life and leisure, arts, real estate, cars, travel, and so forth). Each section comes with its own front page, which generally includes headlines and continuations of its own.
  • Structured storytelling. Newspaper stories tend to be organized in spiralling levels of detail, such that the story is set up to follow the inverted pyramid (the link is well worth reading). The story typically begins with the most newsworthy information, usually immediately addressing the five W questions—who, what where, why, and when, plus how—and the the story builds from there. The key is that the reader can absorb information to the level of detail she seeks, continuing to the end of the story or jumping out when she’s satisfied.
  • Identifying who is involved and who is affected. Reporters and editors contextualize their stories. Just as in testing, people are the most important element of the context. A story is far more compelling when it affects the reader or people that the reader cares about. A good story often helps to clarify why the reader should care.
  • Varying approaches to delivering information. Newspapers often use a picture to helps to illustrate or emphasize an important aspect of a story. In the business or sports sections, where quantitative data is often crucial, information may be organized in tables, or trends may be illustrated with charts. Notice that the stories—first-order reports—are always given greater prominence than the tables of stock quotes league standings, and line scores.
  • Sidebars. Some stories are illuminated by background information that might break the flow of the main story. That information is presented in parallel; in another thread, as we might say.
  • Daily (and in the world of the Web, continuous) delivery of information. My newspaper arrives at a regular time each day, a sort of daily heartbeat for the news cycle. The paper’s Web site is updated on a continuous basis. Information is available both on a supply and a demand basis; both when I expect it and when I seek it.
  • Identifiable sources. Well-researched stories gain credibility by identifying how, where, when, and from whom the information was obtained. This helps to set up degrees of trust and skepticism in the reader.

One important note: These approaches apply to more than text. Testers need to extend these patterns not only to written or mechanical forms, but to oral discourse.

I’ll have more suggestions and additional parallels between test reporting and newspapers in the next post in this series.

Scripts or No Scripts, Managers Might Have to Manage

Wednesday, December 21st, 2011

A fellow named Oren Reshef writes in response to my post on Worthwhile Documentation.

Let me be the devil’s advocate for a post.

Not having fully detailed test steps may lead to insufficient data in bug reports.

Yup, that could be a risk (although having fully detailed steps in a test script might also lead to insufficient data in bug reports; and insufficient to whom, exactly?).

So what do you do with a problem like that? You manage it. You train the tester, reminding her of the heuristic that each problem report needs a problem description; an example of something that shows the problem; and why she thinks it’s a problem (that is, the oracle; the principle or mechanism by which the tester recognizes the problem). Problem, example, and why; PEW. You praise and reward the tester for producing reports that follow the PEW heuristic; you critique reports that don’t have them. You show the tester lots of examples of bug reports, and ask her to differentiate between the good ones and the bad ones, why each one might be consider good or bad, and in what ways. If the tester isn’t getting it, you have the tester work with and be coached by someone who does get it. The coach talks the tester through the process of identifying a problem, deciding why it’s a problem, and outlining the necessary information. Sometimes it’s steps and specific data; sometimes the steps are obvious and it’s only the data you need to specify; sometimes the problem happens with any old data, and it’s the steps that are important. And sometimes the description of the problem contains enough information that you need supply neither steps nor data. As a tester under time pressure, she needs to develop the skill to do this rapidly and well—or, if nothing works, she might have to find a job for which she is better suited.

You can argue that a good tester should include the needed information and steps in her bug report, but this raise (at least) two problems:

- The same information may be duplicated across many bugs, and even worst it will not be consistent.

As a manager, I can not only argue that a tester should include the needed information; I can require that a tester include the needed information. Come on, Mr. Advocate… this is a problem that a capable tester and a capable test manager (and presumably your client) can solve. If “the same” information is duplicated across many bugs, might that be an interesting factor worth noting? A test result, if you will? Will this actually persist for long without the test manager (or test leads, or the test team) noticing or managing it?

And in any case, would a script solve the problem that you post above? If you can solve that problem in a script, can you solve it in a (set of) bug report(s)?

Writing test steps is not as trivial as it sounds (for example due to cognitive biases, or simply by overlooking steps that seems obvious to you), and to be efficient they also need to be peer reviewed and tested. You don’t want that to happen in a bug report.

“Writing test steps is not as trivial as it sounds.” I know. It’s non-trivial in terms of time, and it’s non-trivial in terms of skill, and it’s non-trivial in terms of cost. That’s why I write about those problems. That’s why James Bach writes about them.

Again: how do you solve problems like testers providing inefficient repro steps? You solve it with training, practice, coaching, review, supervision, observation, interaction… that is, if you don’t like the results you’re getting, you steer the testers in the direction you want them to go, with leadership and management.

The tester may choose the same steps over and over, or steps that are easier for her but does not represent real customers.

Yes, I often hear things like this to justify poor testing. “Real customers” according to whom? It seems as though many organizations have a problem recognizing that hackers are real; that people under pressure are real; that people who make mistakes are real; that people who can become distracted are real. That people who get up and go away from the keyboard, such that a transaction times out are real.

Is it the role of testers to behave always like idealized “real” customers? That’s like saying that it’s the role of airport security to assume that all of the business class customers are “real” business people. I’d argue that it’s nice for testers to be able to act like customers, but it’s far more important for testers to act like testers. It’s the tester’s role to identify important vulnerabilities in the product. Sometimes that involves behaving like a typical customer, and sometimes it involves behaving like an atypical customer, or and sometimes it involves behaving like someone who is not a customer at all. But again, mostly it involves behaving like a tester.

Again you may argue that a good tester should take all that into account, but it’s not that simple to verify it especially for tests involving many short trivial steps.

Maybe it isn’t that simple. If that’s a problem, what about logging? What about screen capture tools? Such tools will track activities far more accurately than a script the tester allegedly followed. After all, a test script is just a rumour of how something should be done, and the claim that the script was followed is also a rumour. What about direct supervision and scrutiny? What about occasional pairing? What about reviewing the testers’ work? What about providing feedback to testers, while affording them both freedom and responsibility?

And would scripts solve that problem when (for example) you’re a recording bug that you’ve just discovered (probably after deviating from a script)? How, exactly? What happens when a problem identified by a script is fixed? Does the value of the script stay constant over time?

Detailed test steps (at least to some extent) might important if your test activity might be transferred to another offshore team someday (happened to me a few weeks ago, I sent them a test document with only high level details and hoped for the best), or your customer requires in-depth understanding of your tests (a multi-billion Canadian telecommunication company insisted on getting those from us during the late 90’s, we chose the least readable TestDirector export format and shipped it to them…).

Ah, yes. “I sent them a test document with only high level details and hoped for the best.” What can I say about “hope” as a management approach? Does a pile of test scripts impart in-depth understanding? Or are they (as I suspect) a way of responding to a question that you didn’t know how to answer, which was in fact a question that the telco didn’t know how to ask?

Going through some set of actions by rote is not a test. A test script is not a test. A test is what you think and what you do. It is a complex, cognitive activity that requires the presence or the development of much tacit knowledge. Raw data or raw instructions at best provide you with a miniscule fraction of what you need to know. If someone wanted in-depth understanding of how a retail store works, would you send them a pile of uncontextualized cash register receipts?

The Devil’s Advocate never seems to have a thoughtful manager for a client. I would suggest that a tester neither hire nor work for the devil.

Thank you for playing the devil’s advocate, Oren.

What Exploratory Testing Is Not (Part 5): Undocumented Testing

Wednesday, December 21st, 2011

This week I had the great misfortune of reading yet another article which makes the false and ridiculous claim that exploratory testing is “undocumented”. After years and years of plenty of people talking about and writing about and practicing excellent documentation as part of an exploratory testing approach, it’s depressing to see that there are still people shovelling fresh manure onto a pile that should have been carted off years ago.

Like the other approaches to test activities that have been discussed in this series (“touring“, “after-everything-else“, “tool-free“, and “quick testing“), “documented vs. undocumented” is in a category orthogonal to “exploratory vs. scripted”. True: usually scripted activities are performed by some agency following a set of instructions that has been written down somewhere. But we could choose to think of “scripted” in a slightly different and more expansive way, as “prescriptive”, or “mimeomorphic“. A scripted activity, in this sense, is one for which the actions to be performed have been established in advance, and the choices of the actions are not determined by the agency performing them. In that sense, a cook at McDonalds doesn’t read a script as he prepares your burger, but the preparation of a McDonald’s burger is a highly scripted activity.

Thus any kind of testing can be heavily documented or completely undocumented. A thoroughly documented test might be highly exploratory in nature, or it might be highly scripted.

In the Rapid Software Testing class, James Bach and I point out that when someone says “that should be documented”, what they’re really saying is “that should be documented if and how and when it serves our purposes.” So, let’s start by looking at the “when”.

When we question anything in order to evaluate it, there are moments in the process in which we might choose to record ideas or actions. I’ve broken these down into three basic categories that I hope you find helpful:

  • Before

  • During

  • After

There are “before”, “during”, and “after” moments with respect to any test activity, whether it’s a part of test design, test execution, result interpretation, or learning. Again, a hallmark of exploratory testing is the tester’s freedom and responsibility to optimize the value of the work as it’s happening. That means that when it’s important to record something, the tester is not only welcome but encouraged to
  • pick up a pen
  • take a screen shot
  • launch a session of Rapid Reporter
  • create or update a mind map
  • fire up a screen recorder
  • initiate logging (if it doesn’t start by default on the product you’re testing—and if logging isn’t available, you might consider identifying that as a testability problem and a related product and project risk)
  • sketch out a flowchart diagram
  • type notes into a private or shared repository
  • add to a table of data in Excel
  • fire off a note to a programmer or a product owner
and that’s an incomplete list. But they’re all forms of documentation.

Freedom to document at will should also mean that the tester is free to refrain from documenting something when the documentation doesn’t add value. At the same time, the tester is responsible and accountable for that decision. In Rapid Testing, we recommend writing down (or saving, or illustrating) only the things that are necessary or valuable to the project, and only when the value of doing so exceeds the cost. This doesn’t mean no documentation; it means the most informative yet fastest and least expensive documentation that completely fulfils the testing mission. Integrating that with testing work leads, we hold, to excellent testing—but it takes practice and skill.

For most test activities, it’s possible to relay information to other people orally, or even sometimes by allowing people to observe our behaviour. (At the beginning of the Rapid Testing class, I sometimes silently hold aloft a 5″ x 8″ index card in landscape orientation. I fold it in half along the horizontal axis, and write my first name on one side using a coloured marker. Everyone in the class mimics my actions. Without a single word of instruction being given or questions being asked, either verbally or in writing, the mission has been accomplished: each person now has a tent card in front of him.)

There’s a potential risk associated with an exploratory approach: that the tester might fail to document something important. In that case, we do what skilled people do with risk: we manage it. James Bach talks at length about managing exploratory testing sessions here. Producing appropriate documentation is partly a technical process, but the technical considerations are dominated by business imperatives: cost, value, and risk. There are also social considerations, too. The tester, the test lead, the test manager, the programmers, other managers, and the product owner determine collaboratively what’s important to document and what’s not so important with respect to the current testing mission. In an exploratory approach, we’re more likely to be emphasizing the discovery of new information. So we’re less likely to spend time on documenting what we will do, more likely to document what we are doing and what we have done. We could do a good deal of preparatory reading and writing, even in an exploratory approach—but we realize that there’s an ever-increasing risk that new discoveries will undermine the worth of what we write ahead of time.

That leads directly to “our purposes”, the task that we want to accomplish when documenting something. Just as testing itself has many possible missions, so too does test documentation. Here’s a decidedly non-exhaustive list, prepared over a couple of minutes:

  • to express testing strategy and tactics for an entire project, or for projects in general
  • to keep a set of personal notes to help structure a debriefing conversation
  • to outline testing activities for a test cycle
  • to report on activities during testing execution
  • to outline attributes of a particular quality criterion
  • to catalogue ideas about risk
  • to describe test coverage
  • to account for the work that we’ve done
  • to program a machine to perform a given set of actions
  • to alert people to potential problems in the product
  • to guide a tester’s actions over a test session
  • to identify structures in the application or service
  • to provide a description of how to use a particular test tool that we’ve crafted
  • to describe the tester’s role, skills, and qualifications
  • to explain business rules to someone else on the team
  • to outline scenarios in which the product might be used or tested
  • to identify, for a tester, a specific, explicit sequence of actions to perform, input to provide, and observations to make

That last item is the classic form of highly scripted testing, and that kind of documentation is usually absent from exploratory testing. Even so, a tester can take an exploratory approach using a script as a point of departure or as a reference, just as you might use a trail map to help guide an off-trail hike (among other things, you might want to discover shortcuts or avoid the usual pathways). So when someone says that “exploratory testing is undocumented”, I hear them saying something else. I hear them saying, “I only understand one form of test documentation, and I’ve successfully ignored every other approach to it or purpose for it.”

If you look in the appendices for the Rapid Software Testing class (you can find a .PDF at http://www.satisfice.com/rst-appendices.pdf), you’ll see a large number of examples of documentation that are entirely consistent with an exploratory approach. That’s just one source. For each item in my partial list above, here’s a partial list of approaches, examples, and tools.

Testing strategy and tactics for an entire project, or for projects in general.
Look at the Satisfice Heuristic Test Strategy Model and the Context Model for Heuristic Test Planning (these also appear in the RST Appendices).

An outline of testing activities for a test cycle.
Look at the General Functionality and Stability Test Procedure for Certifed for Microsoft Windows Logo. See also the OWL Quality Plan (and the Risk and Task Correlation) in the RST Appendices.

Keeping a set of personal notes to help structure a debriefing or other conversation.
See the “Beans ‘R Us Test Report” in the RST Appendices; or see my notes on testing an in-flight entertainment system which I did for fun on a flight from India to Amsterdam.

Recording activities and ideas during test execution
A video camera or a screen recording tool can capture the specific actions of a tester for later playback and review. Well-designed log files may also provide a kind of retrospective record about what was testing. Still neither of these provide insight into the tester’s mind. Recorded narration or conversation can do that; tools like BB Test Assistant, Camtasia, or Morae can help. The classic approach, of course, is to take notes. Have a look at my presentation, “An Exploratory Tester’s Notebook“, which has examples of freestyle notes taken during an impromptu testing session, and detailed, annotated examples of Session-Based Test Management sessions. Shmuel Gerson’s Rapid Reporter and Jonathan Kohl’s Session Tester are tools oriented towards taking notes (and, in the former case, including screen captures) of testing sessions on the fly.

Outlining many attributes of a particular quality criterion
See “Heuristics of Software Testability” in the RST Appendices for one example.

Cataloguing ideas about risk
Several examples of this in the RST Appendices, most extensively in the “Deployment Planning and Risk Analysis” example. You’ll also find an “Install Risk Catalog”; “The Risk of Incompatibility”; the Risk vs. Tasks section in the “OWL Quality Plan”; the “Y2K Compliance Report”; “Round Results Risk A”, which shows a mapping of Risk Areas vs. Test Strategy and Tasks.

Describing or outlining test coverage
A mapping establishes or illustrates relationships between things. We can use any of these to help us think about test coverage. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. These can be constructed before, after, or during a given test activity, with the goal of covering the map with tests, or using testing to extend the map. I catalogued several ways of thinking about coverage and reporting on it, in three articles Got You Covered, Cover or Discover, and A Map By Any Other Name. Several examples of lightweight coverage outlines can be found in the RST Appendices (“Putt Putt Saves the Zoo”, “Table Formatting Test Notes”, There are also coverage ideas incorporated into the Apollo mission notes that we’ve titled “Guideword Heuristics for Astronauts”).

Accounting for testing work that we’ve done.
See Session-Based Test Management, and see “An Exploratory Tester’s Notebook“. Darren McMillan provides excellent examples of annotated mind maps; scroll down to the section headed “Session Reports”, and continue through “Simplifying feedback to management” and “Simplifying feedback to groups”. A forthcoming article, written by me, shows how a senior test manager tracks testing sessions at a half-day granularity level.

Programming a machine to help you to explore
See all manner of books on programming, both references and cookbooks, but for testers in particular, have a look at Brian Marick’s Everyday Scripting with Ruby. Check out Pete Houghton’s splendid examples of exploratory test automation that begin here. Cem Kaner (often in collaboration with Doug Hoffman) write extensively about automation-assisted exploratory testing; an example is here.

Alerting people to potential problems in the product
In general, bug reporting systems provide one way to handle the task of recording and reporting problems in the product. James Bach provides an example of a report that he provided to a client (along with a more informal account of the session).

Guiding a tester’s actions over a test session
Guiding a tester involves skills like chartering and checklisting. Start with the documentation on Session Based Test Management (http://www.satisfice.com/sbtm). Selena Delesie has produced an excellent blog post on chartering exploratory testing sessions. The title of Cem Kaner’s presentation at CAST 2008, The Value of Checklists and the Danger of Scripts: What legal training suggests for testers describes the content perfectly. Michael Hunter’s You Are Not Done Yet lists can be used and adapted to your context as a set of checklists.

To identify structures in the application or service
The “Product Elements” section in the Heuristic Test Strategy Model provides a kind of framework for documenting product structures. In the RST Appendices, the test notes for “Putt Putt Saves the Zoo” and “Diskmapper”, and the “OWL Quality Plan” provide examples of identifying several different structures in the programs under test. Mind mapping provides a means of describing and illustrating structures, too; see Darren McMillan’s examples here and here. Ruud Cox and Ru Cindrea used a mind map of product elements to help win the Best Bug Report award in the Test Lab at EuroSTAR 2011. I’ve created a list of structures that support exploratory testing, and many of these are related to structures in the product.

Providing a description of how to use a particular test tool that we’ve crafted
While working at a bank, I developed (in Excel and VBA) a tool that could be used as an oracle and as a way of recording test results. (Thanks to non-disclosure agreements, I can describe these, but cannot provide examples.) When I left the project, I was obliged to document my work. I didn’t work on the assumption that anyone off the street would be reading the document. Instead, I presumed that anyone assigned to that testing job and to using that tool, would have the rapid learning skill to explore the tool, the product, and the business domain in a mutually supportive way. So I crafted documentation that was intended to tell testers just enough to get them exploring.

Explaining business rules to someone else on the team
I did include documentation for novices of one kind: within the documentation for that testing tool, I included a general description of how foreign exchange transactions worked from the bank’s perspective, and how appropriate accounts got credited and debited. I had learned this by reverse-engineering use cases and consulting with the local business analyst. I summarized it with a two-page document written in simple, direct language, referring disrectly to the simpler use cases and explaining the more confusing bits in more detail. For those whose learning style was oriented toward code, I also described the tables and array formulas that applied the business rules.

Outlining scenarios in which the product might be used or tested
I discuss some issues about scenarios here—why they’re important, and why it’s important to keep them open-ended and open to interpretation. It’s more important to record than to prescribe, since in a good scenario, you’ll observe and discover much more than you’ve articulated in advance. Cem Kaner gives ideas on how to produce scenarios; Hans Buwalda presents examples of soap opera testing.

Identifying required tester skill
People with skill don’t need prescriptive documentation for every little thing. Responsible managers identify the skills needed to test, and who commit to employing people who either have those skills or can develop them quickly. James Bach eliminated 50 pages of otiose documentation with two paragraphs. (Otiose is a marvelous word; it’s fun to look it up in a thesaurus.)

Identifying, for a tester, a particular explicit sequence of actions to perform, input to provide, and observations to make.
Again, a document that attempts to specify exactly what a tester should do is the hallmark of scripted testing. James Bach articulates a paradox that has not yet been noted clearly in our craft: in order to perform a scripted test well, you need signficant amounts of skill and tacit knowledge (and you also need to ignore the script on occasion, and you need to know when those occasions are). There’s another interesting issue here: preparing such documents usually depends on exploratory activity. There’s no script to tell you how to write a script. (You might argue there’s one exception. You can follow this script to write a test script: take each line of a requirements document, and add the words “Verify that” to the beginning of each line.)

Now, just as you can perform testing badly using any approach, you can perform exploratory testing and document it inappropriately, either by under-documenting it OR over-documenting it using any of the kinds of documentation above. But, as this document shows, the notion that exploratory testing is by its nature undocumented is not only ignorant, but aggressively ignorant about both testing and documentation. Whenever you see someone claim that exploratory testing is undocumented, I’d ask you to help by setting the record straight. Feel free to refer to this blog post, if you find it helpful; also, please point me to other exemplars of excellent documentation that are consistent with exploratory approaches. If we all work together, we can bury this myth, while providing excellent records and reports for our clients.

Worthwhile Documentation

Monday, December 19th, 2011

In the Rapid Software Testing class, we focus on ways of doing the fastest, least expensive testing that still completely fulfills the mission. That involves doing some things more quickly, and it also involves doing other things less, or less wastefully. One of the prime candidates for radical waste reduction is documentation that’s incongruent with the testing mission.

Medical device projects typically present a high degree of risk. Excellent testing helps teams and product owners to identify risks and problems in the product. The quality of testing is a function of the skill of the tester; one would not set loose an incapable tester on high-risk project. Yet some managers have told me that they commission people to write test documentation in a particular style. That style is, to me, overly elaborate and specific with respect to actions to perform and observations to make. Yet at the same time, that style is remarkably devoid of ideas about motivation or risk.

I sometimes ask managers why they use this style of instruction. They usually answer, “because we want anyone to be able to walk up to this system and test it.”

“Anyone?” I ask. “Why anyone?”

“You know how it is. If we have to test a new revision of this program a year from now, there’s a good chance that we won’t have the same testers.” (Dude. If you’re inflicting on your staff the idea of testing as writing or following instructions for an automaton, I might have an explanation for you.)

“Anyone?” I ask. “How about a cat?”

“Well, Michael, that’s silly. Cats can’t think. Cats can’t read.”

“How about my daughter? She’s seven, and she can read well enough to read that. And she could follow the steps pretty well, too.”

“We don’t hire children here!”

“Okay,” I offer. “Would you hire a completely incompetent tester who needed to be told absolutely everything, in painful detail?”

“We wouldn’t hire anyone like that.”

“Fair enough, and I’d hope not. So, why do you insist that people write instructions for them that way?

Let me be clear: when the situation calls for skilled testers, you don’t need overly specific instructions for them. On the other hand, if you don’t have skilled testers, you’ve got a problem that scripted testing won’t be able to solve.

Here’s a splendid example of a machete that we believe that managers could use to cut through jungles of waste. In a recent project that involved work with FDA-regulated medical devices, James Bach found a huge number of excruciatingly overspecified, low-value test cases aimed at “anyone”. The following two paragraphs replaced 50 pages of waste.

3.0 Test Procedures

3.1 General Testing Protocol

In the test descriptions that follow, the word “verify” is used to highlight specific items that must be checked. In addition to those items, the tester shall at all times be alert for any unexplained or erroneous behaviour of the product. The tester shall bear in mind that, regardless of any specific requirement for any specific test, there is the overarching general requirement that the product shall not pose an unacceptable risk of harm to the patient, including an unacceptable risk due to reasonably foreseeable misuse.

Test personnel requirements: The tester shall be thoroughly familiar with the Generator and Workstation Function Requirement Specifications, as well as the working principles of the devices themselves. The tester shall also know the workings of the power test jig and associated software, including how to configure and calibrate it and how to recognize it is not working correctly. The tester shall have sufficient skill in data analysis and measurement theory to make sense of statistical test results. The tester shall be sufficiently familiar with test design to complement this protocol with exploratory testing in the event that anomalies appear that require investigation. The tester shall know how to keep test records to a credible professional standard.

To me, that’s something worth writing down. Follow those instructions, and your team will save time, save work, and put the emphasis in the right places: on risk, and on meeting and mitigating that risk with skills.

Doing Development Work vs. Doing Quality Assurance

Saturday, June 5th, 2010

Here’s a case where a comment and question were worthy of a post of their own.  In reference to my recent post, Testers:  Get Out of the Quality Assurance Business, Selim Mia writes:

Hi Michael,

I have started following your blog just from past few days and I like to thank you for all of your thoughtful posts by which reflects your craftsmanship.

Thank you for reading, and thank you for thanking me.

I have solely agreed all of your points/advice/discussions on this post. I had many confusion about the term QA and QC since the start of my testing career and still have many confusion, i think other testers have the same. i have been working in a department called “QA” in my organization but doing mostly testing tasks as like other companies in Bangladesh. But along with testing we have also doing some of the QA tasks (i think) and below i have mentioned some of these:

  • Check-in Review: we check, each developer at-least once in a day Check-in their source code into the svn repository (source code management system) with the comment what changes he made for this particular check-in and also reviewer name who pair reviewed the code before check-in.
  • Code review: we check, is the code reviewed by the technology expert in witch technology project is developing in the regular interval (at least for the new developer’s code, code of complex functionalities, etc) and also we ensure that actions has been taken for all the review comments.
  • Audit Process Framework: we check, are all the development processes are following by the all project members except their have enough justification and approval not to follow the particular process(es).
  • Audit Bug repository: we ensure all the reported bugs have been taken into action (not a bug, assigned, WIP, fix, won’t fix).
  • Audit Document Management System: we ensure that all the updated version of all documents of the particular project are stored on the DMS.

Are not all above activities are part (of course, not all) of QA? Your kind words will be very much helpful to me.

Regards,
- Selim

What a great question! Thank you for asking.

The overarching mission for a tester, in my view, is to be of service to the project. Now, that’s not only the case for testers; I think it’s the overarching mission of anyone, everyone, on the project. We’re all in service to our paramount clients—the product owners, the business owners, the gold owners and the goal donors (as some Agile wags have said)—but we’re also in service to each other. When we’re thinking that way, the testers help the programmers by testing the product using a different skill set and mind set from the programmers; the programmers help the testers by providing a more testable product (log files, scriptable interfaces, and so on). Testers may help programmers to pinpoint the circumstances in which a bug happens; programmers help testers by providing explanations, test programs, hints on what to test. Testers learn to program; programmers learn to test. We support each other and learn from each other.

The Agile people for years have been advocating the idea of the self-organizing team. I believe in that too. That means that, in principle, anyone on the team is empowered to do whatever work needs to be done. So if a programmer takes on the tasks of setting up and configuring test environments, or if the tester is recruited to review code or models or bugs—activities that help to assure quality as a part of collaborative process, I’d say that’s cool.

The audit stuff gives me pause. Auditing, in my view, is a kind of testing role: gathering information with the intention of informing a decision. Auditors don’t set policy or enforce rules; they provide information to management. In many process-model-obsessed organizations (here in the West, at least) the role has taken on a different slant: auditors are a kind of process police. In such organizations, people rearrange and reprioritize their work not to optimize its value, but to keep the auditors happy. This is a form of goal displacement. To me, the priority should be on providing service and value to our clients, including each other.

In my view, if auditors discover some deviation from a set policy or a process model, I’d argue that the first step is to question the reasons for the deviation. Maybe someone is being sloppy; maybe someone is cutting corners; maybe someone is adding risk. But maybe someone has discovered a faster, less expensive, more efficient, more informative, more productive way of handling a task. Models always leave out something. Process models often leave out means by which we can encourage beneficial variation and change. I’ve never heard of an auditor reporting on some fabulous new problem-solving approach that someone has discovered internally. Most often, in my experience, process models leave out adaptability and people, as this remarkable TED talk describes.

It’s neither a tester’s job nor an auditor’s job, in my view, to set or enforce policy, and I think it’s politically dangerous for us to be perceived that way. As soon as we are perceived to be responsible for enforcement, we run the risk of being seen as tattletales, busybodies, quality police. In that kind of environment, information will soon start to be hidden, which undermines the task of investigating the product and identifying problems with it that threaten its value.

So, to the extent that you’re doing development work that helps to assure quality; to the extent that your teammates themselves are asking you to assist them; to the extent that you’re providing a service to them; to the extent that they appreciate what you’re doing as a service to them; and to the extent that they thank you for it, I’d say “rock on”, and congratulations.

In another forum, a correspondent suggested “Maybe it’s all down to the “overall” thing – be part of the process, not a megalomaniac who thinks he owns it.” I absolutely agree with that.  To the extent that you’re doing “quality assurance”; to the extent that your managers are requiring you to impose on your teammates (or even worse, to the extent that you’re imposing without being asked by anyone); to the extent that you’re slowing down the project or inflicting help; to the extent that the programmers see your work as enforcing the contents of a process model or policy document; to the extent that you are barely tolerated or outright resented—well, as always, that’s up to you and your organization. But it’s not the kind of work that I would condone or accept myself.

Again, thanks for writing.

Automation Bias, Documentation Bias, and the Power of Humans

Tuesday, June 30th, 2009

A few weeks I went down to the U.S. Consulate in Toronto to register Ariel, my daughter, as an American citizen born abroad. (She’s a dualie, because she was born in Canada to an American parent: me. I’m a dualie too, born in the U.S. to Canadian parents. Being born a dual citizen is a wonderful example of a best practice. You should follow it. But I digress.)

The application process is, naturally, fraught with complication and bureaucracy. There’s also a chilling and intimidating level of security; one isn’t allowed to bring anything electronic into the Consulate at all. No cell phones, no PDAs, and certainly no laptop computers. That means no electronic records, and no hope of looking anything up. So one has to prepare.

There’s a Web site for the Consular services. One of the first items that one sees on the site is a link for telephone inquiries. Note a couple of things here: the telephone services are for visa information, not for general information; and that visa information costs USD$0.90 per minute for a recorded system with no operator. (Oddly, that’s the price for calls from the U.S.; calls from Canada are cheaper, at CAD$0.69 per minute.) I didn’t test that.

With only a little digging, I was able to find information related to registering a birth abroad. I gathered the information and documents that I figured I needed, and took it all down to the Consulate. I was getting ready to travel the next day, and so in typical fashion, I pushed things out to the noon deadline for receiving applications. I watched the clock on the car anxiously, parking at 11:53 and getting to the Consulate at 11:55. “Wow, that’s pushing it,” said the security guard. “Last one today.”

When I spoke to the friendly, helpful lady behind the counter (I mean that; she was genuinely friendly and helpful) she they told me some things that the Web site didn’t.

  • The application form itself is online, and these days it’s one of those PDFs that has input fields, so everything can be nice and tidy. Again, though, there are some fields in the form that have several possible answers. There is some helpful information available, but I still had questions.
  • The consular officers want to see original documents, but accept and keep only photocopies of them. You need to come with your own photocopies. If you don’t, it costs you $1.00 per document—and there are lots of documents. This isn’t noted anywhere on the Web site that I could see.
  • On one of the Web pages listing documentation requirements, it says “In certain cases, it may be necessary to submit additional documents, including affidavits of paternity and support, divorce decrees from prior marriages, or medical reports of blood compatibility.” Well, what cases? The page doesn’t tell me, and getting it wrong means an extra trip. The lady behind the counter reviewed what I had brought, answered a number of questions, and told me exactly what to bring next time.

As I travel around, I sometimes see an implicit assumption that documents tell us all we need to know. Yet documents are always a stand-in for some person, an incomplete representation of what they know or what they want. They’re time-bound, in that they represent someone’s ideas frozen at some point in the past. They can’t, and don’t answer followup questions. As Northrop Frye once said, “A book always says the same thing.” Yet if we look more closely, not even ideas that are carefully and thoroughly debated can be expressed unambiguously. That’s why we have judges. And lawyers.

The next thing that happened emphasized this. After I left the Consulate, I returned to my car. At the collection booth, the posted time was 12:20. I’d been less than half an hour, which is good because parking at that garage costs $3.00 per half-hour. I handed the attendant my ticket. The charge was $6.00.

“What?! I’ve only been gone for 25 minutes.”

She looked at the ticket. “Sorry, sir. You checked in at 11:40.”

“No way,” I said. “I know what time I checked in; I was running late. It was at least 12 minutes later than 11:40. I got to the entrance to the Consulate, just over there, at 11:55. No way I could have taken 15 minutes to walk 75 metres!” She showed me the ticket. It said 11:40. “That’s impossible. I want to check the clock.”

The difference was only $3.00, but I was furious. I exited the garage, drove around to the entrance and check the display. It read 12:24, the correct time. I pushed the button and pulled out a ticket; it too read 12:24. To her credit, the attendant appeared and checked the clock, and asked to see the ticket I had just printed. “12:24. I’m sorry, sir, there’s nothing I can do.” Quite true, no doubt.

In this case, the (clearly fallible) machinery and the (clearly fallible) documentation were more credible than I. I didn’t check the ticket on the way in. And yet I know when I arrived, and I know that there must have been some kind of failure with the machinery. A one-off? A consistent pattern? Happens only at a certain time of the day? A mechanical problem? A software problem?

All the way home, I pondered over how the failure had occurred, and how one might test for it. But what impressed me most about my experience with the Consulate’s Web site, and the consular officer, and the the parking ticket machine, and the parking attendant, was the way in which we invest trust, to varying degrees and at various times, in machines and in documents and in people. When is that trust warranted, and when is it not?

Postscript: Just now, as I attempted to publish this post, the net connection at this hotel was suddenly unavailable. Again.

What Should A Test Plan Contain?

Wednesday, December 3rd, 2008

In response to this posting, Clive asks, “So in your opinion what should a test plan contain?”

First, Clive, thank you for asking.

Let’s consider first what we might mean by “plan”. The way James Bach and I talk about planning (and the way we teach it the Rapid Software Testing course) is that a plan is the sum or intersection of strategy and logistics. Strategy is the set of ideas that guide your test design. Logistics is the set of ideas that guide your application of resources. Put those things together, and you have a plan. The most important thing to note about this is that a plan is not a physical thing; it’s a set of ideas. Thus, it’s important to keep clear the difference between the plan and the planning documents—that is, the documents that contain some information about the plan.

It’s possible to interpret your question in at least two ways—first, as a question about the plan, and second as a question about the planning document. Let’s start with the plan.

For ideas on strategy, I consult the Heuristic Test Strategy Model (download it and have a look). Heuristics are fallible methods for solving a problem or making a decision; “heuristic” as a adjective means “(fallibly) conducive to learning”. “Heuristic” here does triple duty, modifying “model” (all models are heuristic), “strategy” (all strategies are heuristic too), and “test” (all tests are heuristic). The HTSM is a set of guideword heuristics and associated questions; you can find it here. I’ve memorized the 36 guidewords. They’re not hard to remember, using the mnemonics found in the course notes and a little practice.

Having the guidewords in my head makes it fast and easy to come up with lots of questions to ask, ideas to follow, and risks to evaluate in four broad categories.

  • Project environment—questions about context (customer, information sources, developer relations, test team, equipment and tools, schedule, test item (what we’re testing), and deliverables (or work products);
  • Product elements and their dimensions, important in evaluating coverage (structure, function, data, platforms, operations, and time);
  • Quality criteria—questions about the characteristics of the product that appeal to the users we like and discourage users we don’t like (capability, reliability, usability, security, scalability, performance, installability, compatibility, supportability, testability, maintainability, portability, and localizability); and
  • Test techniques by which we might interact with the product (function tests, domain test, stress tests, flow tests, scenario tests, claims-based tests, user tests, risk-based tests, and automatic tests).

For ideas on application of resources, I consider the Heuristic Test Planning Context Model (download this, too), which helps me think about who my clients are, the mission that I’m being asked to accomplish and the givens that I’ve got. Then I ask myself (and, if necessary, my client) questions about the things that might be missing from the givens. Based on risk, the value of the information we seek, and the cost of discovering it, we might decide to go with what we’ve got and exploit whatever resources we have, or we might decide to seek and apply more resources.

With these two tools, I have lots of ideas in my head, and the sum of those ideas constitutes the plan. The guidewords lend structure and stability to the plan; the ever-changing context and choices constantly focus and re-focus the plan. The planning ideas must diversified, risk-focused, specific to the product or system that we’re testing, and practical and achievable. So, in one sense, the plan contains all of these ideas.

Note that the plan itself is entirely thought-stuff, so in a literal physical sense the plan doesn’t actually contain anything. Planning documents contain representations—literally, re-presentations—of elements of the plan. Planning documents are highly audience- and purpose-dependent. They might include

  • a subset of the ideas that might be in my head;
  • mass storage for things that I might forget;
  • a means of co-ordinating the test team’s work;
  • ideas about risk, including ideas about knowns, known unknowns, and potential unknown unknowns;
  • a checklist of known problems in our application;
  • a general set of ideas about test coverage;
  • a specific set of ideas on co-ordinating risk ideas with tasks that we believe will help us to investigate and understand them;
  • -
    notes on how we intend to assign people to certain tasks;
  • a detailed schedule for specific activities that we anticipate;
  • a very broad schedule for activities that we haven’t hammered out yet; or
  • anything else that might be useful to impart to some person in some context.

The audience for the document might include

  • me
  • my test team;
  • my manager;
  • someone else who might be advising me on strategy ideas;
  • her manager, and so forth up to the CEO;
  • my client’s clients;
  • other test teams (say, an outsource test team, or my client’s test team);
  • the programmers;
  • regulators or auditors;
  • or other people or agencies.

A test planning document might take the form of

  • a sketch on the back of a napkin or hotel stationery to help guide a conversation;
  • a few lines in a pocket Moleskine notebook;
  • several pages in a large Moleskine notebook;
  • an email message with some suggestions from some person to some tester;
  • a diagram on a whiteboard;
  • a mind map;
  • a hierarchical text outline;
  • a Gantt chart;
  • a specific set of test ideas or test cases in some ginormous requirements and test management tool;
  • a lengthy but rough Word document, supplemented with tables of data in Excel spreadsheets;
  • a highly formal, polished, formatted document for presentation to a customer or a regulatory agency;
  • or anything else you can think of that describes or models some part of your intentions.

In the Appendices to the Rapid Software Testing course, there’s an excellent set of examples of test planning documents, from the very sketchy to the broadly general to the highly specific. See pages 47 – 166. There’s also an example of a general procedure here. Note the comment on the page that links to it:

I produced this procedure for Microsoft to help them do a better job of assuring that applications that claim to be Windows 2000 compatible really are compatible. The procedure itself is documented in 6 pages. As far as I know it is the first published exploratory testing procedure. It’s used along with a second non-exploratory procedure (which is 400 pages long!) to perform the certification test. What’s interesting about that is the fact that my 6 pages represent about one third of the total test effort.

So what should a test planning document look like? The default answer, to me, is nothing, because the test plan is idea-stuff, and translating ideas into some other form costs something. However, there may be value associated communicating the idea to someone else, so the default answer is not the final answer. The final answer is “the very least expensive representation of the idea that can sufficiently store or communicate the idea.” That might be vague—especially that “sufficiently” business. Yet there are a few important things to remember.

  • Planning comes with opportunity cost attached. In general, the more time we spend planning, the less time we spend interacting with the product. This is not to advocate no planning, but rather the least amount of planning that nonetheless sufficiently guides the accomplishment of the mission.
  • Writing plans—translating thoughts into some other format—tends to take more time than planning—thinking those thoughts.
  • Writing things down, sketching them, outlining them, can be a productive form of working, learning, and remembering for many people. So writing things down usually has some value.
  • Writing things down tends to become more important when people are separated in distance, time, and knowledge and when the knowledge is valuable. Documents can be useful in transmitting specific and detailed information to others. Again, in this context, writing things down can have some value.
  • Writing things down may help people to learn something, or it may give people the illusion of having learned it. It may even prevent people from learning things on their own. (This is not a new problem; see Plato’s Phaedrus, and the dialog between the King of Egypt and the god Theuth).
  • Conversation (or an internal questioning process) may be more useful in a circumstance where things are not known, but must be explored.
  • Aspects of the two approaches can supplement each other.
  • One useful heuristic, which I learned from Cem Kaner, is to ask whether your test planning document is intended to be a product, generally designed to supply someone else with something that they want, or a tool, generally designed to supply ourselves with something that helps us to accomplish our own mission. These are not distinct categories, but thinking about them might help to set priorities.
  • New information prompts us to change our plans. As testers, discovering new information is arguably the most important part of our role. We should be careful to limit our investments in planning, in light of the fact that we’ll know more this afternoon than we did this morning—and, for any given cycle in time, we’ll tend to know significantly more at the end of this cycle than we do now.
  • Our context drives the choices we make, and both evolve over time.

The White Glove Heuristic and The "Unless…" Heuristic

Wednesday, March 7th, 2007

Part of the Rapid Software Testing philosophy involves reducing waste wherever possible. For many organizations, documentation is an area where we might want to cut the clutter. It’s not that documentation is valueless, but every minute that we spend on documentation is a minute that we can’t spend on any other activity. Thus the value of the documentation has to be compared not only to its own cost, but to its opportunity cost. More documentation means less testing; that might be okay, and even important, but it might not.

This leads to the White Glove Heuristic: if we have documentation somewhere in our process, such that running a white-gloved finger over it would cause the glove to pick up a bunch of dust, let’s at least consider applying less work to that document, or eliminating it altogether.

In the RST class, there’s often push-back to this idea. That’s understandable; at one point, someone started producing the document in an attempt to solve some problem or mitigate some risk. The question then becomes, “Has the situation changed such that we no longer need that document?”–and the problem I see most often is that the question is begged.

On a recent trip to India, many of the participants in the class pushed back on the very idea of reducing documentation in any way, claiming “our project managers would never accept that.”

I was curious. “Have you asked them?” The answer was, as I suspected, No. “So suppose you’re producing a forty-page test report for a busy executive. What if that executive only ever reads the summary on the first page? Might she approve of a shorter document? If she had important questions about things in that document, could you answer those questions at a lower cost than preparing the big document?” Maybe, came the answer. “So: your project managers would never accept changes to your test documentation, unless they’re not reading the whole thing anyway. Or they’d never accept changes unless they were aware of the amount of testing time lost to preparing the document. Or they’d never accept changes unless they had the confidence that you could give them the information they needed on demand.” The class participants then began to recognize that a session-based test management approach might allow them to make their testing sufficiently accountable while satisfying the executives with more lightweight summary reports.

Later in the class, we were talking about oracles, and how slippery they can be. Oracles are heuristic; that means that they often work, but they can fail, and that we learn something either way. The class presents a list of consistency oracles (the list is now a little longer than in the linked article); for example, a product should behave in a manner consistent with its history– unless there’s a compelling reason for it to be otherwise, like a feature enhancement or a bug fix.

This led me to formulate The “Unless…” Heuristic: Take whatever statement you care to make about your product, your process, or your model, and append “unless…” to it. Then see where the rest of the sentence takes you.