Blog Posts for the ‘Management’ Category

Where Does All That Time Go?

Tuesday, October 30th, 2012

It had been a long day, so a few of the fellows from the class agreed to meet a restaurant downtown. The main courses had been cleared off the table, some beer had been delivered, and we were waiting for dessert. Pedro (not his real name) was complaining, again, about how much time he had to spend doing administrivial tasks—meetings, filling out forms, time sheets, requisitions, and the like. “Everything takes so long. I want a pad of paper to take notes, I have to fill out a form for it. God help me if I run out of forms!”

“How much time do you spend on this kind of stuff each week?” I asked.

Pedro replied, “An hour a day. Maybe two, some days. Meetings…let’s say an hour and a half, on average.”

Wow, I thought—that’s a pretty good chunk of the week. I had an idea.

“Let’s visualize this, I said.” I took out my trusty Moleskine notebook. I prefer the version with the graph paper in it, for occasions just like this one. I outlined a grid, 20 squares across by two down.

Empty Week

“So you spend, on average, an hour and a half each day on compliance stuff. One-point-five times five, or 7.5 hours a week. Let’s make it eight. Put a C in eight squares.” He did that.

Compliance

“Okay,” I said. “You were griping today about how much time you spend wrestling with your test environments.”

Pedro’s eyes lit up. “Yes!” he said. “That’s the big one. See, it’s mobile stuff. We have a server component and a handset component to what we do, and the server stuff is a real bear.”

“Tell me more.”

“It’s a big deal. We’ve got one environment that models the production system. The software we’re developing has been so buggy that we can’t tell whether a given problem is general, or specific to the handset, so we have another one that we set up to do targeted testing every time we add support for a new handset. That’s the one I work with. Trouble is, setting it up takes ages and it’s really finicky. I have to do everything really carefully. I’ve asked for time to do scripting to automate some of it, but they won’t give that to me, because they’re always in such a rush. So, I do it by hand. It’s buggy, and I make the odd mistake. Either way, when I find out it doesn’t work, I have to troubleshoot it. That means I have to get on instant messaging or the phone to the developers, and figure out what’s wrong; then I have to figure out where to roll back to. And usually that’s right from the start. It wastes hours. And it’s every day.”

“Okay. Show me that on our little table, here. Use an S to represent each hour your spend each day.”

Whereupon Pedro proceded to fill in squares. Ten of them. Ten more. And then, eight more.

Setup

“Really?!” I said. “28 hours a week divided by five days—that’s more than five hours a day. Seriously?”

“Totally,” said Pedro. “It’s most of the day, every day, honestly. Never mind the tedium. What’s really killing me is that I don’t feel like I’m getting any real testing work done.”

“No kidding. There’s no time for it. There are only four squares left in the week. Plus, something you said earlier today about tons of bugs that aren’t related to setting up?”

“Right. When it comes to the stuff that I’m actually being asked to test, there’s lots of bugs there too. So my ‘testing time’ isn’t really testing. It’s mostly taken up with trying to reproduce and document the bugs.”

“Yes. In session-based test management, that’s bug investigation and reporting—B-time. And it does interrupt test design and execution—T-time—which is what produces actual test coverage, learning about what’s actually going on in the product. So, how much B-time?” He filled in three of the squares with Bs.

Bug Investigation and Reporting

“And T-time?”

He had room left to put in one lonely little T in the lower right corner.

Testing Time

“Wow,” I laughed. “One-fortieth of your whole week is spent in getting actual test coverage. The rest is all overhead. Have you told them how it affects you?”

“I’ve mentioned it,” he said.

“So look at this,” I suggested. “It’s even more clear when we use colour for emphasis.”

With Colour

“Whoa. I never looked at it that way. And then,” he paused. “Then they ask me, ‘Why didn’t you find that bug?'”

“Well,” I said, “considering the illusion they’re probably working under, it’s not an unreasonable question.”

“What do you mean?” Pedro asked.

“What does it say on your business card?”

“‘Software Testing’.”

“And what does it say on the door of the the test lab?”

“‘Test Lab’,” said Pedro.

“And they call you…?”

“Pedro.”

“No,” I laughed. “They say you’re a… what?”

“Oh. A tester.”

“So since you’re a tester, and since the door on the test lab says ‘Test Lab’, and your business card says ‘Testing’, they figure that’s all you do. The illusion is what Jerry Weinberg calls the Lumping Problem. All of those different activities—administrative compliance, setup, bug investigation and reporting, and test design and execution—are lumped into a single idea for them.” And I drew it for him.

Management's Dream

“That’s management’s illusion, there. Since, in their imagination, you’ve got forty hours of testing time in a week, it’s not unreasonable for them to wonder why you didn’t find that bug.”

“Hmmm. Right,” said Pedro.

“When in fact, what they’re getting from you is this.” And I drew it for him.

Testing Reality

“For testing—actual interaction with the product, looking for problems, you’ve got one-fortieth of the time they think you’ve got. One lonely little T. Is that part of your test report?”

“Oy,” he said. “Maybe I should show them something like this.”

“Maybe you should,” I said.

A couple of nights later, I showed that page of my notebook to James Bach over Skype. “Wow,” he said. “That guy could be forty times more productive!”

“Forty?”

“Well, no, not really, of course. But suppose the programmers checked their work a little more carefully, or suppose the testers practiced writing more concise bug reports and sharpened their investigating skill. One of those two things could cut the bug investigation time by a third. That would give more time for testing, when they’re not being interrupted by other stuff. What if they cut the setup time by a half, and that administrivia by half?”

“Four, fourteen…” I said. “That would give eighteen more hours for testing and bug investigation, for a total of 22 hours. And even if they’re still doing two hours of bug investigation for every one hour of testing time… well, that’s seven times more productive, at least.”

“Seven times the test coverage if they get some of those issues worked out, then,” said James.

“Maybe de-lumping is the kind of thing lots of testers would want to do in their test reports,” I said.

How about you?

Delivering the News (Test Reporting Part 3)

Monday, February 27th, 2012

In the last post in this series, I noted some potentially useful structual similarities between bug reports (whether oral or written) and newspaper reports. This time, I’ll delve into that a little more.

To our clients, investigative problem reports are usually the most important part of the product story. The most respected newspapers don’t earn their reputations by reprinting press releases; they earn their reputations through investigative journalism. As testers (or, heaven help us, quality assurance) people, we tend to be chartered to look for problems, and to investigate them in ways that are most helpful to programmers, managers, and our other clients. A failing test on its own tells us little, and a failing check even less; as I pointed out here, a failing test is only an allegation of a problem. Investigation and study of a failing test is likely to inform us of something more useful: whether someone will perceive a problem that threatens the value of the product. I’ll talk more about the nature of problems in a later post, but for now, think of a product problem in terms of a perceived absence of or threat to some dimension of quality. (See the Heuristic Test Strategy Model for one list of quality criteria; see Software Quality Characteristics, by Rikard Edgren, Henrik Emilsson and Martin Jansson for another.) Since the manager’s goal is generally to release a product at her desired level of quality, problems that could threaten that goal are likely to be interesting and important. Or, as they say in the newspaper business, “if it bleeds, it leads”.

Potential showstoppers are usually the most important stories. In the 1990s, I was a technical support person, a tester, a program manager, and a programmer for a mass-market, commercial shrink-wrap software company. Since we had millions of customers, even minor problems could have a big impact on technical support and on the reputation of our products in the market. The market was enormous, hardware and software were even less standardized than they are now, and we worked under a great deal of time pressure. Classifying and prioritizing problems was contentious. One of the important classification questions was “What should we consider a showstopper?” One of the senior programmers came up with an answer that I’ve used ever since:

Showstopper (n.): Something that makes more sense to fix than to ship.

(I talked about showstoppers here.) In a development project, a showstopper—any threat to the timely release of the project—is a page-one, above-the-fold story, a story that you can see and begin to read without opening the newspaper or picking it up.

There’s always one story that leads. The most important threat to a timely, successful release may be a single problem, or it may be a collection of problems—what Ian Mitroff calls a mess. Do we have a problem, or a couple of problems, or a mess? No matter what the answer, there’s only so much space on the front page above the fold. Will you have one headline, or two, or three? What will that headline say? What will the lead paragraph of each story look like? Does the lead paragraph cover the five Ws—who, what, where, when, and why? If not, are those questions answered shortly? Might there be a reasonable reason not to answer them?

There’s only one front page, and there’s almost always more than one story on it. Our clients need to be able to absorb the lead story and the other front-page stories quickly, so we need to be able to provide headlines, lead paragraphs, and details in appropriate proportions. See an example front page here, with details that follow.

Very infrequently, serious newspapers give their entire front page to a story. In those cases, it’s usually an overwhelmingly important story, or one that threatens the newspaper or journalism itself.

The most compelling stories are those that have an impact on people. Although product problems are often technical in nature, the “making sense” part of the showstopper decision is focused on the business. Testers must to be able to connect technical problems with business risk. Problems related to technical correctness are often easy to describe, but they might not be important. The skill of bug advocacy—making sure that the customer is aware of the best possible motivations for fixing the bug—depends on your ability to report the bug in terms of its most significant effect on the business. Ben Simo has a lovely way to sum this up. Early in his career, when Ben was trying to advocate a bug fix, his project manager said, “Revenue is king. Liability is queen. Tell me how this bug impacts them.”

The number of stories usually isn’t as important as the significance of the stories. This is another way in which test reports can be like newspapers. We don’t usually evaluate the quality of a newspaper by the number of stories in it. Instead, we look at the significance, relevance, and credibility of the stories.

It may take time to distinguish between a breaking story and a major story. Sometimes the news cycle doesn’t afford time for investigation, even though the story might be important. Information gets passed around the project at various moments during the test and development cycle. Sometimes a discovery happens just before a meeting. Smart reporters know to balance urgency and restraint when there’s a breaking story. When I worked in commercial mass-market software in the 1990s, we sometimes discovered a terrible-looking problem a couple of hours before release. Such discoveries would trigger arousal (no, not sexual friskiness, but arousal in the psychological sense of being suddenly snapped awake and alert to danger). All of a sudden, we’d be noticing all kinds of things that we hadn’t noticed before, and most of them were non-problems of one kind or another. We were biased by fear. We called it the “snakes on everything” moment. When reporting, testers need to take stock of the emotional factors surrounding them, and report cautiously and accurate. An hour from now, an allegation or a rumour might be an important story—or it might be nothing.

Non-problems aren’t news. There’s a pattern of stories in the first section of the newspaper: they’re mostly stories about problems, and there’s a reason for that: problems compel attention. Our emotional systems evolved to help keep us out of trouble. Problems or threats trigger arousal. Things that are going well are nice to hear about, but they don’t engage emotions in the same way as problems do. In a software development project, non-problems have relatively little significance for project managers. Routine daily successes don’t threaten the project, and therefore need less attention.

Numbers, like pictures, are illustrations, not the whole story. A qualitative report is not quantity-free; after all, identifying the presence or absence of something involves counting to one, and the degree of some attribute of interest can be illustrated by number. But just as a pictorial illustration isn’t the item it depicts, a numerical illustration isn’t the story it might help to describe. A picture looks a part of a scene through a particular lens; a number focuses on one attribute using a particular metric. Each one may emphasizes some observations at the expense of other observations. Each one may crop out detail. Each one may magnify or distort.

Since the product and testing stories are multi-dimensional, be prepared to show the dimensions. Newspapers reports always have a bias, but reporters and editors often attempt to manage the bias by providing alternative sources of information, and alternative interpretations. A story of any length often includes multiple stories, or multiple threads of the main story. When tables of data are appropriate, newspapers print tables (think stock quotes in the business section, or box scores or line scores in sports). Products, coverage, quality, and problems are all multi-dimensional, multi-variate, and qualitative. Where there’s a mass of data, consider using tables such as dashboards or coverage tables. Pin numbers to reliable measurements (see the slip charts, the detailed impact case methods, and the subjective impact methods in Weinberg’s Quality Software Management, Volume 2: First Order Measurement; and pay attention to validity—see Kirk and Miller’s Reliability and Validity in Qualitative Research and Shadish, Cook, and Campbell’s Experimental and Quasi-Experimental Designs for Generalized Causal Inference).

Describe your coverage. Boris Beizer described coverage as “any metric of test completeness with respect to a test selection criterion”. That suggests that it is possible to quantify coverage if you have a quantifiable test selection criterion. For example, if a single-digit field accepts any digit from 0 to 9, one could select 10 tests and claim complete coverage based on that criterion. Mind, that data coverage doesn’t account for flow or sequence coverage; suppose that a bug was triggered only when a 7 replaced a 3 in that field. Since the overall number of possible tests is infinite, test selection criteria are based on models. In practical terms, this means that overall test coverage is some finite number over an infinite number. If you report that accurately, you’re stuck with a number that remains asymptotically close to zero. Instead, focus on the qualitative, and describe your coverage on an ordinal scale. Level 0 means “We know nothing about this area of the product.” Use Level 1 to say “We have done smoke or sanity testing; at this point, we’ve determined whether the product is even stable enough for serious testing.” Level 2 means “we’ve tested the common, the core, the critical, the happy path; our testing has been focused on can it work.” Level 3 means “We’ve tested the harsh, the complex, the challenging, the extreme, the exceptional; if there were a serious problem in this area, we’d probably know about it by now.” In this system, the numbers are barely more than labels for a qualitative evaluation, so don’t be tempted to do serious math with them.

Braiding The Stories (Test Reporting Part 2)

Friday, February 24th, 2012

We were in the middle of a testing exercise at the Amplifying Your Effectiveness conference in 2005. I was assisting James Bach in a workshop that he was leading on testing. He presented the group with a mysterious application written by James Lyndsay—an early version of one of the Black Box Test Machines. “How many test cases would you need to test this application?” he asked.

Just then Jerry Weinberg wandered into the room. “Ah! Jerry Weinberg!” said James. “One of the greatest testing experts in the world! He’ll know the answer to this one. How many test cases would you need to test this application, Jerry?”

Jerry looked at the screen for a moment. “Three,” he said, firmly and decisively.

James knew to play along. “Three?!“, he said, in a feigned combination of amazement, uncertainty, and curiosity. “How do you know it’s three? Is it really three, Jerry?”

“Yes,” said Jerry. “Three.” He paused, and then said drily, “Why? Were you expecting some other number?”

In yesterday’s post, I was harshly critical of pass vs. fail ratios, a very problematic yet startlingly common way of estimating the state of the product and the project. When I point out the mischief of pass vs. fail ratios, some people object. “In the real world,” they say, “we have to report pass vs. fail ratios to our managers, because that’s what they want.” Yet bogus reporting is antithetical to the “real world”. Pass vs. fail ratios come from the the fake world, a world where numbers have magical properties to soothe troubled and uncertain souls. Still, there’s no question that managers want something. It’s our mandate to give them something of value.

Some people say that managers want numbers because they want to know that we’re measuring. I’ve found two ways of thinking about measurement that have been very useful to me. One is the definition from Kaner and Bond’s splendid paper “Software Engineering Metrics: What Do They Measure and How Do We Know?”: “Measurement is the empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them.” I think that’s a superb definition of quantitative measurement, and the paper includes a set of probing questions to test the validity of a quantitative measurement. Pass vs. fail ratios fall down badly when they’re subjected to those tests.

Jerry Weinberg offers another definition of measurement that I think is more in line with what managers really want: “Measurement is the art and science of making reliable (and significant) observations.” (The main part of the definition comes from Quality Software Management, Vol. 2: First-Order Measurement; the parenthetical comes from recent correspondence over Twitter.) That’s a more general, inclusive definition. It incorporates Kaner and Bond’s notion of quantitative measurement, but it’s more welcoming to qualitative, first-order approaches. First-order measurement, as Jerry describes it, provides answers to questions like “What seems to be happening? and What should I do now?” It entails a minimum of fuss, and tends to be direct, unobtrusive, inexpensive, and qualitative, leading either to immediate action or a decision to seek more information. It’s a common, misleading, and often expensive mistake in software development to leap over first-order measurement and reporting in favour of second-order—less direct, more quantified, more abstract, and based on more elaborate and vulnerable models.

My experience, as a tester, a programmer, a program manager, and a consultant, tells me that to manage a project well, you need a good deal of immediate and significant information. “Immediate” here doesn’t only mean timely; it also means unmediated, without a bunch of stuff getting in between you and the observation. In particular, managers need to know about problems that threaten the value of the product and the on-time, successful completion of the project. That knowledge requires more than abstract data; it requires information. So, as testers, how can we inform the decision-makers? In our Rapid Software Testing class, James Bach and I have lately taken to emphasizing this: We must learn to describe and report on the product, our testing, and the quality of our testing. This involves constructing, editing, narrating, and justifying a story in three lines that weave around each other like a braid. Each line, or level, is its own story.

Level 1: Tell the product story. The product story is a qualitative report on how the product can work, how it fails, and how it might fail in ways that matter to our clients. “Working”, “failure”, and “what matters” are all qualitative evaluations. Quality is value to some person; in a business setting, quality is value to some person who matters to the business. A qualitative report about a product requires us to relate the nature of the product, the people who matter, and the presence or absence of value, risks, and problems for those people. Qualitative information makes it possible for our clients to make informed decisions about quality.

Level 2: To make the product story credible, tell the testing story. The testing story is about how we configured, operated, observed, and evaluated the product; what we actually did and what we actually saw. The testing story gives warrant to the product story; it helps our clients understand why they should believe and trust the product story we’re giving. The testing story is centred around the coverage that we obtained and the oracles that we applied. Coverage is the extent to which we’ve tested the program; it’s about where we’ve looked and how we’ve looked, and it’s also about what’s uncovered—where we might not have looked yet, and where we don’t intend to look. Oracles are central to evaluation; they’re the principles and mechanisms that allow us to recognize a problem. The product story will likely feature problems in the product; the testing story, where necessary, includes an account of how we knew they were problems, for whom they would be problems, and inferences about how serious the problems it might be. We can make inferences about the significance of problems, but not ultimate conclusions, since the decision of what matters and what constitutes a problem lies with the product owner. The product story and our clients’ reactions to it will influence the ongoing testing story, and vice versa.

Level 3: To make the testing story credible, tell a story about the quality of the testing. Just as the product story needs warrant, so too does the testing story. To tell a story about the quality of testing requires us to describe why the testing we’ve done has been good enough, and why the testing we haven’t done hasn’t been so important so far. The quality-of-testing story includes details on what made testing harder or slower, what made the product more or less testable, what the risks and costs of testing are, and what we might need or recommend in order to provide better, more accurate, more timely information. The quality-of-testing story will shape and be shaped by the other two stories.

Develop skills to tell and frame stories. People sometimes justify presenting invalid numbers in lieu of stories by saying that numbers are “efficient”. I think they mean “fast”, since efficiency of communication depends not only on speed, but also on value, relevance, validity, and the level of detail your client needs. In order to frame stories appropriately and hit the right level of detail…

Don’t think data feed; think the daily news. Testing is like investigative journalism, researching and delivering stories to people. The newspaper business knows how to direct attention efficiently to the stories in which we’re interested, such that we get the level of detail that we seek. Some of those strategies include:

  • Headlines. A quick glance over each page tells us immediately what, in the editors’ judgement, are the most salient aspects of any given story. Headlines come in different sizes, relative to the editors’ assessment of the importance of the story.
  • Front page. The paper comes folded. The stories that the paper deems most important to its reader are on the front page, above the fold. Other important stories are on the front page below the fold. The page is laid out to direct our attention to what we find most relevant, and to allow us to focus and refocus on items of interest.
  • Continuation. When an entire story is too long to fit on the front page, it’s abbreviated and the story continues elsewhere. This gives the reader the option of following the story or looking at other items on the front page.
  • Coverage areas. The newspaper is organized into sections (hard news, business, sports, life and leisure, arts, real estate, cars, travel, and so forth). Each section comes with its own front page, which generally includes headlines and continuations of its own.
  • Structured storytelling. Newspaper stories tend to be organized in spiralling levels of detail, such that the story is set up to follow the inverted pyramid (the link is well worth reading). The story typically begins with the most newsworthy information, usually immediately addressing the five W questions—who, what where, why, and when, plus how—and the the story builds from there. The key is that the reader can absorb information to the level of detail she seeks, continuing to the end of the story or jumping out when she’s satisfied.
  • Identifying who is involved and who is affected. Reporters and editors contextualize their stories. Just as in testing, people are the most important element of the context. A story is far more compelling when it affects the reader or people that the reader cares about. A good story often helps to clarify why the reader should care.
  • Varying approaches to delivering information. Newspapers often use a picture to helps to illustrate or emphasize an important aspect of a story. In the business or sports sections, where quantitative data is often crucial, information may be organized in tables, or trends may be illustrated with charts. Notice that the stories—first-order reports—are always given greater prominence than the tables of stock quotes league standings, and line scores.
  • Sidebars. Some stories are illuminated by background information that might break the flow of the main story. That information is presented in parallel; in another thread, as we might say.
  • Daily (and in the world of the Web, continuous) delivery of information. My newspaper arrives at a regular time each day, a sort of daily heartbeat for the news cycle. The paper’s Web site is updated on a continuous basis. Information is available both on a supply and a demand basis; both when I expect it and when I seek it.
  • Identifiable sources. Well-researched stories gain credibility by identifying how, where, when, and from whom the information was obtained. This helps to set up degrees of trust and skepticism in the reader.

One important note: These approaches apply to more than text. Testers need to extend these patterns not only to written or mechanical forms, but to oral discourse.

I’ll have more suggestions and additional parallels between test reporting and newspapers in the next post in this series.

Scripts or No Scripts, Managers Might Have to Manage

Wednesday, December 21st, 2011

A fellow named Oren Reshef writes in response to my post on Worthwhile Documentation.

Let me be the devil’s advocate for a post.

Not having fully detailed test steps may lead to insufficient data in bug reports.

Yup, that could be a risk (although having fully detailed steps in a test script might also lead to insufficient data in bug reports; and insufficient to whom, exactly?).

So what do you do with a problem like that? You manage it. You train the tester, reminding her of the heuristic that each problem report needs a problem description; an example of something that shows the problem; and why she thinks it’s a problem (that is, the oracle; the principle or mechanism by which the tester recognizes the problem). Problem, example, and why; PEW. You praise and reward the tester for producing reports that follow the PEW heuristic; you critique reports that don’t have them. You show the tester lots of examples of bug reports, and ask her to differentiate between the good ones and the bad ones, why each one might be consider good or bad, and in what ways. If the tester isn’t getting it, you have the tester work with and be coached by someone who does get it. The coach talks the tester through the process of identifying a problem, deciding why it’s a problem, and outlining the necessary information. Sometimes it’s steps and specific data; sometimes the steps are obvious and it’s only the data you need to specify; sometimes the problem happens with any old data, and it’s the steps that are important. And sometimes the description of the problem contains enough information that you need supply neither steps nor data. As a tester under time pressure, she needs to develop the skill to do this rapidly and well—or, if nothing works, she might have to find a job for which she is better suited.

You can argue that a good tester should include the needed information and steps in her bug report, but this raise (at least) two problems:

– The same information may be duplicated across many bugs, and even worst it will not be consistent.

As a manager, I can not only argue that a tester should include the needed information; I can require that a tester include the needed information. Come on, Mr. Advocate… this is a problem that a capable tester and a capable test manager (and presumably your client) can solve. If “the same” information is duplicated across many bugs, might that be an interesting factor worth noting? A test result, if you will? Will this actually persist for long without the test manager (or test leads, or the test team) noticing or managing it?

And in any case, would a script solve the problem that you post above? If you can solve that problem in a script, can you solve it in a (set of) bug report(s)?

Writing test steps is not as trivial as it sounds (for example due to cognitive biases, or simply by overlooking steps that seems obvious to you), and to be efficient they also need to be peer reviewed and tested. You don’t want that to happen in a bug report.

“Writing test steps is not as trivial as it sounds.” I know. It’s non-trivial in terms of time, and it’s non-trivial in terms of skill, and it’s non-trivial in terms of cost. That’s why I write about those problems. That’s why James Bach writes about them.

Again: how do you solve problems like testers providing inefficient repro steps? You solve it with training, practice, coaching, review, supervision, observation, interaction… that is, if you don’t like the results you’re getting, you steer the testers in the direction you want them to go, with leadership and management.

The tester may choose the same steps over and over, or steps that are easier for her but does not represent real customers.

Yes, I often hear things like this to justify poor testing. “Real customers” according to whom? It seems as though many organizations have a problem recognizing that hackers are real; that people under pressure are real; that people who make mistakes are real; that people who can become distracted are real. That people who get up and go away from the keyboard, such that a transaction times out are real.

Is it the role of testers to behave always like idealized “real” customers? That’s like saying that it’s the role of airport security to assume that all of the business class customers are “real” business people. I’d argue that it’s nice for testers to be able to act like customers, but it’s far more important for testers to act like testers. It’s the tester’s role to identify important vulnerabilities in the product. Sometimes that involves behaving like a typical customer, and sometimes it involves behaving like an atypical customer, or and sometimes it involves behaving like someone who is not a customer at all. But again, mostly it involves behaving like a tester.

Again you may argue that a good tester should take all that into account, but it’s not that simple to verify it especially for tests involving many short trivial steps.

Maybe it isn’t that simple. If that’s a problem, what about logging? What about screen capture tools? Such tools will track activities far more accurately than a script the tester allegedly followed. After all, a test script is just a rumour of how something should be done, and the claim that the script was followed is also a rumour. What about direct supervision and scrutiny? What about occasional pairing? What about reviewing the testers’ work? What about providing feedback to testers, while affording them both freedom and responsibility?

And would scripts solve that problem when (for example) you’re a recording bug that you’ve just discovered (probably after deviating from a script)? How, exactly? What happens when a problem identified by a script is fixed? Does the value of the script stay constant over time?

Detailed test steps (at least to some extent) might important if your test activity might be transferred to another offshore team someday (happened to me a few weeks ago, I sent them a test document with only high level details and hoped for the best), or your customer requires in-depth understanding of your tests (a multi-billion Canadian telecommunication company insisted on getting those from us during the late 90’s, we chose the least readable TestDirector export format and shipped it to them…).

Ah, yes. “I sent them a test document with only high level details and hoped for the best.” What can I say about “hope” as a management approach? Does a pile of test scripts impart in-depth understanding? Or are they (as I suspect) a way of responding to a question that you didn’t know how to answer, which was in fact a question that the telco didn’t know how to ask?

Going through some set of actions by rote is not a test. A test script is not a test. A test is what you think and what you do. It is a complex, cognitive activity that requires the presence or the development of much tacit knowledge. Raw data or raw instructions at best provide you with a miniscule fraction of what you need to know. If someone wanted in-depth understanding of how a retail store works, would you send them a pile of uncontextualized cash register receipts?

The Devil’s Advocate never seems to have a thoughtful manager for a client. I would suggest that a tester neither hire nor work for the devil.

Thank you for playing the devil’s advocate, Oren.

What Exploratory Testing Is Not (Part 5): Undocumented Testing

Wednesday, December 21st, 2011

This week I had the great misfortune of reading yet another article which makes the false and ridiculous claim that exploratory testing is “undocumented”. After years and years of plenty of people talking about and writing about and practicing excellent documentation as part of an exploratory testing approach, it’s depressing to see that there are still people shovelling fresh manure onto a pile that should have been carted off years ago.

Like the other approaches to test activities that have been discussed in this series (“touring“, “after-everything-else“, “tool-free“, and “quick testing“), “documented vs. undocumented” is in a category orthogonal to “exploratory vs. scripted”. True: usually scripted activities are performed by some agency following a set of instructions that has been written down somewhere. But we could choose to think of “scripted” in a slightly different and more expansive way, as “prescriptive”, or “mimeomorphic“. A scripted activity, in this sense, is one for which the actions to be performed have been established in advance, and the choices of the actions are not determined by the agency performing them. In that sense, a cook at McDonalds doesn’t read a script as he prepares your burger, but the preparation of a McDonald’s burger is a highly scripted activity.

Thus any kind of testing can be heavily documented or completely undocumented. A thoroughly documented test might be highly exploratory in nature, or it might be highly scripted.

In the Rapid Software Testing class, James Bach and I point out that when someone says “that should be documented”, what they’re really saying is “that should be documented if and how and when it serves our purposes.” So, let’s start by looking at the “when”.

When we question anything in order to evaluate it, there are moments in the process in which we might choose to record ideas or actions. I’ve broken these down into three basic categories that I hope you find helpful:

  • Before

  • During

  • After

There are “before”, “during”, and “after” moments with respect to any test activity, whether it’s a part of test design, test execution, result interpretation, or learning. Again, a hallmark of exploratory testing is the tester’s freedom and responsibility to optimize the value of the work as it’s happening. That means that when it’s important to record something, the tester is not only welcome but encouraged to

  • pick up a pen
  • take a screen shot
  • launch a session of Rapid Reporter
  • create or update a mind map
  • fire up a screen recorder
  • initiate logging (if it doesn’t start by default on the product you’re testing—and if logging isn’t available, you might consider identifying that as a testability problem and a related product and project risk)
  • sketch out a flowchart diagram
  • type notes into a private or shared repository
  • add to a table of data in Excel
  • fire off a note to a programmer or a product owner
and that’s an incomplete list. But they’re all forms of documentation.

Freedom to document at will should also mean that the tester is free to refrain from documenting something when the documentation doesn’t add value. At the same time, the tester is responsible and accountable for that decision. In Rapid Testing, we recommend writing down (or saving, or illustrating) only the things that are necessary or valuable to the project, and only when the value of doing so exceeds the cost. This doesn’t mean no documentation; it means the most informative yet fastest and least expensive documentation that completely fulfils the testing mission. Integrating that with testing work leads, we hold, to excellent testing—but it takes practice and skill.

For most test activities, it’s possible to relay information to other people orally, or even sometimes by allowing people to observe our behaviour. (At the beginning of the Rapid Testing class, I sometimes silently hold aloft a 5″ x 8″ index card in landscape orientation. I fold it in half along the horizontal axis, and write my first name on one side using a coloured marker. Everyone in the class mimics my actions. Without a single word of instruction being given or questions being asked, either verbally or in writing, the mission has been accomplished: each person now has a tent card in front of him.)

There’s a potential risk associated with an exploratory approach: that the tester might fail to document something important. In that case, we do what skilled people do with risk: we manage it. James Bach talks at length about managing exploratory testing sessions here. Producing appropriate documentation is partly a technical process, but the technical considerations are dominated by business imperatives: cost, value, and risk. There are also social considerations, too. The tester, the test lead, the test manager, the programmers, other managers, and the product owner determine collaboratively what’s important to document and what’s not so important with respect to the current testing mission. In an exploratory approach, we’re more likely to be emphasizing the discovery of new information. So we’re less likely to spend time on documenting what we will do, more likely to document what we are doing and what we have done. We could do a good deal of preparatory reading and writing, even in an exploratory approach—but we realize that there’s an ever-increasing risk that new discoveries will undermine the worth of what we write ahead of time.

That leads directly to “our purposes”, the task that we want to accomplish when documenting something. Just as testing itself has many possible missions, so too does test documentation. Here’s a decidedly non-exhaustive list, prepared over a couple of minutes:

  • to express testing strategy and tactics for an entire project, or for projects in general
  • to keep a set of personal notes to help structure a debriefing conversation
  • to outline testing activities for a test cycle
  • to report on activities during testing execution
  • to outline attributes of a particular quality criterion
  • to catalogue ideas about risk
  • to describe test coverage
  • to account for the work that we’ve done
  • to program a machine to perform a given set of actions
  • to alert people to potential problems in the product
  • to guide a tester’s actions over a test session
  • to identify structures in the application or service
  • to provide a description of how to use a particular test tool that we’ve crafted
  • to describe the tester’s role, skills, and qualifications
  • to explain business rules to someone else on the team
  • to outline scenarios in which the product might be used or tested
  • to identify, for a tester, a specific, explicit sequence of actions to perform, input to provide, and observations to make

That last item is the classic form of highly scripted testing, and that kind of documentation is usually absent from exploratory testing. Even so, a tester can take an exploratory approach using a script as a point of departure or as a reference, just as you might use a trail map to help guide an off-trail hike (among other things, you might want to discover shortcuts or avoid the usual pathways). So when someone says that “exploratory testing is undocumented”, I hear them saying something else. I hear them saying, “I only understand one form of test documentation, and I’ve successfully ignored every other approach to it or purpose for it.”

If you look in the appendices for the Rapid Software Testing class (you can find a .PDF at http://www.satisfice.com/rst-appendices.pdf), you’ll see a large number of examples of documentation that are entirely consistent with an exploratory approach. That’s just one source. For each item in my partial list above, here’s a partial list of approaches, examples, and tools.

Testing strategy and tactics for an entire project, or for projects in general.
Look at the Satisfice Heuristic Test Strategy Model and the Context Model for Heuristic Test Planning (these also appear in the RST Appendices).

An outline of testing activities for a test cycle.
Look at the General Functionality and Stability Test Procedure for Certifed for Microsoft Windows Logo. See also the OWL Quality Plan (and the Risk and Task Correlation) in the RST Appendices.

Keeping a set of personal notes to help structure a debriefing or other conversation.
See the “Beans ‘R Us Test Report” in the RST Appendices; or see my notes on testing an in-flight entertainment system which I did for fun on a flight from India to Amsterdam.

Recording activities and ideas during test execution
A video camera or a screen recording tool can capture the specific actions of a tester for later playback and review. Well-designed log files may also provide a kind of retrospective record about what was testing. Still neither of these provide insight into the tester’s mind. Recorded narration or conversation can do that; tools like BB Test Assistant, Camtasia, or Morae can help. The classic approach, of course, is to take notes. Have a look at my presentation, “An Exploratory Tester’s Notebook“, which has examples of freestyle notes taken during an impromptu testing session, and detailed, annotated examples of Session-Based Test Management sessions. Shmuel Gerson’s Rapid Reporter and Jonathan Kohl’s Session Tester are tools oriented towards taking notes (and, in the former case, including screen captures) of testing sessions on the fly.

Outlining many attributes of a particular quality criterion
See “Heuristics of Software Testability” in the RST Appendices for one example.

Cataloguing ideas about risk
Several examples of this in the RST Appendices, most extensively in the “Deployment Planning and Risk Analysis” example. You’ll also find an “Install Risk Catalog”; “The Risk of Incompatibility”; the Risk vs. Tasks section in the “OWL Quality Plan”; the “Y2K Compliance Report”; “Round Results Risk A”, which shows a mapping of Risk Areas vs. Test Strategy and Tasks.

Describing or outlining test coverage
A mapping establishes or illustrates relationships between things. We can use any of these to help us think about test coverage. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. These can be constructed before, after, or during a given test activity, with the goal of covering the map with tests, or using testing to extend the map. I catalogued several ways of thinking about coverage and reporting on it, in three articles Got You Covered, Cover or Discover, and A Map By Any Other Name. Several examples of lightweight coverage outlines can be found in the RST Appendices (“Putt Putt Saves the Zoo”, “Table Formatting Test Notes”, There are also coverage ideas incorporated into the Apollo mission notes that we’ve titled “Guideword Heuristics for Astronauts”).

Accounting for testing work that we’ve done.
See Session-Based Test Management, and see “An Exploratory Tester’s Notebook“. Darren McMillan provides excellent examples of annotated mind maps; scroll down to the section headed “Session Reports”, and continue through “Simplifying feedback to management” and “Simplifying feedback to groups”. A forthcoming article, written by me, shows how a senior test manager tracks testing sessions at a half-day granularity level.

Programming a machine to help you to explore
See all manner of books on programming, both references and cookbooks, but for testers in particular, have a look at Brian Marick’s Everyday Scripting with Ruby. Check out Pete Houghton’s splendid examples of exploratory test automation that begin here. Cem Kaner (often in collaboration with Doug Hoffman) write extensively about automation-assisted exploratory testing; an example is here.

Alerting people to potential problems in the product
In general, bug reporting systems provide one way to handle the task of recording and reporting problems in the product. James Bach provides an example of a report that he provided to a client (along with a more informal account of the session).

Guiding a tester’s actions over a test session
Guiding a tester involves skills like chartering and checklisting. Start with the documentation on Session Based Test Management (http://www.satisfice.com/sbtm). Selena Delesie has produced an excellent blog post on chartering exploratory testing sessions. The title of Cem Kaner’s presentation at CAST 2008, The Value of Checklists and the Danger of Scripts: What legal training suggests for testers describes the content perfectly. Michael Hunter’s You Are Not Done Yet lists can be used and adapted to your context as a set of checklists.

To identify structures in the application or service
The “Product Elements” section in the Heuristic Test Strategy Model provides a kind of framework for documenting product structures. In the RST Appendices, the test notes for “Putt Putt Saves the Zoo” and “Diskmapper”, and the “OWL Quality Plan” provide examples of identifying several different structures in the programs under test. Mind mapping provides a means of describing and illustrating structures, too; see Darren McMillan’s examples here and here. Ruud Cox and Ru Cindrea used a mind map of product elements to help win the Best Bug Report award in the Test Lab at EuroSTAR 2011. I’ve created a list of structures that support exploratory testing, and many of these are related to structures in the product.

Providing a description of how to use a particular test tool that we’ve crafted
While working at a bank, I developed (in Excel and VBA) a tool that could be used as an oracle and as a way of recording test results. (Thanks to non-disclosure agreements, I can describe these, but cannot provide examples.) When I left the project, I was obliged to document my work. I didn’t work on the assumption that anyone off the street would be reading the document. Instead, I presumed that anyone assigned to that testing job and to using that tool, would have the rapid learning skill to explore the tool, the product, and the business domain in a mutually supportive way. So I crafted documentation that was intended to tell testers just enough to get them exploring.

Explaining business rules to someone else on the team
I did include documentation for novices of one kind: within the documentation for that testing tool, I included a general description of how foreign exchange transactions worked from the bank’s perspective, and how appropriate accounts got credited and debited. I had learned this by reverse-engineering use cases and consulting with the local business analyst. I summarized it with a two-page document written in simple, direct language, referring disrectly to the simpler use cases and explaining the more confusing bits in more detail. For those whose learning style was oriented toward code, I also described the tables and array formulas that applied the business rules.

Outlining scenarios in which the product might be used or tested
I discuss some issues about scenarios here—why they’re important, and why it’s important to keep them open-ended and open to interpretation. It’s more important to record than to prescribe, since in a good scenario, you’ll observe and discover much more than you’ve articulated in advance. Cem Kaner gives ideas on how to produce scenarios; Hans Buwalda presents examples of soap opera testing.

Identifying required tester skill
People with skill don’t need prescriptive documentation for every little thing. Responsible managers identify the skills needed to test, and who commit to employing people who either have those skills or can develop them quickly. James Bach eliminated 50 pages of otiose documentation with two paragraphs. (Otiose is a marvelous word; it’s fun to look it up in a thesaurus.)

Identifying, for a tester, a particular explicit sequence of actions to perform, input to provide, and observations to make.
Again, a document that attempts to specify exactly what a tester should do is the hallmark of scripted testing. James Bach articulates a paradox that has not yet been noted clearly in our craft: in order to perform a scripted test well, you need signficant amounts of skill and tacit knowledge (and you also need to ignore the script on occasion, and you need to know when those occasions are). There’s another interesting issue here: preparing such documents usually depends on exploratory activity. There’s no script to tell you how to write a script. (You might argue there’s one exception. You can follow this script to write a test script: take each line of a requirements document, and add the words “Verify that” to the beginning of each line.)

Now, just as you can perform testing badly using any approach, you can perform exploratory testing and document it inappropriately, either by under-documenting it OR over-documenting it using any of the kinds of documentation above. But, as this document shows, the notion that exploratory testing is by its nature undocumented is not only ignorant, but aggressively ignorant about both testing and documentation. Whenever you see someone claim that exploratory testing is undocumented, I’d ask you to help by setting the record straight. Feel free to refer to this blog post, if you find it helpful; also, please point me to other exemplars of excellent documentation that are consistent with exploratory approaches. If we all work together, we can bury this myth, while providing excellent records and reports for our clients.

This is the end of the series “What Exploratory Testing Is Not”, for me. But James Bach has one more.

And, of course, in the face of all these instances of what exploratory testing is not, you might want to know our current take on what exploratory testing is.

xMMwhy

Friday, October 28th, 2011

Several years ago, I worked for a few weeks as a tester on a big retail project. The project was spectacularly mismanaged, already a year behind schedule by the time I arrived. Just before I left, the oft-revised target date slipped by another three months. Three months later, the project was deployed, then pulled out of production for another six months to be fixed. Project managers and a CIO, among many others, lost their jobs. The company pinned an eight-figure loss on the project.

The software infrastructure was supplied by a big database company, and the software to glue everything together was supplied by development organization in another country. That software was an embarrassment—bloated, incoherent, hard to use, and buggy. Fixes were rarely complete and often introduced new bugs. At one point during my short tenure, all effective worked stopped for five days because the development organization’s servers crashed and no backups were available. All this despite the fact that the software development company claimed CMMI Level 5.

This morning, I was greeted by a Tweet that said

“Deloittes show how a level 5 CMMi company has bad test process at #TMMi conf in Korea! So CMMi needs TMMi – good.”

The TMMi is the Testing Maturity Model Integration. Here’s what the TMMi Foundation says about it:

“The Test Maturity Model Integration has been developed to complement the existing CMMI framework. It provides a structured presentation of maturity levels, allowing for standard TMMi assessments and certification, enabling a consistent deployment of the standards and the collection of industry metrics.”

Here’s what the SEI—the CMMi’s co-ordinator and sponsor—says about it:

“CMMI (Capability Maturity Model Integration) is a process improvement approach that provides organizations with the essential elements of effective processes, which will improve their performance. CMMI-based process improvement includes identifying your organization’s process strengths and weaknesses and making process changes to turn weaknesses into strengths.”

What conclusions could we draw from these three statements?

If a company has achieved CMMI Level 5, yet has a bad test process, then there’s a logical problem here. Either testing isn’t an essential element of effective processes (in which case the TMMI should be unnecessary) or it is (in which case the SEI’s claim of providing the essential processes is unsupportable).

One clear solution to the problem would be to adjudicate all this by way of a Maturity Model Maturity Model (Integrated), the MMMMI, whereby your organization can determine (in a mature fashion, of course) what essential processes are in the first place. Mind you, that could be flawed too. You’d need a set of essential processes to determine how to determine essential processes, so you’ll also need a Maturity Model Maturity Model Maturity Model (Integrated), an MMMMMMI. And in fairly short order, your organization will disappear up its own ass.

Jerry Weinberg points in a different direction, using very strong language. This is from Quality Software Management, Volume 1: Systems Thinking, p. 21:

…cultural patterns are not more or less mature, they are just more or less fitting. Of course, some people have an emotional need for perfection, and they will impose this emotional need on everything they do. Their comparisons have nothing to do with the organization’s problems, but with their own.

“The quest for unjustified perfection is not mature, but infantile.

“Hitler was quite clear on who was the ‘master race’. His definition of Aryan race was supposed to represent the mature end product of all human history, and that allowed Hitler and the Nazis to justify atrocities on “less mature” cultures such as Gypsies, Catholics, Jews, Poles, Czechs, and anyone else who got in their way. Many would-be reformers of software engineering require their ‘targets’ to confess to their previous inferiority. These little Hitlers have not been very successful.

“Very few healthy people will make such a confession voluntarily, and even concentration camps didn’t cause many people to change their minds. This is not ‘just a matter of words’. Words are essential to any change project because they give us models of the world as it was and as we hope it to be. So if your goal is changing an organization, start by dropping the comparisons such as those implied in the loaded term ‘maturity.'”

It’s time for us, the worldwide testing community, to urge Deloitte, the SEI, the TMMI, and the unfortunate testers in Korea who are presently being exposed to the nonsense to recognize what many of us have known for years: maturity models have it backwards.

Testing: Difficult or Time-Consuming?

Thursday, September 29th, 2011

In my recent blog post, Testing Problems Are Test Results, I noted a question that we might ask about people’s perceptions of testing itself:

Does someone perceive testing to be difficult or time-consuming? Who? What’s the basis for that perception? What assumptions underlie it?

The answer to that question may provide important clues to the way people think about testing, which in turn influences the cost and value of testing.

As an example, an pseudonymous person (“PM Hut”) who is evidently associated with project management in some sense (s/he provides the URL http://www.pmhut.com) answered my questions above.

Just to answer your question “Does someone perceive testing to be difficult or time-consuming?” Yes, everyone, I can’t think of a single team member I have managed who doesn’t think that testing is time consuming, and they’d rather do something else.

This, alas, isn’t an unusual response. To someone like me who offers help in increasing the value and reducing the cost of testing, it triggers some questions that might prompt reframes or further questions.

  • What do the team members think testing is? Do they think that it’s something ancillary to the project, rather than an essential and integrated aspect of software development? To me, testing is about gathering information and raising awareness that’s essential for identifying product risks and steering the project. That’s incredibly important and valuable.

    So when the team members are driving a car, do they perceive looking out the windshield to be difficult or time-consuming? Do they perceive looking at the dashboard to be difficult or time-consuming? If so, why? What are the differences between the way they obtain awareness when they’re driving a car, versus the way they obtain awareness when they’re contributing to the development of a product or service?

  • Do the team members think testing is the mindless repetition of actions and observation of specific outputs, as prescribed by someone else? If so, I’d agree with them that testing is an unpalatable activity—except I don’t call that testing. I call it checking, and I’d rather let a machine do it. I’d also ask if checking is being done automatically by the programmers at lower levels where it tends to be fast, cheap, easy, useful and timely—or manually at higher levels, where it tends to be slower, more expensive, more difficult, less useful, and less timely—and tedious?
  • Is testing focused mostly on confirmation of things that we already know or hope to be true? Is it mostly focused on the functional aspects of the program (which are amenable to checking)? People tend to find this dull and tedious, and rightly so. Or is testing an active search for new information, problems, and risks? Does it include focus on parafunctional aspects of the product—the things that provide important perceptions of real value to real people? Are the testers given the freedom and responsibility to manage a good deal of their own investigation? Testers tend to find this kind of approach a lot more engaging and a lot more interesting, and the results are typically more wide-ranging, informative, and valuable to programmers and managers.
  • Is testing overburdened by meaningless and valueless paperwork, bureaucracy, and administrivia? How did that come to pass? Are team members aware that there are simple, lightweight, rapid, and highly effective ways of planning, recording, and reporting testing work and project status?
  • Are there political issues? Are testers (or people acting temporarily in a testing role) routinely blown off (as in this example)? Are the nuggets of information revealed by testing habitually dismissed? Is that because testing is revealing trivial information? If so, is there a problem with specific testing skills like modeling the test space, determining coverage, determining oracles, recording, or reporting?
  • Have people been trained on the basis of testing as a skilled, sophisticated thinking art? Or is testing something for which capability can be assessed by a trivial, 40-question multiple choice exam?
  • If testing is being done well (which given people’s attitudes expressed above would be a surprise), are programmers or managers afraid of having to deal with the information that testing reveals? Does that lead to recrimination and conflict?
  • If there’s a perception that testing is by its nature dull and slow, are the testers aware of the quick testing approaches in our Rapid Software Testing class (PDF, page 97-99) , in the Black Box Software Testing course offered by the Association for Software Testing, or in James Whittaker’s How to Break Software? Has anyone read and absorbed Lessons Learned in Software Testing?
  • If there’s a perception that technical reviews are slow, have the testers, programmers, or managers read Perfect Software and Other Illusions About Testing? Do they recognize the ways in which careful observation provides us with “instant reviews” (see Perfect Software, page 143)? Has anyone on the team read any other of Jerry Weinberg’s books on software management and measurement?
  • Have the testers, programmers, and managers recognized the extent to which exploratory testing is going on all the time? Do they recognize that issues revealed by testing might be even more important than bugs? Do they understand that every test result and every testing problem points to meta-information that can be extremely valuable in managing the project?

On PM Hut’s own Web site, there’s an article entitled “Why Project Managers Fail“. The author, Jim Benson, lists five common problems, each of which could be quickly revealed by looking at testing as a source of information, rather than by simply going through the motions. Take it from the former program manager of a product that, in its day, was the best-selling piece of commercial software in the world: testers, testing, and the information they reveal are a project manager’s best friends and most valuable assets—when you have the awareness to recognize them.

Testing need not be difficult, tedious or time-consuming. A perception that it is so, or that it must be so, suggests a problem with testing as practised or testing as perceived. Astute managers and teams will investigate that important and largely mistaken perception.

Testing Problems Are Test Results

Tuesday, September 6th, 2011

I often do an exercise in the Rapid Software Testing class in which I ask people to catalog things that, for them, make testing harder or slower. Their lists fit a pattern I hear over and over from testers (you can see an example of the pattern in this recent question on Stack Exchange). Typical points include:

  • I’m a tester working alone with several programmers (or one of a handful of testers working with many programmers).
  • I’m under enormous time pressure. Builds are coming in continuously, and we’re organized on one- or two-week development cycles.
  • The product(s) I’m testing is (are) very complex.
  • There are many interdependencies between modules within the product, or between products.
  • I’m seeing a consistent pattern of failures specifically related to those interdependencies; the tiniest change here can have devastating impact there—or anywhere.
  • I believe that I have to run a complete regression test on every build to try to detect those failures.
  • I’m trying to cope by using automated checks, but the complexity makes the automation difficult, the program’s testing hooks are minimal at best, and frequent product changes make the whole relationship brittle.
  • The maintenance effort for the test automation is significant, at a cost to other testing I’d like to do.
  • I’m feeling overwhelmed by all this, but I’m trying to cope.

On top of that,

  • The organization in which I’m working calls itself Agile.
  • Other than the two-week iterations, we’re actually using at most two other practices associated with Agile development, (typically) daily scrums or Kanban boards.

Oh, and for extra points,

  • The builds that I’m getting are very unstable. The system falls over under the most basic of smoke tests. I have to do a lot of waiting or reconfiguring or both before I can even get started on the other stuff.

How might we consider these observations?

We could choose to interpret them as problems for testing, but we could think of them differently: as test results.

Test results don’t tell us whether something is good or bad, but they may inform a decision, or an evaluation, or more questions. People observe test results and decide whether there are problems, what the problems are, what further questions are warranted, and what decisions should be made. Doing that requires human judgement and wisdom, consideration of lots of factors, and a number of possible interpretations.

Just as for automated checks and other test results, it’s important to consider a variety of explanations and interpretations for testing meta-results—observations about testing. If we don’t do that, we risk missing important problems that threaten the quality of testing effort, and the quality of the product, too.

As Jerry Weinberg points out in Perfect Software and Other Illusions About Testing, whatever else something might be, it’s information. If testing is, as Jerry says, gathering information with the intention of informing a decision, it seems a mistake to leave potentially valuable observations lying around on the floor.

We often run into problems when we test. But instead of thinking of them as problems for testing, we could also choose to think of them as symptoms of product or project problems—problems that testing can help to solve.

For example, when a tester feels outnumbered by programmers, or when a tester feels under time pressure, that’s a test result. The feeling often comes from the programmers generating more work and more complexity than the tester can handle without help.

Complexity, like quality, is a relationship between some person and something else. Complexity on its own isn’t necessarily a problem, but the way people react to it might be. When we observe the ways in which people react to perceived complexity and risk, we might learn a lot.

  • Do we, as testers, help people to become conscious of the risks—especially the Black Swans—that typically accompany complexity?
  • If people are conscious of risk, are they paying attention to it? Are they panicking over it? Or are they ignoring it and whistling past the graveyard? Or…
  • Are people reacting calmly and pragmatically? Are they acknowledging and dealing with the complexity of the product?
  • If they can’t make the product or the process that it models less complex, are they at least taking steps to make that product or process easier to understand?
  • Might the programmers be generating or modifying code so quickly that they’re not taking the time to understand what’s really going on with it?
  • If someone feels that more testers are needed, what’s behind that feeling? (I took a stab at an answer to that question a few years back.)

How might we figure that out answers to those questions? One way might be to look at more of the test results and test meta-results.

  • Does someone perceive testing to be difficult or time-consuming? Who?
  • What’s the basis for that perception? What assumptions underlie it?
  • Does the need to investigate and report bugs overwhelm the testers’ capacity to obtain good test coverage? (I wrote about that problem here.)
  • Does testing consistently reveal consistent patterns of failure?
  • Are programmers consistently surprised by such failures and patterns?
  • Do small changes in the code cause problems that are disproportionately large or hard to find?
  • Do the programmers understand the product’s interdependencies clearly? Are those interdependencies necessary, or could they be eliminated?
  • Are programmers taking steps to anticipate or prevent problems related to interfaces and interactions?
  • If automated checks are difficult to develop and maintain, does that say something about the skill of the tester, the quality of the automation interfaces, or the scope of checks? Or about something else?
  • Do unstable builds get in the way of deeper testing?
  • Could we interpret “unstable builds” as a sign that the product has problems so numerous and serious that even shallow testing reveals them?
  • When a “stable” build appears after a long series of unstable builds, how stable is it really?

Perhaps, with the answers to those questions, we could raise even more questions.

  • What risks do those problems present for the success of the product, whether in the short term or the longer term?
  • When testing consistently reveals patterns of failures and attendant risk, what does the product team do with that information?
  • Are the programmers mandated to deliver code? Or are the programmers mandated to deliver code with a warrant that the code does what it should (and doesn’t do what it shouldn’t), to the best of their knowledge? Do the programmers adamantly prefer the latter mandate?
  • Is someone pressuring the programmers to make schedule or scope commitments that they can’t really fulfill?
  • Are the programmers and the testers empowered to push back on scope or schedule pressure when it adds to product or project risk?
  • Do the business people listen to the development team’s concerns? Are they aware of the risks that testers and programmers bring to their attention? When the development team points out risks, do managers and business people deal with them congruently?
  • Is the team working at a sustainable pace? Or is the product and the project being overwhelmed by complexity, interdependencies, fragility, and problems that lurk just beyond the reach of our development and testing effort?
  • Is the development team really Agile, in the sense of the precepts of the Agile Manifesto? Or is “agility” being used in a cargo-cult way, using practices or artifacts to mask over an incoherent project?

Testers often feel that their role is to find, investigate, and report on bugs in a running software product. That’s usually true, but it’s also a pretty limited view of what testers could test. A product can be anything that someone has produced: a program, a requirements document, a diagram, a specification, a flowchart, a prototype, a development process mode, a development process, an idea. Testing can reveal information about all of those things, if we pay attention.

When seen one way, the problems that appear at the top of this article look like serious problems for testing. They may be, but they’re more than that too. When we remember Jerry’s definition of testing as “gathering information with the intention of informing a decision”, then everything that we notice or discover during testing is a test result.

Here’s a follow-up to this post. (See also this discussion for an example of looking beyond the test result for possible product and project risks.)

This post was edited in small ways, for clarity, on 2017-03-11.

Exploratory Testing is All Around You

Monday, May 16th, 2011

I regularly converse with people who say they want to introduce exploratory testing in their organization. They say that up until now, they’ve only used a scripted approach.

I reply that exploratory testing is already going on all the time at your organization.  It’s just that no one notices, perhaps because they call it

  • “review”, or
  • “designing scripts”, or
  • “getting ready to test”, or
  • “investigating a bug”, or
  • “working around a problem in the script”, or
  • “retesting around the bug fix”, or
  • “going off the script, just for a moment”, or
  • “realizing the significance of what a programmer said in the hallway, and trying it out on the system”, or
  • “pausing for a second to look something up”, or
  • “test-driven development”, or
  • “Hey, watch this!”, or
  • “I’m learning how to use the product”, or
  • “I’m shaking out it a bit”, or
  • “Wait, let’s do this test first instead of that test”, or
  • “Hey, I wonder what would happen if…”, or
  • “Is that really the right phone number?”, or
  • “Bag it, let’s just play around for a while”, or
  • “How come what the script says and what the programmer says and what the spec says are all different from each other?”, or
  • “Geez, this feature is too broken to make further testing worthwhile; I’m going to go to talk to the programmer”, or
  • “I’m training that new tester in how to use this product”, or
  • “You know, we could automate that; let’s try to write a quickie Perl script right now”, or
  • “Sure, I can test that…just gimme a sec”, or
  • “Wow… that looks like it could be a problem; I think I’ll write a quick note about that to remind me to talk to my test lead”, or
  • “Jimmy, I’m confused… could you help me interpret what’s going on on this screen?”, or
  • “Why are we always using ‘tester’ as the login account? Let’s try ‘tester2’ today”, or
  • “Hey, I could cancel this dialog and bring it up again and cancel it again and bring it up again”, or
  • “Cool! The return value for each call in this library is the round-trip transaction time—and look at these four transactions that took thirty times longer than average!”, or
  • “Holy frijoles! It blew up! I wonder if I can make it blow up even worse!”, or
  • “Let’s install this and see how it works”, or
  • “Weird… that’s not what the Help file says”, or
  • “That could be a cool tool; I’m going to try it when I get home”, or
  • “I’m sitting with a new tester, helping her to learn the product”, or (and this is the big one)
  • “I’m preparing a test script.”

Now it’s possible that none of that stuff ever happens in your organization. Or maybe people aren’t paying attention or don’t know how to observe testing. Or both.

Then, just before I posted this blog entry, James Bach offered me two more sure-fire clues that people are doing exploratory testing: they say, “I am in no way doing exploratory testing”, or “we’re doing only highly rigorous formal testing”. In both cases, the emphatic nature of the claim guarantees that the claimant is not sufficiently observant about testing to realize that exploratory testing is happening all around them.

Update, October 12, 2015: In fact, in the Rapid Software Testing namespace, we now maintain it’s redundant to say “exploratory testing”, in the same way it’s redundant to say “carbon-based human” or “vegetarian potato”. It is formal scripting—not exploration‐that is the interloper on testing’s territory. We explain that here.

You Won’t See It Until You Believe It

Thursday, March 24th, 2011

Not too long ago, I updated my copy of Quicken. I hesitate to say upgrade. I’ve been using Quicken for years, despite the fact that the user interface has never been wonderful and has consistently declined a little in each version.

One of these days, I’ll do a 90-minute session and record some observations about the product. But for now, here’s one.

The default sort order for transactions in an account listing is by date, from earliest to latest. There are options whereby you can sort by reference number, payee, the amount of money spent or received, or the category. On the right side, there’s a scroll bar. As with pretty much all scroll bars, there’s a thumb—the button-like thing that one drags on to make the scrolling happen. No matter what I’ve chose for the sorting order, the tooltip associated with the thumb stubbornly continues to display the date, and the listing doesn’t update until I have let go of the scroll bar. So the tooltip is useless, and I can’t tell how far I need to scroll.

There are a zillion little problems like that in the product that make it unnecessarily hard to use. As I’ve maintained so often before, you can’t tell from the outside that no one tested the scroll bars, but I can guarantee that no one fixed them.

Upon updating the product, I was asked to fill out a survey. Aha! A chance to provide feedback! One of the survey questions was “What was your primary reason for upgrading Quicken?”

I wanted to respond, “Anticipated bug fixes.” I wanted to respond “I was hoping against hope to see some of the user interface problems in the previous versions sorted out.” The choices that I was offered were very close to this (I didn’t record them at the time, but a later online survey offered me these choices, which are close to what I remember):

  • I received an email from Quicken/Intuit
  • My previous version was no longer supported
  • I saw it advertised
  • I wanted specific new features
  • I saw a new version in stores
  • Banker/Financial advisor recommended I upgrade
  • I read a news article that mentioned the new Quicken version

In the survey included as part of the product update, there was no “Other” with a text box to indicate why I was really updating. There was no “Other” at all. (There was an “other” option in a subsequent survey form, of which I was notified through email.) This is how marketers get to make the assertion, “No one is interested in bug fixes.” They don’t see the evidence for it. But if you systematically place blinders over your eyes, you won’t see the evidence for much of anything other than what’s right in front of you.



Marshall McLuhan is rumoured to have said, “I wouldn’t have seen it if I hadn’t believed it.” If you want to observe something, it helps to believe that it’s possible. At least, it helps not to constrain your capacity to observe something that you didn’t expect. For the same reason, test cases with pre-defined and closed outcomes intensify the risk that a tester will be blind to what’s going on around. For the same reason, certification exams that present exactly four multiple-choice answers will fail to evaluate the nuances and subtleties of what a tester might observe and evaluate.

Managers, please take note!