Blog Posts for the ‘Testing Story’ Category

Deeper Testing (2): Automating the Testing

Saturday, April 22nd, 2017

Here’s an easy-to-remember little substitution that you can perform when someone suggests “automating the testing”:

“Automate the evaluation
and learning
and exploration
and experimentation
and modeling
and studying of the specs
and observation of the product
and inference-drawing
and questioning
and risk assessment
and prioritization
and coverage analysis
and pattern recognition
and decision making
and design of the test lab
and preparation of the test lab
and sensemaking
and test code development
and tool selection
and recruiting of helpers
and making test notes
and preparing simulations
and bug advocacy
and triage
and relationship building
and analyzing platform dependencies
and product configuration
and application of oracles
and spontaneous playful interaction with the product
and discovery of new information
and preparation of reports for management
and recording of problems
and investigation of problems
and working out puzzling situations
and building the test team
and analyzing competitors
and resolving conflicting information
and benchmarking…”

And you can add things to this list too. Okay, so maybe it’s not so easy to remember. But that’s what it would mean to automate the testing.

Use tools? Absolutely! Tools are hugely important to amplify and extend and accelerate certain tasks within testing. We can talk about using tools in testing in powerful ways for specific purposes, including automated (or “programmed“) checking. Speaking more precisely costs very little, helps us establish our credibility, and affords deeper thinking about testing—and about how we might apply tools thoughtfully to testing work.

Just like research, design, programming, and management, testing can’t be automated. Trouble arises when we talk about “automated testing”: people who have not yet thought about testing too deeply (particularly naïve managers) might sustain the belief that testing can be automated. So let’s be helpful and careful not to enable that belief.

What Is A Tester?

Thursday, June 25th, 2015

A junior tester relates some of the issues she’s encountering in describing her work.

To the people who thinks she “just breaks stuff all day”, here’s what I might reply:

It’s not that I don’t just break stuff; I don’t break stuff at all. The stuff that I’ve given to test is what it is; if it’s broken, it was broken when I got it. If I break anything, consider what my colleague James Bach says: I break dreams; I break the illusion that the software is doing what people want.

And when somebody doesn’t understand what a tester does, these are some of the metaphors upon which I can start a conversation. These are some things that, in my testing work, I am or that I aspire to be.

I’m a research scientist. My field of study is a product that’s in development. I research the product and everything around it to discover things that no one else has noticed so far. An important focus of my research is potential problems that threaten the value of the product. Other people—builders and managers—may know an immense amount about the product, but the majority of their attention is necessarily directed towards trying to make things work, and satisfaction about things that appear to work already. As a scientist, I’m attempting to falsify the theory that everything is okay with the product. So I study the technologies on which the product is built. I model the tasks and the problem space that the product is intended to address. I analyze each feature in the product, looking for problems in the way it was designed. I experiment with each part of the product, trying to disprove the theory that it will behave reasonably no matter what people throw at it. I recognize the difference between an experiment (investigating whether something works) and a demonstration (showing that something can work).

I’m an explorer. I start with a fuzzy idea of the product, and a large, empty notebook. I treat the product as a set of territories to be investigated, a country or city or landscape to learn about. I move through the space, sometimes following a safe route, and sometimes deviating from the usual path, and sometimes going to extremes. I might follow some of the same paths over and over again, but when I really want to learn about the territory, I turn off the marked roads, bushwhacking, branching and backtracking, getting lost sometimes, but always trying to see the landscape from new angles. I observe and reflect as I go. In my notebook I create pages of maps, diagrams, lists, journal entries, tables, photos, procedures. Mind you, I know that the book is only a pale representation of what I’ve seen and what I’ve learned, no matter how much I write and illustrate. I also know that many of the pages in the book are for myself, and that I’ll only show a few pages to others. The notebook is not the story of my exploration; it helps me tell the story of my exploration. (Here’s some more on notebooks.)

I’m a social scientist. I’m a sociologist and anthropologist, studying how people live and work; how they organize and interact; how things happen in their culture; and how the product will help them get things done. That’s because a product is not merely machinery and some code to make it work. A product fits into society, to fulfill a social purpose of some kind, and humans must repair the differences between what machines and humans can do. Thus testing requires a complex social judgement—which is much more than a matter of making sure that the wheels spin right. (I am indebted to Harry Collins for putting this idea so clearly.) What I’m doing has hard-science elements (just as anthropology has a strong biological component), but social sciences don’t always return hard answers. Instead, they provide “partial answers that might be useful”. (I am indebted to Cem Kaner for putting this idea so clearly.) As a social scientist, I strive to become aware of my biases so that I can manage them, thereby addressing certain threats to the validity of my research. So, I use and interact with the product in ways that represent actual customers’ behaviour, to discover problems that I and everyone else might have missed otherwise. I gather facts about the product; how it fits into the tasks that users perform with it, and how people might have to adapt themselves to handle the things that the product doesn’t do so well.

I’m a tool user. I’m always interacting with hardware, software, and other contrivances that help me to get things done. I use tools as media in the McLuhan sense: tools extend, enhance, intensify, enable, accelerate, amplify my capabilities. Tools can help me set systems up, generate data, and see things that might otherwise be harder to see. Tools can help me to sort and search through data. Tools can help me to produce results that I can compare to my product’s results. Tools can check to help me see what’s there and what might be missing. Tools can help me to feed input to the product, to control it, and to observe its output. Tools can help me with record-keeping and reporting. Sometimes the tools I’ve got aren’t up to the task at hand, so I use tools to help me build tools—whereupon I am also a tool builder. I’m aware of another aspect of McLuhan’s ideas about media: when extended beyond their original or intended capacity, tools reverse into producing the opposite of their original or intended effects.

I’m a critic. Like my favourite film critics, I study the work and how it might appeal—or not—to the audience for which it is intended. I study the technical aspects of the product, just as a film critic looks at lighting, framing of the shots, and other aspects of cinematography; at sound; at editing; at story construction; and so forth. I study culture and history—I study the culture and history of software—as a critic studies those of film—and of societies generally—to evaluate how well the product (story) fits in relation to its culture and its period and the genre in which the work fits. I might like the work or not, but as a critic, my personal preferences aren’t as important as analyzing the work on behalf of an audience. To do this well, I must recognize my preferences and my biases, and manage them. I fit all those things and more into an account that helps a potential audience decide whether they’ll like it or not. (A key difference is that the reader of my review is not the audience of a finished product; my review is for the cast, crew, and producers as the product is being built.)

I’m an investigative reporter. My beat is the product and everything and everyone around it. I ask the who, what, where, when, why, and how questions that reporters ask, and I’m continually figuring out and refining the next set of questions I need to ask. I’m interacting with the product myself, to learn all I can about it. I’m interviewing people who are asking for it, the people are who building it, and other people who might use it. I’m telling a story about what I discover, one that leads with a headline, begins with a summary overview and delves into to more detail. My story might be illustrated with charts, tables, and pictures. My story is truthful, but I realize the existence of different truths for different people, so I’m also prepared to bring several perspectives to the story.

There are other metaphors, of course. These are the prominent ones for me. What other ones can you see in your own work?

Very Short Blog Posts (24): You Are Not a Bureaucrat

Saturday, February 7th, 2015

Here’s a pattern I see fairly often at the end of bug reports:

Expected: “Total” field should update and display correct result.
Actual: “Total” field updates and displays incorrect result.

Come on. When you write a report like that, can you blame people for thinking you’re a little slow? Or that you’re a bureaucrat, and that testing work is mindless paperwork and form-filling? Or perhaps that you’re being condescending?

It is absolutely important that you describe a problem in your bug report, and how to observe that problem. In the end, a bug is an inconsistency between a desired state and an observed state; between what we want and what we’ve got. It’s very important to identify the nature of that inconsistency; oracles are our means of recognizing and describing problems. But in the relationship between your observation and the desired state, the expectation is the middleman. Your expectation is grounded in a principle based on some desirable consistency. If you need to make that principle explicit, leave out the expectation, and go directly for a good oracle instead.

Taking Severity Seriously

Wednesday, January 14th, 2015

There’s a flaw in the way most organizations classify the severity of a bug. Here’s an example from the Elementool Web site (as of 14 January, 2015); I’m sure you’ve seen something like it:

Critical: The bug causes a failure of the complete software system, subsystem or a program within the system.
High: The bug does not cause a failure, but causes the system to produce incorrect, incomplete, inconsistent results or impairs the system usability.
Medium: The bug does not cause a failure, does not impair usability, and does not interfere in the fluent work of the system and programs.
Low: The bug is an aesthetic (sic —MB), is an enhancement (ditto) or is a result of non-conformance to a standard.

These are serious problems, to be sure—and there are problems with the categorizations, too. (For example, non-conformance to a medical device standard can get you publicly reprimanded by the FDA; how is that low severity?) But there’s a more serious problem with models of severity like this: they’re all about the system as though no person used that system. There’s no empathy or emotion here; there’s no impact on people. The descriptions don’t mention the victims of the problem, and they certainly don’t identify consequences for the business. What would happen if we thought of those categories a little differently?

Critical: The bug will cause so much harm or loss that customers will sue us, regulators will launch a probe of our management, newspapers will run a front-page story about us, and comedians will talk about us on late night talk shows. Our company will spend buckets of money on lawyers, public relations, and technical support to try to keep the company afloat. Many capable people will leave voluntarily without even looking for a new job. Lots of people will get laid off. Or, the bug blocks testing such that we could miss problems of this magnitude; go back to the beginning of this paragraph.

High: The bug will cause loss, harm, or deep annoyance and inconvenience to our customers, prompting them to flood the technical support phones, overwhelm the online chat team, return the product demanding their money back, and buy the competitor’s product. And they’ll complain loudly on Twitter. The newspaper story will make it to the front page of the business section, and our product will be used for a gag in Dilbert. Sales will take a hit and revenue will fall. The Technical Support department will hold a grudge against Development and Product Management for years. And our best workers won’t leave right away, but they’ll be sufficiently demoralized to start shopping their résumés around.

Medium: The bug will cause our customers to be frustrated or impatient, and to lose faith in our product such that they won’t necessarily call or write, but they won’t be back for the next version. Most won’t initiate a tweet about us, but they’ll eagerly retweet someone else’s. Or, the bug will annoy the CEO’s daughter, whereupon the CEO will pay an uncomfortable visit to the development group. People won’t leave the company, but they’ll be demotivated and call in sick more often. Tech support will handle an increased number of calls. Meanwhile, the testers will have—with the best of intentions—taken time to investigate and report the bug, such that other, more serious bugs will be missed (see “High” and “Critical” above). And a few months later, some middle manager will ask, uncomprehendingly, “Why didn’t you find that bug?”

Low: The bug is visible; it makes our customers laugh at us because it makes our managers, programmers, and testers look incompetent and sloppy‐and it causes our customers to suspect deeper problems. Even people inside the company will tease others about the problem via grafitti in the stalls in the washroom (written with a non-washable Sharpie).Again, the testers will have spent some time on investigation and reporting, and again test coverage will suffer.

Of course, one really great way to avoid many of these kinds of problems is to focus on diligent craftsmanship supported by scrupulous testing. But when it comes to that discussion in that triage meeting, let’s consider the impact on real customers, on the real people in our company, and on our own reputations.

Rising Against the Rent-Seekers

Monday, August 25th, 2014

At CAST 2014, a quiet, modest, thoughtful, and very experienced man named James Christie gave a talk called “Standards: Promoting Quality or Restricting Competition?”. The talk followed on from his tutorial at EuroSTAR 2013 on working with auditors—James is a former auditor himself—and from his blogs on software standards over the years.

James’ talk introduced to our community the term rent-seeking. Rent-seeking is the act of using political means—the exercise of power—to obtain wealth without creating wealth; see http://www.econlib.org/library/Enc/RentSeeking.html and http://en.wikipedia.org/wiki/Rent-seeking. One form of rent-seeking is using regulations or standards in order to create or manipulate a market for consulting, training, and certification.

James’ CAST presentation galvanized several people in attendance to respond to ISO Standard 29119, the most recent rent-seeking scheme by a very persistent group of certificationists and standards promoters. Since the ISO standard on standards requires—at least in theory—consensus from industry experts, some people proposed a petition to demonstrate opposition and the absence of consensus amongst skilled testers. I have signed this petition, and I urge you to read it, and, if you agree, to sign it too.

Subsequently, a publication named Professional Tester published—under an anonymous byline—a post about the petition, with the provocative title “Book burners threaten (old) new testing standard”. Presumably such (literally) inflammatory language was meant as clickbait. Ordinarily such things would do little to foster thoughtful discussion about the issues, but it prompted some quite thoughtful reactions. Here’s one example; here’s another. Meanwhile, if the author wishes to characterize me as a book burner, here are (selected) contents of my library relevant to software testing. Even the lamest testing books (and some are mighty lame) have yet to be incinerated.

In the body text, the anonymous author mischaracterises the petition and its proponents, of which I am one. “Their objection,” (s)he says, “is that not everyone will agree with what the standard says: on that criterion nothing would ever be published.” I might not agree with what the standard says, but that’s mostly a side issue for the purposes of this post. I disagree with what the authors of the standard attempt to do with it.

1) To prescribe expensive, time-consuming, and wasteful focus on bloated process models and excessive documentation. My concern here is that organizations and institutions will engage in goal displacement: expending money, time and resources on demonstrating compliance with the standard, rather than on actually testing their products and services. Any kind of work presents opportunity cost; when you’re doing something, most of the time it prevents you from doing something else. Every minute that a tester spends on wasteful documentation is a minute that the tester cannot fulfill the overarching mission of testing: learning about the product, with an emphasis on discovering important problems that threaten value or safety, so that our clients can make informed decisions about problems and risks.

I am not objecting here to documentation, as the calumny from Professional Tester suggests. I am objecting to excessive and wasteful documentation. Ironically, the standard itself provides an example: the current version of ISO 29119-1 runs to 64 pages; 29119-2 has 68 pages; and 29119-3 has 138 pages. If those pages follow the pattern of earlier drafts, or of most other ISO documents, you have a long, pointless, and sleep-inducing read ahead of you. Want a summary model of the testing process? Try this example of what the rent-seekers propose as their model of of testing work. Note the model’s similarity to that of a (overly complex and poorly architected) computer program.

2) To set up an unnecessary market for training, certification, and consultancy in interpreting and applying the standard. The primary tactic here is to instill the fear of being de-certified. We’ve been here before, as shown in this post from Tom DeMarco (date uncertain, but it seems to have been written prior to 2000).

Rent-seeking is of the essence, and we’ve been here before in another sense: this was one of the key goals of the promulgators of the ISEB and ISTQB. In the image, they’ve saved the best for last.

The well-informed reader will note that the list of organizations behind those schemes and the members of the ISO 29119 international working group look strikingly similar.

If the working group happens to produce a massive and opaque set of documents, and you’re in an environment that claims conformance to the 29119 standards, and you want to get some actual testing work done, you’ll probably find it helpful to hire a consultant to help you understand them, or to help defend you from charges that you were not following the standard. Maybe you’ll want training and certification in interpreting the standard—services that the authors’ consultancies are primed to offer, with extra credibility because they wrote the standards! Good thing there are no ethical dilemmas around all of this.

3) To use the ISO’s standards development process to help suppress dissent. If you want to be on the international working group, it’s a commitment to six days of non-revenue work, somewhere in the world, twice a year. The ISO/IEC does not pay for travel expenses. Where have international working group meetings been held? According to the http://softwaretestingstandard.org/ Web site, meetings seem to have been held in Seoul, South Korea (2008); Hyderabad, India (2009); Niigata, Japan (2010); Mumbai, India (2011); Seoul, South Korea (2012); Wellington New Zealand (2013). Ask yourself these questions:

  • How many independent testers or testing consultants from Europe or North America have that kind of travel budget?

  • What kinds of consultants might be more likely to obtain funding for this kind of travel?

  • Who benefits from the creation of a standard whose opacity demands a consultant to interpret or to certify?

Meanwhile, if you join one of the local working groups, there are two ways that the group arrives at consensus.

  • By reaching broad agreement on the content. (Consensus, by the way, does not mean unanimity—that everyone agrees with the the content. It would be closer to say that in a consensus-based decision-making process, everyone agrees that they can live with the content.) But, if you can’t get to that, there’s another strategy.

  • By attrition. If your interest is in promulgating an unwieldy and opaque standard, there will probably be objectors. When there are, wait them out until they get frustrated enough to leave the decision-making process. Alan Richardson describes his experience with ISEB in this way.

In light of that, ask yourself these questions:

  • How many independent consultants have the time and energy to attend local working groups, often during otherwise billable hours?

  • What kinds of consultants might be more likely to support attendance at local working groups?

  • Who benefits from the creation of a standard that needs a consultant to interpret or to certify?

4) To undermine the role of skill in testing, and the reputations of people who discuss and promote it. “The real reason the book burners want to suppress it is that they don’t want there to be any standards at all,” says the polemicist from Professional Tester. I do want there to be standards for widgets and for communication protocols, but not for complex, cognitive, context-sensitive intellectual work. There should be standards for designed things that are intended to work together, but I’m not at all sure there should be mandated standards for how to do design. S/he goes on: “Effective, generic, documented systematic testing processes and methods impact their ability to depict testing as a mystic art and themselves as its gurus.” Far from treating testing as a mystic art, appealing to things like “intuition” and “experienced-based techniques”, my community has been trying to get to the heart of testing skills, flexible and responsive coverage reporting, tacit and explict knowledge, and the premises of the way we do testing. I’ve seen no such effort to dig deeper into these subjects—and to demystify them—from the rent-seekers.

Unlike the anonymous author at Professional Tester, I am willing to stand behind my work, my opinions, and my reputation by signing my name and encouraging comments. Feel free.

—Michael B.

Very Short Blog Posts (20): More About Testability

Monday, July 14th, 2014

A few weeks ago, I posted a Very Short Blog Post on the bare-bones basics of testability. Today, I saw a very good post from Adam Knight talking about telling the testability story. Adam focused, as I did, on intrinsic testability—things in the product itself that it more testable. But testability isn’t just a product attribute. In Heuristics of Testability (material we developed in a session of Rapid Software Testing Intensive Online), James Bach shows that testability is a set of relationships between product (“intrinsic testability”); project (“project-related testability”); tester (“subjective testability”); what we want from the product (“value-related testability”); and how we know what we know and what we need to know (“epistemic testability”).

Be sure of this: anything that makes testing harder or slower gives bugs more time or more opportunities to hide. In telling an expert and compelling story of our testing, it’s essential to identify and address things that make it harder to understand the product we’ve got—things that help to increase the risk that it won’t be the product our clients want.

Very Short Blog Posts (12): Scripted Testing Depends on Exploratory Testing

Sunday, February 23rd, 2014

People commonly say that exploratory testing “is a luxury” that “we do after we’ve finished our scripted testing”. Yet there is no scripted procedure for developing a script well. To develop a script, we must explore requirements, specifications, or interfaces. This requires us to investigate the product and the information available to us; to interpret them and to seek ambiguity, incompleteness, and inconsistency; to model the scope of the test space, the coverage, and our oracles; to conjecture, experiment, and make discoveries; and to perform testing and obtain feedback on how the scripts relate to the actual product, rather than the one imagined or described or modeled in an artifact; to observe and interpret and report the test results, and to feed them back into the process; and to do all of those things in loops and bursts of testing activity. All of these are exploratory activities. Scripted testing is preceded by and embedded in exploratory processes that are not luxuries, but essential.

Related posts:

http://www.satisfice.com/blog/archives/856
http://www.developsense.com/blog/2011/05/exploratory-testing-is-all-around-you/
http://www.satisfice.com/blog/archives/496

Counting the Wagons

Monday, December 30th, 2013

A member of Linked In asks if “a test case can have multiple scenarios”. The question and the comments reinforce, for me, just how unhelpful the notion of the “test case” is.

Since I was a tiny kid, I’ve watched trains go by—waiting at level crossings, dashing to the window of my Grade Three classroom, or being dragged by my mother’s grandchildren to the balcony of her apartment, perched above a major train line that goes right through the centre of Toronto. I’ve always counted the cars (or wagons, to save us some confusion later on). As a kid, it was fun to see how long the train was (were more than a hundred wagons?!). As a parent, it was a way to get the kids to practice counting while waiting for the train to pass and the crossing gates to lift.

train

Often the wagons are flatbeds, loaded with shipping containers or the trailers from trucks. Others are enclosed, but when I look through the screening, they seem to be carrying other vehicles—automobiles or pickup trucks. Some of the wagons are traditional boxcars. Other wagons are designed to carry liquids or gases, or grain, or gravel. Sometimes I imagine that I could learn something about the economy or the transportation business if I knew what the trains were actually carrying. But in reality, after I’ve counted them, I don’t know anything significant about the contents or their value. I know a number, but I don’t know the story. That’s important when a single car could have explosive implications, as in another memory from my youth.

A test case is like a railway wagon. It’s a container for other things, some of which have important implications and some of which don’t, some of which may be valuable, and some of which may be other containers. Like railway wagons, the contents—the cargo, and not the containers—are the really interesting and important parts. And like railway wagons, you can’t tell much about the contents without more information. Indeed, most of the time, you can’t tell from the outside whether you’re looking at something full, empty, or in between; something valuable or nothing at all; something ordinary and mundane, or something complex, expensive, or explosive. You can surely count the wagons—a kid can do that—but what do you know about the train and what it’s carrying?

To me, a test case is “a question that someone would like to ask (and presumably answer) about a program”. There’s nothing wrong with using “test case” as shorthand for the expression in quotes. We risk trouble, though, when we start to forget some important things.

  • Apparently simple questions may contain or infer multiple, complex, context-dependent questions.
  • Questions may have more outcomes than binary, yes-or-no, pass-or-fail, green-or-red answers. Simple questions can lead to complex answers with complex implications—not just a bit, but a story.
  • Both questions and answers can have multiple interpretations.
  • Different people will value different questions and answers in different ways.
  • For any given question, there may be many different ways to obtain an answer.
  • Answers can have multiple nuances and explanations.
  • Given a set of possible answers, many people will choose to provide a pleasant answer over an unpleasant one, especially when someone is under pressure.
  • The number of questions (or answers) we have tells us nothing about their relevance or value.
  • Most importantly: excellent testing of a product means asking questions that prompt discovery, rather than answering questions that confirm what we believe or hope.

Testing is an investigation in which we learn about the product we’ve got, so that our clients can make decisions about whether it’s the product they want. Other investigative disciplines don’t model things in terms of “cases”. Newspaper reporters don’t frame their questions in terms of “story cases”. Historians don’t write “history cases”. Even the most reductionist scientists talk about experiments, not “experiment cases”.

Why the fascination with modeling testing in terms of test cases? I suspect it’s because people have a hard time describing testing work qualitatively, as the complex cognitive activity that it is. These are often people whose minds are blown when we try to establish a distinction between testing and checking. Treating testing in terms of test cases, piecework, units of production, simplifies things for those who are disinclined to confront the complexity, and who prefer to think of testing as checking at the end of an assembly line, rather than as an ongoing, adaptive investigation. Test cases are easy to count, which in turn makes it easy to express testing work in a quantitative way. But as with trains, fixating on the containers doesn’t tell you anything about what’s in them, or about anything else that might be going on.


As an alternative to thinking in terms of test cases, try thinking in terms of coverage. Here are links to some further reading:

  • Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.
  • Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.
  • A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.
  • What Counts“, an article that I wrote for Better Software magazine, on problems with counting things.
  • Braiding the Stories” and “Delivering the News“, two blog posts on describing testing qualitatively.
  • My colleague James Bach has a presentation on the case against test cases.
  • Apropos of the reference to “scenarios” in the original thread, Cem Kaner has at least two valuable discussions of scenario testing, as tutorial notes and as an article.

Very Short Blog Posts (8): Two Big Questions

Thursday, November 14th, 2013

Too often testing is focused on getting the right answers, rather than asking worthwhile questions and helping to get them answered. There are at least two overarching questions that a tester must ask. While looking directly at the product, at its customers, at the project, at the business, and at the relationships between them, the tester’s first question is “Is there a problem here?” When the tester observes anything looks like it might be a problem, the tester’s next question is for the testing client and for the rest of the project community: “Are we okay with this?

Where Does All That Time Go?

Tuesday, October 30th, 2012

It had been a long day, so a few of the fellows from the class agreed to meet a restaurant downtown. The main courses had been cleared off the table, some beer had been delivered, and we were waiting for dessert. Pedro (not his real name) was complaining, again, about how much time he had to spend doing administrivial tasks—meetings, filling out forms, time sheets, requisitions, and the like. “Everything takes so long. I want a pad of paper to take notes, I have to fill out a form for it. God help me if I run out of forms!”

“How much time do you spend on this kind of stuff each week?” I asked.

Pedro replied, “An hour a day. Maybe two, some days. Meetings…let’s say an hour and a half, on average.”

Wow, I thought—that’s a pretty good chunk of the week. I had an idea.

“Let’s visualize this, I said.” I took out my trusty Moleskine notebook. I prefer the version with the graph paper in it, for occasions just like this one. I outlined a grid, 20 squares across by two down.

Empty Week

“So you spend, on average, an hour and a half each day on compliance stuff. One-point-five times five, or 7.5 hours a week. Let’s make it eight. Put a C in eight squares.” He did that.

Compliance

“Okay,” I said. “You were griping today about how much time you spend wrestling with your test environments.”

Pedro’s eyes lit up. “Yes!” he said. “That’s the big one. See, it’s mobile stuff. We have a server component and a handset component to what we do, and the server stuff is a real bear.”

“Tell me more.”

“It’s a big deal. We’ve got one environment that models the production system. The software we’re developing has been so buggy that we can’t tell whether a given problem is general, or specific to the handset, so we have another one that we set up to do targeted testing every time we add support for a new handset. That’s the one I work with. Trouble is, setting it up takes ages and it’s really finicky. I have to do everything really carefully. I’ve asked for time to do scripting to automate some of it, but they won’t give that to me, because they’re always in such a rush. So, I do it by hand. It’s buggy, and I make the odd mistake. Either way, when I find out it doesn’t work, I have to troubleshoot it. That means I have to get on instant messaging or the phone to the developers, and figure out what’s wrong; then I have to figure out where to roll back to. And usually that’s right from the start. It wastes hours. And it’s every day.”

“Okay. Show me that on our little table, here. Use an S to represent each hour your spend each day.”

Whereupon Pedro proceded to fill in squares. Ten of them. Ten more. And then, eight more.

Setup

“Really?!” I said. “28 hours a week divided by five days—that’s more than five hours a day. Seriously?”

“Totally,” said Pedro. “It’s most of the day, every day, honestly. Never mind the tedium. What’s really killing me is that I don’t feel like I’m getting any real testing work done.”

“No kidding. There’s no time for it. There are only four squares left in the week. Plus, something you said earlier today about tons of bugs that aren’t related to setting up?”

“Right. When it comes to the stuff that I’m actually being asked to test, there’s lots of bugs there too. So my ‘testing time’ isn’t really testing. It’s mostly taken up with trying to reproduce and document the bugs.”

“Yes. In session-based test management, that’s bug investigation and reporting—B-time. And it does interrupt test design and execution—T-time—which is what produces actual test coverage, learning about what’s actually going on in the product. So, how much B-time?” He filled in three of the squares with Bs.

Bug Investigation and Reporting

“And T-time?”

He had room left to put in one lonely little T in the lower right corner.

Testing Time

“Wow,” I laughed. “One-fortieth of your whole week is spent in getting actual test coverage. The rest is all overhead. Have you told them how it affects you?”

“I’ve mentioned it,” he said.

“So look at this,” I suggested. “It’s even more clear when we use colour for emphasis.”

With Colour

“Whoa. I never looked at it that way. And then,” he paused. “Then they ask me, ‘Why didn’t you find that bug?'”

“Well,” I said, “considering the illusion they’re probably working under, it’s not an unreasonable question.”

“What do you mean?” Pedro asked.

“What does it say on your business card?”

“‘Software Testing’.”

“And what does it say on the door of the the test lab?”

“‘Test Lab’,” said Pedro.

“And they call you…?”

“Pedro.”

“No,” I laughed. “They say you’re a… what?”

“Oh. A tester.”

“So since you’re a tester, and since the door on the test lab says ‘Test Lab’, and your business card says ‘Testing’, they figure that’s all you do. The illusion is what Jerry Weinberg calls the Lumping Problem. All of those different activities—administrative compliance, setup, bug investigation and reporting, and test design and execution—are lumped into a single idea for them.” And I drew it for him.

Management's Dream

“That’s management’s illusion, there. Since, in their imagination, you’ve got forty hours of testing time in a week, it’s not unreasonable for them to wonder why you didn’t find that bug.”

“Hmmm. Right,” said Pedro.

“When in fact, what they’re getting from you is this.” And I drew it for him.

Testing Reality

“For testing—actual interaction with the product, looking for problems, you’ve got one-fortieth of the time they think you’ve got. One lonely little T. Is that part of your test report?”

“Oy,” he said. “Maybe I should show them something like this.”

“Maybe you should,” I said.

A couple of nights later, I showed that page of my notebook to James Bach over Skype. “Wow,” he said. “That guy could be forty times more productive!”

“Forty?”

“Well, no, not really, of course. But suppose the programmers checked their work a little more carefully, or suppose the testers practiced writing more concise bug reports and sharpened their investigating skill. One of those two things could cut the bug investigation time by a third. That would give more time for testing, when they’re not being interrupted by other stuff. What if they cut the setup time by a half, and that administrivia by half?”

“Four, fourteen…” I said. “That would give eighteen more hours for testing and bug investigation, for a total of 22 hours. And even if they’re still doing two hours of bug investigation for every one hour of testing time… well, that’s seven times more productive, at least.”

“Seven times the test coverage if they get some of those issues worked out, then,” said James.

“Maybe de-lumping is the kind of thing lots of testers would want to do in their test reports,” I said.

How about you?