Blog Posts for the ‘Management’ Category

Taking Severity Seriously

Wednesday, January 14th, 2015

There’s a flaw in the way most organizations classify the severity of a bug. Here’s an example from the Elementool Web site (as of 14 January, 2015); I’m sure you’ve seen something like it:

Critical: The bug causes a failure of the complete software system, subsystem or a program within the system.
High: The bug does not cause a failure, but causes the system to produce incorrect, incomplete, inconsistent results or impairs the system usability.
Medium: The bug does not cause a failure, does not impair usability, and does not interfere in the fluent work of the system and programs.
Low: The bug is an aesthetic (sic —MB), is an enhancement (ditto) or is a result of non-conformance to a standard.

These are serious problems, to be sure—and there are problems with the categorizations, too. (For example, non-conformance to a medical device standard can get you publicly reprimanded by the FDA; how is that low severity?) But there’s a more serious problem with models of severity like this: they’re all about the system as though no person used that system. There’s no empathy or emotion here; there’s no impact on people. The descriptions don’t mention the victims of the problem, and they certainly don’t identify consequences for the business. What would happen if we thought of those categories a little differently?

Critical: The bug will cause so much harm or loss that customers will sue us, regulators will launch a probe of our management, newspapers will run a front-page story about us, and comedians will talk about us on late night talk shows. Our company will spend buckets of money on lawyers, public relations, and technical support to try to keep the company afloat. Many capable people will leave voluntarily without even looking for a new job. Lots of people will get laid off. Or, the bug blocks testing such that we could miss problems of this magnitude; go back to the beginning of this paragraph.

High: The bug will cause loss, harm, or deep annoyance and inconvenience to our customers, prompting them to flood the technical support phones, overwhelm the online chat team, return the product demanding their money back, and buy the competitor’s product. And they’ll complain loudly on Twitter. The newspaper story will make it to the front page of the business section, and our product will be used for a gag in Dilbert. Sales will take a hit and revenue will fall. The Technical Support department will hold a grudge against Development and Product Management for years. And our best workers won’t leave right away, but they’ll be sufficiently demoralized to start shopping their résumés around.

Medium: The bug will cause our customers to be frustrated or impatient, and to lose faith in our product such that they won’t necessarily call or write, but they won’t be back for the next version. Most won’t initiate a tweet about us, but they’ll eagerly retweet someone else’s. Or, the bug will annoy the CEO’s daughter, whereupon the CEO will pay an uncomfortable visit to the development group. People won’t leave the company, but they’ll be demotivated and call in sick more often. Tech support will handle an increased number of calls. Meanwhile, the testers will have—with the best of intentions—taken time to investigate and report the bug, such that other, more serious bugs will be missed (see “High” and “Critical” above). And a few months later, some middle manager will ask, uncomprehendingly, “Why didn’t you find that bug?”

Low: The bug is visible; it makes our customers laugh at us because it makes our managers, programmers, and testers look incompetent and sloppy‐and it causes our customers to suspect deeper problems. Even people inside the company will tease others about the problem via grafitti in the stalls in the washroom (written with a non-washable Sharpie).Again, the testers will have spent some time on investigation and reporting, and again test coverage will suffer.

Of course, one really great way to avoid many of these kinds of problems is to focus on diligent craftsmanship supported by scrupulous testing. But when it comes to that discussion in that triage meeting, let’s consider the impact on real customers, on the real people in our company, and on our own reputations.

Very Short Blog Posts (21): You Had It Last!

Tuesday, November 4th, 2014

Sometimes testers say to me “My development team (or the support people, or the managers) keeping saying that any bugs in the product are the testers’ fault. ‘It’s obvious that any bug in the product is the tester’s responsibility,’ they say, ‘since the tester had the product last.’ How do I answer them?”

Well, you could say that the product’s problems are the responsibility of the tester because the tester had the product last—and that a successful product was successful because the programmers and the business people did such a good job at preventing bugs. But that would be to explain any failures in the product in one way, and to explain any successes in the product in a completely different way.

Instead, let’s be consistent. Testers don’t put the bugs in, and testers miss some of the bugs because bugs are, by their nature, hidden. Moreover, the bugs are hidden so well that not even the people who put them in could find them. The bugs are hidden by people, and by the consequences of how we choose to do software development. So let’s all work to prevent the bugs, and to find them more quickly. Let’s talk about problems in development that allow bugs to hide. Let’s all work on testability, so that we can find bugs earlier, and more easily, before the bugs have a chance to hide deeply. And let’s all share responsibility for our failures and our successes.

Rising Against the Rent-Seekers

Monday, August 25th, 2014

At CAST 2014, a quiet, modest, thoughtful, and very experienced man named James Christie gave a talk called “Standards: Promoting Quality or Restricting Competition?”. The talk followed on from his tutorial at EuroSTAR 2013 on working with auditors—James is a former auditor himself—and from his blogs on software standards over the years.

James’ talk introduced to our community the term rent-seeking. Rent-seeking is the act of using political means—the exercise of power—to obtain wealth without creating wealth; see http://www.econlib.org/library/Enc/RentSeeking.html and http://en.wikipedia.org/wiki/Rent-seeking. One form of rent-seeking is using regulations or standards in order to create or manipulate a market for consulting, training, and certification.

James’ CAST presentation galvanized several people in attendance to respond to ISO Standard 29119, the most recent rent-seeking scheme by a very persistent group of certificationists and standards promoters. Since the ISO standard on standards requires—at least in theory—consensus from industry experts, some people proposed a petition to demonstrate opposition and the absence of consensus amongst skilled testers. I have signed this petition, and I urge you to read it, and, if you agree, to sign it too.

Subsequently, a publication named Professional Tester published—under an anonymous byline—a post about the petition, with the provocative title “Book burners threaten (old) new testing standard”. Presumably such (literally) inflammatory language was meant as clickbait. Ordinarily such things would do little to foster thoughtful discussion about the issues, but it prompted some quite thoughtful reactions. Here’s one example; here’s another. Meanwhile, if the author wishes to characterize me as a book burner, here are (selected) contents of my library relevant to software testing. Even the lamest testing books (and some are mighty lame) have yet to be incinerated.

In the body text, the anonymous author mischaracterises the petition and its proponents, of which I am one. “Their objection,” (s)he says, “is that not everyone will agree with what the standard says: on that criterion nothing would ever be published.” I might not agree with what the standard says, but that’s mostly a side issue for the purposes of this post. I disagree with what the authors of the standard attempt to do with it.

1) To prescribe expensive, time-consuming, and wasteful focus on bloated process models and excessive documentation. My concern here is that organizations and institutions will engage in goal displacement: expending money, time and resources on demonstrating compliance with the standard, rather than on actually testing their products and services. Any kind of work presents opportunity cost; when you’re doing something, most of the time it prevents you from doing something else. Every minute that a tester spends on wasteful documentation is a minute that the tester cannot fulfill the overarching mission of testing: learning about the product, with an emphasis on discovering important problems that threaten value or safety, so that our clients can make informed decisions about problems and risks.

I am not objecting here to documentation, as the calumny from Professional Tester suggests. I am objecting to excessive and wasteful documentation. Ironically, the standard itself provides an example: the current version of ISO 29119-1 runs to 64 pages; 29119-2 has 68 pages; and 29119-3 has 138 pages. If those pages follow the pattern of earlier drafts, or of most other ISO documents, you have a long, pointless, and sleep-inducing read ahead of you. Want a summary model of the testing process? Try this example of what the rent-seekers propose as their model of of testing work. Note the model’s similarity to that of a (overly complex and poorly architected) computer program.

2) To set up an unnecessary market for training, certification, and consultancy in interpreting and applying the standard. The primary tactic here is to instill the fear of being de-certified. We’ve been here before, as shown in this post from Tom DeMarco (date uncertain, but it seems to have been written prior to 2000).

Rent-seeking is of the essence, and we’ve been here before in another sense: this was one of the key goals of the promulgators of the ISEB and ISTQB. In the image, they’ve saved the best for last.

The well-informed reader will note that the list of organizations behind those schemes and the members of the ISO 29119 international working group look strikingly similar.

If the working group happens to produce a massive and opaque set of documents, and you’re in an environment that claims conformance to the 29119 standards, and you want to get some actual testing work done, you’ll probably find it helpful to hire a consultant to help you understand them, or to help defend you from charges that you were not following the standard. Maybe you’ll want training and certification in interpreting the standard—services that the authors’ consultancies are primed to offer, with extra credibility because they wrote the standards! Good thing there are no ethical dilemmas around all of this.

3) To use the ISO’s standards development process to help suppress dissent. If you want to be on the international working group, it’s a commitment to six days of non-revenue work, somewhere in the world, twice a year. The ISO/IEC does not pay for travel expenses. Where have international working group meetings been held? According to the http://softwaretestingstandard.org/ Web site, meetings seem to have been held in Seoul, South Korea (2008); Hyderabad, India (2009); Niigata, Japan (2010); Mumbai, India (2011); Seoul, South Korea (2012); Wellington New Zealand (2013). Ask yourself these questions:

  • How many independent testers or testing consultants from Europe or North America have that kind of travel budget?

  • What kinds of consultants might be more likely to obtain funding for this kind of travel?

  • Who benefits from the creation of a standard whose opacity demands a consultant to interpret or to certify?

Meanwhile, if you join one of the local working groups, there are two ways that the group arrives at consensus.

  • By reaching broad agreement on the content. (Consensus, by the way, does not mean unanimity—that everyone agrees with the the content. It would be closer to say that in a consensus-based decision-making process, everyone agrees that they can live with the content.) But, if you can’t get to that, there’s another strategy.

  • By attrition. If your interest is in promulgating an unwieldy and opaque standard, there will probably be objectors. When there are, wait them out until they get frustrated enough to leave the decision-making process. Alan Richardson describes his experience with ISEB in this way.

In light of that, ask yourself these questions:

  • How many independent consultants have the time and energy to attend local working groups, often during otherwise billable hours?

  • What kinds of consultants might be more likely to support attendance at local working groups?

  • Who benefits from the creation of a standard that needs a consultant to interpret or to certify?

4) To undermine the role of skill in testing, and the reputations of people who discuss and promote it. “The real reason the book burners want to suppress it is that they don’t want there to be any standards at all,” says the polemicist from Professional Tester. I do want there to be standards for widgets and for communication protocols, but not for complex, cognitive, context-sensitive intellectual work. There should be standards for designed things that are intended to work together, but I’m not at all sure there should be mandated standards for how to do design. S/he goes on: “Effective, generic, documented systematic testing processes and methods impact their ability to depict testing as a mystic art and themselves as its gurus.” Far from treating testing as a mystic art, appealing to things like “intuition” and “experienced-based techniques”, my community has been trying to get to the heart of testing skills, flexible and responsive coverage reporting, tacit and explict knowledge, and the premises of the way we do testing. I’ve seen no such effort to dig deeper into these subjects—and to demystify them—from the rent-seekers.

Unlike the anonymous author at Professional Tester, I am willing to stand behind my work, my opinions, and my reputation by signing my name and encouraging comments. Feel free.

—Michael B.

Counting the Wagons

Monday, December 30th, 2013

A member of Linked In asks if “a test case can have multiple scenarios”. The question and the comments reinforce, for me, just how unhelpful the notion of the “test case” is.

Since I was a tiny kid, I’ve watched trains go by—waiting at level crossings, dashing to the window of my Grade Three classroom, or being dragged by my mother’s grandchildren to the balcony of her apartment, perched above a major train line that goes right through the centre of Toronto. I’ve always counted the cars (or wagons, to save us some confusion later on). As a kid, it was fun to see how long the train was (were more than a hundred wagons?!). As a parent, it was a way to get the kids to practice counting while waiting for the train to pass and the crossing gates to lift.

train

Often the wagons are flatbeds, loaded with shipping containers or the trailers from trucks. Others are enclosed, but when I look through the screening, they seem to be carrying other vehicles—automobiles or pickup trucks. Some of the wagons are traditional boxcars. Other wagons are designed to carry liquids or gases, or grain, or gravel. Sometimes I imagine that I could learn something about the economy or the transportation business if I knew what the trains were actually carrying. But in reality, after I’ve counted them, I don’t know anything significant about the contents or their value. I know a number, but I don’t know the story. That’s important when a single car could have explosive implications, as in another memory from my youth.

A test case is like a railway wagon. It’s a container for other things, some of which have important implications and some of which don’t, some of which may be valuable, and some of which may be other containers. Like railway wagons, the contents—the cargo, and not the containers—are the really interesting and important parts. And like railway wagons, you can’t tell much about the contents without more information. Indeed, most of the time, you can’t tell from the outside whether you’re looking at something full, empty, or in between; something valuable or nothing at all; something ordinary and mundane, or something complex, expensive, or explosive. You can surely count the wagons—a kid can do that—but what do you know about the train and what it’s carrying?

To me, a test case is “a question that someone would like to ask (and presumably answer) about a program”. There’s nothing wrong with using “test case” as shorthand for the expression in quotes. We risk trouble, though, when we start to forget some important things.

  • Apparently simple questions may contain or infer multiple, complex, context-dependent questions.
  • Questions may have more outcomes than binary, yes-or-no, pass-or-fail, green-or-red answers. Simple questions can lead to complex answers with complex implications—not just a bit, but a story.
  • Both questions and answers can have multiple interpretations.
  • Different people will value different questions and answers in different ways.
  • For any given question, there may be many different ways to obtain an answer.
  • Answers can have multiple nuances and explanations.
  • Given a set of possible answers, many people will choose to provide a pleasant answer over an unpleasant one, especially when someone is under pressure.
  • The number of questions (or answers) we have tells us nothing about their relevance or value.
  • Most importantly: excellent testing of a product means asking questions that prompt discovery, rather than answering questions that confirm what we believe or hope.

Testing is an investigation in which we learn about the product we’ve got, so that our clients can make decisions about whether it’s the product they want. Other investigative disciplines don’t model things in terms of “cases”. Newspaper reporters don’t frame their questions in terms of “story cases”. Historians don’t write “history cases”. Even the most reductionist scientists talk about experiments, not “experiment cases”.

Why the fascination with modeling testing in terms of test cases? I suspect it’s because people have a hard time describing testing work qualitatively, as the complex cognitive activity that it is. These are often people whose minds are blown when we try to establish a distinction between testing and checking. Treating testing in terms of test cases, piecework, units of production, simplifies things for those who are disinclined to confront the complexity, and who prefer to think of testing as checking at the end of an assembly line, rather than as an ongoing, adaptive investigation. Test cases are easy to count, which in turn makes it easy to express testing work in a quantitative way. But as with trains, fixating on the containers doesn’t tell you anything about what’s in them, or about anything else that might be going on.


As an alternative to thinking in terms of test cases, try thinking in terms of coverage. Here are links to some further reading:

  • Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.
  • Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.
  • A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.
  • What Counts“, an article that I wrote for Better Software magazine, on problems with counting things.
  • Braiding the Stories” and “Delivering the News“, two blog posts on describing testing qualitatively.
  • My colleague James Bach has a presentation on the case against test cases.
  • Apropos of the reference to “scenarios” in the original thread, Cem Kaner has at least two valuable discussions of scenario testing, as tutorial notes and as an article.

Can You Hear The Alarm Bells?

Monday, November 25th, 2013

Many people seem certain about what happened to cause the healthcare.gov fiasco. Stories are starting to trickle out, and eventually they’ll be an ocean of them. To anyone familiar with software development, especially in large organizations, these stories include familiar elements of character and plot. From those, it’s easy to extrapolate and fill in the details based on imagination and experience. We all know what happened.

Well, we don’t. In a project of that size, no one knows what happened. No one can know what happened. Imagine Rashomon scaled up to hundreds of people, each making his own observations and decisions along the way.

As time goes by, I anticipate some people saying that the project will represent a turning point in software development and project management. “Surely,” they will say, “after a project failure of this size and scope, people will finally learn.” Alas, I’m less optimistic. As the first three premises of rapid software testing describe it, software development is a human activity that is surrounded by 1) confusion, 2) complexity, 3) volatility, 4) urgency and… 5) ambition. Increasing ambition causes increases in the other four items too. In our societies, we could help to defend ourselves against future fiascos by restraining our ambitions, but I fear that people will put blindfolds on each other, pass around the keys, and scramble to get back into the driver’s seat of the school bus. How will they do this?

One form of the blindfold is to say “That not going to be a problem here because…”

…failure is not an option.
…we have our best people on it.
…we can’t disappoint the client.
…it doesn’t have to be perfect. (thanks, Joe Miller, @lilshieste)
…we’ll fix it in production.
…no user would ever do that.
…the users will figure it out.
…the users will never notice that.
…THAT bug is in someone else’s code.
…we don’t have to fix that; that’s a new feature request.
…it’s working exactly as designed.
…if there’s no test case for it, it’s not a bug.
…the clients will come to their senses before the ship date.
…we have thousands of automated tests that we run on every build
…this time it will be different.
…we have budget to fix that before we deploy.
…at least the back end is working right.
…if there are performance problems, we’ll just add another few servers.
…we’ve done lots of projects just like this one.
…foreign-language support is something we could cut.
…that list there says that this is a level three threat, not a level one threat.
…the support people can handle whatever problems come up.
…this graph shows that the load will never get that high.
…now is too soon; we’ll tell the clients about the problems after we’ve fixed them.
…we’re thinking positively&mdashthat can-do spirit will see us through.
…we still have plenty of time left to fix that.
…the spec didn’t say anything about having to handle special characters. How are single quotes a big deal?
…the client should have thought of that before.
…seriously, that’s just a cosmetic problem.
…it’s important not to complicate things.
…everybody WILL put in some overtime and we WILL get this thing done.
…well, at least the front end looks good, and people will be happy with that.
…everyone here is committed to making sure this ships on time.
…we’ll just shorten the test cycle.
…if there’s a problem, the other/upstream/downstream team will let us know.
…they can take care of that in training.
…we’ve planned to make sure that nothing unexpected happens.
…we’ve got this fantastic new framework that’ll make things go faster.
…we’ll pull a bunch of people off other projects to work on this one.

I wonder whether these things were said, at one time or another, during the healthcare.gov project. I don’t know if they were. I don’t know what happened. I didn’t work on it. But I’ve heard these things on projects before, I know that I can listen for them, and I know that they’re a sign of trouble ahead. Are they being said on your project?

Interview and Interrogation

Friday, September 27th, 2013

In response to my post from a couple of days ago, Gus kindly provides a comment, and I think a discussion of it is worth a blog post on its own.

Michael, I appreciate what you are trying to say but the simile doesn’t really work 100% for me, let me try to explain.

The simile has prompted you to think and to question, so in that sense, it works 100% for me. Triggering thought is, after all, is why people use similes. (See also Surfaces and Essences: Analogy as the Fuel and Fire of Thinking.)

I would apply lean principles and cut some waste from your interview process. I will fail the candidate as soon as she gives me the first wrong answer.

I have 5 questions and all have to be answered correctly to hire the person for a junior position (release 1).

Interview candidate A:
Ask question 1 OK
Ask question 2 FAIL
Send candidate A home

Second interview to candidate A:
Ask question 1 OK
Ask question 2 OK
Ask question 3 FAIL
Send candidate A home

Third interview to candidate A:
Ask question 1 OK
Ask question 2 OK
Ask question 3 OK
Ask question 4 OK
Ask question 5 OK

Hire candidate A

All right. You seem to have left out something important in your process here, which I would apply after each step—indeed, after each question and answer: make a conscious decision about your next step. To me, that requires continuous review of your list of questions for relevance, significance, sufficiency, and information value. Interviewing is an exploratory process. A skilled interviewer will respond, in the moment, to what the candidate has said. A skilled interviewer will think less in terms of “pass or fail”, and more in terms of “What am I learning about this candidate? What does the last answer suggest I should ask next? What other information, exclusive of the answer, might I apply to my decision-making process? What else should I be looking for?” When the candidate gets the answer wrong, the skilled interviewer will ask “Was it really wrong? Maybe there are multiple right answers to the same question. Maybe she didn’t understand the question because I asked in in an ambiguous way, and she gave a right answer to an alternative interpretation. Maybe her answer was a question for me, intended to clarify my question.”

I can’t emphasize this enough: like interviewing, testing is about far more than pass or fail. Testing is about exploration, discovery, investigation, and learning, with the goal of imparting what we’ve learned to people who matter. Testing is about trying to understand the product that we’ve got, with the goal of revealing information that helps our clients decide if it’s the product they want. Testing is usually (but not always) focused on finding evident problems, apparent problems and potential problems, not only in our products, but in our ideas about our products. Testing is also about finding problems in our testing, and every one of the “fail” moments above is a point at which I would want to consider a problem with the test. (The “pass” moments are like that too, if I really want to do a great job.)

At this point when candidate A will want to be promoted to a senior position (translate with next release of the software) I will prepare other 5 different questions probing against the new skills and responsibilities and as I have automated the first 5 questions I can send her a link to a web site where she will have to prove that she hasn’t forgotten about the first 5 before she can be even considered for the new position.

I’d do things slightly differently.

First I would ask “What would prompt me to ask the same questions again? Are those still the most important questions I could ask as she’s heading for her new role? What reason do I have to believe that she might have lost some capability she previously had? Are there other questions related to her old role—not necessarily to her new one—that I should ask that might be more revealing or more significant?” Note that there might be entirely legitimate reasons to believe that she might have backslid somehow—but at that point, I’d also want to ask “What are the conditions that would have allowed her to backslide without me noticing it—and what could I do to minimize those kinds of conditions?”

Then there would be another question I’d ask: “What if she has learned to answer a specific question properly, but is not adaptable to the general case? Should I be asking the same question in a different way, to see if she gives the same answer? Should I be asking a similar question that has a different answer, and see if she notices and handles the difference?”

Now: it might be costly to vary my questions, so I might simply shrug and decide just to go with the ones I’d asked before. But the point of evaluating my process is to ask, “How might I be fooling myself in the belief that I still know this person well?”

Assumes she answers correctly the 5 automated questions, at this point I will do the interview for the senior role.

Interview candidate A for senior role:
Ask question 6 OK
Ask question 7 FAIL
Send candidate A home

and so on.

I don’t see a problem with this process as long as I am allowed using everything I learn from the feedback with the candidate up to question “N” to adapt and change all the questions greater than “N”

Up until this point, you haven’t mentioned that, and your description of your process doesn’t make that at all clear. You’ve only mentioned the “pass” and “fail” parts of “everything I learn from the feedback”. Now, you might be taking that into account in your head, but notice how your description, your process model, doesn’t reflect that—so it becomes easy to misinterpret what you actually do. In addition, you’ve focused on adapting and changing all the questions greater than N—but I’d be interested in the possibility of adapting and changing all the questions less than or equal to N, too.

More importantly: qualifying someone for an important job is not about making sure that they can get the right answers on a canned test, just as testing a product is not about making sure that the functions produce expected results for some number of test cases. The specific answers might have some significance, but if I’m serious about hiring the right people for the job, I don’t want to make my decisions solely by putting them in front of a terminal, having them fill out an online form, and checking their answers. I want to evaluate them from a number of criteria: do they respond quickly, in a polite and friendly way? Do they work well with others? Are they appropriately discrete? Are they adaptable? Can they deal with heavy workloads? Do they learn quickly? In order to learn those things, I need to do more than ask pass-or-fail questions. I need to have unscripted, spontaneous, and free-flowing conversation with them; interview and interaction, and not just interrogation. You see?

Where Does All That Time Go?

Tuesday, October 30th, 2012

It had been a long day, so a few of the fellows from the class agreed to meet a restaurant downtown. The main courses had been cleared off the table, some beer had been delivered, and we were waiting for dessert. Pedro (not his real name) was complaining, again, about how much time he had to spend doing administrivial tasks—meetings, filling out forms, time sheets, requisitions, and the like. “Everything takes so long. I want a pad of paper to take notes, I have to fill out a form for it. God help me if I run out of forms!”

“How much time do you spend on this kind of stuff each week?” I asked.

Pedro replied, “An hour a day. Maybe two, some days. Meetings…let’s say an hour and a half, on average.”

Wow, I thought—that’s a pretty good chunk of the week. I had an idea.

“Let’s visualize this, I said.” I took out my trusty Moleskine notebook. I prefer the version with the graph paper in it, for occasions just like this one. I outlined a grid, 20 squares across by two down.

Empty Week

“So you spend, on average, an hour and a half each day on compliance stuff. One-point-five times five, or 7.5 hours a week. Let’s make it eight. Put a C in eight squares.” He did that.

Compliance

“Okay,” I said. “You were griping today about how much time you spend wrestling with your test environments.”

Pedro’s eyes lit up. “Yes!” he said. “That’s the big one. See, it’s mobile stuff. We have a server component and a handset component to what we do, and the server stuff is a real bear.”

“Tell me more.”

“It’s a big deal. We’ve got one environment that models the production system. The software we’re developing has been so buggy that we can’t tell whether a given problem is general, or specific to the handset, so we have another one that we set up to do targeted testing every time we add support for a new handset. That’s the one I work with. Trouble is, setting it up takes ages and it’s really finicky. I have to do everything really carefully. I’ve asked for time to do scripting to automate some of it, but they won’t give that to me, because they’re always in such a rush. So, I do it by hand. It’s buggy, and I make the odd mistake. Either way, when I find out it doesn’t work, I have to troubleshoot it. That means I have to get on instant messaging or the phone to the developers, and figure out what’s wrong; then I have to figure out where to roll back to. And usually that’s right from the start. It wastes hours. And it’s every day.”

“Okay. Show me that on our little table, here. Use an S to represent each hour your spend each day.”

Whereupon Pedro proceded to fill in squares. Ten of them. Ten more. And then, eight more.

Setup

“Really?!” I said. “28 hours a week divided by five days—that’s more than five hours a day. Seriously?”

“Totally,” said Pedro. “It’s most of the day, every day, honestly. Never mind the tedium. What’s really killing me is that I don’t feel like I’m getting any real testing work done.”

“No kidding. There’s no time for it. There are only four squares left in the week. Plus, something you said earlier today about tons of bugs that aren’t related to setting up?”

“Right. When it comes to the stuff that I’m actually being asked to test, there’s lots of bugs there too. So my ‘testing time’ isn’t really testing. It’s mostly taken up with trying to reproduce and document the bugs.”

“Yes. In session-based test management, that’s bug investigation and reporting—B-time. And it does interrupt test design and execution—T-time—which is what produces actual test coverage, learning about what’s actually going on in the product. So, how much B-time?” He filled in three of the squares with Bs.

Bug Investigation and Reporting

“And T-time?”

He had room left to put in one lonely little T in the lower right corner.

Testing Time

“Wow,” I laughed. “One-fortieth of your whole week is spent in getting actual test coverage. The rest is all overhead. Have you told them how it affects you?”

“I’ve mentioned it,” he said.

“So look at this,” I suggested. “It’s even more clear when we use colour for emphasis.”

With Colour

“Whoa. I never looked at it that way. And then,” he paused. “Then they ask me, ‘Why didn’t you find that bug?'”

“Well,” I said, “considering the illusion they’re probably working under, it’s not an unreasonable question.”

“What do you mean?” Pedro asked.

“What does it say on your business card?”

“‘Software Testing’.”

“And what does it say on the door of the the test lab?”

“‘Test Lab’,” said Pedro.

“And they call you…?”

“Pedro.”

“No,” I laughed. “They say you’re a… what?”

“Oh. A tester.”

“So since you’re a tester, and since the door on the test lab says ‘Test Lab’, and your business card says ‘Testing’, they figure that’s all you do. The illusion is what Jerry Weinberg calls the Lumping Problem. All of those different activities—administrative compliance, setup, bug investigation and reporting, and test design and execution—are lumped into a single idea for them.” And I drew it for him.

Management's Dream

“That’s management’s illusion, there. Since, in their imagination, you’ve got forty hours of testing time in a week, it’s not unreasonable for them to wonder why you didn’t find that bug.”

“Hmmm. Right,” said Pedro.

“When in fact, what they’re getting from you is this.” And I drew it for him.

Testing Reality

“For testing—actual interaction with the product, looking for problems, you’ve got one-fortieth of the time they think you’ve got. One lonely little T. Is that part of your test report?”

“Oy,” he said. “Maybe I should show them something like this.”

“Maybe you should,” I said.

A couple of nights later, I showed that page of my notebook to James Bach over Skype. “Wow,” he said. “That guy could be forty times more productive!”

“Forty?”

“Well, no, not really, of course. But suppose the programmers checked their work a little more carefully, or suppose the testers practiced writing more concise bug reports and sharpened their investigating skill. One of those two things could cut the bug investigation time by a third. That would give more time for testing, when they’re not being interrupted by other stuff. What if they cut the setup time by a half, and that administrivia by half?”

“Four, fourteen…” I said. “That would give eighteen more hours for testing and bug investigation, for a total of 22 hours. And even if they’re still doing two hours of bug investigation for every one hour of testing time… well, that’s seven times more productive, at least.”

“Seven times the test coverage if they get some of those issues worked out, then,” said James.

“Maybe de-lumping is the kind of thing lots of testers would want to do in their test reports,” I said.

How about you?

Delivering the News (Test Reporting Part 3)

Monday, February 27th, 2012

In the last post in this series, I noted some potentially useful structual similarities between bug reports (whether oral or written) and newspaper reports. This time, I’ll delve into that a little more.

To our clients, investigative problem reports are usually the most important part of the product story. The most respected newspapers don’t earn their reputations by reprinting press releases; they earn their reputations through investigative journalism. As testers (or, heaven help us, quality assurance) people, we tend to be chartered to look for problems, and to investigate them in ways that are most helpful to programmers, managers, and our other clients. A failing test on its own tells us little, and a failing check even less; as I pointed out here, a failing test is only an allegation of a problem. Investigation and study of a failing test is likely to inform us of something more useful: whether someone will perceive a problem that threatens the value of the product. I’ll talk more about the nature of problems in a later post, but for now, think of a product problem in terms of a perceived absence of or threat to some dimension of quality. (See the Heuristic Test Strategy Model for one list of quality criteria; see Software Quality Characteristics, by Rikard Edgren, Henrik Emilsson and Martin Jansson for another.) Since the manager’s goal is generally to release a product at her desired level of quality, problems that could threaten that goal are likely to be interesting and important. Or, as they say in the newspaper business, “if it bleeds, it leads”.

Potential showstoppers are usually the most important stories. In the 1990s, I was a technical support person, a tester, a program manager, and a programmer for a mass-market, commercial shrink-wrap software company. Since we had millions of customers, even minor problems could have a big impact on technical support and on the reputation of our products in the market. The market was enormous, hardware and software were even less standardized than they are now, and we worked under a great deal of time pressure. Classifying and prioritizing problems was contentious. One of the important classification questions was “What should we consider a showstopper?” One of the senior programmers came up with an answer that I’ve used ever since:

Showstopper (n.): Something that makes more sense to fix than to ship.

(I talked about showstoppers here.) In a development project, a showstopper—any threat to the timely release of the project—is a page-one, above-the-fold story, a story that you can see and begin to read without opening the newspaper or picking it up.

There’s always one story that leads. The most important threat to a timely, successful release may be a single problem, or it may be a collection of problems—what Ian Mitroff calls a mess. Do we have a problem, or a couple of problems, or a mess? No matter what the answer, there’s only so much space on the front page above the fold. Will you have one headline, or two, or three? What will that headline say? What will the lead paragraph of each story look like? Does the lead paragraph cover the five Ws—who, what, where, when, and why? If not, are those questions answered shortly? Might there be a reasonable reason not to answer them?

There’s only one front page, and there’s almost always more than one story on it. Our clients need to be able to absorb the lead story and the other front-page stories quickly, so we need to be able to provide headlines, lead paragraphs, and details in appropriate proportions. See an example front page here, with details that follow.

Very infrequently, serious newspapers give their entire front page to a story. In those cases, it’s usually an overwhelmingly important story, or one that threatens the newspaper or journalism itself.

The most compelling stories are those that have an impact on people. Although product problems are often technical in nature, the “making sense” part of the showstopper decision is focused on the business. Testers must to be able to connect technical problems with business risk. Problems related to technical correctness are often easy to describe, but they might not be important. The skill of bug advocacy—making sure that the customer is aware of the best possible motivations for fixing the bug—depends on your ability to report the bug in terms of its most significant effect on the business. Ben Simo has a lovely way to sum this up. Early in his career, when Ben was trying to advocate a bug fix, his project manager said, “Revenue is king. Liability is queen. Tell me how this bug impacts them.”

The number of stories usually isn’t as important as the significance of the stories. This is another way in which test reports can be like newspapers. We don’t usually evaluate the quality of a newspaper by the number of stories in it. Instead, we look at the significance, relevance, and credibility of the stories.

It may take time to distinguish between a breaking story and a major story. Sometimes the news cycle doesn’t afford time for investigation, even though the story might be important. Information gets passed around the project at various moments during the test and development cycle. Sometimes a discovery happens just before a meeting. Smart reporters know to balance urgency and restraint when there’s a breaking story. When I worked in commercial mass-market software in the 1990s, we sometimes discovered a terrible-looking problem a couple of hours before release. Such discoveries would trigger arousal (no, not sexual friskiness, but arousal in the psychological sense of being suddenly snapped awake and alert to danger). All of a sudden, we’d be noticing all kinds of things that we hadn’t noticed before, and most of them were non-problems of one kind or another. We were biased by fear. We called it the “snakes on everything” moment. When reporting, testers need to take stock of the emotional factors surrounding them, and report cautiously and accurate. An hour from now, an allegation or a rumour might be an important story—or it might be nothing.

Non-problems aren’t news. There’s a pattern of stories in the first section of the newspaper: they’re mostly stories about problems, and there’s a reason for that: problems compel attention. Our emotional systems evolved to help keep us out of trouble. Problems or threats trigger arousal. Things that are going well are nice to hear about, but they don’t engage emotions in the same way as problems do. In a software development project, non-problems have relatively little significance for project managers. Routine daily successes don’t threaten the project, and therefore need less attention.

Numbers, like pictures, are illustrations, not the whole story. A qualitative report is not quantity-free; after all, identifying the presence or absence of something involves counting to one, and the degree of some attribute of interest can be illustrated by number. But just as a pictorial illustration isn’t the item it depicts, a numerical illustration isn’t the story it might help to describe. A picture looks a part of a scene through a particular lens; a number focuses on one attribute using a particular metric. Each one may emphasizes some observations at the expense of other observations. Each one may crop out detail. Each one may magnify or distort.

Since the product and testing stories are multi-dimensional, be prepared to show the dimensions. Newspapers reports always have a bias, but reporters and editors often attempt to manage the bias by providing alternative sources of information, and alternative interpretations. A story of any length often includes multiple stories, or multiple threads of the main story. When tables of data are appropriate, newspapers print tables (think stock quotes in the business section, or box scores or line scores in sports). Products, coverage, quality, and problems are all multi-dimensional, multi-variate, and qualitative. Where there’s a mass of data, consider using tables such as dashboards or coverage tables. Pin numbers to reliable measurements (see the slip charts, the detailed impact case methods, and the subjective impact methods in Weinberg’s Quality Software Management, Volume 2: First Order Measurement; and pay attention to validity—see Kirk and Miller’s Reliability and Validity in Qualitative Research and Shadish, Cook, and Campbell’s Experimental and Quasi-Experimental Designs for Generalized Causal Inference).

Describe your coverage. Boris Beizer described coverage as “any metric of test completeness with respect to a test selection criterion”. That suggests that it is possible to quantify coverage if you have a quantifiable test selection criterion. For example, if a single-digit field accepts any digit from 0 to 9, one could select 10 tests and claim complete coverage based on that criterion. Mind, that data coverage doesn’t account for flow or sequence coverage; suppose that a bug was triggered only when a 7 replaced a 3 in that field. Since the overall number of possible tests is infinite, test selection criteria are based on models. In practical terms, this means that overall test coverage is some finite number over an infinite number. If you report that accurately, you’re stuck with a number that remains asymptotically close to zero. Instead, focus on the qualitative, and describe your coverage on an ordinal scale. Level 0 means “We know nothing about this area of the product.” Use Level 1 to say “We have done smoke or sanity testing; at this point, we’ve determined whether the product is even stable enough for serious testing.” Level 2 means “we’ve tested the common, the core, the critical, the happy path; our testing has been focused on can it work.” Level 3 means “We’ve tested the harsh, the complex, the challenging, the extreme, the exceptional; if there were a serious problem in this area, we’d probably know about it by now.” In this system, the numbers are barely more than labels for a qualitative evaluation, so don’t be tempted to do serious math with them.

Braiding The Stories (Test Reporting Part 2)

Friday, February 24th, 2012

We were in the middle of a testing exercise at the Amplifying Your Effectiveness conference in 2005. I was assisting James Bach in a workshop that he was leading on testing. He presented the group with a mysterious application written by James Lyndsay—an early version of one of the Black Box Test Machines. “How many test cases would you need to test this application?” he asked.

Just then Jerry Weinberg wandered into the room. “Ah! Jerry Weinberg!” said James. “One of the greatest testing experts in the world! He’ll know the answer to this one. How many test cases would you need to test this application, Jerry?”

Jerry looked at the screen for a moment. “Three,” he said, firmly and decisively.

James knew to play along. “Three?!“, he said, in a feigned combination of amazement, uncertainty, and curiosity. “How do you know it’s three? Is it really three, Jerry?”

“Yes,” said Jerry. “Three.” He paused, and then said drily, “Why? Were you expecting some other number?”

In yesterday’s post, I was harshly critical of pass vs. fail ratios, a very problematic yet startlingly common way of estimating the state of the product and the project. When I point out the mischief of pass vs. fail ratios, some people object. “In the real world,” they say, “we have to report pass vs. fail ratios to our managers, because that’s what they want.” Yet bogus reporting is antithetical to the “real world”. Pass vs. fail ratios come from the the fake world, a world where numbers have magical properties to soothe troubled and uncertain souls. Still, there’s no question that managers want something. It’s our mandate to give them something of value.

Some people say that managers want numbers because they want to know that we’re measuring. I’ve found two ways of thinking about measurement that have been very useful to me. One is the definition from Kaner and Bond’s splendid paper “Software Engineering Metrics: What Do They Measure and How Do We Know?”: “Measurement is the empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them.” I think that’s a superb definition of quantitative measurement, and the paper includes a set of probing questions to test the validity of a quantitative measurement. Pass vs. fail ratios fall down badly when they’re subjected to those tests.

Jerry Weinberg offers another definition of measurement that I think is more in line with what managers really want: “Measurement is the art and science of making reliable (and significant) observations.” (The main part of the definition comes from Quality Software Management, Vol. 2: First-Order Measurement; the parenthetical comes from recent correspondence over Twitter.) That’s a more general, inclusive definition. It incorporates Kaner and Bond’s notion of quantitative measurement, but it’s more welcoming to qualitative, first-order approaches. First-order measurement, as Jerry describes it, provides answers to questions like “What seems to be happening? and What should I do now?” It entails a minimum of fuss, and tends to be direct, unobtrusive, inexpensive, and qualitative, leading either to immediate action or a decision to seek more information. It’s a common, misleading, and often expensive mistake in software development to leap over first-order measurement and reporting in favour of second-order—less direct, more quantified, more abstract, and based on more elaborate and vulnerable models.

My experience, as a tester, a programmer, a program manager, and a consultant, tells me that to manage a project well, you need a good deal of immediate and significant information. “Immediate” here doesn’t only mean timely; it also means unmediated, without a bunch of stuff getting in between you and the observation. In particular, managers need to know about problems that threaten the value of the product and the on-time, successful completion of the project. That knowledge requires more than abstract data; it requires information. So, as testers, how can we inform the decision-makers? In our Rapid Software Testing class, James Bach and I have lately taken to emphasizing this: We must learn to describe and report on the product, our testing, and the quality of our testing. This involves constructing, editing, narrating, and justifying a story in three lines that weave around each other like a braid. Each line, or level, is its own story.

Level 1: Tell the product story. The product story is a qualitative report on how the product can work, how it fails, and how it might fail in ways that matter to our clients. “Working”, “failure”, and “what matters” are all qualitative evaluations. Quality is value to some person; in a business setting, quality is value to some person who matters to the business. A qualitative report about a product requires us to relate the nature of the product, the people who matter, and the presence or absence of value, risks, and problems for those people. Qualitative information makes it possible for our clients to make informed decisions about quality.

Level 2: To make the product story credible, tell the testing story. The testing story is about how we configured, operated, observed, and evaluated the product; what we actually did and what we actually saw. The testing story gives warrant to the product story; it helps our clients understand why they should believe and trust the product story we’re giving. The testing story is centred around the coverage that we obtained and the oracles that we applied. Coverage is the extent to which we’ve tested the program; it’s about where we’ve looked and how we’ve looked, and it’s also about what’s uncovered—where we might not have looked yet, and where we don’t intend to look. Oracles are central to evaluation; they’re the principles and mechanisms that allow us to recognize a problem. The product story will likely feature problems in the product; the testing story, where necessary, includes an account of how we knew they were problems, for whom they would be problems, and inferences about how serious the problems it might be. We can make inferences about the significance of problems, but not ultimate conclusions, since the decision of what matters and what constitutes a problem lies with the product owner. The product story and our clients’ reactions to it will influence the ongoing testing story, and vice versa.

Level 3: To make the testing story credible, tell a story about the quality of the testing. Just as the product story needs warrant, so too does the testing story. To tell a story about the quality of testing requires us to describe why the testing we’ve done has been good enough, and why the testing we haven’t done hasn’t been so important so far. The quality-of-testing story includes details on what made testing harder or slower, what made the product more or less testable, what the risks and costs of testing are, and what we might need or recommend in order to provide better, more accurate, more timely information. The quality-of-testing story will shape and be shaped by the other two stories.

Develop skills to tell and frame stories. People sometimes justify presenting invalid numbers in lieu of stories by saying that numbers are “efficient”. I think they mean “fast”, since efficiency of communication depends not only on speed, but also on value, relevance, validity, and the level of detail your client needs. In order to frame stories appropriately and hit the right level of detail…

Don’t think data feed; think the daily news. Testing is like investigative journalism, researching and delivering stories to people. The newspaper business knows how to direct attention efficiently to the stories in which we’re interested, such that we get the level of detail that we seek. Some of those strategies include:

  • Headlines. A quick glance over each page tells us immediately what, in the editors’ judgement, are the most salient aspects of any given story. Headlines come in different sizes, relative to the editors’ assessment of the importance of the story.
  • Front page. The paper comes folded. The stories that the paper deems most important to its reader are on the front page, above the fold. Other important stories are on the front page below the fold. The page is laid out to direct our attention to what we find most relevant, and to allow us to focus and refocus on items of interest.
  • Continuation. When an entire story is too long to fit on the front page, it’s abbreviated and the story continues elsewhere. This gives the reader the option of following the story or looking at other items on the front page.
  • Coverage areas. The newspaper is organized into sections (hard news, business, sports, life and leisure, arts, real estate, cars, travel, and so forth). Each section comes with its own front page, which generally includes headlines and continuations of its own.
  • Structured storytelling. Newspaper stories tend to be organized in spiralling levels of detail, such that the story is set up to follow the inverted pyramid (the link is well worth reading). The story typically begins with the most newsworthy information, usually immediately addressing the five W questions—who, what where, why, and when, plus how—and the the story builds from there. The key is that the reader can absorb information to the level of detail she seeks, continuing to the end of the story or jumping out when she’s satisfied.
  • Identifying who is involved and who is affected. Reporters and editors contextualize their stories. Just as in testing, people are the most important element of the context. A story is far more compelling when it affects the reader or people that the reader cares about. A good story often helps to clarify why the reader should care.
  • Varying approaches to delivering information. Newspapers often use a picture to helps to illustrate or emphasize an important aspect of a story. In the business or sports sections, where quantitative data is often crucial, information may be organized in tables, or trends may be illustrated with charts. Notice that the stories—first-order reports—are always given greater prominence than the tables of stock quotes league standings, and line scores.
  • Sidebars. Some stories are illuminated by background information that might break the flow of the main story. That information is presented in parallel; in another thread, as we might say.
  • Daily (and in the world of the Web, continuous) delivery of information. My newspaper arrives at a regular time each day, a sort of daily heartbeat for the news cycle. The paper’s Web site is updated on a continuous basis. Information is available both on a supply and a demand basis; both when I expect it and when I seek it.
  • Identifiable sources. Well-researched stories gain credibility by identifying how, where, when, and from whom the information was obtained. This helps to set up degrees of trust and skepticism in the reader.

One important note: These approaches apply to more than text. Testers need to extend these patterns not only to written or mechanical forms, but to oral discourse.

I’ll have more suggestions and additional parallels between test reporting and newspapers in the next post in this series.

Scripts or No Scripts, Managers Might Have to Manage

Wednesday, December 21st, 2011

A fellow named Oren Reshef writes in response to my post on Worthwhile Documentation.

Let me be the devil’s advocate for a post.

Not having fully detailed test steps may lead to insufficient data in bug reports.

Yup, that could be a risk (although having fully detailed steps in a test script might also lead to insufficient data in bug reports; and insufficient to whom, exactly?).

So what do you do with a problem like that? You manage it. You train the tester, reminding her of the heuristic that each problem report needs a problem description; an example of something that shows the problem; and why she thinks it’s a problem (that is, the oracle; the principle or mechanism by which the tester recognizes the problem). Problem, example, and why; PEW. You praise and reward the tester for producing reports that follow the PEW heuristic; you critique reports that don’t have them. You show the tester lots of examples of bug reports, and ask her to differentiate between the good ones and the bad ones, why each one might be consider good or bad, and in what ways. If the tester isn’t getting it, you have the tester work with and be coached by someone who does get it. The coach talks the tester through the process of identifying a problem, deciding why it’s a problem, and outlining the necessary information. Sometimes it’s steps and specific data; sometimes the steps are obvious and it’s only the data you need to specify; sometimes the problem happens with any old data, and it’s the steps that are important. And sometimes the description of the problem contains enough information that you need supply neither steps nor data. As a tester under time pressure, she needs to develop the skill to do this rapidly and well—or, if nothing works, she might have to find a job for which she is better suited.

You can argue that a good tester should include the needed information and steps in her bug report, but this raise (at least) two problems:

– The same information may be duplicated across many bugs, and even worst it will not be consistent.

As a manager, I can not only argue that a tester should include the needed information; I can require that a tester include the needed information. Come on, Mr. Advocate… this is a problem that a capable tester and a capable test manager (and presumably your client) can solve. If “the same” information is duplicated across many bugs, might that be an interesting factor worth noting? A test result, if you will? Will this actually persist for long without the test manager (or test leads, or the test team) noticing or managing it?

And in any case, would a script solve the problem that you post above? If you can solve that problem in a script, can you solve it in a (set of) bug report(s)?

Writing test steps is not as trivial as it sounds (for example due to cognitive biases, or simply by overlooking steps that seems obvious to you), and to be efficient they also need to be peer reviewed and tested. You don’t want that to happen in a bug report.

“Writing test steps is not as trivial as it sounds.” I know. It’s non-trivial in terms of time, and it’s non-trivial in terms of skill, and it’s non-trivial in terms of cost. That’s why I write about those problems. That’s why James Bach writes about them.

Again: how do you solve problems like testers providing inefficient repro steps? You solve it with training, practice, coaching, review, supervision, observation, interaction… that is, if you don’t like the results you’re getting, you steer the testers in the direction you want them to go, with leadership and management.

The tester may choose the same steps over and over, or steps that are easier for her but does not represent real customers.

Yes, I often hear things like this to justify poor testing. “Real customers” according to whom? It seems as though many organizations have a problem recognizing that hackers are real; that people under pressure are real; that people who make mistakes are real; that people who can become distracted are real. That people who get up and go away from the keyboard, such that a transaction times out are real.

Is it the role of testers to behave always like idealized “real” customers? That’s like saying that it’s the role of airport security to assume that all of the business class customers are “real” business people. I’d argue that it’s nice for testers to be able to act like customers, but it’s far more important for testers to act like testers. It’s the tester’s role to identify important vulnerabilities in the product. Sometimes that involves behaving like a typical customer, and sometimes it involves behaving like an atypical customer, or and sometimes it involves behaving like someone who is not a customer at all. But again, mostly it involves behaving like a tester.

Again you may argue that a good tester should take all that into account, but it’s not that simple to verify it especially for tests involving many short trivial steps.

Maybe it isn’t that simple. If that’s a problem, what about logging? What about screen capture tools? Such tools will track activities far more accurately than a script the tester allegedly followed. After all, a test script is just a rumour of how something should be done, and the claim that the script was followed is also a rumour. What about direct supervision and scrutiny? What about occasional pairing? What about reviewing the testers’ work? What about providing feedback to testers, while affording them both freedom and responsibility?

And would scripts solve that problem when (for example) you’re a recording bug that you’ve just discovered (probably after deviating from a script)? How, exactly? What happens when a problem identified by a script is fixed? Does the value of the script stay constant over time?

Detailed test steps (at least to some extent) might important if your test activity might be transferred to another offshore team someday (happened to me a few weeks ago, I sent them a test document with only high level details and hoped for the best), or your customer requires in-depth understanding of your tests (a multi-billion Canadian telecommunication company insisted on getting those from us during the late 90’s, we chose the least readable TestDirector export format and shipped it to them…).

Ah, yes. “I sent them a test document with only high level details and hoped for the best.” What can I say about “hope” as a management approach? Does a pile of test scripts impart in-depth understanding? Or are they (as I suspect) a way of responding to a question that you didn’t know how to answer, which was in fact a question that the telco didn’t know how to ask?

Going through some set of actions by rote is not a test. A test script is not a test. A test is what you think and what you do. It is a complex, cognitive activity that requires the presence or the development of much tacit knowledge. Raw data or raw instructions at best provide you with a miniscule fraction of what you need to know. If someone wanted in-depth understanding of how a retail store works, would you send them a pile of uncontextualized cash register receipts?

The Devil’s Advocate never seems to have a thoughtful manager for a client. I would suggest that a tester neither hire nor work for the devil.

Thank you for playing the devil’s advocate, Oren.