Blog Posts for the ‘Management’ Category

Breaking the Test Case Addiction (Part 7)

Monday, June 10th, 2019

Throughout this series, we’ve been looking at an an alternative to artifact-based approaches to testing: an activity-based approach.

In the previous post, we looked at a kind of scenario testing, using a one-page sheet to guide a tester through a session of testing. The one-pager replaces explicit, formal, procedure test cases with a theme and a set of test ideas, a set of guidelines, or a checklist. The charter helps to steer the tester to some degree, but the tester maintains agency over her work. She has substantial freedom make her own choices from one moment to the next.

Frieda, my coaching client, anticipated what her managers would say. In our coaching session, she played the part of her boss. “With test cases,” she said, in character, “I can be sure about what has been tested. Without test cases, how will anyone know what the tester has done?”

A key first step in breaking the test case addiction is acknowledging the client’s concern. I started my reply to “the manager” carefully. “There’s certainly a reasonable basis for that question. It’s important for managers and other clients of testing to know what testing has been done, and how the testers have done it. My first step would be to ask them about those things.”

“How would that work?”, asked Frieda, still in her role. “I can’t be talking to them all the time! With test cases, I know that they’ve followed the test cases, at least. How am I supposed to trust them without test cases?”

“It seems to me that if you don’t trust them, that’s a pretty serious problem on its own—one of the first things to address if you’re a manager. And if you mistrust them, can you really trust them when they tell you that they’ve followed the test cases? And can you trust that they’ve done a good job in terms of the things that the test cases don’t mention?”

“Wait… what things?” asked “the manager” with a confused expression on her face. Frieda played the role well.

“Invisible things. Unwritten things. Most of the written test cases I’ve seen refer only to conditions or factors that can be observed or manipulated; behaviours that can be described or encoded in strings or sentences or numbers or bits. It seems to me that a test case rarely includes the motivation for the test; the intention for it; how to interpret the steps. Test cases don’t usually raise new questions, or encourage testers to look around at the sides of the path.

“Now,” I continued, “some testers deal with that stuff really well. They act on those unspoken, unwritten things as they perform the test. Other testers might follow the test case to the letter — yet not find any bugs. A tester might not even follow the test case at all, and just say that he followed it. Yet that tester might find lots of important bugs.”

“So what am I supposed to do? Watch them every minute of every day?”

“Oh, I don’t think you can do that,” I replied. “Watching everybody all the time isn’t reasonable and it isn’t sustainable. You’ve got plenty of important stuff to do, and besides, if you were watching people all the time, they wouldn’t like it any more than you would. As a manager, you must to be able to give a fair degree of freedom and responsibility to your testers. You must be able to extend some degree of trust to them.”

“Why should I trust them? They miss lots of bugs!” Frieda seemed to have had a lot of experience with difficult managers.

“Do you know why they miss bugs?” I asked. “Maybe it’s not because they’re ignoring the test cases. Maybe it’s because they’re following them too closely. When you give someone very specific, formalized instructions and insist that they follow them, that’s what they’ll do They’ll focus on following the instructions, but not on the overarching testing task, which is learning about the product and finding problems in it.”

“So how should I get them to do that?”, asked “the manager”.

“Don’t turn test cases into the mission. Make their mission learning about the product and finding problems in it.”

“But how can I trust them to do that?”

“Well,” I replied, “let’s look at other people who focus on investigation: journalists; scientific researchers; police detectives. Their jobs are to make discoveries. They don’t follow scripted procedures. No one sees that as a problem. They all work under some degree of supervision—journalists report to editors; researchers in a lab report to senior researchers and to managers; detectives report to their superiors. How do those bosses know what their people are doing?”

“I don’t know. I imagine they check in from time to time. They meet? They talk?”

“Yes. And when they do, they describe the work they’ve done, and provide evidence to back up the description.”

“A lot of the testers I work with aren’t very good at that,” said Frieda, suddenly as herself. “I worry sometimes that I’m not good at that.”

“That’s a good thing to be concerned about. As a tester, I would want to focus on that skill; the skill of telling the story of my testing. And as a manager, I’d want to prepare my testers to tell that story, and train them in how to do it any time they’re asked.”

“What would that be like?”, asked Frieda.

“It varies. It depends a lot on tacit knowledge.”

“Huh?”

“Tacit knowledge is what we know that hasn’t been made explicit—told, or written down, or diagrammed, or mapped out, or explained. It’s stuff that’s inside someone’s head; or it’s physical things that people do that has become second nature, like touch typing; or it’s cultural, social—The Way We Do Things Around Here.

“The profile of a debrief after a testing session varies pretty dramatically depending on a bunch of context factors: where we are in the project, how well the tester knows the product, and how well we know each other.

“Let me take you through one debrief. I’ll set the scene: we’re working on a product—a project management system. Karla is an experienced tester who’s been testing the product for a while. We’ve worked together for a long time too, and I know a lot about how she tests. When I debrief her, there’s a lot that goes unsaid, because I trust her to tell me what I need to know without me having to ask her too much. We both summarize. Here’s how the conversation with Karla might play out.”

Me: (scanning the session sheet) The charter was to look at task updates from the management role. Your notes look fine. How did it go?

Karla: Yeah. It’s not in bad shape. It feels okay, and I’m mostly done with it. There’s at least one concurrency problem, though. When a manager tries to reassign a task to another tester, and that task is open because the assigned tester is updating it, the reassignment doesn’t stick. It’s still assigned to the original tester, not the one the manager assigned. Seems to me that would be pretty rare, but it could happen. I logged that, and I talked about it to Ron.

Me: Anything else?

Karla: Given that bug, we might want to do another session on any kind of update. Maybe part of a session. Ron tells me async stuff in Javascript can be a bear. He’s looking into a way of handling the sequence properly, and he should have a fix by the end of the day. I wouldn’t mind using part of a session to script out some test data for that.

Me: Okay. Want to look at that tomorrow, when you look at the reporting module? And anything else I should know?

Karla: I can get to that stuff in the morning. It’d be cool to make sure the programmers aren’t mucking around in the test environment, though. That was 20 minutes of Setup.

Me: Okay, I’ll tell them to stay out.

“And that’s it,” I said.

“That’s it?”, asked Frieda. “I figured a debrief would be longer than that.”

“Oh, it could be,” I replied. “If the tester is inexperienced or new to me; if the test notes have problems; if the product or feature is new or gnarly; or if the tester found lots of bugs or ran into lots of obstacles, the debrief can take a while longer.

When I want to co-ordinate testing work for a bunch of people, or when I anticipate that someone might want to scrutinize the work, or when I’m in a regulated environment, I might want to be extra-careful and structure the conversation more formally. I might even want to checklist the debriefing.

No matter what, though, I have a kind of internal checklist. In broad terms, I’ve got three big questions: How’s the product? How do we know? Why should I trust what we know, and what do we need to get a better handle on things?”

“That sounds like four questions,” Frieda smiled. “But it also sounds like the three-part testing story.”

“Right you are. So when I’m asking focused questions, I’d start with the charter:

  • Did you fulfill your charter? Did you cover everything that the charter was intended to cover?
  • If you didn’t fulfill the charter, what aspects of the charter didn’t get done?
  • What else did you do, even if it was outside the scope of the mission?

“What I’m doing here is trying to figure out whether the charter was met as written, or if we need to adjust the it to reflect what really happened. After we’ve established that, I’ll ask questions in three areas that overlap to some degree. I won’t necessarily ask them in any particular order, since each answer will affect my choice of the next question.”

“So a debriefing is an exploratory process too!” said Frieda.

“Absolutely!” I grinned. “I’ll tend to start by asking about the product:

  • How’s the product? What is it supposed to do? Does it do that?
  • How do you know it’s supposed to to that?
  • What did you find out or learn? In particular, what problems did you find?

“I’ll ask about the testing:

  • What happened in the course of the session?
  • What did you cover, and how did you cover it it?
  • What product factors did you focus on?
  • What quality criteria were you paying the most attention to?
  • If you saw problems, how did you know that they were problems? What were your oracles?
  • Was there anything important from the charter that you didn’t cover?
  • What testing around this charter do you see as important, but has not yet been done?

“Based on things that come up in response to these questions, I’ll probably have some others:

  • What work products did you develop?
  • What evidence do you have to back the story? What makes it credible?
  • Where can people find that evidence? Why, or why not, should we hang on to it?
  • What testing activity should, or could, happen next or in the more distant future?
  • What might be necessary to enable that activity?

“That last question is about practical testability.”

“Geez, that’s a lot of questions,” said Frieda.

“I don’t necessarily ask them all every time. I usually don’t have to. I will go through a lot of them when a tester is new to this style of working, or new to me. In those cases, as a manager, I have to take more responsibility for making sure about what was tested—what we know and what we don’t. Plus these kinds of questions—and the answers—help me to figure out whether the tester is learning to be more self-guided

“And then I’ve got three more on my list:

  • What factors might have affected the quality of the testing?
  • What got in the way, made things harder, made things slower, made the testing less valuable?
  • What ongoing problems are you having?
  • Frieda frowned. “A lot of the managers I’ve worked with don’t seem to want to know about the problems. They say stuff like, ‘Don’t come to me with problems; come to me with solutions.'”

    I laughed. “Yeah, I’ve dealt with those kinds of managers. I usually don’t want to go them at all. But when I do, I assure them that I’m really stuck and that I need management help to get unstuck. And I’ve often said this: ‘You probably don’t want to hear about problems; no one really does. But I think it would be worse for everyone if you didn’t know about them.’

    “And that leads to one more important question:

    • What did you spend your time doing in this session?”

    “Ummm… That would be ‘testing’, presumably, wouldn’t it?” Frieda asked.

    “Well,” I replied, “there’s testing, and then there’s other work that happens in the session.”

    We’ll talk about that next time.

Pressing the Green Button

Wednesday, December 19th, 2018

For years at conferences and meetups and in social media, I have been hearing regularly from testers who tell me that they must “sign off” on the product or deployment before it is released to production, or to review by the client. The testers claim that, after they have performed some degree of testing work, they must “approve” or “reject” the product. Here’s a fairly typical verbatim report from one tester:

In my current context, despite my reasoned explanations to the contrary, I am viewed as the work product gatekeeper and am newly positioned as such within a software controlled workflow that literally presents a green “approve” or red “reject” button for me to select after I “do QC” on the work product which as a bonus might be provided with a list of ambiguous client requests and/or a sketchy mock-up with many scattered revision notes (often superseded by verbal requests not logged).

It’s important to note that in project work, a mess of competing information is normal — not completely desirable, necessarily, but normal. When information about the product is unclear, it’s typically part of the tester’s job to identify where and how it’s unclear. Confusion and uncertainty ratchet up product and project risk.

After all, if you’re not sure about what the product is supposed to do, how can the developers be sure about it? In the unlikely event that the developers know how the product should work and you don’t, how will you recognize all of the important bugs? Whether you, the developers, or both aren’t straight on what the client wants, bugs will have time and opportunity to breed and to survive.

The tester continues:

Delivery of the product to the client for their review is generally held up until I press the green “approve” button. The expectation when I “approve” is that the product (which I did not build) is “error free”, meets contradictory loosely defined “standards” and satisfies the client (whom I have not met). The structure is such that I am not to directly communicate to the client, so all clarifications and questions I have are filtered through a project manager.

I am now beginning to frustrate the developers with whom I have previously built a great rapport by repeatedly rejecting their work products for even a single minor infraction. I also frustrate project managers delaying product delivery and going over budget. I predict there will soon be pressure from all sides to “just approve” and then later repercussions of “how/why did this get approved”. I combat this by providing long lists of observations, potential issues, questions, obstacles and coverage notes with each “rejection” or “approval” and I communicate to project managers that they do not need my approval to proceed and may override at anytime.

Some testers seem happy with the authority implicit in “approving” or “rejecting” the product. Most express some level of discomfort with the decision. To me, the discomfort is appropriate.

In the Rapid Software Testing view of the world, it is not the job of the tester to approve or disapprove of things. It is the job of the tester to identify reasons to believe that some person who matters might approve or disapprove of something, and to explain the bases for that belief. Decisions about what to do with a product, including approving or rejecting it, lie with a role called management.

If you’re in a situation like the tester above, and someone offers you the responsibility to approve and reject products, and you desire to be a manager, this is your big chance! Seize the opportunity—but don’t do it without the manager’s title and authority—and salary, while you’re at it. If you’re offered the approval or rejection decision without becoming a manager, though, I’d recommend that you politely decline the “offer”, and make a counteroffer—perhaps one like this:

“Thank you for honouring me with the offer to approve or reject the product. However, as a tester, I don’t believe that it is appropriate for me to make such decisions without management authority. Here’s what I would do in this tester’s situation, though; I’d say this:

“I will gladly test the product, learning about it through exploration and experimentation. I will evaluate the product for consistency with these (contradictory, loosely-defined) standards. If the product appears to be inconsistent with them, I will certainly let you know about that. If I see inconsistencies or contradictions in those standards, I will let you know about those too, so that you can decide how the standards apply to our product. But I won’t limit my testing to that.

“I will tell you about any important problems that I find. I will tell you about problems that appear inconsistent with things desirable to important people. Here’s an example of how I might categorize desirable things, and here’s an example of how I might recognize problems in the product.

“I will report on anything that appears consistent with some notion of an ‘error’. However, I will not assert that the product is error-free. I don’t know how I could do that. I don’t know how anyone can do that.

“I would prefer to interact freely and directly with stakeholders, for the purposes of obtaining clarifications and answers to questions I have without bothering the project manager. (I will, of course, keep responsible records of my interactions; and I will not presume to make decisions about product or project scope, since that’s a management function.)

“If you would prefer to restrict or mediate my access to stakeholders, that’s OK; I can work that way too. Doing so will likely come with a cost of extra time on the part of the project manager, and the risk of broken-telephone-style miscommunication between the stakeholders and me. However, if you’re prepared to take responsibility for that risk, I’m fine with it too.

“Since I am manager of neither the product, nor the project, nor the developers, I do not have the authority to direct them. However, I am happy to report on everything I know about the product—and the apparent problems and risks in it—to those who do have the required authority and responsibility, and they can make the appropriate decisions based on everything they know about the product and business and its needs.

“I am not a gatekeeper, or owner, or ‘approver’ of the quality of the product. I am not a manager or decision maker. I am a reporter on the status of the product, and of the testing, and of the quality of the testing, and I’ll report accordingly. My “approval” is immaterial; what matters is what managers and the business want. It is they, not I, who decide whether a problem is a showstopper or something we’re prepared to live with. It is they, not I, who decide whether problems are significant enough to extend the schedule or increase the budget for the project.

“It’s my job to contribute information to any decision to approve or reject, but it’s not my job to make that decision. I would like someone else to be responsible for the ‘approve’ or ‘reject’ checkbox as such. However, if the tool that we’re using restricts me to ‘approve’ and ‘reject’, let me tell you what those mean, because what they say is inconsistent with their normal English meanings, and we should all be aware of that.

“Pressing ‘Approve’ means this, and only this: ‘I am not aware of any problem in this area that threatens the value of the product, project, or business to any person that matters.’

Pressing ‘Reject’ means ‘I am aware of a specific problem or I have some reason to believe that there could be a problem in this area that I have not had the opportunity to identify yet.’ In other words, ‘reject’ means that I see risk; there’s something about the product or about the testing that I believe managers or the programmers should be aware of. ‘Reject’ means no more than that.

“In either case, we should frequently discuss my observations, potential issues, questions, obstacles and coverage notes, to avoid the possibility that I’m overlooking something important, or that I’m over-emphasizing the significance of particular problems.”

How you’re viewed depends on the commitments (another example here) that you make and declare about what you do, what you’re willing to do, and what you’re not willing to do. If your role, your profile and your commitments don’t match, getting them lined up is your most urgent and important job.

Related reading:
Signing Off
When Testers Are Asked For a Ship/No-Ship Opinion
Testers: Get Out of the Quality Assurance Business

How To Get What You Want From Testing (for Managers): The Q & A

Wednesday, November 22nd, 2017

On November 21, 2017, I delivered a webinar as part of the regular Technobility Webinar Series presented by my friend and colleague Peter de Jager. The webinar was called “How To Get What You Want from Testing (for Managers)”, and you can find it here.

Alas, we had only a one-hour slot, and there were plenty of questions afterwards. Each of the questions I received is potentially worthy of a blog post on its own. Here, though, I’ll try to summarize. I’ll give brief replies, suggesting links to where I might have already provided some answers, and offering an IOU for future blog posts if there are further questions or comments.

If the CEO doesn’t appreciate testing and QA enough in order to develop application of high quality how QA Manager can get battle?

First, in my view, testers should think seriously about whether they’re in the quality assurance business at all. We don’t run the business, we don’t manage the product, and we can’t assure quality. It’s our job to shine light on the product we’ve got, so that people who do run the business can decide whether it’s the product they want.

Second, the CEO is also the CQAO—the Chief Quality Assurance Officer. If there’s a disconnect between what you and the CEO believe to be important, at least one of two things is probably true: the CEO knows or values something you don’t, or you know or value something the CEO doesn’t. The knowledge problem may be resolved through communication—primarily conversation. If that doesn’t solve the disconnect, it’s more likely a question of a different set of values. If you don’t share those, your options are to adopt the CEO’s values, or to accept them, or to find a CEO whose values you do share.

How to change managers to focus on the whole (e.g. lead time) instead of testing metrics (when vendor X is coding and vendor Y is testing?

As a tester, it’s not my job to change a manager, as such. I’m not the manager’s manager, and I don’t change people. I might try to steer the manager’s focus to some degree, which I can do by pointing to problems and to risks.

Is there a risk associated with focusing on testing metrics instead of on the whole? My answer, based on experience, is Yes; many managers have trouble observing software and software development. If you also believe that there’s risk associated with metrics fixation, have you made those risks clear to the manager? If your answer to that question is “I’ve tried, but I haven’t been successful,” get in touch with me and perhaps I can be of help.

One way I might be able to help right away is to recommend some reading: “Software Engineering Metrics: What Do They Measure and How Do We Know” (Kaner and Bond); by Measuring and Managing Performance in Organizations, by Robert Austin; and by Jerry Weinberg’s Quality Management series, especially Quality Software Management, Vol. 2: First Order Measurement (also available as two e-books, “How to Observe Software Systems” and “Responding to Significant Software Events“). Those works outline the problems, and suggest some solutions.

An important suggestion related to this is to offer, to agree upon, and to live up to a set of commitments, along with an understanding of what roles involve.

How do you ensure testers put under the microscope the important bugs and ignore the trivial stuff?

My answers here include: training, continuous feedback, and continuous learning; continuous development, review, and refinement of risk models; providing testers with knowledge of and access to stakeholders that matter. Good models of quality criteria, product elements, and test techniques (the Heuristic Test Strategy Model is an example) help a lot, too.

Suggestions for the scalability of regression testing, specifically when developers say “this touches everything.”

When the developer says “this touches everything”, one interpretation is “I’m not sure what this touches” (since “this touches everything” is, at best, rarely true). “I’m not sure what this touches,” in turn, really means “I don’t actually understand what I’m modifying”. That interpretation (like others revealed in the “Testing without Machinery” chapter in Perfect Software and Other Illusions About Testing) points to a Severity 0 product and project risk.

So this is not simply a question about the scalability of regression testing. This is a question about the scalability of architecture, design, development, and programming. These are issues for the whole team—including managers—to address. The management suggestion is to steer things towards getting the developer’s model of the code more clear, refactoring the code and the design until it’s comprehensible.

I’ve done some presentations on regression testing before, and some blog posts here and here (although, ten years later, we no longer talk about “manual tests” and “automated tests”; we do talk about testing and checking.)

How can we avoid a lot of back and forth, e.g. submitting issues piecemeal vs. submitting in a larger set.

One way would be to practice telling a three-part testing story and delivering the news. That said, I’m not sure I have enough context to help you out. Please feel free to leave a comment or otherwise get in touch.

How much of what you teach can expand to testing in other contexts?

Plenty of it, I’d say. In Rapid Software Testing, we take an expansive view of testing; evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, and plenty of other stuff. That’s applicable to lots of contexts. We also teach applied critical thinking—that is, thinking about thinking with the intention of avoiding being fooled. That’s extensible to lots of domains.

I’m curious about the other contexts you might have in mind, and why you ask. If I can be of assistance, please let me know.

I agree 100% with everything you say, and I feel daily meetings are not working, because everybody says what they did and what they will do and there is no talk about the state of the product.

That’s a significant test result—another example of the sort of thing that Jerry Weinberg is referring to in the “Testing without Machinery” chapter in Perfect Software and Other Illusions About Testing. I’m not sure what your role is. As a manager, I would mandate reports on the status of the product as part of the daily conversation. As a tester, I would provide that information to rest of the team, since that’s my job. What I found about the product, what I did, and what happened when I did are part of that three-part testing story.

Perhaps test cases are used as they are quantifiable and helps organise the testing of parts of the product.

Sure. Email messages are also quantifiable and they also help to organise the testing of parts of the product. Yet we don’t evaluate a business by the number of email messages it produces. Let’s agree to think all the way through this.

“Writing” is not “factory work” either, but as you say, many, many tools help check spelling, grammar, awkward phrases, etc.

Of course. As I said in a recent blog post, “A good editor uses the spelling checker, while carefully monitoring and systematically distrusting it.” We do not dispute the power and usefulness of tools. We do remain skeptical about what will happen when tools are put in the hands of those who don’t have the skill or wisdom to apply them appropriately.

I appreciate the “testing/checking” distinction, but wish you had applied the labels in the other way, given the rise of ‘the checklist” as a crucial tool in both piloting and surgery—application of the checklist is NOT algorithmic, when done as recommended.

Indeed. I’m confident, though, that smart people can keep track of the differences between “checking” or “a check” and a “checklist”, just as they can keep track of the differences between “testing” or “a test” and “testimony”.

I would include “breaking” the product (to see if it fails gracefully and to find its limits).

I prefer to say “stressing” the product, and I agree absolutely with the goal to discover its limits, whether it fails gracefully, and what happens when it doesn’t.

“Breaking” is a common metaphor, to be sure. It worries me to some degree because of the public relations issue: ““The software was fine until the testers broke it.” “We could ship our wonderful product on time if only the testers would stop breaking it.” “Normal customers wouldn’t have problems our wonderful product; it’s just that the testers break it.” “There are no systemic management or development problems that have been leading to problems in the product. Nuh-uh. No way. The testers broke it.

What about problems with the customer/user? Uninformed use or, even more problematical, unexpected use (especially if many users do the unexpected — using Lotus 123 as a word processor, for instance). I would argue you need to “model the human (including his or her context of machine, attempted use, level of understanding, et al.).

And I would not argue with you. I’d agree, especially with respect to the tendency of users to do surprising things.

I think stories about the product should also include items about what the product did right (or less bad than expected), especially items that were NOT the focus of the design (ie, how well does Lotus 123 work as a word processor?).

They could certainly include that to some degree. After all, the first items in our list of what to report in the product story are what it is and what it does, presumably successfully. Those things are important, and it’s worth reporting good news when there’s some available. My observation and experience suggests that reporting the good news is less urgent than noting what the product doesn’t do, or doesn’t do well, relative to what people want it to do, or hope that it does. It’s important, to me, that the good news doesn’t displace the important bad news. We’re not in the quality reassurance business.

So far, you appear to be missing much of the user side– what will they know, what context will they be in, what will they be doing with the product? (A friend was a hero for getting over-sized keyboards for football players to deal with all the typing problems that appeared in the software but were the consequence of huge fingers trying to hit normal-sized keytops.

That’s a great story.

I am really missing any mention of users, especially in a age of sensitivity to disabilities like lack of sight, hearing, manual dexterity, et al. It wasn”t until 11:45 or so that I heard about “use” and “abuse.” Again, all good stuff, but missing a huge part of wht I think is the source of risk.

I was mostly talking to project managers and their relationships with testers in this talk (and at that, I went over time). In other talks (and especially when talking to testers), I would underscore the importance of modeling the end user’s desires and actions.

Please in the blog, talk about risk from unexpected use (abuse and unusual uses).

The short blog post that I mentioned above talks about that a bit. Here is a more detailed one. Lately, I’ve been enjoying Gojko Adzic’s book Humans vs. Computers, and can recommend it.

Taking Severity Seriously

Wednesday, January 14th, 2015

There’s a flaw in the way most organizations classify the severity of a bug. Here’s an example from the Elementool Web site (as of 14 January, 2015); I’m sure you’ve seen something like it:

Critical: The bug causes a failure of the complete software system, subsystem or a program within the system.
High: The bug does not cause a failure, but causes the system to produce incorrect, incomplete, inconsistent results or impairs the system usability.
Medium: The bug does not cause a failure, does not impair usability, and does not interfere in the fluent work of the system and programs.
Low: The bug is an aesthetic (sic —MB), is an enhancement (ditto) or is a result of non-conformance to a standard.

These are serious problems, to be sure—and there are problems with the categorizations, too. (For example, non-conformance to a medical device standard can get you publicly reprimanded by the FDA; how is that low severity?) But there’s a more serious problem with models of severity like this: they’re all about the system as though no person used that system. There’s no empathy or emotion here; there’s no impact on people. The descriptions don’t mention the victims of the problem, and they certainly don’t identify consequences for the business. What would happen if we thought of those categories a little differently?

Critical: The bug will cause so much harm or loss that customers will sue us, regulators will launch a probe of our management, newspapers will run a front-page story about us, and comedians will talk about us on late night talk shows. Our company will spend buckets of money on lawyers, public relations, and technical support to try to keep the company afloat. Many capable people will leave voluntarily without even looking for a new job. Lots of people will get laid off. Or, the bug blocks testing such that we could miss problems of this magnitude; go back to the beginning of this paragraph.

High: The bug will cause loss, harm, or deep annoyance and inconvenience to our customers, prompting them to flood the technical support phones, overwhelm the online chat team, return the product demanding their money back, and buy the competitor’s product. And they’ll complain loudly on Twitter. The newspaper story will make it to the front page of the business section, and our product will be used for a gag in Dilbert. Sales will take a hit and revenue will fall. The Technical Support department will hold a grudge against Development and Product Management for years. And our best workers won’t leave right away, but they’ll be sufficiently demoralized to start shopping their résumés around.

Medium: The bug will cause our customers to be frustrated or impatient, and to lose faith in our product such that they won’t necessarily call or write, but they won’t be back for the next version. Most won’t initiate a tweet about us, but they’ll eagerly retweet someone else’s. Or, the bug will annoy the CEO’s daughter, whereupon the CEO will pay an uncomfortable visit to the development group. People won’t leave the company, but they’ll be demotivated and call in sick more often. Tech support will handle an increased number of calls. Meanwhile, the testers will have—with the best of intentions—taken time to investigate and report the bug, such that other, more serious bugs will be missed (see “High” and “Critical” above). And a few months later, some middle manager will ask, uncomprehendingly, “Why didn’t you find that bug?”

Low: The bug is visible; it makes our customers laugh at us because it makes our managers, programmers, and testers look incompetent and sloppy—and it causes our customers to suspect deeper problems. Even people inside the company will tease others about the problem via grafitti in the stalls in the washroom (written with a non-washable Sharpie). Again, the testers will have spent some time on investigation and reporting, and again test coverage will suffer.

Of course, one really great way to avoid many of these kinds of problems is to focus on diligent craftsmanship supported by scrupulous testing. But when it comes to that discussion in that triage meeting, let’s consider the impact on real customers, on the real people in our company, and on our own reputations.

Very Short Blog Posts (21): You Had It Last!

Tuesday, November 4th, 2014

Sometimes testers say to me “My development team (or the support people, or the managers) keeping saying that any bugs in the product are the testers’ fault. ‘It’s obvious that any bug in the product is the tester’s responsibility,’ they say, ‘since the tester had the product last.’ How do I answer them?”

Well, you could say that the product’s problems are the responsibility of the tester because the tester had the product last—and that a successful product was successful because the programmers and the business people did such a good job at preventing bugs. But that would be to explain any failures in the product in one way, and to explain any successes in the product in a completely different way.

Instead, let’s be consistent. Testers don’t put the bugs in, and testers miss some of the bugs because bugs are, by their nature, hidden. Moreover, the bugs are hidden so well that not even the people who put them in could find them. The bugs are hidden by people, and by the consequences of how we choose to do software development. So let’s all work to prevent the bugs, and to find them more quickly. Let’s talk about problems in development that allow bugs to hide. Let’s all work on testability, so that we can find bugs earlier, and more easily, before the bugs have a chance to hide deeply. And let’s all share responsibility for our failures and our successes.

Rising Against the Rent-Seekers

Monday, August 25th, 2014

At CAST 2014, a quiet, modest, thoughtful, and very experienced man named James Christie gave a talk called “Standards: Promoting Quality or Restricting Competition?”. The talk followed on from his tutorial at EuroSTAR 2013 on working with auditors—James is a former auditor himself—and from his blogs on software standards over the years.

James’ talk introduced to our community the term rent-seeking. Rent-seeking is the act of using political means—the exercise of power—to obtain wealth without creating wealth; see http://www.econlib.org/library/Enc/RentSeeking.html and http://en.wikipedia.org/wiki/Rent-seeking. One form of rent-seeking is using regulations or standards in order to create or manipulate a market for consulting, training, and certification.

James’ CAST presentation galvanized several people in attendance to respond to ISO Standard 29119, the most recent rent-seeking scheme by a very persistent group of certificationists and standards promoters. Since the ISO standard on standards requires—at least in theory—consensus from industry experts, some people proposed a petition to demonstrate opposition and the absence of consensus amongst skilled testers. I have signed this petition, and I urge you to read it, and, if you agree, to sign it too.

Subsequently, a publication named Professional Tester published—under an anonymous byline—a post about the petition, with the provocative title “Book burners threaten (old) new testing standard”. Presumably such (literally) inflammatory language was meant as clickbait. Ordinarily such things would do little to foster thoughtful discussion about the issues, but it prompted some quite thoughtful reactions. Here’s one example; here’s another. Meanwhile, if the author wishes to characterize me as a book burner, here are (selected) contents of my library relevant to software testing. Even the lamest testing books (and some are mighty lame) have yet to be incinerated.

In the body text, the anonymous author mischaracterises the petition and its proponents, of which I am one. “Their objection,” (s)he says, “is that not everyone will agree with what the standard says: on that criterion nothing would ever be published.” I might not agree with what the standard says, but that’s mostly a side issue for the purposes of this post. I disagree with what the authors of the standard attempt to do with it.

1) To prescribe expensive, time-consuming, and wasteful focus on bloated process models and excessive documentation. My concern here is that organizations and institutions will engage in goal displacement: expending money, time and resources on demonstrating compliance with the standard, rather than on actually testing their products and services. Any kind of work presents opportunity cost; when you’re doing something, most of the time it prevents you from doing something else. Every minute that a tester spends on wasteful documentation is a minute that the tester cannot fulfill the overarching mission of testing: learning about the product, with an emphasis on discovering important problems that threaten value or safety, so that our clients can make informed decisions about problems and risks.

I am not objecting here to documentation, as the calumny from Professional Tester suggests. I am objecting to excessive and wasteful documentation. Ironically, the standard itself provides an example: the current version of ISO 29119-1 runs to 64 pages; 29119-2 has 68 pages; and 29119-3 has 138 pages. If those pages follow the pattern of earlier drafts, or of most other ISO documents, you have a long, pointless, and sleep-inducing read ahead of you. Want a summary model of the testing process? Try this example of what the rent-seekers propose as their model of of testing work. Note the model’s similarity to that of a (overly complex and poorly architected) computer program.

2) To set up an unnecessary market for training, certification, and consultancy in interpreting and applying the standard. The primary tactic here is to instill the fear of being de-certified. We’ve been here before, as shown in this post from Tom DeMarco (date uncertain, but it seems to have been written prior to 2000).

Rent-seeking is of the essence, and we’ve been here before in another sense: this was one of the key goals of the promulgators of the ISEB and ISTQB. In the image, they’ve saved the best for last.

The well-informed reader will note that the list of organizations behind those schemes and the members of the ISO 29119 international working group look strikingly similar.

If the working group happens to produce a massive and opaque set of documents, and you’re in an environment that claims conformance to the 29119 standards, and you want to get some actual testing work done, you’ll probably find it helpful to hire a consultant to help you understand them, or to help defend you from charges that you were not following the standard. Maybe you’ll want training and certification in interpreting the standard—services that the authors’ consultancies are primed to offer, with extra credibility because they wrote the standards! Good thing there are no ethical dilemmas around all of this.

3) To use the ISO’s standards development process to help suppress dissent. If you want to be on the international working group, it’s a commitment to six days of non-revenue work, somewhere in the world, twice a year. The ISO/IEC does not pay for travel expenses. Where have international working group meetings been held? According to the http://softwaretestingstandard.org/ Web site, meetings seem to have been held in Seoul, South Korea (2008); Hyderabad, India (2009); Niigata, Japan (2010); Mumbai, India (2011); Seoul, South Korea (2012); Wellington New Zealand (2013). Ask yourself these questions:

  • How many independent testers or testing consultants from Europe or North America have that kind of travel budget?

  • What kinds of consultants might be more likely to obtain funding for this kind of travel?

  • Who benefits from the creation of a standard whose opacity demands a consultant to interpret or to certify?

Meanwhile, if you join one of the local working groups, there are two ways that the group arrives at consensus.

  • By reaching broad agreement on the content. (Consensus, by the way, does not mean unanimity—that everyone agrees with the the content. It would be closer to say that in a consensus-based decision-making process, everyone agrees that they can live with the content.) But, if you can’t get to that, there’s another strategy.

  • By attrition. If your interest is in promulgating an unwieldy and opaque standard, there will probably be objectors. When there are, wait them out until they get frustrated enough to leave the decision-making process. Alan Richardson describes his experience with ISEB in this way.

In light of that, ask yourself these questions:

  • How many independent consultants have the time and energy to attend local working groups, often during otherwise billable hours?

  • What kinds of consultants might be more likely to support attendance at local working groups?

  • Who benefits from the creation of a standard that needs a consultant to interpret or to certify?

4) To undermine the role of skill in testing, and the reputations of people who discuss and promote it. “The real reason the book burners want to suppress it is that they don’t want there to be any standards at all,” says the polemicist from Professional Tester. I do want there to be standards for widgets and for communication protocols, but not for complex, cognitive, context-sensitive intellectual work. There should be standards for designed things that are intended to work together, but I’m not at all sure there should be mandated standards for how to do design. S/he goes on: “Effective, generic, documented systematic testing processes and methods impact their ability to depict testing as a mystic art and themselves as its gurus.” Far from treating testing as a mystic art, appealing to things like “intuition” and “experienced-based techniques”, my community has been trying to get to the heart of testing skills, flexible and responsive coverage reporting, tacit and explict knowledge, and the premises of the way we do testing. I’ve seen no such effort to dig deeper into these subjects—and to demystify them—from the rent-seekers.

Unlike the anonymous author at Professional Tester, I am willing to stand behind my work, my opinions, and my reputation by signing my name and encouraging comments. Feel free.

—Michael B.

Counting the Wagons

Monday, December 30th, 2013

A member of Linked In asks if “a test case can have multiple scenarios”. The question and the comments (now unreachable via the original link) reinforce, for me, just how unhelpful the notion of the “test case” is.

Since I was a tiny kid, I’ve watched trains go by—waiting at level crossings, dashing to the window of my Grade Three classroom, or being dragged by my mother’s grandchildren to the balcony of her apartment, perched above a major train line that goes right through the centre of Toronto. I’ve always counted the cars (or wagons, to save us some confusion later on). As a kid, it was fun to see how long the train was (were more than a hundred wagons?!). As a parent, it was a way to get the kids to practice counting while waiting for the train to pass and the crossing gates to lift.

train

Often the wagons are flatbeds, loaded with shipping containers or the trailers from trucks. Others are enclosed, but when I look through the screening, they seem to be carrying other vehicles—automobiles or pickup trucks. Some of the wagons are traditional boxcars. Other wagons are designed to carry liquids or gases, or grain, or gravel. Sometimes I imagine that I could learn something about the economy or the transportation business if I knew what the trains were actually carrying. But in reality, after I’ve counted them, I don’t know anything significant about the contents or their value. I know a number, but I don’t know the story. That’s important when a single car could have explosive implications, as in another memory from my youth.

A test case is like a railway wagon. It’s a container for other things, some of which have important implications and some of which don’t, some of which may be valuable, and some of which may be other containers. Like railway wagons, the contents—the cargo, and not the containers—are the really interesting and important parts. And like railway wagons, you can’t tell much about the contents without more information. Indeed, most of the time, you can’t tell from the outside whether you’re looking at something full, empty, or in between; something valuable or nothing at all; something ordinary and mundane, or something complex, expensive, or explosive. You can surely count the wagons—a kid can do that—but what do you know about the train and what it’s carrying?

To me, a test case is “a question that someone would like to ask (and presumably answer) about a program”. There’s nothing wrong with using “test case” as shorthand for the expression in quotes. We risk trouble, though, when we start to forget some important things.

  • Apparently simple questions may contain or infer multiple, complex, context-dependent questions.
  • Questions may have more outcomes than binary, yes-or-no, pass-or-fail, green-or-red answers. Simple questions can lead to complex answers with complex implications—not just a bit, but a story.
  • Both questions and answers can have multiple interpretations.
  • Different people will value different questions and answers in different ways.
  • For any given question, there may be many different ways to obtain an answer.
  • Answers can have multiple nuances and explanations.
  • Given a set of possible answers, many people will choose to provide a pleasant answer over an unpleasant one, especially when someone is under pressure.
  • The number of questions (or answers) we have tells us nothing about their relevance or value.
  • Most importantly: excellent testing of a product means asking questions that prompt discovery, rather than answering questions that confirm what we believe or hope.

Testing is an investigation in which we learn about the product we’ve got, so that our clients can make decisions about whether it’s the product they want. Other investigative disciplines don’t model things in terms of “cases”. Newspaper reporters don’t frame their questions in terms of “story cases”. Historians don’t write “history cases”. Even the most reductionist scientists talk about experiments, not “experiment cases”.

Why the fascination with modeling testing in terms of test cases? I suspect it’s because people have a hard time describing testing work qualitatively, as the complex cognitive activity that it is. These are often people whose minds are blown when we try to establish a distinction between testing and checking. Treating testing in terms of test cases, piecework, units of production, simplifies things for those who are disinclined to confront the complexity, and who prefer to think of testing as checking at the end of an assembly line, rather than as an ongoing, adaptive investigation. Test cases are easy to count, which in turn makes it easy to express testing work in a quantitative way. But as with trains, fixating on the containers doesn’t tell you anything about what’s in them, or about anything else that might be going on.


As an alternative to thinking in terms of test cases, try thinking in terms of coverage. Here are links to some further reading:

  • Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.
  • Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.
  • A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.
  • What Counts“, an article that I wrote for Better Software magazine, on problems with counting things.
  • Braiding the Stories” and “Delivering the News“, two blog posts on describing testing qualitatively.
  • My colleague James Bach has a presentation on the case against test cases.
  • Apropos of the reference to “scenarios” in the original thread, Cem Kaner has at least two valuable discussions of scenario testing, as tutorial notes and as an article.

Can You Hear The Alarm Bells?

Monday, November 25th, 2013

Many people seem certain about what happened to cause the healthcare.gov fiasco. Stories are starting to trickle out, and eventually they’ll be an ocean of them. To anyone familiar with software development, especially in large organizations, these stories include familiar elements of character and plot. From those, it’s easy to extrapolate and fill in the details based on imagination and experience. We all know what happened.

Well, we don’t. In a project of that size, no one knows what happened. No one can know what happened. Imagine Rashomon scaled up to hundreds of people, each making his own observations and decisions along the way.

As time goes by, I anticipate some people saying that the project will represent a turning point in software development and project management. “Surely,” they will say, “after a project failure of this size and scope, people will finally learn.” Alas, I’m less optimistic. As the first three premises of rapid software testing describe it, software development is a human activity that is surrounded by 1) confusion, 2) complexity, 3) volatility, 4) urgency and… 5) ambition. Increasing ambition causes increases in the other four items too. In our societies, we could help to defend ourselves against future fiascos by restraining our ambitions, but I fear that people will put blindfolds on each other, pass around the keys, and scramble to get back into the driver’s seat of the school bus. How will they do this?

One form of the blindfold is to say “That not going to be a problem here because…”

…failure is not an option.
…we have our best people on it.
…we can’t disappoint the client.
…it doesn’t have to be perfect. (thanks, Joe Miller, @lilshieste)
…we’ll fix it in production.
…no user would ever do that.
…the users will figure it out.
…the users will never notice that.
…THAT bug is in someone else’s code.
…we don’t have to fix that; that’s a new feature request.
…it’s working exactly as designed.
…if there’s no test case for it, it’s not a bug.
…the clients will come to their senses before the ship date.
…we have thousands of automated tests that we run on every build
…this time it will be different.
…we have budget to fix that before we deploy.
…at least the back end is working right.
…if there are performance problems, we’ll just add another few servers.
…we’ve done lots of projects just like this one.
…foreign-language support is something we could cut.
…that list there says that this is a level three threat, not a level one threat.
…the support people can handle whatever problems come up.
…this graph shows that the load will never get that high.
…now is too soon; we’ll tell the clients about the problems after we’ve fixed them.
…we’re thinking positively&mdashthat can-do spirit will see us through.
…we still have plenty of time left to fix that.
…the spec didn’t say anything about having to handle special characters. How are single quotes a big deal?
…the client should have thought of that before.
…seriously, that’s just a cosmetic problem.
…it’s important not to complicate things.
…everybody WILL put in some overtime and we WILL get this thing done.
…well, at least the front end looks good, and people will be happy with that.
…everyone here is committed to making sure this ships on time.
…we’ll just shorten the test cycle.
…if there’s a problem, the other/upstream/downstream team will let us know.
…they can take care of that in training.
…we’ve planned to make sure that nothing unexpected happens.
…we’ve got this fantastic new framework that’ll make things go faster.
…we’ll pull a bunch of people off other projects to work on this one.

I wonder whether these things were said, at one time or another, during the healthcare.gov project. I don’t know if they were. I don’t know what happened. I didn’t work on it. But I’ve heard these things on projects before, I know that I can listen for them, and I know that they’re a sign of trouble ahead. Are they being said on your project?

Interview and Interrogation

Friday, September 27th, 2013

In response to my post from a couple of days ago, Gus kindly provides a comment, and I think a discussion of it is worth a blog post on its own.

Michael, I appreciate what you are trying to say but the simile doesn’t really work 100% for me, let me try to explain.

The simile has prompted you to think and to question, so in that sense, it works 100% for me. Triggering thought is, after all, is why people use similes. (See also Surfaces and Essences: Analogy as the Fuel and Fire of Thinking.)

I would apply lean principles and cut some waste from your interview process. I will fail the candidate as soon as she gives me the first wrong answer.

I have 5 questions and all have to be answered correctly to hire the person for a junior position (release 1).

Interview candidate A:
Ask question 1 OK
Ask question 2 FAIL
Send candidate A home

Second interview to candidate A:
Ask question 1 OK
Ask question 2 OK
Ask question 3 FAIL
Send candidate A home

Third interview to candidate A:
Ask question 1 OK
Ask question 2 OK
Ask question 3 OK
Ask question 4 OK
Ask question 5 OK

Hire candidate A

All right. You seem to have left out something important in your process here, which I would apply after each step—indeed, after each question and answer: make a conscious decision about your next step. To me, that requires continuous review of your list of questions for relevance, significance, sufficiency, and information value. Interviewing is an exploratory process. A skilled interviewer will respond, in the moment, to what the candidate has said. A skilled interviewer will think less in terms of “pass or fail”, and more in terms of “What am I learning about this candidate? What does the last answer suggest I should ask next? What other information, exclusive of the answer, might I apply to my decision-making process? What else should I be looking for?” When the candidate gets the answer wrong, the skilled interviewer will ask “Was it really wrong? Maybe there are multiple right answers to the same question. Maybe she didn’t understand the question because I asked in in an ambiguous way, and she gave a right answer to an alternative interpretation. Maybe her answer was a question for me, intended to clarify my question.”

I can’t emphasize this enough: like interviewing, testing is about far more than pass or fail. Testing is about exploration, discovery, investigation, and learning, with the goal of imparting what we’ve learned to people who matter. Testing is about trying to understand the product that we’ve got, with the goal of revealing information that helps our clients decide if it’s the product they want. Testing is usually (but not always) focused on finding evident problems, apparent problems and potential problems, not only in our products, but in our ideas about our products. Testing is also about finding problems in our testing, and every one of the “fail” moments above is a point at which I would want to consider a problem with the test. (The “pass” moments are like that too, if I really want to do a great job.)

At this point when candidate A will want to be promoted to a senior position (translate with next release of the software) I will prepare other 5 different questions probing against the new skills and responsibilities and as I have automated the first 5 questions I can send her a link to a web site where she will have to prove that she hasn’t forgotten about the first 5 before she can be even considered for the new position.

I’d do things slightly differently.

First I would ask “What would prompt me to ask the same questions again? Are those still the most important questions I could ask as she’s heading for her new role? What reason do I have to believe that she might have lost some capability she previously had? Are there other questions related to her old role—not necessarily to her new one—that I should ask that might be more revealing or more significant?” Note that there might be entirely legitimate reasons to believe that she might have backslid somehow—but at that point, I’d also want to ask “What are the conditions that would have allowed her to backslide without me noticing it—and what could I do to minimize those kinds of conditions?”

Then there would be another question I’d ask: “What if she has learned to answer a specific question properly, but is not adaptable to the general case? Should I be asking the same question in a different way, to see if she gives the same answer? Should I be asking a similar question that has a different answer, and see if she notices and handles the difference?”

Now: it might be costly to vary my questions, so I might simply shrug and decide just to go with the ones I’d asked before. But the point of evaluating my process is to ask, “How might I be fooling myself in the belief that I still know this person well?”

Assumes she answers correctly the 5 automated questions, at this point I will do the interview for the senior role.

Interview candidate A for senior role:
Ask question 6 OK
Ask question 7 FAIL
Send candidate A home

and so on.

I don’t see a problem with this process as long as I am allowed using everything I learn from the feedback with the candidate up to question “N” to adapt and change all the questions greater than “N”

Up until this point, you haven’t mentioned that, and your description of your process doesn’t make that at all clear. You’ve only mentioned the “pass” and “fail” parts of “everything I learn from the feedback”. Now, you might be taking that into account in your head, but notice how your description, your process model, doesn’t reflect that—so it becomes easy to misinterpret what you actually do. In addition, you’ve focused on adapting and changing all the questions greater than N—but I’d be interested in the possibility of adapting and changing all the questions less than or equal to N, too.

More importantly: qualifying someone for an important job is not about making sure that they can get the right answers on a canned test, just as testing a product is not about making sure that the functions produce expected results for some number of test cases. The specific answers might have some significance, but if I’m serious about hiring the right people for the job, I don’t want to make my decisions solely by putting them in front of a terminal, having them fill out an online form, and checking their answers. I want to evaluate them from a number of criteria: do they respond quickly, in a polite and friendly way? Do they work well with others? Are they appropriately discrete? Are they adaptable? Can they deal with heavy workloads? Do they learn quickly? In order to learn those things, I need to do more than ask pass-or-fail questions. I need to have unscripted, spontaneous, and free-flowing conversation with them; interview and interaction, and not just interrogation. You see?

Where Does All That Time Go?

Tuesday, October 30th, 2012

It had been a long day, so a few of the fellows from the class agreed to meet a restaurant downtown. The main courses had been cleared off the table, some beer had been delivered, and we were waiting for dessert. Pedro (not his real name) was complaining, again, about how much time he had to spend doing administrivial tasks—meetings, filling out forms, time sheets, requisitions, and the like. “Everything takes so long. I want a pad of paper to take notes, I have to fill out a form for it. God help me if I run out of forms!”

“How much time do you spend on this kind of stuff each week?” I asked.

Pedro replied, “An hour a day. Maybe two, some days. Meetings…let’s say an hour and a half, on average.”

Wow, I thought—that’s a pretty good chunk of the week. I had an idea.

“Let’s visualize this, I said.” I took out my trusty Moleskine notebook. I prefer the version with the graph paper in it, for occasions just like this one. I outlined a grid, 20 squares across by two down.

Empty Week

“So you spend, on average, an hour and a half each day on compliance stuff. One-point-five times five, or 7.5 hours a week. Let’s make it eight. Put a C in eight squares.” He did that.

Compliance

“Okay,” I said. “You were griping today about how much time you spend wrestling with your test environments.”

Pedro’s eyes lit up. “Yes!” he said. “That’s the big one. See, it’s mobile stuff. We have a server component and a handset component to what we do, and the server stuff is a real bear.”

“Tell me more.”

“It’s a big deal. We’ve got one environment that models the production system. The software we’re developing has been so buggy that we can’t tell whether a given problem is general, or specific to the handset, so we have another one that we set up to do targeted testing every time we add support for a new handset. That’s the one I work with. Trouble is, setting it up takes ages and it’s really finicky. I have to do everything really carefully. I’ve asked for time to do scripting to automate some of it, but they won’t give that to me, because they’re always in such a rush. So, I do it by hand. It’s buggy, and I make the odd mistake. Either way, when I find out it doesn’t work, I have to troubleshoot it. That means I have to get on instant messaging or the phone to the developers, and figure out what’s wrong; then I have to figure out where to roll back to. And usually that’s right from the start. It wastes hours. And it’s every day.”

“Okay. Show me that on our little table, here. Use an S to represent each hour your spend each day.”

Whereupon Pedro proceded to fill in squares. Ten of them. Ten more. And then, eight more.

Setup

“Really?!” I said. “28 hours a week divided by five days—that’s more than five hours a day. Seriously?”

“Totally,” said Pedro. “It’s most of the day, every day, honestly. Never mind the tedium. What’s really killing me is that I don’t feel like I’m getting any real testing work done.”

“No kidding. There’s no time for it. There are only four squares left in the week. Plus, something you said earlier today about tons of bugs that aren’t related to setting up?”

“Right. When it comes to the stuff that I’m actually being asked to test, there’s lots of bugs there too. So my ‘testing time’ isn’t really testing. It’s mostly taken up with trying to reproduce and document the bugs.”

“Yes. In session-based test management, that’s bug investigation and reporting—B-time. And it does interrupt test design and execution—T-time—which is what produces actual test coverage, learning about what’s actually going on in the product. So, how much B-time?” He filled in three of the squares with Bs.

Bug Investigation and Reporting

“And T-time?”

He had room left to put in one lonely little T in the lower right corner.

Testing Time

“Wow,” I laughed. “One-fortieth of your whole week is spent in getting actual test coverage. The rest is all overhead. Have you told them how it affects you?”

“I’ve mentioned it,” he said.

“So look at this,” I suggested. “It’s even more clear when we use colour for emphasis.”

With Colour

“Whoa. I never looked at it that way. And then,” he paused. “Then they ask me, ‘Why didn’t you find that bug?'”

“Well,” I said, “considering the illusion they’re probably working under, it’s not an unreasonable question.”

“What do you mean?” Pedro asked.

“What does it say on your business card?”

“‘Software Testing’.”

“And what does it say on the door of the the test lab?”

“‘Test Lab’,” said Pedro.

“And they call you…?”

“Pedro.”

“No,” I laughed. “They say you’re a… what?”

“Oh. A tester.”

“So since you’re a tester, and since the door on the test lab says ‘Test Lab’, and your business card says ‘Testing’, they figure that’s all you do. The illusion is what Jerry Weinberg calls the Lumping Problem. All of those different activities—administrative compliance, setup, bug investigation and reporting, and test design and execution—are lumped into a single idea for them.” And I drew it for him.

Management's Dream

“That’s management’s illusion, there. Since, in their imagination, you’ve got forty hours of testing time in a week, it’s not unreasonable for them to wonder why you didn’t find that bug.”

“Hmmm. Right,” said Pedro.

“When in fact, what they’re getting from you is this.” And I drew it for him.

Testing Reality

“For testing—actual interaction with the product, looking for problems, you’ve got one-fortieth of the time they think you’ve got. One lonely little T. Is that part of your test report?”

“Oy,” he said. “Maybe I should show them something like this.”

“Maybe you should,” I said.

A couple of nights later, I showed that page of my notebook to James Bach over Skype. “Wow,” he said. “That guy could be forty times more productive!”

“Forty?”

“Well, no, not really, of course. But suppose the programmers checked their work a little more carefully, or suppose the testers practiced writing more concise bug reports and sharpened their investigating skill. One of those two things could cut the bug investigation time by a third. That would give more time for testing, when they’re not being interrupted by other stuff. What if they cut the setup time by a half, and that administrivia by half?”

“Four, fourteen…” I said. “That would give eighteen more hours for testing and bug investigation, for a total of 22 hours. And even if they’re still doing two hours of bug investigation for every one hour of testing time… well, that’s seven times more productive, at least.”

“Seven times the test coverage if they get some of those issues worked out, then,” said James.

“Maybe de-lumping is the kind of thing lots of testers would want to do in their test reports,” I said.

How about you?