Blog Posts for the ‘Management’ Category

“Why Didn’t We Catch This in QA?”

Thursday, August 13th, 2020

My good friend Keith Klain recently posted this on LinkedIn:

“Why didn’t we catch this in QA” might possibly be the most psychologically terrorizing and dysfunctional software testing culture an organization can have. I’ve seen it literally destroy good people and careers. It flies in the face of systems thinking, complexity of failure, risk management, and just about everything we know about the psychology involved in testing, but the bully and blame culture in IT refuses to let it die…”

There’s a lot to unpack here. Let’s start with this: what is “QA”?

If “QA” is quality assurance, then it’s important to figure out who, or what, assures quality—value to some person(s) who matter(s).

Confusion abounds when “QA” is used as a misnomer for testing. Testing is not quality assurance, though it can inform quality assurance. Testing does not assure quality, no more than diagnosis assures good health.

In terms of health, there’s no question that we want good diagnoses so that we can become aware of particular pathologies or diseases. If we’re in poor health, and we’re not aware of it, and diagnosis doesn’t catch it, it’s reasonable to ask why not, so that we can improve the quality of diagnosis. The unreasonableness starts when someone foolishly believes that diagnosis is infallible, or that it assures good health, or that it prevents disease—like believing that lab technicians and epidemiologists are responsible for COVID-19, or for its spread.

Once again, it is high time that we dropped the idea that testing is quality assurance. Who perpetuates this? Everyone, so it seems, and it’s not a new problem. At very least, it would be a great idea if testers stopped using the label to describe themselves. As long as testers persist in calling themselves “QA”, the pandemic of ignorance and blame will continue.

What, or who, does assure quality, then?

In one sense, everyone who performs work has agency or authority over it, which includes an implicit responsibility to assure its quality, just as everyone is responsible to maintain the health of his or her mind and body. Assuring the quality of our work a matter of craft; self-awareness; diligence; discipline; professionalism; and duty of care towards ourselves, our clients and our social groups. If we’re adults, no one else is responsible for washing our hands.

In everyday life, we make choices about lifestyle, diet, and hygiene that influence our health and safety. As adults, those choices, whether wise or reckless, are our responsibility. At work, our agency affords freedom and responsibility to push back or ask for help when we’re pressed to do work in a way that might compromise our own sense of quality. And our agency enables us to leave any situation in which we are required to behave in ways that we consider unprofessional or unethical.

Part of maintaining personal health is maintaining awareness of it. That means asking ourselves how we feel, and soliciting the help of others who can sometimes help us become aware of things that we don’t see, like personal trainers, doctors, or counsellors. Similarly, assuring quality in our work involves evaluating it—often with the help of other people—to become aware of its state, and in particular, its limitation and problems.

Other people might help us, but as authors of our own work, we are responsible for making those evaluations, and we are responsible for what we do based on those evaluations. Choices that bear on our health, or on the quality of our work, are ours to make.

So, in this sense, “why didn’t we catch this in QA?” would mean “why did we not assure the quality of our own work?” And at the centre of that “we” is “I”.

In another sense, responsibility for the quality of work and workplace resides in the management role. While we’re responsible for washing our hands, management is responsible for providing an environment where handwashing is possible—and for ensuring that people aren’t pushed into conditions where they’re endangering themselves, each other, or the business.

Insofar as management engages people to do work and make products, management is responsible for determining what constitutes quality work, and deciding whether the product has met its goals. Management decides whether the product it’s got is the product it wants—and the product it wants to ship. Management can ask testers to learn about the product on management’s behalf, but management is ultimately responsible for assuming the risk of unknown problems in the product.

Management is responsible for setting the course; for co-ordinating people; for marshaling resources; for setting policy; for providing help when it’s needed; for listening and responding and acting appropriately when people are pushing back. While testers help management to become aware of the status of the product, management is responsible for evaluating the quality of the work and the workplace, and for deciding (based on information from everyone, not only testers) whether the work is ready for the outside world.

Management assures quality by creating the conditions that make it possible for people to assure the quality of their own work. And management fails to assure quality when it sets up conditions that make quality assurance impossible, or that undermine it. In that case, “why didn’t we catch this in QA?” would mean “why didn’t management assure the quality of the work for which it is responsible?”

When people get sick, it’s reasonable to ask how people got sick. It’s reasonable to ask what they might need and what they might do to take better care of themselves. It’s also reasonable to ask if government is providing sufficient support for individual health, public health, and public health workers. It’s even reasonable to ask how better epidemiology and diagnosis could help to sound the alarm when people and populations aren’t healthy. It’s not reasonable to put responsibility for personal or public health on the epidemiologists and diagnosticians and lab techs.

So “Why didn’t we catch this in QA?” is a fine question to ask when it means “Why did we not assure the quality of our own work?” or “Why didn’t management assure the quality of the work for which it is responsible?” But don’t mistake testing for quality assurance, and don’t mistake the question for “Why didn’t testers assure the quality of the product?” And if you’re a tester, and being asked the latter question, reframe it to refer to the previous two.

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

Breaking the Test Case Addiction (Part 10)

Monday, June 8th, 2020

This post serves two purposes. It is yet another installation in The Series That Ate My Blog; and it’s a kind of personal exploration of work in progress on the Rapid Software Testing Guide to Test Reporting. Your feedback and questions on this post will help to inform the second project, so I welcome your comments.

As a tester, your mission is to evaluate the product and report on its status, typically with a special emphasis on finding problems that matter. We’ve discussed bug reporting in the Rapid Testing Guide to Making Good Bug Reports. In this installment of Breaking the Test Case Addiction, I’m describing test reporting as something that responsible testers do.

Sounds straightforward, right? But right away, I want to address the risk of misunderstanding, so let me clear up what I mean by certain terms here.

Responsible Testers
Responsible testers are people who assume the role of tester on a project, and who commit themselves to doing that job well over time. Supporting testers (which we used to call “helpers”) help the test effort temporarily or intermittently, but are not committed to the testing role. Supporting testers are generally not required to report on their testing work to the same degree as responsible testers are.

Test Project
In this post, when I say test project, I’m referring to any set of activities focused on testing of any product or service, or any part of it: a low-level unit, a function, a component, a feature, a story, a service, an entire system… A test project can contain lots of little test projects. Accordingly, depending on the level of granularity we’re referring to, a test project might happen over moments or minutes, days, weeks, or months. A report on a test project might cover similar spans of time—instants, episodes, sprints, releases…

“Test project” here could refer to something that happens outside of development. More typically, it refers to testing activity that happens inside a development project, in parallel with the other aspects of development, like design, programming, or other testing.

Product
When I say product here, I mean anything that anyone has produced that might be subject to testing. While that includes running code, “product” could include code that is not running yet; prototypes and mockups; specifications and other requirement documents; flowcharts, diagrams, or state models; user documentation; sales and marketing material; or ideas about any of those things. When we refer to testing activity pointed at things that are static, like most of the items in the preceding list, we usually call it “review”; we might also call it “performing a thought experiment”. Review is a kind of testing activity that may be closely or distantly associated with performing a test—which brings us to what we mean by “testing”.

Testing, Test Activities, and Review
When I say testing here, I am using the Rapid Software Testing definition. To us, testing is the process of evaluating a product by learning about it through experiencing, exploring, and experimenting.

Testing includes many activities: questioning, studying, modeling, operating the product, manipulating it, making inferences, analyzing risk, thinking critically, recording the process, reporting on it, etc. Testing activities also include investigating and analyzing bugs and suspicious behaviour. Testing typically includes applying tools to help with any testing activities.

A test is an instance of testing, and to perform a test means to explore, experiment with, and gain experience of a product. In general, to perform a test implies that we will operate and observe a product or its output by some means.

In review, operation of the product as such typically isn’t available. In review, though, we engage in other testing activities as mentioned above. We can’t perform experiments on the running product but, as I mentioned above, we might perform thought experiments on it, imagining interactions between the product and the people using it. Of course, a thought experiment isn’t the same as a real-world experiment; that’s a key difference between review and performing a test.

Why go on about all this? Because reporting is central to our role as testers. We test; we learn; and we report on what we’ve learned.

Are you doing testing work of any kind, or even thinking about doing testing? Then you’ve got a test project on the go, and you can report on its status, even if your report starts with “I haven’t started testing the product yet, but here are some ideas about how we might go about it.”

Report
Next, let’s unpack the idea of a report. A report is a description, explanation, or justification of something. A report is a communication, but a report is not necessarily a document.

Communicating a report might happen as conversation in a hallway, or beside a coffee machine or a water cooler; as a couple of sentences uttered at a stand-up meeting; as a quick mention of a bug in passing to a developer; as a lengthy description of the status of the product and the status of testing at a go-live meeting. A report might be conveyed in writing as a paragraph, a page, or several pages of text; as (heaven help us) a PowerPoint presentation; or as hundreds of pages in bound books, formally presented to a government or regulatory body.

We might include or refer to artifacts collected or produced during the activity that led to the report—the reporter’s raw notes, data sets, program code, design notes for the activity itself. A report might be supplemented with illustrations, charts, graphs, or diagrams, sketched on a whiteboard or formally rendered on glossy paper. Or a report might be accompanied by photographs, audio, video, mind maps, tables, and references to other artifacts.

Test Report
A test report is any description, explanation, or justification of the status of a test project.

A comprehensive test report is all of those things together.

A professional test report is one that is competently, thoughtfully, and ethically designed to serve your clients in their context. A professional test report need not be a comprehensive test report, nor vice versa.

Some might say that a test report is “just the facts”, but it isn’t; it cannot be. A test report is based on facts, but it’s a story about facts—a story framed for the person or people receiving it. Stories always emphasize some things and leave other things out. We never have all the facts, and facts are sometimes in dispute. Stories are always, to some degree, biased by the storyteller and focused by what the storyteller wants the audience to hear, to learn, and to know. Those biases can seen be as problems in the report, features of it, or both.

The audience for your test report might include insiders who are directly involved in the testing and development work; other insiders (who might be overseeing that work, or affected by it without being directly involved); or outsiders.

For now, I’m going to assume your audience is in the first two categories. On that basis, it helps to consider what the audiences for a test report probably wants to know above all else.

They almost certainly don’t want to know about test case counts (although they might think they do).
They almost certainly don’t want to know about pass-fail ratios (although they might think they do).
They almost certainly don’t want to know about when the testing is going to be done (although they might think they do).

(I realize that these claims may sound strange to you. I will address these (non-)desires in a future post.)

Having been a program manager, a developer, and having worked with lots of them, I can tell you what those people almost certainly do want to know:

What is the actual status of the product? Are there problems that threaten the value of the product? Do these problems threaten the on-time, successful completion of our work?

A test report addresses those questions.

Three Aspects of Test Reporting
A good test report braids three strands of story together:

  • a story about the product and its status; what the product is, what it does, how it works, how it doesn’t work, and how it might not work in ways that matter to our various clients. This is a story about bugs, problems, and risks about the product.
  • a story about how the testing was done—how the product story in was obtained; how we configured, operated, observed, and evaluated the product. A thread in this second strand of the testing story involves describing the ways in which we recognized problems; our oracles. Another thread in this strand involves where we looked for problems; our coverage. Yet another thread includes what we haven’t covered yet, or won’t cover at all unless something changes.
  • a story about the quality of the testing work—why the testing that was done can be trusted, or to the degree that it is untrustworthy, issues that present obstacles to the fastest, least expensive, most powerful testing we can do. In this strand, we also identify what we might need or recommend to the testing better, and we may also provide a context and and evaluation of the quality of the report itself.

Most of the time, the client of the testing will be most interested in that first strand. Sometimes the client might be more interested in one of the other two. Nonetheless, whatever form the report might take, the reporter should at least be prepared to address all three strands.

(I’ve written more about this pattern here, here, and here.)

Credibility
If you’re not credible, your reports won’t be taken seriously. In your reporting, you may be delivering surprising or uncomfortable information. Your clients, unconsciously or deliberately, may assume that you’re mistaken or that you’re exaggerating risks, and they may try to micro-manage your reporting. Credibility is an antidote to all this.

To build and maintain credibility, it’s important to actually care about the project and the people on it. It’s important to take your work and your skills seriously, and to demonstrate that seriousness in your attitude, commitments, and behaviour. There will be more to say about this later, but for now…

  • Actually know how to do your job.
  • Gain experience with the product.
  • Study the technology in and around your project.
  • Read all of the relevant requirement, specification, and standards documents carefully, especially when you’re in a regulated environment.
  • Take notes diligently on your own work to inform your reporting.
  • Sweat the details in your own work.
  • Find things to appreciate about the work of others.
  • Acknowledge mistakes, correct them and learn from them.
  • Do not tell lies or exaggerate.

Examples
Note that Part 7 of this series included a number of test reports delivered verbally. Here I’m providing examples of test report documents.

As you survey them, you might want to consider the context for which they’re intended; the reporting levels that they focus on (product, testing, or quality-of-testing); the evidence or references included to support the report; and what the report might need or could leave out.

Note that while a couple of reports refer to specific things to be checked, there is rarely even a mention of test cases. The focus, instead, is usually on bugs or potential problems in the product that represent risk to the value of the product, and therefore risk to the business.

Spot Check Test Report

Click to access mpim-report.pdf

Here is an example of a real, comprehensive, professional test report, prepared by James Bach and edited by me. Over five pages, it describes a paired exploratory testing session that found problems in a real medical device. (The names, nouns and verbs have been changed to shield the identity of the company and the product.)

Cheese Grater Incident Report

Click to access cheesegrater.pdf

This is two reports in one: a whimsical yet serious report on repairing a broken Parmesan cheese dispenser; and a much longer, detailed set of notes on how to perform an investigation and report on it. Indeed, the latter section is a really worthwhile complement to this blog post.

OEW Case Tool

Click to access OEWCaseToolReport.pdf

An example of a two-page summary report (from 1994!) about a computer-aided software engineering (CASE) tool at Borland.

Y2K Compliance Report

Click to access Y2KComplianceReport.pdf

An eight-page report prepared for compliance with Y2K requirements, including notes on strategy; the test approaches that were applied (and risks that prompted those approaches); the results; and a list of specific items that needed to be checked.

OWL Quality Plan

Click to access OWLQualityPlan.pdf

This is a report on proposed plans for testing another Borland product, the Object Windows Library. The report includes a table linking product risks to testing work necessary to investigate those risks. It also includes a listing of components and sub-components in the product.

An Exploratory Tester’s Notebook

Click to access etnotebook.pdf

This paper on recording and reporting includes a report on my spontaneous investigation of an in-flight entertainment system, and a couple of session-based test management session sheets.

A Sticky Situation

Click to access 2012-02-AStickySituation.pdf

This is an example of a form of reporting that’s sometimes called an “information radiator”. It visualizes the status of a test project (and some degree of test coverage) using sticky notes.

The Low-Tech Testing Dashboard

Click to access dashboard.pdf

Of this, James Bach says “Back in 1997, I was challenged by top management to create a way to convey testing status at a glance. Thus was born the “low-tech testing dashboard” which has since been rendered in various electronic, distributed forms. The important thing about the dashboard is that there are no “measurements.” We don’t count anything. Instead there are assessments. These are subjective, yes, but always grounded in evidence.

Who Killed My Battery?

Click to access boneh-www2012.pdf

A splendid research paper on what drains mobile phone batteries… and why. Also a presentation on YouTube: https://www.youtube.com/watch?v=_uv057DP2Vs

Once again, these reports don’t focus test cases, but on testing. They’re examples of powerful and reasonable test reports that offer an alternative to management that is fixated on test cases.

Managers are more likely to relax their obsession with test cases when we provide them with reports that tell the product and testing stories.

Two more posts to go. Next!

Breaking the Test Case Addiction (Part 8)

Monday, December 9th, 2019

Throughout this series, we’ve been looking at an an alternative to artifact-based approaches to performing and accounting for testing: an activity-based approach.

Frieda, my coaching client, and I had been discussing how to manage testing without dependence on formalized, scripted, procedural test cases. Part of any approach to making work accountable is communication between a manager or test lead and the person who had done the work. In session-based test management, one part of this communication is a conversation that we call a debrief, and that’s what we talked about last time.

One of the important elements of a debrief is accounting for the time spent on the work. And that’s why one of the most important questions in the debrief is What did you spend your time doing in this session?

“Ummm… That would be ‘testing’, presumably, wouldn’t it?” Frieda asked.

“Well,” I replied, “there’s testing, and then there’s other work that happens in the session. And there are pretty much inevitably interruptions of some kind.”

“For sure,” Frieda agreed. “I’m getting interrupted every day, all the time: instant messages, phone calls, other testers asking me for help, programmers claiming they can’t reproduce the bug on their machines…”

“Interruptions are a Thing, for sure,” I said. “Let’s talk about those in a bit. First, though, let’s consider what you’d be doing during a testing session in which you weren’t interrupted. Or if we didn’t talk about the interruptions, for a moment. What would you be doing?”

“Testing. Performing tests. Looking for bugs,” said Frieda.

“Right. Can you go deeper? More specific?”

“OK. I’d be learning about the product, exercising test conditions, increasing test coverage. I’d be keeping notes. If I were making a mind map, I’d be adding to it, filling in the empty areas where I hadn’t been before. Each bit of testing I performed would add to coverage.”

“‘Each bit of testing,'”, I repeated. “All right; let’s imagine that you set up a 90-minute session where you could be uninterrupted. Lock the office door…”

“…the one that I don’t have…”, Frieda said.

“Natch. It’s cubicle-land where you work. But let’s say you put up a sign that said “Do not disturb! Testing is in Session!” Set the phone to Send Calls, shut off Slack and Skype and iMessage and what-all… In that session, let’s just say that you could do a bunch of two-minute tests, and with each one of those tests, you could learn something specific about the product.”

“That’s not how testing really works! That sounds like… test cases!” Frieda said.

“I know,” I grinned. “You’re right. I agree. But let’s suspend that objection for a bit while we work through this. Imagine that 90-minute session rendered as a nine-by-five table of 45 little microbursts of test activity. The kind of manager that you’ve been role-playing here thinks this will happen.”

A Manager's Fantasy of an Ideal Test Session

Frieda chuckled. “Manager’s Fantasy Edition. That’s about right.”

“Indeed,” I said. “But why?”

“Well, obviously, when I’m testing, I find bugs. When I do, start investigating. I start figuring out how to reproduce the bug, so I can write it up. And then I write it up.”

“Right,” I said. “But even though it’s part of testing, it’s got a different flavour than the learning-focused stuff, doesn’t it?”

“Definitely,” said Frieda. ” When I find a bug, I’m not covering new territory. It’s like I’m not adding to the map I’m making of the product. It’s more like I’m staying in the same place for a while while I investigate.”

“Is that a good thing to do?”

“Well…, yes,” Frieda replied. “Obviously. Investigating bugs is a big part of my job.”

“Right. And it takes time. How much?”

“Well,” Frieda began, “A lot of the time I repeat the test to make sure I’m really seeing a bug. Then I try to find out how to reproduce it reliably, in some minimum set of steps, or with some particular data. Sometimes I try some variations to see if I can find other problems around that problem. Then I’ve got to turn all that into a bug report, and log it in the tracking system. Even if I don’t write it up formally, I have to talk to the developer about it.”

“So, quite a bit of time,” I said.

“Yep,” she said. “And another thing: some bugs block me and prevent me from getting to part of the product I want to test. Trying to work around the blockers takes time too. So… like I said, while I’m doing all those things, I’m not covering new ground. It’s like being stuck in the mud on a flooded road.”

“If I were your manager, and if I were concerned about your productivity, I’d want to know about stuff like that,” I said. That’s why, in session-based test management, we keep track of several kinds of testing time. Let’s start with two: test design and execution, in which we’re performing tests, learning about the product, gaining a better understanding of it. Of course, our focus is on activity that will either find a bug, or help us to find a bug. We call that T-time, for short, and distinguish it from bug investigation and reporting—B-time—which includes the stuff that you were just talking about. The key thing is that B-time interrupts T-time.”

Frieda’s brow furrowed. “Or, to put it another way, investigating bugs reduces test coverage.”

“Yes. And when it does, it’s important for managers to know about it. As a manager, I don’t want to be fooled about coverage—that is, how much of the product that we’ve examined with respect to some model.

“You start a session with a charter that’s intended to cover something we want to know about. In a 90-minute session, it’s one thing if a tester spends 80 minutes covering some product area with testing and only ten minutes investigating bugs. It’s a completely different thing if the tester spends 80 minutes investigating bugs, and only ten minutes on tests that produced new coverage. If you only spend ten percent of the time addressing the charter, and the rest on investigating a bug that you’ve found, I’d hope you’d report that you hadn’t accomplished your charter.”

“Wait… what if I were nervous about that?” Frieda asked. “Doesn’t it look bad if I haven’t achieved the goal for the session?”

“Not necessarily,” I replied. “We can have the best of intentions and aspirations for a session before it starts But the product is what it is, and whatever happens, happens. Whatever the charter suggests, there’s an overarching mission for every session: investigate the product and report on the problems in it. If you’re having to report lots of bugs because they’re there, and you’re doing it efficiently, that shouldn’t be held against you. Testers don’t put the bugs in. If there are problems to report, that takes time, and that’s going to reduce coverage time. If you’re finding and investigate a lot of bugs, there’s no shame in not covering what we might hope you’d cover. Plus, bug investigation helps the developers to understand what they’re dealing with, so that’s a service to the team.”

Frieda looked concerned. “Not very managers I’ve worked with would understand that. They’d just say, ‘Finish the test cases!’ and be done with it.”

“That can be an issue, for sure. But a key part of testing work these days is to help managers to learn how to become good clients for testing. That sometimes means spelling out certain things explicitly. For instance: if you find a ton of bugs in during in a session, that’s bad enough, in that you’ve got a lot less than a session’s worth of test coverage. But there’s something that might be even worse on top of that: you have found only the shallowest bugs By definition; the bugs you’ve found already were the easiest bugs to find. A swarm of shallow bugs is often associated with an infestation of deeper bugs.”

“So, in that situation, I’m going to need a few more sessions to obtain the coverage we intended to achieve with the first one,” said Frieda.

“Right. And you if you’re concerned about risk, you’ll may want to charter more, deeper testing sessions, because—again, by definition—deeper bugs are harder to find.”

Frieda paused. “You said there were several kinds of testing time. You mentioned T-time and B-time. That’s only two.”

“Yes. At very least, there’s also Setup time, S-time. While you’re setting up for a test, you aren’t obtaining coverage, and you’re not investigating or reporting a bug. Actually, setting up is only one thing covered our notion of “Setup”. S-time is a kind of catch-all for time within the session in which you couldn’t have found a bug. Maybe you’re configuring the product or some tool; maybe you’re resetting the system after a problem; maybe you’re tidying up your notes.”

“Or reading about the product? Or talking with somebody about it?”, Frieda asked.

“Right. Anything that’s necessary to get the work done, but that isn’t T-time or B-time. So instead of that Manager’s Fantasy Version of the session, a real session often looks like this:”

A More Plausible Test Session

“Or even this.”

A Common Test Session

“Wow,” said Frieda. “I mean, that second one is totally realistic to me. And look at how little gets covered, and how much doesn’t get covered.”

“Yeah. When we visualize like this, makes an impression, doesn’t it? Trouble is, not very many testers help managers connect those dots. As you said, if you want to achieve the coverage that the manager hoped for in the Fantasy Edition, this helps to show that you’ll need something like four sessions to get it, not just one. Plus the bugs that you’ve found in that one session are by definition the shallowest bugs, the ones closest to the surface. Hidden, rare, subtle, intermittent, emergent bugs… they’re deeper.”

Frieda still had a few more questions, which we’ll get to next time.

Breaking the Test Case Addiction (Part 7)

Monday, June 10th, 2019

Throughout this series, we’ve been looking at an an alternative to artifact-based approaches to testing: an activity-based approach.

In the previous post, we looked at a kind of scenario testing, using a one-page sheet to guide a tester through a session of testing. The one-pager replaces explicit, formal, procedure test cases with a theme and a set of test ideas, a set of guidelines, or a checklist. The charter helps to steer the tester to some degree, but the tester maintains agency over her work. She has substantial freedom make her own choices from one moment to the next.

Frieda, my coaching client, anticipated what her managers would say. In our coaching session, she played the part of her boss. “With test cases,” she said, in character, “I can be sure about what has been tested. Without test cases, how will anyone know what the tester has done?”

A key first step in breaking the test case addiction is acknowledging the client’s concern. I started my reply to “the manager” carefully. “There’s certainly a reasonable basis for that question. It’s important for managers and other clients of testing to know what testing has been done, and how the testers have done it. My first step would be to ask them about those things.”

“How would that work?”, asked Frieda, still in her role. “I can’t be talking to them all the time! With test cases, I know that they’ve followed the test cases, at least. How am I supposed to trust them without test cases?”

“It seems to me that if you don’t trust them, that’s a pretty serious problem on its own—one of the first things to address if you’re a manager. And if you mistrust them, can you really trust them when they tell you that they’ve followed the test cases? And can you trust that they’ve done a good job in terms of the things that the test cases don’t mention?”

“Wait… what things?” asked “the manager” with a confused expression on her face. Frieda played the role well.

“Invisible things. Unwritten things. Most of the written test cases I’ve seen refer only to conditions or factors that can be observed or manipulated; behaviours that can be described or encoded in strings or sentences or numbers or bits. It seems to me that a test case rarely includes the motivation for the test; the intention for it; how to interpret the steps. Test cases don’t usually raise new questions, or encourage testers to look around at the sides of the path.

“Now,” I continued, “some testers deal with that stuff really well. They act on those unspoken, unwritten things as they perform the test. Other testers might follow the test case to the letter — yet not find any bugs. A tester might not even follow the test case at all, and just say that he followed it. Yet that tester might find lots of important bugs.”

“So what am I supposed to do? Watch them every minute of every day?”

“Oh, I don’t think you can do that,” I replied. “Watching everybody all the time isn’t reasonable and it isn’t sustainable. You’ve got plenty of important stuff to do, and besides, if you were watching people all the time, they wouldn’t like it any more than you would. As a manager, you must to be able to give a fair degree of freedom and responsibility to your testers. You must be able to extend some degree of trust to them.”

“Why should I trust them? They miss lots of bugs!” Frieda seemed to have had a lot of experience with difficult managers.

“Do you know why they miss bugs?” I asked. “Maybe it’s not because they’re ignoring the test cases. Maybe it’s because they’re following them too closely. When you give someone very specific, formalized instructions and insist that they follow them, that’s what they’ll do They’ll focus on following the instructions, but not on the overarching testing task, which is learning about the product and finding problems in it.”

“So how should I get them to do that?”, asked “the manager”.

“Don’t turn test cases into the mission. Make their mission learning about the product and finding problems in it.”

“But how can I trust them to do that?”

“Well,” I replied, “let’s look at other people who focus on investigation: journalists; scientific researchers; police detectives. Their jobs are to make discoveries. They don’t follow scripted procedures. No one sees that as a problem. They all work under some degree of supervision—journalists report to editors; researchers in a lab report to senior researchers and to managers; detectives report to their superiors. How do those bosses know what their people are doing?”

“I don’t know. I imagine they check in from time to time. They meet? They talk?”

“Yes. And when they do, they describe the work they’ve done, and provide evidence to back up the description.”

“A lot of the testers I work with aren’t very good at that,” said Frieda, suddenly as herself. “I worry sometimes that I’m not good at that.”

“That’s a good thing to be concerned about. As a tester, I would want to focus on that skill; the skill of telling the story of my testing. And as a manager, I’d want to prepare my testers to tell that story, and train them in how to do it any time they’re asked.”

“What would that be like?”, asked Frieda.

“It varies. It depends a lot on tacit knowledge.”

“Huh?”

“Tacit knowledge is what we know that hasn’t been made explicit—told, or written down, or diagrammed, or mapped out, or explained. It’s stuff that’s inside someone’s head; or it’s physical things that people do that has become second nature, like touch typing; or it’s cultural, social—The Way We Do Things Around Here.

“The profile of a debrief after a testing session varies pretty dramatically depending on a bunch of context factors: where we are in the project, how well the tester knows the product, and how well we know each other.

“Let me take you through one debrief. I’ll set the scene: we’re working on a product—a project management system. Karla is an experienced tester who’s been testing the product for a while. We’ve worked together for a long time too, and I know a lot about how she tests. When I debrief her, there’s a lot that goes unsaid, because I trust her to tell me what I need to know without me having to ask her too much. We both summarize. Here’s how the conversation with Karla might play out.”

Me: (scanning the session sheet) The charter was to look at task updates from the management role. Your notes look fine. How did it go?

Karla: Yeah. It’s not in bad shape. It feels okay, and I’m mostly done with it. There’s at least one concurrency problem, though. When a manager tries to reassign a task to another tester, and that task is open because the assigned tester is updating it, the reassignment doesn’t stick. It’s still assigned to the original tester, not the one the manager assigned. Seems to me that would be pretty rare, but it could happen. I logged that, and I talked about it to Ron.

Me: Anything else?

Karla: Given that bug, we might want to do another session on any kind of update. Maybe part of a session. Ron tells me async stuff in Javascript can be a bear. He’s looking into a way of handling the sequence properly, and he should have a fix by the end of the day. I wouldn’t mind using part of a session to script out some test data for that.

Me: Okay. Want to look at that tomorrow, when you look at the reporting module? And anything else I should know?

Karla: I can get to that stuff in the morning. It’d be cool to make sure the programmers aren’t mucking around in the test environment, though. That was 20 minutes of Setup.

Me: Okay, I’ll tell them to stay out.

“And that’s it,” I said.

“That’s it?”, asked Frieda. “I figured a debrief would be longer than that.”

“Oh, it could be,” I replied. “If the tester is inexperienced or new to me; if the test notes have problems; if the product or feature is new or gnarly; or if the tester found lots of bugs or ran into lots of obstacles, the debrief can take a while longer.

When I want to co-ordinate testing work for a bunch of people, or when I anticipate that someone might want to scrutinize the work, or when I’m in a regulated environment, I might want to be extra-careful and structure the conversation more formally. I might even want to checklist the debriefing.

No matter what, though, I have a kind of internal checklist. In broad terms, I’ve got three big questions: How’s the product? How do we know? Why should I trust what we know, and what do we need to get a better handle on things?”

“That sounds like four questions,” Frieda smiled. “But it also sounds like the three-part testing story.”

“Right you are. So when I’m asking focused questions, I’d start with the charter:

  • Did you fulfill your charter? Did you cover everything that the charter was intended to cover?
  • If you didn’t fulfill the charter, what aspects of the charter didn’t get done?
  • What else did you do, even if it was outside the scope of the mission?

“What I’m doing here is trying to figure out whether the charter was met as written, or if we need to adjust the it to reflect what really happened. After we’ve established that, I’ll ask questions in three areas that overlap to some degree. I won’t necessarily ask them in any particular order, since each answer will affect my choice of the next question.”

“So a debriefing is an exploratory process too!” said Frieda.

“Absolutely!” I grinned. “I’ll tend to start by asking about the product:

  • How’s the product? What is it supposed to do? Does it do that?
  • How do you know it’s supposed to to that?
  • What did you find out or learn? In particular, what problems did you find?

“I’ll ask about the testing:

  • What happened in the course of the session?
  • What did you cover, and how did you cover it it?
  • What product factors did you focus on?
  • What quality criteria were you paying the most attention to?
  • If you saw problems, how did you know that they were problems? What were your oracles?
  • Was there anything important from the charter that you didn’t cover?
  • What testing around this charter do you see as important, but has not yet been done?

“Based on things that come up in response to these questions, I’ll probably have some others:

  • What work products did you develop?
  • What evidence do you have to back the story? What makes it credible?
  • Where can people find that evidence? Why, or why not, should we hang on to it?
  • What testing activity should, or could, happen next or in the more distant future?
  • What might be necessary to enable that activity?

“That last question is about practical testability.”

“Geez, that’s a lot of questions,” said Frieda.

“I don’t necessarily ask them all every time. I usually don’t have to. I will go through a lot of them when a tester is new to this style of working, or new to me. In those cases, as a manager, I have to take more responsibility for making sure about what was tested—what we know and what we don’t. Plus these kinds of questions—and the answers—help me to figure out whether the tester is learning to be more self-guided

“And then I’ve got three more on my list:

  • What factors might have affected the quality of the testing?
  • What got in the way, made things harder, made things slower, made the testing less valuable?
  • What ongoing problems are you having?
  • Frieda frowned. “A lot of the managers I’ve worked with don’t seem to want to know about the problems. They say stuff like, ‘Don’t come to me with problems; come to me with solutions.'”

    I laughed. “Yeah, I’ve dealt with those kinds of managers. I usually don’t want to go them at all. But when I do, I assure them that I’m really stuck and that I need management help to get unstuck. And I’ve often said this: ‘You probably don’t want to hear about problems; no one really does. But I think it would be worse for everyone if you didn’t know about them.’

    “And that leads to one more important question:

    • What did you spend your time doing in this session?”

    “Ummm… That would be ‘testing’, presumably, wouldn’t it?” Frieda asked.

    “Well,” I replied, “there’s testing, and then there’s other work that happens in the session.”

    We’ll talk about that next time.

Pressing the Green Button

Wednesday, December 19th, 2018

For years at conferences and meetups and in social media, I have been hearing regularly from testers who tell me that they must “sign off” on the product or deployment before it is released to production, or to review by the client. The testers claim that, after they have performed some degree of testing work, they must “approve” or “reject” the product. Here’s a fairly typical verbatim report from one tester:

In my current context, despite my reasoned explanations to the contrary, I am viewed as the work product gatekeeper and am newly positioned as such within a software controlled workflow that literally presents a green “approve” or red “reject” button for me to select after I “do QC” on the work product which as a bonus might be provided with a list of ambiguous client requests and/or a sketchy mock-up with many scattered revision notes (often superseded by verbal requests not logged).

It’s important to note that in project work, a mess of competing information is normal — not completely desirable, necessarily, but normal. When information about the product is unclear, it’s typically part of the tester’s job to identify where and how it’s unclear. Confusion and uncertainty ratchet up product and project risk.

After all, if you’re not sure about what the product is supposed to do, how can the developers be sure about it? In the unlikely event that the developers know how the product should work and you don’t, how will you recognize all of the important bugs? Whether you, the developers, or both aren’t straight on what the client wants, bugs will have time and opportunity to breed and to survive.

The tester continues:

Delivery of the product to the client for their review is generally held up until I press the green “approve” button. The expectation when I “approve” is that the product (which I did not build) is “error free”, meets contradictory loosely defined “standards” and satisfies the client (whom I have not met). The structure is such that I am not to directly communicate to the client, so all clarifications and questions I have are filtered through a project manager.

I am now beginning to frustrate the developers with whom I have previously built a great rapport by repeatedly rejecting their work products for even a single minor infraction. I also frustrate project managers delaying product delivery and going over budget. I predict there will soon be pressure from all sides to “just approve” and then later repercussions of “how/why did this get approved”. I combat this by providing long lists of observations, potential issues, questions, obstacles and coverage notes with each “rejection” or “approval” and I communicate to project managers that they do not need my approval to proceed and may override at anytime.

Some testers seem happy with the authority implicit in “approving” or “rejecting” the product. Most express some level of discomfort with the decision. To me, the discomfort is appropriate.

In the Rapid Software Testing view of the world, it is not the job of the tester to approve or disapprove of things. It is the job of the tester to identify reasons to believe that some person who matters might approve or disapprove of something, and to explain the bases for that belief. Decisions about what to do with a product, including approving or rejecting it, lie with a role called management.

If you’re in a situation like the tester above, and someone offers you the responsibility to approve and reject products, and you desire to be a manager, this is your big chance! Seize the opportunity—but don’t do it without the manager’s title and authority—and salary, while you’re at it. If you’re offered the approval or rejection decision without becoming a manager, though, I’d recommend that you politely decline the “offer”, and make a counteroffer—perhaps one like this:

“Thank you for honouring me with the offer to approve or reject the product. However, as a tester, I don’t believe that it is appropriate for me to make such decisions without management authority. Here’s what I would do in this tester’s situation, though; I’d say this:

“I will gladly test the product, learning about it through exploration and experimentation. I will evaluate the product for consistency with these (contradictory, loosely-defined) standards. If the product appears to be inconsistent with them, I will certainly let you know about that. If I see inconsistencies or contradictions in those standards, I will let you know about those too, so that you can decide how the standards apply to our product. But I won’t limit my testing to that.

“I will tell you about any important problems that I find. I will tell you about problems that appear inconsistent with things desirable to important people. Here’s an example of how I might categorize desirable things, and here’s an example of how I might recognize problems in the product.

“I will report on anything that appears consistent with some notion of an ‘error’. However, I will not assert that the product is error-free. I don’t know how I could do that. I don’t know how anyone can do that.

“I would prefer to interact freely and directly with stakeholders, for the purposes of obtaining clarifications and answers to questions I have without bothering the project manager. (I will, of course, keep responsible records of my interactions; and I will not presume to make decisions about product or project scope, since that’s a management function.)

“If you would prefer to restrict or mediate my access to stakeholders, that’s OK; I can work that way too. Doing so will likely come with a cost of extra time on the part of the project manager, and the risk of broken-telephone-style miscommunication between the stakeholders and me. However, if you’re prepared to take responsibility for that risk, I’m fine with it too.

“Since I am manager of neither the product, nor the project, nor the developers, I do not have the authority to direct them. However, I am happy to report on everything I know about the product—and the apparent problems and risks in it—to those who do have the required authority and responsibility, and they can make the appropriate decisions based on everything they know about the product and business and its needs.

“I am not a gatekeeper, or owner, or ‘approver’ of the quality of the product. I am not a manager or decision maker. I am a reporter on the status of the product, and of the testing, and of the quality of the testing, and I’ll report accordingly. My “approval” is immaterial; what matters is what managers and the business want. It is they, not I, who decide whether a problem is a showstopper or something we’re prepared to live with. It is they, not I, who decide whether problems are significant enough to extend the schedule or increase the budget for the project.

“It’s my job to contribute information to any decision to approve or reject, but it’s not my job to make that decision. I would like someone else to be responsible for the ‘approve’ or ‘reject’ checkbox as such. However, if the tool that we’re using restricts me to ‘approve’ and ‘reject’, let me tell you what those mean, because what they say is inconsistent with their normal English meanings, and we should all be aware of that.

“Pressing ‘Approve’ means this, and only this: ‘I am not aware of any problem in this area that threatens the value of the product, project, or business to any person that matters.’

Pressing ‘Reject’ means ‘I am aware of a specific problem or I have some reason to believe that there could be a problem in this area that I have not had the opportunity to identify yet.’ In other words, ‘reject’ means that I see risk; there’s something about the product or about the testing that I believe managers or the programmers should be aware of. ‘Reject’ means no more than that.

“In either case, we should frequently discuss my observations, potential issues, questions, obstacles and coverage notes, to avoid the possibility that I’m overlooking something important, or that I’m over-emphasizing the significance of particular problems.”

How you’re viewed depends on the commitments (another example here) that you make and declare about what you do, what you’re willing to do, and what you’re not willing to do. If your role, your profile and your commitments don’t match, getting them lined up is your most urgent and important job.

Related reading:
Signing Off
When Testers Are Asked For a Ship/No-Ship Opinion
Testers: Get Out of the Quality Assurance Business

How Long Will the Testing Take?

Friday, April 27th, 2018

Today yet another tester asked “When a client asks ‘How long will the testing take for our development project?’, how should I reply?”

The simplest answer I can offer is this: if you’re testing as part of a development project, testing will take exactly as long as development will take. That’s because effective, efficient testing is not separate from development; it is woven into development.

When we develop a software product or service, we build things: designs, prototypes, functions, modules, components, interfaces, integrated components, services, complete systems… There is some degree of risk—the possibility of problems—associated with everything we build. To be successful, each element of a system must fit with other elements of the system, and must satisfy the people who are using and building the system.

Despite our best intentions, we’re likely to make some mistakes and experience some misunderstandings along the way. If we review things as we go, we’ll eliminate many problems before we turn them into code. Still, there’s risk in using elements when we haven’t built, configured, operated, observed, and evaluated them. Unless we actually try to use what we’ve built, look for problem in it, and develop a well-grounded, empirical understanding of it, we risk fooling ourselves about how it really works—and doesn’t.

So it’s a good idea to start testing things as developer are creating or assembling them—and to test how they fit with other elements of the system—from the moment we start to build them or put them together.

Testing, for a given element or system, stops when we have a reasoned belief that there is no more development work to be done. Testing stops when people are satisfied that they know enough about the system and its elements to have discovered and resolved all the important problems. That satisfaction will be easier to establish relatively quickly, for any given element or system, when the product is designed to be easy to test. If you’ve been testing each part of the product deeply, investigating risk, and addressing problems throughout development, the whole product will be easier to test.

For the project to be successful, it’s important for the whole team to keep discovering and identifying ways in which people might be dissatisfied. That includes not only finding problems in the product, but also finding problems in our concept of what it might mean to be “done”. It’s not the tester’s job to build confidence in the product, but to reveal where confidence is unwarranted. That’s central to testing work, even though it’s difficult, disappointing, and to some degree socially disruptive.

Asking how long testing will take is an odd question for managers to ask, because it’s just like asking how long management will take. Management of a development project starts as development starts, and ends as development ends. Testing enables awareness about the product and problems in it, so that managers and developers can make decisions about it. So testing starts when the project starts, and testing stops when managers and developers have made their final decisions about it. Testing doesn’t stop until development and management of the project is done.

People often stop testing a product after it’s been put into production. Now: it might be a good idea to monitor and test some aspects of the system after it’s been put into production, too. (We call that live-site testing.) Live-site testing is often a very good idea, but like all other forms of testing, ultimately, it’s optional. Here’s one good reason to continue live-site testing: when you believe that there will be more development work done on the system.

So: development and management and testing go hand in hand. When a manager asks “How long will the testing take?”, it seems to me that the appropriate answer is “Testing will take just as long as development will take.” When people are satisfied that there’s no more important development work to do, they will also be satisfied that there’s no more important testing work to do either.

It’s important to remember, though: determining when people will be satisfied is something that we can guess, but cannot calculate. Satisfaction is a feeling, not a finish line.

Further reading:

Test Project Estimation, The Rapid Way
Test Estimation Is Really Negotiation
Project Estimation and Black Swans (Part 1)
Project Estimation and Black Swans (Part 5): Test Estimation

Very Short Blog Posts (35): Make Things Visible

Tuesday, April 24th, 2018

I hear a lot from testers who discover problems late in development, and who get grief for bringing them up.

On one level, the complaints are baseless, like holding an investigate journalist responsible for a corrupt government. On the other hand, there’s a way for testers to anticipate bad news and reduce the surprises. Try producing a product coverage outline and a risk list.

A product coverage outline is an artifact (a mind map, or list, or table) that identifies factors, dimensions, or elements of a product that might be relevant to testing it. Those factors might include the product’s structure, the functions it performs, the data it processes, the interfaces it provides, the platforms upon which it depends, the operations that people might perform with it, and the way the product is affected by time. (Look here for more detail.) Sketches or diagrams can help too.

As you learn more through deeper testing, add branches to the map, or create more detailed maps of particular areas. Highlight areas that have been tested so far. Use colour to indicate the testing effort that has been applied to each area—and where coverage is shallower.

A risk list is a list of bad things that might happen: Some person(s) will experience a problem with respect to something desirable that can be detected in some set of conditions because of a vulnerability in the system. Generate ideas on that, rank them, and list them.

At the beginning of the project or early as possible, post your coverage outline and risk list in places where people will see and read them. Update it daily. Invite questions and conversations. This can help you change “why didn’t you find that bug?” to “why didn’t we find that bug?”

How To Get What You Want From Testing (for Managers): The Q & A

Wednesday, November 22nd, 2017

On November 21, 2017, I delivered a webinar as part of the regular Technobility Webinar Series presented by my friend and colleague Peter de Jager. The webinar was called “How To Get What You Want from Testing (for Managers)”, and you can find it here.

Alas, we had only a one-hour slot, and there were plenty of questions afterwards. Each of the questions I received is potentially worthy of a blog post on its own. Here, though, I’ll try to summarize. I’ll give brief replies, suggesting links to where I might have already provided some answers, and offering an IOU for future blog posts if there are further questions or comments.

If the CEO doesn’t appreciate testing and QA enough in order to develop application of high quality how QA Manager can get battle?

First, in my view, testers should think seriously about whether they’re in the quality assurance business at all. We don’t run the business, we don’t manage the product, and we can’t assure quality. It’s our job to shine light on the product we’ve got, so that people who do run the business can decide whether it’s the product they want.

Second, the CEO is also the CQAO—the Chief Quality Assurance Officer. If there’s a disconnect between what you and the CEO believe to be important, at least one of two things is probably true: the CEO knows or values something you don’t, or you know or value something the CEO doesn’t. The knowledge problem may be resolved through communication—primarily conversation. If that doesn’t solve the disconnect, it’s more likely a question of a different set of values. If you don’t share those, your options are to adopt the CEO’s values, or to accept them, or to find a CEO whose values you do share.

How to change managers to focus on the whole (e.g. lead time) instead of testing metrics (when vendor X is coding and vendor Y is testing?

As a tester, it’s not my job to change a manager, as such. I’m not the manager’s manager, and I don’t change people. I might try to steer the manager’s focus to some degree, which I can do by pointing to problems and to risks.

Is there a risk associated with focusing on testing metrics instead of on the whole? My answer, based on experience, is Yes; many managers have trouble observing software and software development. If you also believe that there’s risk associated with metrics fixation, have you made those risks clear to the manager? If your answer to that question is “I’ve tried, but I haven’t been successful,” get in touch with me and perhaps I can be of help.

One way I might be able to help right away is to recommend some reading: “Software Engineering Metrics: What Do They Measure and How Do We Know” (Kaner and Bond); by Measuring and Managing Performance in Organizations, by Robert Austin; and by Jerry Weinberg’s Quality Management series, especially Quality Software Management, Vol. 2: First Order Measurement (also available as two e-books, “How to Observe Software Systems” and “Responding to Significant Software Events“). Those works outline the problems, and suggest some solutions.

An important suggestion related to this is to offer, to agree upon, and to live up to a set of commitments, along with an understanding of what roles involve.

How do you ensure testers put under the microscope the important bugs and ignore the trivial stuff?

My answers here include: training, continuous feedback, and continuous learning; continuous development, review, and refinement of risk models; providing testers with knowledge of and access to stakeholders that matter. Good models of quality criteria, product elements, and test techniques (the Heuristic Test Strategy Model is an example) help a lot, too.

Suggestions for the scalability of regression testing, specifically when developers say “this touches everything.”

When the developer says “this touches everything”, one interpretation is “I’m not sure what this touches” (since “this touches everything” is, at best, rarely true). “I’m not sure what this touches,” in turn, really means “I don’t actually understand what I’m modifying”. That interpretation (like others revealed in the “Testing without Machinery” chapter in Perfect Software and Other Illusions About Testing) points to a Severity 0 product and project risk.

So this is not simply a question about the scalability of regression testing. This is a question about the scalability of architecture, design, development, and programming. These are issues for the whole team—including managers—to address. The management suggestion is to steer things towards getting the developer’s model of the code more clear, refactoring the code and the design until it’s comprehensible.

I’ve done some presentations on regression testing before, and some blog posts here and here (although, ten years later, we no longer talk about “manual tests” and “automated tests”; we do talk about testing and checking.)

How can we avoid a lot of back and forth, e.g. submitting issues piecemeal vs. submitting in a larger set.

One way would be to practice telling a three-part testing story and delivering the news. That said, I’m not sure I have enough context to help you out. Please feel free to leave a comment or otherwise get in touch.

How much of what you teach can expand to testing in other contexts?

Plenty of it, I’d say. In Rapid Software Testing, we take an expansive view of testing; evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, and plenty of other stuff. That’s applicable to lots of contexts. We also teach applied critical thinking—that is, thinking about thinking with the intention of avoiding being fooled. That’s extensible to lots of domains.

I’m curious about the other contexts you might have in mind, and why you ask. If I can be of assistance, please let me know.

I agree 100% with everything you say, and I feel daily meetings are not working, because everybody says what they did and what they will do and there is no talk about the state of the product.

That’s a significant test result—another example of the sort of thing that Jerry Weinberg is referring to in the “Testing without Machinery” chapter in Perfect Software and Other Illusions About Testing. I’m not sure what your role is. As a manager, I would mandate reports on the status of the product as part of the daily conversation. As a tester, I would provide that information to rest of the team, since that’s my job. What I found about the product, what I did, and what happened when I did are part of that three-part testing story.

Perhaps test cases are used as they are quantifiable and helps organise the testing of parts of the product.

Sure. Email messages are also quantifiable and they also help to organise the testing of parts of the product. Yet we don’t evaluate a business by the number of email messages it produces. Let’s agree to think all the way through this.

“Writing” is not “factory work” either, but as you say, many, many tools help check spelling, grammar, awkward phrases, etc.

Of course. As I said in a recent blog post, “A good editor uses the spelling checker, while carefully monitoring and systematically distrusting it.” We do not dispute the power and usefulness of tools. We do remain skeptical about what will happen when tools are put in the hands of those who don’t have the skill or wisdom to apply them appropriately.

I appreciate the “testing/checking” distinction, but wish you had applied the labels in the other way, given the rise of ‘the checklist” as a crucial tool in both piloting and surgery—application of the checklist is NOT algorithmic, when done as recommended.

Indeed. I’m confident, though, that smart people can keep track of the differences between “checking” or “a check” and a “checklist”, just as they can keep track of the differences between “testing” or “a test” and “testimony”.

I would include “breaking” the product (to see if it fails gracefully and to find its limits).

I prefer to say “stressing” the product, and I agree absolutely with the goal to discover its limits, whether it fails gracefully, and what happens when it doesn’t.

“Breaking” is a common metaphor, to be sure. It worries me to some degree because of the public relations issue: ““The software was fine until the testers broke it.” “We could ship our wonderful product on time if only the testers would stop breaking it.” “Normal customers wouldn’t have problems our wonderful product; it’s just that the testers break it.” “There are no systemic management or development problems that have been leading to problems in the product. Nuh-uh. No way. The testers broke it.

What about problems with the customer/user? Uninformed use or, even more problematical, unexpected use (especially if many users do the unexpected — using Lotus 123 as a word processor, for instance). I would argue you need to “model the human (including his or her context of machine, attempted use, level of understanding, et al.).

And I would not argue with you. I’d agree, especially with respect to the tendency of users to do surprising things.

I think stories about the product should also include items about what the product did right (or less bad than expected), especially items that were NOT the focus of the design (ie, how well does Lotus 123 work as a word processor?).

They could certainly include that to some degree. After all, the first items in our list of what to report in the product story are what it is and what it does, presumably successfully. Those things are important, and it’s worth reporting good news when there’s some available. My observation and experience suggests that reporting the good news is less urgent than noting what the product doesn’t do, or doesn’t do well, relative to what people want it to do, or hope that it does. It’s important, to me, that the good news doesn’t displace the important bad news. We’re not in the quality reassurance business.

So far, you appear to be missing much of the user side– what will they know, what context will they be in, what will they be doing with the product? (A friend was a hero for getting over-sized keyboards for football players to deal with all the typing problems that appeared in the software but were the consequence of huge fingers trying to hit normal-sized keytops.

That’s a great story.

I am really missing any mention of users, especially in a age of sensitivity to disabilities like lack of sight, hearing, manual dexterity, et al. It wasn”t until 11:45 or so that I heard about “use” and “abuse.” Again, all good stuff, but missing a huge part of wht I think is the source of risk.

I was mostly talking to project managers and their relationships with testers in this talk (and at that, I went over time). In other talks (and especially when talking to testers), I would underscore the importance of modeling the end user’s desires and actions.

Please in the blog, talk about risk from unexpected use (abuse and unusual uses).

The short blog post that I mentioned above talks about that a bit. Here is a more detailed one. Lately, I’ve been enjoying Gojko Adzic’s book Humans vs. Computers, and can recommend it.

Is There a Simple Coverage Metric?

Tuesday, April 26th, 2016

In response to my recent blog post, 100% Coverage is Possible, reader Hema Khurana asked:

“Also some measure is required otherwise we wouldn’t know about the depth of coverage. Any straight measures available?”

I replied, “I don’t know what you mean by a ‘straight’ measure. Can you explain what you mean by that?”

Hema responded: “I meant a metric some X/Y.”

In all honesty, it’s sometimes hard to remain patient when this question seems to come up at every conference, in every class, week upon week, year upon year. Asking me about this is a little like asking Chris Hadfield—since he’s a well-known astronaut and a pretty smart guy—if he could provide a way of measuring the area of the flat, rectangular earth. But Hema hasn’t asked me before, and we’ve never met, so I don’t want to be immediately dismissive.

My answer, my fast answer, is No. One key problem here is related to what Y could possibly represent. What counts? Maybe we could talk about Y in terms of a number of test cases, and X as how many of those test cases we’ve executed so far. If Y is 600 and X is 540, we could say that testing is 90% done. But that ignores at least two fundamental problems.

The first problem is that, irrespective of the number of test cases we have, we could choose to add more at any time as (via testing) we discover different conditions that we would like to evaluate. Or maybe we could choose to drop test cases when we realize that they’re out of date or irrelevant or erroneous. That is, unless we decide to ignore what we’ve learned, Y will, quite appropriately, change over time.

The second problem is that—at least in my view, and in the view of my colleagues—test cases are a ludicrous way to think about testing.

Another almost-as-quick answer would be to encourage people to re-read that 100% Coverage is Possible post (and the Further Reading links), and to keep re-reading until they get it.

But that’s probably not very encouraging to someone who is asking a naive question, and I’d like to more be helpful than that.

Here’s one thing we could do, if someone were desperate for numbers that summarize coverage: we could make a qualitative evaluation of coverage, and put numbers (or letters, or symbols) on a scale that is nominal and very weakly ordinal.

Our qualitative evaluation would be rooted in analysis of many dimensions of coverage. The Product Elements and Quality Criteria sections of the Heuristic Test Strategy Model provides a framework for generating coverage ideas or for reviewing our coverage retrospectively. We would review and discuss how much testing we’ve done of specific features, or particular functional areas, or perceived risks, and summarize our evaluation using a simple scale that would go something like this:

Level 0 (or X, or an empty circle, or…): We know nothing at all about this area of the product.

Level 1 (or C, or a glassy-eyed emoticon, or…): We have done a very cursory evaluation of this area. Smoke- or sanity-level; we’ve visited this feature and had a brief look at it, but we don’t really know very much about it; we haven’t probed it in any real depth.

Level 2 (or B, or a normal-looking emoticon, or…): We’ve had a reasonable look at this area, although we haven’t gone all the way deep. We’ve examined the common, the core, the critical, the happy paths, the handling of everyday errors or exceptions. We’ve pretty familiar with this area. We’ve done the kind of testing that would expose some significant bugs, if they were there.

Level 3 (or A, or a determined-looking angel emoticon, or…): We’ve really kicked this area harshly and hard. We’ve looked at unusual and complex conditions or states. We’ve probed deeply for subtle or hidden bugs. We’ve exposed the product to the extreme, the exceptional, the rare, the improbable. We’ve looked for bugs that are deep in the corners or hidden in the dark. If there were a serious bug, we’re pretty sure we would have found it by now.

Strictly speaking, these numbers are placed on an ordinal scale, in the sense that Level 3 coverage is deeper than Level 2, which is deeper than Level 1. (If you don’t know about scales of measurement, you should learn about them before providing or asking for metrics. And there are some other things to look at.) The numbers are certainly not an interval scale, or a ratio scale. They may not be commensurate from one feature area to the next; that is, they may represent different notions of coverage, different amounts of effort, different modes of evaluation. By design, these numbers should not be treated as valid measurements, and we should make sure that everyone on the project knows it. They are little labels that summarize evaluations and product elements and effort, factors that must be discussed to be understood. But those discussions can lead to understanding and consensus between ourselves, our colleagues, and our clients.

On a Role

Monday, June 15th, 2015

This article was originally published in the February 2015 edition of Testing Trapeze, an excellent online testing magazine produced by our testing friends in New Zealand. There are small edits here from the version I submitted.

Once upon a time, before I was a tester, I worked in theatre. Throughout my career, I took on many roles—but maybe not in the way you’d immediately expect.

In my early days, I was a performer, acting in roles in the sense that springs to mind for most people when they think of theatre: characters in a play. Most of the time, though, I was in the role of a stage manager, which is a little like being a program manager in a software development group. Sometimes my role was that of a lighting designer, sound engineer, or stagehand. I worked in the wardrobe of the Toronto production of CATS for six months, too.

Recent discussions about software development have prompted me to think about the role of roles in our work, and in work generally. For example, in a typical theatre piece, an actor performs in three different roles at once. Here, I’ll classify them…

a first-order role, in which a person is a member of the theatre company throughout the rehearsal period and run of the play. If someone asks him “What are you working on these days?”, he’ll reply “I’m doing a show with the Mistytown Theatre Company.”

a second-order role that the person takes on when he arrives at the theatre, defocusing from his day-to-day role as a husband and father, and focusing his energy on being an actor, or stagehand, or lighting designer. He typically holds that second-order role over the course of the working day, and abandons it when it’s time to go home.

a third-order role that the actor performs as a specific character at some point during the show. In many cases, the actor takes on one character per performance. Occasionally an actor takes on several different characters throughout the course of the performance, playing a new third-order role from one moment to another. In an improvisational theatre company, a performer may pick up and drop third-order roles as quickly as you or I would don or doff a hat. In a more traditional style of theatre, roles are more sharply defined, and things can get confusing when actors suddenly and unexpectedly change roles mid-performance.

(I saw that happen once during my theatre career. An elderly performer took ill during the middle of the first act, and her much younger understudy stepped in for the remainder of the show. It was necessary on that occasion, of course, but the relationships between the performers were shaken up for the rest of the evening, and there was no telling what sense the audience was able to make of the sudden switch until intermission when the stage manager made an announcement.)

It’s natural and normal to deal simultaneously with roles of different orders, but it’s hard to handle two roles of the same order at exactly the same time. For example, a person may be both a member of a theatre company and a parent, but it’s not easy to supervise a child while you’re on stage in the middle of a show. In a small theatre company, the same person might hold two second-order roles—as both an actor and a costume designer, say—but in a given moment, that person is focusing on either acting or costume design, but not both at once.

People in a perfomer role tend not to play two different third-order roles—two different characters—at the same moment. There are rare exceptions, as in those weird Star Trek episodes or in movies like All of Me, in which one character is inhabiting the body of another. To perform successfully in two simultaneous third-order roles takes spectacular amounts of discipline and skill, and the occasions where it’s necessary to do so aren’t terribly common.

Some roles are more temporary than others. At the end of the performance, people drop their second-order roles to go home and live out their other, more long-term roles; husbands and wives, parents, daughters and sons. They may adopt other roles too: volunteer in the community soup kitchen; declarer in this hand of the bridge game; parishioners at the church; pitcher on the softball team.

Roles can be refined and redefined; in a dramatic television series, an actor performs in a third-order role in each episode, as a particular character. If it’s an interesting character, aspects of the role change and develop over time.

At the end of the run of a show, people may continue in their first-order roles with the same theatre company; they may become directors or choreographers with that company; or they may move on to another role in another company. They may take on another career altogether. Other roles evolve too, from friend to lover to spouse to parent.

In theatre, a role is an identity that a person takes to fulfill some purpose in service of the theatre company, production, or the nightly show. More generally, a role is a position or function that a person adopts and performs temporarily. A role represents a set of services offered, and often includes tacit or explicit commmitments to do certain things for and with other people.

A role is a way to summarize ideas about services people offer, activities they perform, and the goals that guide them.

Now: to software. As a member of a software development team within an organization, I’m an individual contributor. In that first-order role, I’m a generalist. I’ve been a program manager, programmer, tech support person, technical writer, network administrator, and phone system administrator, business owner, bookkeeper, teacher, musician… Those experiences have helped me to be aware of the diversity of roles on a project, to recognize and respect the the people who perform them, and to be able to perform them effectively to some extent if necessary.

In the individual contributor role, I commit to taking on work to help the company to achieve success, just as (I hope) everyone else in the company does.

Normally I’m taking on the everyday, second-order role of a tester, just as member of a theatre company might walk through the door in the evening as a lighting technician. By adopting the testing role, I’m declaring my commitment to specialize in providing testing services for the project.

That doesn’t limit me to testing, of course. If I’m asked, I might also do some programming or documentation work, especially in small development groups—just as an actor in a very small theatre company might help in the box office and take ticket orders from time to time. Nonetheless, my commitment and responsibility to provide testing services requires me to be very cautious about taking on things outside the testing role.

When I’m hired as a tester, my default belief is that there’s going to be more than enough testing work to do. If I’m being asked to perform in a different role such that important testing work might be neglected or compromised, I must figure out the priorities with my client.

Within my testing role, I might take on a third-order role as a responsible tester (James Bach has blogged on the role of the responsible tester) for a given project, but I might take on a variety of third-order roles as a test jumper (James has blogged about test jumpers, too).

Like parts of an outfit that I choose to wear, a role is a heuristic that can help to suggest who I am and what I do. In a hospital, the medical staff are easy to identify, wearing uniforms, lab coats, or scrubs that distinguish them from civilian life. Everyone wears badges that allow others to identify them. Surgical staff wear personalized caps—some plain and ordinary, others colourful and whimsical. Doctors often have stethoscopes stuffed into a coat pocket, and certificates from medical schools on their walls.

Yet what we might see remains a hint, not a certainty; someone dressed like a nurse may not be a nurse. The role is not a guarantee that the person is qualified to do the work, so it’s worthwhile to see if the garb is a good fit for the person wearing it.

The “team member” role is one thing; the role within the team is another. In a FIFA soccer match, the goalkeeper is dressed differently to make the distinct role—with its special responsibilities and expectations—clearly visible to everyone else, including his team members.

The goalkeeper’s role is to mind the net, not to run downfield trying to score goals. There’s no rule against a goalie trying to do what a striker does, but to do so would be disruptive to the dynamics of the team. When a goalkeeper runs downfield trying to score goals, he leaves the net unattended—and those who chose to defend the goal crease aren’t allowed to use their hands.

In well-organized, self-organized teamwork, roles help to identify whether people are in appropriate places. If I’m known as a tester on the project and I am suddenly indisposed, unavailable, or out of position, people are more likely to recognize that some of the testing work won’t get done.

Conversely, if someone else can’t fulfill their role for some reason, I’m prepared to step up and volunteer to help. Yet to be helpful, I need to coordinate consistently with the rest of the team to make sure our perceptions line up. On the one hand, I may not have have noticed important and necessary work. On the other, I don’t want to inflict help on the project, nor would it be respectful or wise for me to usurp anyone else’s role.

Shifting positions to adapt to a changing situation can be a lot easier when roles help to frame where we’re coming from, where we are, and where we’re going.

A role is not a full-body tattoo, permanently inscribed on me, difficult and painful to remove. A role is not a straitjacket. I wouldn’t volunteer to wear a straitjacket, and I’ll resist if someone tries to put me into one. As Kent Beck has said, “Responsibility cannot be assigned; it can only be accepted. If someone tries to give you responsibility, only you can decide if you are responsible or if you aren’t.” (from Extreme Programming Explained: Embrace Change)

I also (metaphorically) study escape artistry in the unlikely event that someone manages to constrain me. When I adopt a role, I must do so voluntarily, understanding the commitment I’m making and believing that I can perform it well—or learn it in a hurry.

I might temporarily adopt a third-order role normally taken by someone else, but in the long run, I can’t commit to a role without full and ongoing understanding, agreement, and consent between me and my clients.

If I resist accepting a role, I don’t do so capriciously or arbitrarily, but for deeply practical reasons related to three important problems.

The Expertise Problem. I’m willing to do or to learn almost anything, but there is often work for which I may be incompetent, unprepared or underqualified. Each set of tasks in software development requires a significant and distinct set of skills which must be learned and practiced if they are to be performed expertly.

I don’t want fool my client or my team into believing that the work will be done well until I’m capable, so I’ll push back on working in certain roles unless my client is willing to accept the attendant risks.

For example, becoming an expert programmer takes years of focused study, experience, and determination. As Collins and Evans suggest, real expertise requires not only skill, but also ongoing maintenance; immersion in a way of life. James Bach remarked to me recently, “The only reason that I’m not an expert programmer now is that I haven’t tried it. I’ve been in the software business for thirty years, and if I had focused on programming, I’d be a kick-ass programmer by now. But I chose to be a tester instead.”

I feel the same way. Programming is a valuable means to end for me—it helps me get certain kinds of testing work done. I can be a quite capable programmer when I put my mind to it, but I find I have to do programming constantly—almost obsessively—to maintain my skills to my own standards. (These days, if I were asked to do any kind of production programming—even minor changes to the code—I would insist on both close collaboration with peers and careful review by an expert.)

I believe I can perform competently, adequately, eventually, in any role. Yet competence and adequacy aren’t enough when I aspire to achieving excellence and mastery.

At a certain point in my life, I decided to focus my time and energy on testing and the teaching of it; the testing and teaching roles are the ones that attract me most. Their skills are the ones that I am most interested in trying to master—just as others are focused on mastering programming skills.

So: roles represent a heuristic for focusing my development of expertise, and for distributing expertise around the team.

The Mindset Problem. Building a product demands a certain mindset; testing it deeply demands another. When I’m programming or writing (as I’m doing now), I tend to be in the builder’s mindset. As such, I’m at close “critical distance” to the work. I’m seeing it from the position of an insider—me—rather than as an outsider.

When I’m in the builder’s mindset, it’s relatively easy for me to perform shallow testing and spot coding errors, or spelling and grammatical mistakes—although after I’ve been looking at the work for a while, I may start to miss those as well.

In the builder’s mindset, it’s quite a bit harder for me to notice deeper structural or thematic problems, because I’ve invested time and energy in building the piece as I have, converging towards something I believe that I want. To see deeper problems, I need the greater critical distance that’s available in the tester’s mindset—what testers or editors do.

It’s not a trivial matter to switch between mindsets, especially with respect to one’s own work. Switching mindsets is not impossible, but shifting from building into good critical and analytical work is effortful and time-consuming, and messes with the flow.

One heuristic for identifying deep problems in my writing work would be to walk away from writing—from the builder’s mindset—and come back later with the tester’s mindset—just as I’ve done several times with this essay. However, the change in mindset takes time, and even after days or weeks, part of me remains in the writer’s mindset—because it’s my writing.

Similarly, a programmer in the flow of developing a product may find it disruptive—both logistically and intellectually—to switch mindsets and start looking for problems. In fact, the required effort likely explains a good deal of some programmers’ stated reluctance to do deep testing on their own.

So another useful heuristic is for the builder to show the work to other people. As they are different people, other builders naturally have critical distance, but that distance gets emphasized when they agree to take on a testing role.

I’ve done that with this article too, by enlisting helpers—other writers who adopt the roles of editors and reviewers. A reviewer might usually identify herself as a writer, just as someone in a testing role might normally identify as a programmer. Yet temporarily adopting a reviewer’s role and a testing mindset frames the approach to the task at hand—finding important problems in the work that are harder to see quickly from the builder’s mindset.

In publishing, some people by inclination, experience, training, and skills specialize in editing, rather than writing. The editing role is analogous to that of the dedicated tester—someone who remains consistently in the tester’s mindset, at even farther critical distance from the work than the builder-helpers are—more quickly and easily able to observe deep, rare, or subtle problems that builders might not notice.

The Workspace Problem. Tasks in software development may require careful preparation, ongoing design, and day-to-day, long-term maintenance of environments and tools. Different jobs require different workspaces.

Programmers, in the building role, set up their environments and tools to do development and building work most simply and efficiently. Setting up a test lab for all of its different purposes—investigation of problems from the field; testing for adaptability and platform support; benchmarking for performance—takes time and focus away from valuable development tasks. The testing role provides a heuristic for distributing and organizing the work of maintaining the test lab.

People sometimes say “on an Agile project, everybody does everything” or “there are no roles on an Agile project”. To me, that’s like saying that there is no particular focusing heuristic for the services that people offer; throwing out the baby of skill with the bathwater of overspecialization and isolation.

Indeed, “everybody doing everything” seems to run counter to another idea important to Agile development: expertise and craftsmanship. A successful team is one in which people with diversified skills, interests, temperaments, and experiences work together to produce something that they could not have produced individually.

Roles are powerful heuristics for helping to organize and structure the relationships between those people. Even though I’m willing to do anything, I can serve the project best in the testing role, just as others serve the project best in the developer role.

That’s the end of the article. However, my colleague James Bach offered these observations on roles, which were included as a sidebar to the article in the magazine.

A role is probably not:

  • a declaration of the only things you are allowed to do. (It is neither a prison cell nor a destiny from which escape is not possible.)
  • a declaration of the things that you and you only are allowed to do. (It is not a fortress that prevents entry from anyone outside.)
  • a one-size, exclusive, permanent, or generic structure.

A role is:

  • a declaration of what one can be relied upon to do; a promise to perform a service or services well. (Some of those services may be explict; others are tacit.)
  • a unifying idea serving to focus commitment, preparation, performance, and delivery of services.
  • a heuristic for helping people manage their time on a project, and to be able to determine spontaneously who to approach, consult with, or make requests to (or sometimes avoid), in order to get things done.
  • a heuristic for fostering personal engagement and responsibility.
  • a heuristic for defining or explaining the meaning of your work.
  • a flexible and non-exclusive structure that may exist over a span of moments or years.
  • a label that represents these things.
  • a voluntary commitment.

A role may or may not be:

  • an identity
  • a component of identity.

—James Bach