Blog Posts from July, 2015

Very Short Blog Posts (29): Defective Detection Effectiveness

Tuesday, July 14th, 2015

Managers are responsible for hiring testers, for training them, and for removing any obstacles that make testing harder or slower. Managers are also responsible for hiring developers and designers, and providing appropriate training when it’s needed. If there are problems in development, managers are responsible for helping the developers to address them.

Managers are also responsible for the scope of the product, the budget, the staffing, and the schedule. As such, they’re responsible for maintaining awareness of the product, of product development, and anything that threatens the value of either of these. Finally, managers are responsible for the release decision: is this product ready for deployment or release into the market?

Misbegotten metrics like “Defect Detection Percentage” (I won’t dignify references to them with a link) continue to plague the software development world, and are sometimes used to evaluate “testing effectiveness”. But since it’s management’s job to understand the product and to decide when the product ships, a too-low defect detection percentage suggests the possibility of development or testing problems, unaware management, or a rash shipping decision. Testers don’t decide whether or when to ship the product; that’s management’s responsibility. In other words: Defect Detection Percentage—to the degree that it has any validity at all—measures management effectiveness.

On Green

Tuesday, July 7th, 2015

A little while ago, I took a look at what happens when a check runs red. Since then, comments and conversations with colleagues emphasized this point from the post: it’s overwhelmingly common first to doubt the red result, and then to doubt the check. A red check almost provokes a kind of panic for some testers, because it takes away a green check’s comforting—even narcotic—confirmation that Everything Is Going Just Fine.

Skepticism about any kind of test result is reasonable, of course. Before delivering painful news, it’s natural and responsible for a tester to examine the evidence for it carefully. All software projects—and all decisions about quality—are to some degree loaded with politics and emotions. This is normal. When a tester’s technical and social skills are strong, and self-esteem is high, those political and emotional considerations are manageable. When we encounter a red check—a suggestion that there might be a problem in the product—we must be prepared for powerful feelings, potential controversy, and cognitive dissonance all around. When people feel politically or emotionally vulnerable, the cognitive dissonance can start to overwhelm the desire to investigate the problem. Several colleague have recalled circumstances in which intermittent red checks were considered sufficiently pesky by someone on the project team—even by testers themselves, on occasion—that the checks were ignored or disabled, as one might do with a cooking detector.

So what happens when checks return “green” results?

As my colleague James Bach puts it, checks are like motion detectors around the boundaries of our attention. When the check runs green, it’s easy to remain relaxed. The alarm doesn’t sound; the emergency lighting doesn’t come on; the dog doesn’t bark. If we’re insufficiently attentive and skeptical, every green check helps to confirm that everything is okay.

Kirk and Miller identified a big problem with confirmation:

Most of the technology of “confirmatory” non-qualitative research in both the social and natural sciences is aimed at preventing discovery. When confirmatory research goes smoothly, everything comes out precisely as expected. Received theory is supported by one more example of its usefulness, and requires no change. As in everyday social life, confirmation is exactly the absence of insight. In science, as in life, dramatic new discoveries must almost by definition be accidental (“serendipitous”). Indeed, they occur only in consequence of some mistake.

Kirk, Jerome, and Miller, Marc L., Reliability and Validity in Qualitative Research (Qualitative Research Methods). Sage Publications, Inc, Thousand Oaks, CA, 1985.

It’s our relationship between the checks and our models of them that matters here. When we have unjustified trust in our checks, we have the opposite problem that we have with the cooking detector: we’re unlikely to notice that the alarm doesn’t go off when it should. That is, we don’t pay attention. The good news is that being inattentive is optional. We can choose to hold on to the possibility that something might be wrong with our checks, and to identify the absence of red checks as meta-information; a suspicious silence, instead of a comforting one. The responsible homeowner checks the batteries on the smoke alarm, and the savvy explorer knows when to say “The forest is quiet tonight… maybe too quiet.”

By putting variation into our testing, we rescue ourselves from the possibility that our checks are too narrow, too specific, cover too few kinds of risk. If you’re aware of the possibility that your alarm clock might fail to wake you, you’re more likely to take alternative measures to avoid sleeping too long.

Valuable conversations with James Bach and Chris Tranter contributed to this post.

On Scripting

Saturday, July 4th, 2015

A script, in the general sense, is something that constrains our actions in some way.

In common talk about testing, there’s one fairly specific and narrow sense of the word “script”—a formal sequence of steps that are intended to specify behaviour on the part of some agent—the tester, a program, or a tool. Let’s call that “formal scripting”. In Rapid Software Testing, we also talk about scripts as something more general, in the same kind of way that some psychologists might talk about “behavioural scripts”: things that direct, constrain, or program our behaviour in some way. Scripts of that nature might be formal or informal, explicit or tacit, and we might follow them consciously or unconsciously. Scripts shape the ways in which people behave, influencing what we might expect people to do in a scenario as the action plays out.

As James Bach says in the comments to our blog post Exploratory Testing 3.0, “By ‘script’ we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.” (my emphasis, there)

When I’m driving to a party out in the country, the list of directions that I got from the host scripts me. Many other things script me too. The starting time of the party—combined with cultural norms that establish whether I should be very prompt or fashionably late—prompts me to leave home at a certain time. The traffic laws and the local driving culture condition my behaviour and my interactions with other people on the road. The marked detour along the route scripts me, as do the weather and the driving conditions. My temperament and my current emotional state script me too. In this more general sense of “scripting”, any activity can become heavily scripted, even if it isn’t written down in a formal way.

Scripts are not universally bad things, of course. They often provide compelling advantages. Scripts can save cognitive effort; the more my behaviour is scripted, the less I have to think, do research, make choices, or get confused. In my driving example, a certain degree of scripting helps me to get where I’m going, to get along with other drivers, and to avoid certain kinds of trouble. Still, if I want to get to the party without harm to myself or other people, I must bring my own agency to the task and stay vigilant, present, and attentive, making conscious and intentional choices. Scripts might influence my choices, and may even help me make better choices, but they should not control me; I must remain in control. Following a script means giving up engagement and responsibility for that part of the action.

From time to time, testing might include formal testing—testing that must be done in a specific way, or to check specific facts. On those occasions, formal scripting—especially the kind of formal script followed by a machine—might be a reasonable approach enabling certain kinds of tasks and managing them successfully. A highly scripted approach could be helpful for rote activities like operating the product following explicitly declared steps and then checking for specific outputs. A highly scripted approach might also enable or extend certain kinds of variation—randomizing data, for example. But there are many other activities in testing: learning about the product, designing a test strategy, interviewing a domain expert, recognizing a new risk, investigating a bug—and dealing with problems in formally scripted activities. In those cases, variability and adaptation are essential, and an overly formal approach is likely to be damaging, time-consuming, or outright impossible. Here’s something else that is almost never formally scripted: the behaviour of normal people using software.

Notice on the one hand that formal testing is, by its nature, highly scripted; most of the time, scripting constrains or even prevents exploration by constraining variation. On the other hand, if you want to make really good decisions about what to test formally, how to test formally, why to test formally, it helps enormously to learn about the product in unscripted and informal ways: conversation, experimentation, investigation… So excellent scripted testing and excellent checking are rooted in exploratory work. They begin with exploratory work and depend on exploratory work. To use language as Harry Collins might, scripted testing is parasitic on exploration.

We say that any testing worthy of the name is fundamentally exploratory. We say that to test a product means to evaluate it by learning about it through experimentation and exploration. To explore a product means to investigate it, to examine it, to create and travel over maps and models of it. Testing includes studying the product, modeling it, questioning it, making inferences about it, operating it, observing it. Testing includes reporting, which itself includes choosing what to report and how to contextualize it. We believe these activities cannot be encoded in explicit procedural scripting in the narrow sense that I mentioned earlier, even though they are all scripted to some degree in the more general sense. Excellent testing—excellent learning—requires us to think and to make choices, which includes thinking about what might be scripting us, and deciding whether to control those scripts or to be controlled by them. We must remain aware of the factors that are scripting us so that we can manage them, taking advantage of them when they help and resisting them when they interfere with our mission.