Blog Posts for the ‘Context’ Category

(At Least) Four Things for Testers To Do in Planning Meetings

Wednesday, October 18th, 2017

There’s much talk these days of DevOps, and Agile development, and “shift left”. Apparently, in these process models, it’s a revelation that testers can do more than test a built product, and that testers can and should be involved at every step of development.

In Rapid Software Testing, that’s not exactly news. From the beginning, we’ve rejected the idea that the product has to be complete, or has to pass some kind of “quality gate” or meet “acceptance criteria” before we start testing. We welcome the opportunity to test anything that anyone is willing to give us. We’ll happily do testing at any time from the moment someone has an idea for a product until long after the product has been released.

When testers are invited to planning meetings, there’s clearly no product to test. So what are we there for?

We’re there to learn. Testing is evaluating a product by learning about it through exploration and experimentation. At the meeting, there is a product to test. Running code is not the only kind of product we can test—not by a long shot. Ideas, designs, documents, drawings, and prototypes are products too. We can explore them, and perform thought experiments on them—and we can learn about them and evaluate them.

At the meeting, we’re there to learn about the product; to learn about the technology; to learn about the contexts in which the product will be used; to learn about plans for building the product. Our role is to become aware of all of the sources of information that might aid in our testing, and in development of the product generally. We’re there to find out about risks that threaten the value of the product in the short and long term, and about problems that might threaten the on-time, successful completion of the product.

We’re there to advocate for testability. Testability might happen by accident, without our help. It’s the role of a responsible tester to make sure that testability happens intentionally, by design. Note that testability is not just about stuff that’s intrinsic to the product. There are factors in the project, in our notions of value, and in our understanding of the risk gap that influence testability. Testability is also subjective with respect to us, our knowledge and skills, and our relationship to the team. So part of our jobs during preparation for development is to ask for the help we’ll need to make ourselves more powerful testers.

We’re there to challenge. Other people are in roles oriented towards building the product. They are focused on synthesis, and envisioning success. If they’re designers, they might be focused on helping the user to accomplish a task, on efficiency, or effectiveness, or on esthetics. If they’re business people, they might be focused on accomplishing some business goal, or meeting a deadline. Developers are often focused more on the details than on the big pictures. All of those people may be anxious to declare and meet a definition of “done”.

The testing role is to think critically about the product and the project; to ask how we might be fooling ourselves. We’re tilted towards asking good questions instead of getting “the right answer”; towards analysis more than synthesis; towards skepticism and suspicion more than optimism; towards anticipating problems more than seeking solutions. We can do those other things, but when we do, we pop for that moment out of a testing role and into a building role.

As testers, we’re trying to notice problems in what people are talking about in the meetings. We’re trying to identify obstacles that might hinder the user’s task; ways in which the product might be ineffective, inefficient, or unappealing. We’re trying to recognize how the business goal might not be met, or how the deadline could be blown. We’re alternating between small details and the big picture. We’re trying to figure out how our definition of done might be inadequate; how we might be fooling ourselves into believing we’re done when we’re not. We’re here to challenge the idea that something is okay when it might not be okay.

We’re there to establish our roles as testers. A role is a heuristic that helps in managing time, focus, and responsibility. The testing role is a commitment to perform valuable and necessary services: to focus on discovering problems, ideally early when they’re small, so that they can be prevented from turning into bigger problems later; to build a product and a project that affords rapid, inexpensive discovery and learning; and to challenge the ideas and artifacts that represent what we think we know about the product and its design. These tasks are socially, psychologically, emotionally, and politically difficult. Unless we handle them gracefully, our questioning, problem-focused role will be seen as merely disruptive, rather than an essential part of the process of building something excellent.

In Rapid Software Testing, we don’t claim that someone must be in the testing role, or must have the job title “tester”. We do believe that having someone responsible for the testing role helps to put focus on the task of providing helpful feedback. This should be a service to the project, not an obstacle. It requires us to maintain close social distance while maintaining a good deal of critical distance.

Of course, the four things that I’ve mentioned here can be done in any development model. They can be done not only in planning meetings, but at any time when we are engaging with others, at any time in the product’s development, at any level of granularity or formality. DevOps and Agile and “shift left” are context. Testing is always testing.

Some related posts:

What Exploratory Testing Is Not (Part 2): After-Everything-Else Testing

Exploratory Testing and Review

Exploratory Testing is All Around You

Testers Don’t Prevent Problems

What Is A Tester?

Testing is…

A Context-Driven Approach to Automation in Testing

Sunday, January 31st, 2016

(We interrupt the previously-scheduled—and long—series on oracles for a public service announcement.)

Over the last year James Bach and I have been refining our ideas about the relationships between testing and tools in Rapid Software Testing. The result is this paper. It’s not a short piece, because it’s not a light subject. Here’s the abstract:

There are many wonderful ways tools can be used to help software testing. Yet, all across industry, tools are poorly applied, which adds terrible waste, confusion, and pain to what is already a hard problem. Why is this so? What can be done? We think the basic problem is a shallow, narrow, and ritualistic approach to tool use. This is encouraged by the pandemic, rarely examined, and absolutely false belief that testing is a mechanical, repetitive process.

Good testing, like programming, is instead a challenging intellectual process. Tool use in testing must therefore be mediated by people who understand the complexities of tools and of tests. This is as true for testing as for development, or indeed as it is for any skilled occupation from carpentry to medicine.

You can find the article here. Enjoy!

Rising Against the Rent-Seekers

Monday, August 25th, 2014

At CAST 2014, a quiet, modest, thoughtful, and very experienced man named James Christie gave a talk called “Standards: Promoting Quality or Restricting Competition?”. The talk followed on from his tutorial at EuroSTAR 2013 on working with auditors—James is a former auditor himself—and from his blogs on software standards over the years.

James’ talk introduced to our community the term rent-seeking. Rent-seeking is the act of using political means—the exercise of power—to obtain wealth without creating wealth; see and One form of rent-seeking is using regulations or standards in order to create or manipulate a market for consulting, training, and certification.

James’ CAST presentation galvanized several people in attendance to respond to ISO Standard 29119, the most recent rent-seeking scheme by a very persistent group of certificationists and standards promoters. Since the ISO standard on standards requires—at least in theory—consensus from industry experts, some people proposed a petition to demonstrate opposition and the absence of consensus amongst skilled testers. I have signed this petition, and I urge you to read it, and, if you agree, to sign it too.

Subsequently, a publication named Professional Tester published—under an anonymous byline—a post about the petition, with the provocative title “Book burners threaten (old) new testing standard”. Presumably such (literally) inflammatory language was meant as clickbait. Ordinarily such things would do little to foster thoughtful discussion about the issues, but it prompted some quite thoughtful reactions. Here’s one example; here’s another. Meanwhile, if the author wishes to characterize me as a book burner, here are (selected) contents of my library relevant to software testing. Even the lamest testing books (and some are mighty lame) have yet to be incinerated.

In the body text, the anonymous author mischaracterises the petition and its proponents, of which I am one. “Their objection,” (s)he says, “is that not everyone will agree with what the standard says: on that criterion nothing would ever be published.” I might not agree with what the standard says, but that’s mostly a side issue for the purposes of this post. I disagree with what the authors of the standard attempt to do with it.

1) To prescribe expensive, time-consuming, and wasteful focus on bloated process models and excessive documentation. My concern here is that organizations and institutions will engage in goal displacement: expending money, time and resources on demonstrating compliance with the standard, rather than on actually testing their products and services. Any kind of work presents opportunity cost; when you’re doing something, most of the time it prevents you from doing something else. Every minute that a tester spends on wasteful documentation is a minute that the tester cannot fulfill the overarching mission of testing: learning about the product, with an emphasis on discovering important problems that threaten value or safety, so that our clients can make informed decisions about problems and risks.

I am not objecting here to documentation, as the calumny from Professional Tester suggests. I am objecting to excessive and wasteful documentation. Ironically, the standard itself provides an example: the current version of ISO 29119-1 runs to 64 pages; 29119-2 has 68 pages; and 29119-3 has 138 pages. If those pages follow the pattern of earlier drafts, or of most other ISO documents, you have a long, pointless, and sleep-inducing read ahead of you. Want a summary model of the testing process? Try this example of what the rent-seekers propose as their model of of testing work. Note the model’s similarity to that of a (overly complex and poorly architected) computer program.

2) To set up an unnecessary market for training, certification, and consultancy in interpreting and applying the standard. The primary tactic here is to instill the fear of being de-certified. We’ve been here before, as shown in this post from Tom DeMarco (date uncertain, but it seems to have been written prior to 2000).

Rent-seeking is of the essence, and we’ve been here before in another sense: this was one of the key goals of the promulgators of the ISEB and ISTQB. In the image, they’ve saved the best for last.

The well-informed reader will note that the list of organizations behind those schemes and the members of the ISO 29119 international working group look strikingly similar.

If the working group happens to produce a massive and opaque set of documents, and you’re in an environment that claims conformance to the 29119 standards, and you want to get some actual testing work done, you’ll probably find it helpful to hire a consultant to help you understand them, or to help defend you from charges that you were not following the standard. Maybe you’ll want training and certification in interpreting the standard—services that the authors’ consultancies are primed to offer, with extra credibility because they wrote the standards! Good thing there are no ethical dilemmas around all of this.

3) To use the ISO’s standards development process to help suppress dissent. If you want to be on the international working group, it’s a commitment to six days of non-revenue work, somewhere in the world, twice a year. The ISO/IEC does not pay for travel expenses. Where have international working group meetings been held? According to the Web site, meetings seem to have been held in Seoul, South Korea (2008); Hyderabad, India (2009); Niigata, Japan (2010); Mumbai, India (2011); Seoul, South Korea (2012); Wellington New Zealand (2013). Ask yourself these questions:

  • How many independent testers or testing consultants from Europe or North America have that kind of travel budget?

  • What kinds of consultants might be more likely to obtain funding for this kind of travel?

  • Who benefits from the creation of a standard whose opacity demands a consultant to interpret or to certify?

Meanwhile, if you join one of the local working groups, there are two ways that the group arrives at consensus.

  • By reaching broad agreement on the content. (Consensus, by the way, does not mean unanimity—that everyone agrees with the the content. It would be closer to say that in a consensus-based decision-making process, everyone agrees that they can live with the content.) But, if you can’t get to that, there’s another strategy.

  • By attrition. If your interest is in promulgating an unwieldy and opaque standard, there will probably be objectors. When there are, wait them out until they get frustrated enough to leave the decision-making process. Alan Richardson describes his experience with ISEB in this way.

In light of that, ask yourself these questions:

  • How many independent consultants have the time and energy to attend local working groups, often during otherwise billable hours?

  • What kinds of consultants might be more likely to support attendance at local working groups?

  • Who benefits from the creation of a standard that needs a consultant to interpret or to certify?

4) To undermine the role of skill in testing, and the reputations of people who discuss and promote it. “The real reason the book burners want to suppress it is that they don’t want there to be any standards at all,” says the polemicist from Professional Tester. I do want there to be standards for widgets and for communication protocols, but not for complex, cognitive, context-sensitive intellectual work. There should be standards for designed things that are intended to work together, but I’m not at all sure there should be mandated standards for how to do design. S/he goes on: “Effective, generic, documented systematic testing processes and methods impact their ability to depict testing as a mystic art and themselves as its gurus.” Far from treating testing as a mystic art, appealing to things like “intuition” and “experienced-based techniques”, my community has been trying to get to the heart of testing skills, flexible and responsive coverage reporting, tacit and explict knowledge, and the premises of the way we do testing. I’ve seen no such effort to dig deeper into these subjects—and to demystify them—from the rent-seekers.

Unlike the anonymous author at Professional Tester, I am willing to stand behind my work, my opinions, and my reputation by signing my name and encouraging comments. Feel free.

—Michael B.

What Do You Mean By “Arguing Over Semantics”?

Wednesday, April 3rd, 2013

Commenting on testing and checking, one correspondent responds:

“To be honest, I don’t care what these types of verification are called be it automated checking or manual testing or ministry of John Cleese walks. What I would like to see is investment and respect being paid to testing as a profession rather than arguing with ourselves over semantics.”

My very first job in software development was as a database programmer at a personnel agency. Many times I wrote a bit of new code, I got a reality check: the computer always did exactly what I said, and not necessarily what I meant. The difference was something that I experienced as a bug. Sometimes the things that I told the computer were consistent with what I meant to tell it, but the way I understood something and the way my clients understood something was different. In that case, the difference was something that my clients experienced as a bug, even though I didn’t, at first. The issue was usually that my clients and I didn’t agree on what we said or what we meant. That wasn’t out of ignorance or ill-will. The problem was often that my clients and I had shallow agreement on a concept. A big part of the job was refining our words for things—and when we did that, we often found that the conversation refined our ideas about things too. Those revelations (Eric Evans calls them “knowledge crunching”) are part of the process of software development.

As the only person on my development team, I was also responsible for preparing end-user documentation for the program. My spelling and grammar could be impeccable, and spelling and grammar checkers could check my words for syntactic correctness. When my description of how to use the program was vague, inaccurate, or imprecise, the agents who used the application would get confused, or would make mistakes, or would miss out on something important. There was a real risk that the company’s clients wouldn’t get the candidates they wanted, or that some qualified person wouldn’t get a shot at a job. Being unclear had real consequences for real people.

A few years later, my friend Dan Spear—at the time, Quaterdeck’s chief scientist, and formerly the principal programmer of QEMM-386—accepted my request for some lessons in assembly language programming. He began the first lesson while we were both sitting back from the keyboard. “Programming a computer,” he began, “is the most humbling thing that you can do. The computer is like a mirror. It does exactly what you tell it to do, and in doing that, it reflects any sloppiness in your thinking or in your way of expressing yourself.”

I was a program manager (a technical position) for the company for four years. Towards the end of my tenure, we began working on an antivirus product. One of the product managers (“product manager” was a marketing position) wanted to put a badge on the retail box: “24 hour support response time!” In a team meeting, we technical people made it clear that we didn’t provide 24-hour monitoring of our support channels. The company’s senior management clearly had no intention of staffing or funding 24-hour support, either. We were in Los Angeles, and the product was developed in Israel. It took development time—sometimes hours, but sometimes days—to analyse a virus and figure out ways to detect and to eradicate it. Nonetheless, the marketing guy (let’s call him Mark) continued to insist that that’s what he wanted to put on the box. One of the programming liaisons (let’s call him Paul) spoke first:

Paul: “I doubt that some of the problems we’re going to see can be turned around in 24 hours. Polymorphic viruses can be tricky to identify and pin down. So what do you mean by 24-hour response time?”

Mark: “Well, we’ll respond within 24 hours.”

Paul: “With a fix?”

Mark: “Not necessarily, but with a response.”

Paul: “With a promise of a fix ? A schedule for a fix?”

Mark: “Not necessarily, but we will respond.”

Paul: “What does it mean to respond?”

Mark: “When someone calls in, we’ll answer the phone.”

Sam (a support person): “We don’t have people here on the weekends.”

Mark: “Well, 24 hours during the week.”

Sam: “We don’t have people here before 7:00am, or after 5:00pm.”

Mark: “Well… we’ll put someone to check voicemail as soon as they get in… and, on the weekends… I don’t know… maybe we can get someone assigned to check voicemail on the weekend too, and they can… maybe, uh… send an email to Israel. And then they can turn it around.”

At this point, as the program manager for the product, I’d had enough. I took a deep breath, and said, “Mark, if you put ’24-hour response time’ on the box, I will guarantee that that will mislead some people. And if we mislead people to take advantage of them, knowing that we’re doing it, we’re lying. And if they give us money because of a lie we’re telling, we’re also stealing. I don’t think our CEO wants to be the CEO of a lying, stealing company.”

There’s a common thread that runs through these stories: they’re about what we say, about what we mean, and about whether we say what we mean and mean what we say. That’s semantics: the relationships between words and meaning. Those relationships are central to testing work.

If you feel yourself tempted to object to something by saying “We’re arguing about semantics,” try a macro expansion: “We’re arguing about what we mean by the words we’re choosing,” which can then be shortened to “We’re arguing about what we mean.” If we can’t settle on the premises of a conversation, we’re going to have an awfully hard time agreeing on conclusions.

I’ll have more to say on this tomorrow.

Premises of Rapid Software Testing, Part 3

Thursday, September 27th, 2012

Over the last two days, I’ve published the premises of the Rapid Software Testing classes and methodology, as developed by James Bach and me. The first set addresses the nature of Rapid Testing’s engagement with software development—an ambitious activity, performed by fallible humans for other fallible humans, under conditions of uncertainty and time pressure. The second set addresses the nature of testing as an investigative activity focused on understanding the product and discovering problems that threaten its value. Today I present the last three premises, which deal with our relationship to our clients and to quality.

6. We commit to performing credible, cost-effective testing, and we will inform our clients of anything that threatens that commitment. Rapid Testing seeks the fastest, least expensive testing that completely fulfills the mission of testing. We should not suggest million dollar testing when ten dollar testing will do the job. It’s not enough that we test well; we must test well given the limitations of the project. Furthermore, when we are under constraints that may prevent us from doing a good job, testers must work with the client to resolve those problems. Whatever we do, we must be ready to justify and explain it.

7. We will not knowingly or negligently mislead our clients and colleagues. This ethical premise drives a lot of the structure of Rapid Software Testing. Testers are frequently the target of well-meaning but unreasonable or ignorant requests by their clients. We may be asked to suppress bad news, to create test documentation that we have no intention of using, or to produce invalid metrics to measure progress. We must politely but firmly resist such requests unless, in our judgment, they serve the better interests of our clients. At minimum we must advise our clients of the impact of any task or mode of working that prevents us from testing, or creates a false impression of the testing.

8. Testers accept responsibility for the quality of their work, although they cannot control the quality of the product. Testing requires many interlocking skills. Testing is an engineering activity requiring considerable design work to conceive and perform. Like many other highly cognitive jobs, such as investigative reporting, piloting an airplane, or programming, it is difficult for anyone not actually doing the work to supervise it effectively. Therefore, testers must not abdicate responsibility for the quality of their own work. By the same token, we cannot accept responsibility for the quality of the product itself, since it is not within our span of control. Only programmers and their management control that. Sometimes testing is called “QA.” If so, we choose to think of it as quality assistance (an idea due to Cem Kaner) or quality awareness, rather than quality assurance.

Premises of Rapid Software Testing, Part 2

Wednesday, September 26th, 2012

Yesterday I published the first three premises that underlie the Rapid Software Testing methodology developed and taught by James Bach and me. Today’s two are on the nature of “test” as an activity—a verb, rather than a noun—and the purpose of testing as we see it: understanding the product and imparting that understanding to our clients, with emphasis on problems that threaten the product’s value.

4. A test is an activity; it is a performance, not an artifact. Most testers will casually say that they “write tests” or that they “create test cases.” That’s fine, as far as it goes. That means they have conceived of ideas, data, procedures, and perhaps programs that automate some task or another; and they may have represented those ideas in writing or in program code. Trouble occurs when any of those things is confused with the ideas they represent, and when the representations become confused with actually testing the product. This is a fallacy called reification, the error of treating abstractions as though they were things. Until some tester engages with the product, observes it and interprets those observations, no testing has occurred. Even if you write a completely automatic checking process, the results of that process must be reviewed and interpreted by a responsible person.

5. Testing’s purpose is to discover the status of the product and any threats to its value, so that our clients can make informed decisions about it. There are people that have other purposes in mind when they use the word “test.” For some, testing may be a ritual of checking that basic functions appear to work. This is not our view. We are on the hunt for important problems. We seek a comprehensive understanding of the product. We do this in support of the needs of our clients, whoever they are. The level of testing necessary to serve our clients will vary. In some cases the testing will be more formal and simple, in other cases, informal and elaborate. In all cases, testers are suppliers of vital information about the product to those who must make decisions about it. Testers light the way.

I’ll continue with the last three premises of Rapid Software Testing tomorrow.

Premises of Rapid Software Testing, Part 1

Tuesday, September 25th, 2012

In February of 2012, James Bach and I got together for a week of work one-on-one, face-to-face—something that happens all too rarely. We worked on a number of things, but the principal outcome was a statement of the premises on which Rapid Software Testing—our classes and our methodology—are based. In deference to Twitter-sized attention spans like mine, I’ll post the premises over the next few days. Here’s the preamble and the first three points:

These are the premises of the Rapid Software Testing methodology. Everything in the methodology derives in some way from this foundation. These premises derive from our experience, study, and discussions over a period of decades. They have been shaped by the influence of two thinkers above all: Cem Kaner and Jerry Weinberg, both of whom have worked as programmers, managers, social scientists, authors, teachers, and of course, testers. (We do not claim that Cem or Jerry will always agree with James or me, or with each other. Sometimes they will disagree. We are not claiming their endorsement here, but instead we are gratefully acknowledging their positive impact on our thinking and on our work. We urge thinking testers everywhere to study the writings and ideas of these two men.)

1. Software projects and products are relationships between people, who are creatures both of emotion and rational thought. Yes, there are technical, physical, and logical elements as well, and those elements are very substantial. But software development is dominated by human aspects: politics, emotions, psychology, perception, and cognition. A project manager may declare that any given technical problem is not a problem at all for the business. Users may demand features they will never use. Your fabulous work may be rejected because the programmer doesn’t like you. Sufficiently fast performance for a novice user may be unacceptable to an experienced user. Quality is always value to some person who matters. Product quality is a relationship between a product and people, never an attribute that can be isolated from a human context.

2. Each project occurs under conditions of uncertainty and time pressure. Some degree of confusion, complexity, volatility, and urgency besets each project. The confusion may be crippling, the complexity overwhelming, the volatility shocking, and the urgency desperate. There are simple reasons for this: novelty, ambition, and economy. Every software project is an attempt to produce something new, in order to solve a problem. People in software development are eager to solve these problems. At the same time, they often try to do a whole lot more than they can comfortably do with the resources they have. This is not any kind of moral fault of humans. Rather, it’s a consequence of the so-called “Red Queen” effect from evolutionary theory (the name for which comes from Through the Looking Glass): you must run as fast as you can just to stay in the same place. If your organization doesn’t run with the risk, your competitors will—and eventually you will be working for them, or not working at all.

3. Despite our best hopes and intentions, some degree of inexperience, carelessness, and incompetence is normal. This premise is easy to verify. Start by taking an honest look at yourself. Do you have all of the knowledge and experience you need to work in an unfamiliar domain, or with an unfamiliar product? Have you ever made a spelling mistake that you didn’t catch? Which testing textbooks have you read carefully? How many academic papers have you pored over? Are you up to speed on set theory, graph theory, and combinatorics? Are you fluent in at least one programming language? Could you sit down right now and use a de Bruijn sequence to optimize your test data? Would you know when to avoid using it? Are you thoroughly familiar with all the technologies being used in the product you are testing? Probably not—and that’s okay. It is the nature of innovative software development work to stretch the limits of even the most competent people. Other testing and development methodologies seem to assume that everyone can and will do the right thing at the right time. We find that incredible. Any methodology that ignores human fallibility is a fantasy. By saying that human fallibility is normal, we’re not trying to defend it or apologize for it, but we are pointing out that we must expect to encounter it in ourselves and in others, to deal with it compassionately, and make the most of our opportunities to learn our craft and build our skills.

I’ll continue with more Rapid Software Testing premises tomorrow.

I Might Be Wrong (But Not For Me)

Tuesday, March 6th, 2012

Jerry Weinberg tells a story (yes, it’s me; I’m telling yet another Jerry Weinberg story) of meeting an old friend who looked distraught.

“What’s the matter?” Jerry asked.

The fellow replied, “Well, I’m kind of shellshocked. My wife just left me.”

“Was that a surprise?”

“Yes, it really was,” the fellow said. “I mean, we had had some problems, but I thought they were all settled.”

Jerry paused for a moment. Then he said, “nothing is ever settled.”

Several years after hearing that story I recognized its power as a general systems law. Obviously, I didn’t discover it, but I did name it. I call it “The Unsettling Rule”: Nothing is ever settled.

In Lessons Learned in Software Testing by Kaner, Bach, and Pettichord, Lesson 145 is “Use the IEEE Standard 829 for Test Documentation”. Lesson 146, on the facing page, is “Don’t Use the IEEE Standard 829”. When the book was published, some reviewers said “What’s the problem with these guys? They can’t even get it together to tell a consistent story!” Others, including me, thought that this pair of pages in particular was wonderful. It underscored the degree to which issues in the world of software testing are not settled, the degree to which our craft is a long dialogue in which there are many voices to be heard, many options to be discussed, and many contexts be considered.

The difference between the context-driven school (or approach; there’s now apparently disagreement between whether it’s a school or an approach!) and other school/approaches is that these disagreements can get aired in public. There are some fundamental principles on which we agree, and there are some other things on which we don’t agree. Whatever else happens, in this community, we try to make sure that there’s no fake consensus. This is alarming and disturbing, sometimes, to some people, and it can be stressful to the participants. But when it comes up, it’s a hallmark of our community that we try to deal with it. It helps to keep us sharp, and it helps to keep us honest.

Recently I wrote a blog post in which I took the position that the often-used pass-vs.-fail ratio is an invalid and misleading measurement. To summarize the post, I said, “At best, if everyone ignores it entirely, it’s simply playing with numbers. Otherwise, producing a pass/fail ratio is irresponsible, unethical, and unprofessional… The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception. Reducing the product story to a number means reducing its relationship with people to a number. By extension, that means reducing people to numbers too. So to irresponsible, unethical, and unprofessional, we can add unscientific and inhumane.”

I recognize that, coming from someone who claims to be context-driven, that’s pretty extreme stuff. Yet, in its form, it’s consistent with one of those pages or the other in Lessons Learned in Software Testing (with some omissions, which I’ll address shortly). It is also consistent with a set of principles that James Bach and I espouse as part in our Rapid Software Testing class:

We will not knowingly or negligently mislead our clients and colleagues. This ethical premise drives a lot of the structure of Rapid Software Testing. Testers are frequently the target of well-meaning but unreasonable or ignorant requests by their clients. We may be asked to suppress bad news, to create test documentation that we have no intention of using, or to produce invalid metrics to measure progress. We must politely but firmly resist such requests unless, in our judgment, they serve the better interests of our clients. At minimum we must advise our clients of the impact of any task or mode of working that prevents us from testing, or creates a false impression of the testing.

To me, that statement is both in tension with and consistent with several of the principles of the context-driven school, the first and second (“The value of any practice depends on its context” and “There are good practices in context, but there are no best practices”) and the seventh (“Only through judgment and skill, exercised cooperatively throughout the entire project, are we able to do the right things at the right times to effectively test our products.”)

Pass-vs.-fail ratios, to me, fly in the face of one of the “principles in action” listed at “Metrics that are not valid are dangerous.”

Cem Kaner disagrees with the position expressed in my post. It seems to me that Cem’s disagreement hangs on the degree of danger and our reactions to it. I hold that in practical contexts, pass-vs.-fail ratios so dangerous that for almost all cases, they cross over the line into “unethical:, like giving the car keys to someone who is obviously drunk, or like planting land mines near a community well, even though in some rare contexts, such things could be done in good faith and without harm. Cem’s position seems to be (and I welcome correction, if it’s warranted) that although pass-vs.-fail ratios are exemplary of dangerous metrics, they’re not unethical.

Let’s start with two points that I’d like to make about the “unethical” label. One is that my ethical sense is personal, and so are the views posted on my blog. Although I’m happy when other people share them, unless otherwise stated, I don’t represent the view of any community, including my own. I don’t make claims to universal ethics. Second, Cem refers to “using the accusation of unethical as a way of shutting down discussion of whether an idea (unethical!) was any good or not.” I’m not using it that way. I have no intention whatsoever of shutting down debate (as if I could in any case!). Unless claimed otherwise, I am stating personal principles; not Right and Wrong, but right and wrong for me. I don’t know of any agency (other than society) who can make claims of Right or Wrong, and even then claims seem always context-specific.

Whether pass-vs.-fail ratios are wrong or Wrong, they’re certainly wrong for me, wrong enough that I’m uncomfortable with using them on the job. I’m sufficiently uncomfortable that I’m usually going to decline to provide them, just as I would not accept a job in which I was obliged to shoot people. Other people might choose to become mercenaries or to go to war for their countries; I’d be a conscientious objector. That wrongness is relative too, of course. It’s subject to the Relative Rule; that any abstract X is X to some person, at some time. I can only warrant my own ethical stance for the moment. My position on some issues has changed over the years, courtesy of some pleasant and unpleasant experiences. I’m not currently aware of things that might cause my stand to change in the future, but I have to leave the possibility open.

So, is providing pass vs. fail rates unethical? On reflection, I have to say reluctantly, yeah, I think so; not absolutely, but in most practical circumstances. For me, the crucial test is in the last of Cem’s questions about ethics: “Are you helping someone else lie, cheat, steal, intimidate, or cause harm?” My answer is that I see a great deal of risk—and admittedly risk is only potential harm—that I will be aiding the client in some form of oppression or deception, either to himself or to his superiors. (The latter is a situation that I have been in before, with pass-vs.-fail ratios at the centre of the story in a project associated with a $33 million dollar loss.) Most of the time, providing pass-vs.-fail ratios is a test activity that I would stop immediately, using the “mission rejected” stopping heuristic (one that I hadn’t noted until Cem himself pointed it out).

Cem doesn’t provide any contexts in which pass-vs.-fail ratios might be useful, but as a context-driven tester, it’s my obligation to accept his critique and his challenge, and consider some contexts in which I might use them. (This is the omission from my post post that I mentioned above, and it’s the way that the controversy was handled in Lessons Learned: with a serving of context) I present them in order from the least plausible to the most plausible.

“Your daughter will die” or “we’ll shoot this dog.” If someone employs a threat of harm to some person or being or something of value, I have to evaluate the relative damage afforded by providing the measure or not.

When mandated by force of law. If I were on the witness stand, and a lawyer asked me, “What were the pass-vs.-fail ratios at release time for this project,” I’d be required by law to respond. I can imagine a likely way it would play out, too: “92.7%, but I’d also like to make it clear that—” “No further questions, Your Honour.”

If I provided the data with all of the appropriate disclaimers AND I were sure that the disclaimer would be heard. If the client (and the client’s client, and so forth) were to relay the data and the disclaimer reliably to the point where the data would be used, I might be persuaded to provide the data. But I’d have to weigh that against the risk that I was wrong about the disclaimer being heard. Moreover, in my professional judgement, it would be wasting my client(s)’s time.

As a placebo. I might give a pass vs. fail ratio long enough to convince my client that it’s not helpful or necessary, while doing other things to test well and provide her with other forms of reliable information. I’d remain pretty uncomfortable with dispensing the sugar pills, though, and would work at ways of getting around it.

In the course of demonstrating that pass-vs.-fail ratios are a bad idea. In some contexts, pass-vs.-fail ratios provide what Kirk and Miller call quixotic reliability. That is, the measurement seems to correlate with other measurements of the state of the project. I might provide pass-vs.-fail ratios long enough to show a divergence between that data and other measures of project or product health.

If I were aware that the person receiving the data was in possession of all the contextual information that I believe they needed to put it to appropriate and non-harmful use. We use this in one of the exercises in our class, based on a bug from an actual product. We present a very specific set of tests that are the same in every material way but for two variables. The total domain space to put these variables in combination is a set with 2304 elements. When used in a test that covers all of these elements, 510 provide a “fail” result. All of the test cases are of the same kind, and our students knows that those test cases are comparable for the purposes that they’re considering. In that case, that kind of ratio in that kind of context has some value in describing that kind of coverage. So there might be some pedagogical or rhetorical value to reporting a pass-vs.-fail ratio there. Interestingly, the root of the problem is a data type problem in a single line of code. That helps to illuminate the discussion of “one bug or 510?” which in turn illuminates how bug counts and failure counts aren’t well correlated. It also helps to illuminate opportunity cost in paying overmuch attention to this problem when there are many other things that we might test.

To me, the real challenge is in coming up with a case in which this invalid, dangerous metric in its most common applications might be used for good. In the contexts where they’re commonly discussed and used—overwhelmingly commonly, in my view—pass-vs.-fail ratios are used to express the quality of testing, the health of the project, or the readiness of the product. In those contexts, the risk of misuse, whether intentional or inadvertent, is high—like placing a loaded gun with the safety off in a crowded subway car. As I’ve heard Cem say before, “I’d like to call them an Industry Worst Practice, but being context-driven, I can’t.” Once again, Cem has reminded me of why I can’t commit to the “unethical” charge absolutely and in all cases. He’s provided me with a challenge and an opportunity to sharpen my analysis, and I thank him for that.

Postscript, March 28, 2012: In private correspondence and conversation, Cem suggested a different interpretation of a paragraph from this post that I quoted above to provide context for this post. In order to ward off that interpretation, here’s how I might write that paragraph today:

“The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception. Reducing the product story to this invalid number without additional information means reducing the product’s relationship with people to this invalid number. By extension when this invalid number is being used to evaluate people, that means reducing people to this invalid number too. So to irresponsible, unethical, and unprofessional, in this case we could add unscientific and inhumane.”

To be clear: these two posts have not been a blanket condemnation of all measurement, but of a particular metric that fails spectacularly when subjected to the tests of construct validity and reasonable and foreseeable side effects in Kaner and Bond’s Software Engineering Metrics: What Do They Measure and How Do We Know?. Pass vs. fail is not an imperfect metric; this is a metric that has no discernable construct validity to me (or even to Cem). I’ve both experienced and seen pain and systematic deception with this metric at the centre of it. In this, it’s not like imperfect financial figures that are generated by legitimate companies subject to scrutiny by regulators, by auditors, by shareholders, and by markets. It’s more like financial forecasting data dreamed up by Bernie Madoff. I don’t mind dealing with imperfect but plausibly valid information; that’s all a tester ever gets to do, really. But if Bernie Madoff were to ask me to lend my credibility to his models, data, or business practices, I’d feel personally bound to decline that particular request.

Should Testers Play Planning Poker?

Wednesday, October 26th, 2011

My colleague and friend Eric Jacobson, who recently (as I write) did a bang-up job on his first conference presentation at STAR West 2011, asks a question in response to this blog post from 2006. (I like it when people reflect on an issue for a few years.) Eric asks:

You are suggesting it may not make sense for testers to give time-based estimates to their teams, but what about relative estimates? Let’s say a Rapid Software Tester is asked to participate in Planning Poker (relative-based story estimation) on an Agile Scrum team. I’ve always considered this a golden opportunity. Are you suggesting said tester may want to refuse to participate in the Planning Poker?

Having observed Planning Poker in action, I’m conflicted. Estimating anything is always a bit of a dodgy business, even at the best of times. That’s especially true for investigation and in particular for discovery. (I’ve written about some of the problems with estimation here and in subsequent posts, and with how those problems pertain to testing here.) Yet Planning Poker may be one way to get a good deal closer to the best of times. I like the idea of testers hearing what’s going on in planning sessions, and of offering perspective on the possible implications of work or change. On the other hand, at Planning Poker sessions I’ve observed or participated in, testers are often pressured to lower their numbers. In an environment where there’s trust, there tends to be much less pressure; in an environment where there’s less trust, I’d take pressure to lower the estimate as a test result with several possible interpretations. (I leave those interpretations as an exercise for the reader, but don’t stop until you get to five, at least.)

In any case, some fundamental problems remain: First, testing is oriented towards discovering things, not building things. At the root of it all, any estimate of how long it will take to test something is like estimating how long it will take you to evaluate someone’s ability to speak Spanish (which I wrote about here), and discovering problems in their ability to express themselves. If you already know something or can reasonably anticipate it, that helps a lot, and the Planning Poker approach (among many others) can help with that to some degree.

The second problem is that there’s not necessarily symmetry between the effort in creating something and the effort in testing it. A function or feature that takes very little effort to program might take an enormous amount of effort to test. What kinds of variation could we put into data, workflow, timing, platform dependencies and interactions, scenarios, and so forth? Meanwhile, a feature that takes signficant amounts of programming effort could take almost no time to test (since “programming effort” could include an enormous amount of testing effort). There are dozens of factors involved, including the amount of testing the programmers do as they code; what kind of review is being done; what the scope of the change is; when particular discoveries get made (during “development time” or “testing time”; the skill of the parties involved; the testability of the product under test; how buggy the finished feature is (in which case there will be more time needed for investigation and reporting)… Planning Poker doesn’t solve the asymmetry problem, but it provides a venue for discussing it and getting started on sorting it out.

The third problem, closely related to the second, is this idea that all testing work associated with developing something must and shall happen within the same iteration. Testing never ends; it only stops. So it’s folly to think that all testing for a given amount of programming work can always fit into the same iteration in which the work is done. I’d argue that we need a more nuanced perspective and more options than that. The decision as to how much testing we’ll need is informed by many factors. Paradoxically, we’ll need some testing to help reveal and inform our notions of how much testing we’ll need.

I understand the desire to close the book on a development story within the sprint. I often—even usually—share that desire. Yet many kinds of testing work must respond to development work, and in such cases the development work has to be complete in some lesser sense than “fully tested”. Many kinds of confirmatory checking work, it seems to me, can be done within the same sprint as the programming work; no problem there. Yet it seems to me that other kinds of testing can reasonably wait for subsequent sprints—indeed, must wait for subsequent sprints, unless we’d like to have programmers stop all programming work altogether after a certain day in the sprint. Let me give you an example: in big banks, some kinds of transactions take several days to wend their way through batch processes that are run overnight. The testing work associated with that can be simulated, for sure (indeed, one would hope that most of such work would be simulated), but only at the expense of some loss of realism. For the test, whether the realism is important or not is always an open question with a fallible answer. Instead of making sure that there’s NO testing debt, consider reasonable, small, and sustainable amounts of testing debt that spans iterations. Agile can be about actual agility, instead of dogma.

So… If playing Planning Poker is part of the context, go for it. It’s a heuristic approach to getting people to consider testing more consciously and thoughtfully, and there’s something to that. It’s oriented towards estimating things in a more comprehensible time frame, and in digestible chunks of task and effort. Planning Poker is fallible, and one approach among many possible approaches. Like everything else, its usefulness largely depends mostly on the people using it, and how they use it.

Testing: Difficult or Time-Consuming?

Thursday, September 29th, 2011

In my recent blog post, Testing Problems Are Test Results, I noted a question that we might ask about people’s perceptions of testing itself:

Does someone perceive testing to be difficult or time-consuming? Who? What’s the basis for that perception? What assumptions underlie it?

The answer to that question may provide important clues to the way people think about testing, which in turn influences the cost and value of testing.

As an example, an pseudonymous person (“PM Hut”) who is evidently associated with project management in some sense (s/he provides the URL answered my questions above.

Just to answer your question “Does someone perceive testing to be difficult or time-consuming?” Yes, everyone, I can’t think of a single team member I have managed who doesn’t think that testing is time consuming, and they’d rather do something else.

This, alas, isn’t an unusual response. To someone like me who offers help in increasing the value and reducing the cost of testing, it triggers some questions that might prompt reframes or further questions.

  • What do the team members think testing is? Do they think that it’s something ancillary to the project, rather than an essential and integrated aspect of software development? To me, testing is about gathering information and raising awareness that’s essential for identifying product risks and steering the project. That’s incredibly important and valuable.

    So when the team members are driving a car, do they perceive looking out the windshield to be difficult or time-consuming? Do they perceive looking at the dashboard to be difficult or time-consuming? If so, why? What are the differences between the way they obtain awareness when they’re driving a car, versus the way they obtain awareness when they’re contributing to the development of a product or service?

  • Do the team members think testing is the mindless repetition of actions and observation of specific outputs, as prescribed by someone else? If so, I’d agree with them that testing is an unpalatable activity—except I don’t call that testing. I call it checking, and I’d rather let a machine do it. I’d also ask if checking is being done automatically by the programmers at lower levels where it tends to be fast, cheap, easy, useful and timely—or manually at higher levels, where it tends to be slower, more expensive, more difficult, less useful, and less timely—and tedious?
  • Is testing focused mostly on confirmation of things that we already know or hope to be true? Is it mostly focused on the functional aspects of the program (which are amenable to checking)? People tend to find this dull and tedious, and rightly so. Or is testing an active search for new information, problems, and risks? Does it include focus on parafunctional aspects of the product—the things that provide important perceptions of real value to real people? Are the testers given the freedom and responsibility to manage a good deal of their own investigation? Testers tend to find this kind of approach a lot more engaging and a lot more interesting, and the results are typically more wide-ranging, informative, and valuable to programmers and managers.
  • Is testing overburdened by meaningless and valueless paperwork, bureaucracy, and administrivia? How did that come to pass? Are team members aware that there are simple, lightweight, rapid, and highly effective ways of planning, recording, and reporting testing work and project status?
  • Are there political issues? Are testers (or people acting temporarily in a testing role) routinely blown off (as in this example)? Are the nuggets of information revealed by testing habitually dismissed? Is that because testing is revealing trivial information? If so, is there a problem with specific testing skills like modeling the test space, determining coverage, determining oracles, recording, or reporting?
  • Have people been trained on the basis of testing as a skilled, sophisticated thinking art? Or is testing something for which capability can be assessed by a trivial, 40-question multiple choice exam?
  • If testing is being done well (which given people’s attitudes expressed above would be a surprise), are programmers or managers afraid of having to deal with the information that testing reveals? Does that lead to recrimination and conflict?
  • If there’s a perception that testing is by its nature dull and slow, are the testers aware of the quick testing approaches in our Rapid Software Testing class (PDF, page 97-99) , in the Black Box Software Testing course offered by the Association for Software Testing, or in James Whittaker’s How to Break Software? Has anyone read and absorbed Lessons Learned in Software Testing?
  • If there’s a perception that technical reviews are slow, have the testers, programmers, or managers read Perfect Software and Other Illusions About Testing? Do they recognize the ways in which careful observation provides us with “instant reviews” (see Perfect Software, page 143)? Has anyone on the team read any other of Jerry Weinberg’s books on software management and measurement?
  • Have the testers, programmers, and managers recognized the extent to which exploratory testing is going on all the time? Do they recognize that issues revealed by testing might be even more important than bugs? Do they understand that every test result and every testing problem points to meta-information that can be extremely valuable in managing the project?

On PM Hut’s own Web site, there’s an article entitled “Why Project Managers Fail“. The author, Jim Benson, lists five common problems, each of which could be quickly revealed by looking at testing as a source of information, rather than by simply going through the motions. Take it from the former program manager of a product that, in its day, was the best-selling piece of commercial software in the world: testers, testing, and the information they reveal are a project manager’s best friends and most valuable assets—when you have the awareness to recognize them.

Testing need not be difficult, tedious or time-consuming. A perception that it is so, or that it must be so, suggests a problem with testing as practised or testing as perceived. Astute managers and teams will investigate that important and largely mistaken perception.