Archive for the ‘Context’ Category

Premises of Rapid Software Testing, Part 3

Thursday, September 27th, 2012

Over the last two days, I’ve published the premises of the Rapid Software Testing classes and methodology, as developed by James Bach and me. The first set addresses the nature of Rapid Testing’s engagement with software development—an ambitious activity, performed by fallible humans for other fallible humans, under conditions of uncertainty and time pressure. The second set addresses the nature of testing as an investigative activity focused on understanding the product and discovering problems that threaten its value. Today I present the last three premises, which deal with our relationship to our clients and to quality.

6. We commit to performing credible, cost-effective testing, and we will inform our clients of anything that threatens that commitment. Rapid Testing seeks the fastest, least expensive testing that completely fulfills the mission of testing. We should not suggest million dollar testing when ten dollar testing will do the job. It’s not enough that we test well; we must test well given the limitations of the project. Furthermore, when we are under constraints that may prevent us from doing a good job, testers must work with the client to resolve those problems. Whatever we do, we must be ready to justify and explain it.

7. We will not knowingly or negligently mislead our clients and colleagues. This ethical premise drives a lot of the structure of Rapid Software Testing. Testers are frequently the target of well-meaning but unreasonable or ignorant requests by their clients. We may be asked to suppress bad news, to create test documentation that we have no intention of using, or to produce invalid metrics to measure progress. We must politely but firmly resist such requests unless, in our judgment, they serve the better interests of our clients. At minimum we must advise our clients of the impact of any task or mode of working that prevents us from testing, or creates a false impression of the testing.

8. Testers accept responsibility for the quality of their work, although they cannot control the quality of the product. Testing requires many interlocking skills. Testing is an engineering activity requiring considerable design work to conceive and perform. Like many other highly cognitive jobs, such as investigative reporting, piloting an airplane, or programming, it is difficult for anyone not actually doing the work to supervise it effectively. Therefore, testers must not abdicate responsibility for the quality of their own work. By the same token, we cannot accept responsibility for the quality of the product itself, since it is not within our span of control. Only programmers and their management control that. Sometimes testing is called “QA.” If so, we choose to think of it as quality assistance (an idea due to Cem Kaner) or quality awareness, rather than quality assurance.

Premises of Rapid Software Testing, Part 2

Wednesday, September 26th, 2012

Yesterday I published the first three premises that underlie the Rapid Software Testing methodology developed and taught by James Bach and me. Today’s two are on the nature of “test” as an activity—a verb, rather than a noun—and the purpose of testing as we see it: understanding the product and imparting that understanding to our clients, with emphasis on problems that threaten the product’s value.

4. A test is an activity; it is a performance, not an artifact. Most testers will casually say that they “write tests” or that they “create test cases.” That’s fine, as far as it goes. That means they have conceived of ideas, data, procedures, and perhaps programs that automate some task or another; and they may have represented those ideas in writing or in program code. Trouble occurs when any of those things is confused with the ideas they represent, and when the representations become confused with actually testing the product. This is a fallacy called reification, the error of treating abstractions as though they were things. Until some tester engages with the product, observes it and interprets those observations, no testing has occurred. Even if you write a completely automatic checking process, the results of that process must be reviewed and interpreted by a responsible person.

5. Testing’s purpose is to discover the status of the product and any threats to its value, so that our clients can make informed decisions about it. There are people that have other purposes in mind when they use the word “test.” For some, testing may be a ritual of checking that basic functions appear to work. This is not our view. We are on the hunt for important problems. We seek a comprehensive understanding of the product. We do this in support of the needs of our clients, whoever they are. The level of testing necessary to serve our clients will vary. In some cases the testing will be more formal and simple, in other cases, informal and elaborate. In all cases, testers are suppliers of vital information about the product to those who must make decisions about it. Testers light the way.

I’ll continue with the last three premises of Rapid Software Testing tomorrow.

Premises of Rapid Software Testing, Part 1

Tuesday, September 25th, 2012

In February of 2012, James Bach and I got together for a week of work one-on-one, face-to-face—something that happens all too rarely. We worked on a number of things, but the principal outcome was a statement of the premises on which Rapid Software Testing—our classes and our methodology—are based. In deference to Twitter-sized attention spans like mine, I’ll post the premises over the next few days. Here’s the preamble and the first three points:

These are the premises of the Rapid Software Testing methodology. Everything in the methodology derives in some way from this foundation. These premises derive from our experience, study, and discussions over a period of decades. They have been shaped by the influence of two thinkers above all: Cem Kaner and Jerry Weinberg, both of whom have worked as programmers, managers, social scientists, authors, teachers, and of course, testers. (We do not claim that Cem or Jerry will always agree with James or me, or with each other. Sometimes they will disagree. We are not claiming their endorsement here, but instead we are gratefully acknowledging their positive impact on our thinking and on our work. We urge thinking testers everywhere to study the writings and ideas of these two men.)

1. Software projects and products are relationships between people, who are creatures both of emotion and rational thought. Yes, there are technical, physical, and logical elements as well, and those elements are very substantial. But software development is dominated by human aspects: politics, emotions, psychology, perception, and cognition. A project manager may declare that any given technical problem is not a problem at all for the business. Users may demand features they will never use. Your fabulous work may be rejected because the programmer doesn’t like you. Sufficiently fast performance for a novice user may be unacceptable to an experienced user. Quality is always value to some person who matters. Product quality is a relationship between a product and people, never an attribute that can be isolated from a human context.

2. Each project occurs under conditions of uncertainty and time pressure. Some degree of confusion, complexity, volatility, and urgency besets each project. The confusion may be crippling, the complexity overwhelming, the volatility shocking, and the urgency desperate. There are simple reasons for this: novelty, ambition, and economy. Every software project is an attempt to produce something new, in order to solve a problem. People in software development are eager to solve these problems. At the same time, they often try to do a whole lot more than they can comfortably do with the resources they have. This is not any kind of moral fault of humans. Rather, it’s a consequence of the so-called “Red Queen” effect from evolutionary theory (the name for which comes from Through the Looking Glass): you must run as fast as you can just to stay in the same place. If your organization doesn’t run with the risk, your competitors will—and eventually you will be working for them, or not working at all.

3. Despite our best hopes and intentions, some degree of inexperience, carelessness, and incompetence is normal. This premise is easy to verify. Start by taking an honest look at yourself. Do you have all of the knowledge and experience you need to work in an unfamiliar domain, or with an unfamiliar product? Have you ever made a spelling mistake that you didn’t catch? Which testing textbooks have you read carefully? How many academic papers have you pored over? Are you up to speed on set theory, graph theory, and combinatorics? Are you fluent in at least one programming language? Could you sit down right now and use a de Bruijn sequence to optimize your test data? Would you know when to avoid using it? Are you thoroughly familiar with all the technologies being used in the product you are testing? Probably not—and that’s okay. It is the nature of innovative software development work to stretch the limits of even the most competent people. Other testing and development methodologies seem to assume that everyone can and will do the right thing at the right time. We find that incredible. Any methodology that ignores human fallibility is a fantasy. By saying that human fallibility is normal, we’re not trying to defend it or apologize for it, but we are pointing out that we must expect to encounter it in ourselves and in others, to deal with it compassionately, and make the most of our opportunities to learn our craft and build our skills.

I’ll continue with more Rapid Software Testing premises tomorrow.

I Might Be Wrong (But Not For Me)

Tuesday, March 6th, 2012

Jerry Weinberg tells a story (yes, it’s me; I’m telling yet another Jerry Weinberg story) of meeting an old friend who looked distraught.

“What’s the matter?” Jerry asked.

The fellow replied, “Well, I’m kind of shellshocked. My wife just left me.”

“Was that a surprise?”

“Yes, it really was,” the fellow said. “I mean, we had had some problems, but I thought they were all settled.”

Jerry paused for a moment. Then he said, “nothing is ever settled.”

Several years after hearing that story I recognized its power as a general systems law. Obviously, I didn’t discover it, but I did name it. I call it “The Unsettling Rule”: Nothing is ever settled.

In Lessons Learned in Software Testing by Kaner, Bach, and Pettichord, Lesson 145 is “Use the IEEE Standard 829 for Test Documentation”. Lesson 146, on the facing page, is “Don’t Use the IEEE Standard 829″. When the book was published, some reviewers said “What’s the problem with these guys? They can’t even get it together to tell a consistent story!” Others, including me, thought that this pair of pages in particular was wonderful. It underscored the degree to which issues in the world of software testing are not settled, the degree to which our craft is a long dialogue in which there are many voices to be heard, many options to be discussed, and many contexts be considered.

The difference between the context-driven school (or approach; there’s now apparently disagreement between whether it’s a school or an approach!) and other school/approaches is that these disagreements can get aired in public. There are some fundamental principles on which we agree, and there are some other things on which we don’t agree. Whatever else happens, in this community, we try to make sure that there’s no fake consensus. This is alarming and disturbing, sometimes, to some people, and it can be stressful to the participants. But when it comes up, it’s a hallmark of our community that we try to deal with it. It helps to keep us sharp, and it helps to keep us honest.

Recently I wrote a blog post in which I took the position that the often-used pass-vs.-fail ratio is an invalid and misleading measurement. To summarize the post, I said, “At best, if everyone ignores it entirely, it’s simply playing with numbers. Otherwise, producing a pass/fail ratio is irresponsible, unethical, and unprofessional… The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception. Reducing the product story to a number means reducing its relationship with people to a number. By extension, that means reducing people to numbers too. So to irresponsible, unethical, and unprofessional, we can add unscientific and inhumane.”

I recognize that, coming from someone who claims to be context-driven, that’s pretty extreme stuff. Yet, in its form, it’s consistent with one of those pages or the other in Lessons Learned in Software Testing (with some omissions, which I’ll address shortly). It is also consistent with a set of principles that James Bach and I espouse as part in our Rapid Software Testing class:

We will not knowingly or negligently mislead our clients and colleagues. This ethical premise drives a lot of the structure of Rapid Software Testing. Testers are frequently the target of well-meaning but unreasonable or ignorant requests by their clients. We may be asked to suppress bad news, to create test documentation that we have no intention of using, or to produce invalid metrics to measure progress. We must politely but firmly resist such requests unless, in our judgment, they serve the better interests of our clients. At minimum we must advise our clients of the impact of any task or mode of working that prevents us from testing, or creates a false impression of the testing.

To me, that statement is both in tension with and consistent with several of the principles of the context-driven school, the first and second (“The value of any practice depends on its context” and “There are good practices in context, but there are no best practices”) and the seventh (“Only through judgment and skill, exercised cooperatively throughout the entire project, are we able to do the right things at the right times to effectively test our products.”)

Pass-vs.-fail ratios, to me, fly in the face of one of the “principles in action” listed at http://www.context-driven-testing.com: “Metrics that are not valid are dangerous.”

Cem Kaner disagrees with the position expressed in my post. It seems to me that Cem’s disagreement hangs on the degree of danger and our reactions to it. I hold that in practical contexts, pass-vs.-fail ratios so dangerous that for almost all cases, they cross over the line into “unethical:, like giving the car keys to someone who is obviously drunk, or like planting land mines near a community well, even though in some rare contexts, such things could be done in good faith and without harm. Cem’s position seems to be (and I welcome correction, if it’s warranted) that although pass-vs.-fail ratios are exemplary of dangerous metrics, they’re not unethical.

Let’s start with two points that I’d like to make about the “unethical” label. One is that my ethical sense is personal, and so are the views posted on my blog. Although I’m happy when other people share them, unless otherwise stated, I don’t represent the view of any community, including my own. I don’t make claims to universal ethics. Second, Cem refers to “using the accusation of unethical as a way of shutting down discussion of whether an idea (unethical!) was any good or not.” I’m not using it that way. I have no intention whatsoever of shutting down debate (as if I could in any case!). Unless claimed otherwise, I am stating personal principles; not Right and Wrong, but right and wrong for me. I don’t know of any agency (other than society) who can make claims of Right or Wrong, and even then claims seem always context-specific.

Whether pass-vs.-fail ratios are wrong or Wrong, they’re certainly wrong for me, wrong enough that I’m uncomfortable with using them on the job. I’m sufficiently uncomfortable that I’m usually going to decline to provide them, just as I would not accept a job in which I was obliged to shoot people. Other people might choose to become mercenaries or to go to war for their countries; I’d be a conscientious objector. That wrongness is relative too, of course. It’s subject to the Relative Rule; that any abstract X is X to some person, at some time. I can only warrant my own ethical stance for the moment. My position on some issues has changed over the years, courtesy of some pleasant and unpleasant experiences. I’m not currently aware of things that might cause my stand to change in the future, but I have to leave the possibility open.

So, is providing pass vs. fail rates unethical? On reflection, I have to say reluctantly, yeah, I think so; not absolutely, but in most practical circumstances. For me, the crucial test is in the last of Cem’s questions about ethics: “Are you helping someone else lie, cheat, steal, intimidate, or cause harm?” My answer is that I see a great deal of risk—and admittedly risk is only potential harm—that I will be aiding the client in some form of oppression or deception, either to himself or to his superiors. (The latter is a situation that I have been in before, with pass-vs.-fail ratios at the centre of the story in a project associated with a $33 million dollar loss.) Most of the time, providing pass-vs.-fail ratios is a test activity that I would stop immediately, using the “mission rejected” stopping heuristic (one that I hadn’t noted until Cem himself pointed it out).

Cem doesn’t provide any contexts in which pass-vs.-fail ratios might be useful, but as a context-driven tester, it’s my obligation to accept his critique and his challenge, and consider some contexts in which I might use them. (This is the omission from my post post that I mentioned above, and it’s the way that the controversy was handled in Lessons Learned: with a serving of context) I present them in order from the least plausible to the most plausible.

“Your daughter will die” or “we’ll shoot this dog.” If someone employs a threat of harm to some person or being or something of value, I have to evaluate the relative damage afforded by providing the measure or not.

When mandated by force of law. If I were on the witness stand, and a lawyer asked me, “What were the pass-vs.-fail ratios at release time for this project,” I’d be required by law to respond. I can imagine a likely it would play out, too: “92.7%, but I’d also like to make it clear that—” “No further questions, Your Honour.”

If I provided the data with all of the appropriate disclaimers AND I was sure that the disclaimer would be heard. If the client (and the client’s client, and so forth) were to relay the data and the disclaimer reliably to the point where the data would be used, I might be persuaded to provide the data. But I’d have to weigh that against the risk that I was wrong about the disclaimer being heard. Moreover, in my professional judgement, it would be wasting my client(s)’s time.

As a placebo. I might give a pass vs. fail ratio long enough to convince my client that it’s not helpful or necessary, while doing other things to test well and provide her with other forms of reliable information. I’d remain pretty uncomfortable with dispensing the sugar pills, though, and would work at ways of getting around it.

In the course of demonstrating that pass-vs.-fail ratios are a bad idea. In some contexts, pass-vs.-fail ratios provide what Kirk and Miller call quixotic reliability. That is, the measurement seems to correlate with other measurements of the state of the project. I might provide pass-vs.-fail ratios long enough to show a divergence between that data and other measures of project or product health.

If I were aware that the person receiving the data was in possession of all the contextual information that I believe they needed to put it to appropriate and non-harmful use. We use this in one of the exercises in our class, based on a bug from an actual product. We present a very specific set of tests that are the same in every material way but for two variables. The total domain space to put these variables in combination is a set with 2304 elements. When used in a test that covers all of these elements, 510 provide a “fail” result. All of the test cases are of the same kind, and our students knows that those test cases are comparable for the purposes that they’re considering. In that case, that kind of ratio in that kind of context has some value in describing that kind of coverage. So there might be some pedagogical or rhetorical value to reporting a pass-vs.-fail ratio there. Interestingly, the root of the problem is a data type problem in a single line of code. That helps to illuminate the discussion of “one bug or 510?” which in turn illuminates how bug counts and failure counts aren’t well correlated. It also helps to illuminate opportunity cost in paying overmuch attention to this problem when there are many other things that we might test.

To me, the real challenge is in coming up with a case in which this invalid, dangerous metric in its most common applications might be used for good. In the contexts where they’re commonly discussed and used—overwhelmingly commonly, in my view—pass-vs.-fail ratios are used to express the quality of testing, the health of the project, or the readiness of the product. In those contexts, the risk of misuse, whether intentional or inadvertent, is high—like placing a loaded gun with the safety off in a crowded subway car. As I’ve heard Cem say before, “I’d like to call them an Industry Worst Practice, but being context-driven, I can’t.” Once again, Cem has reminded me of why I can’t commit to the “unethical” charge absolutely and in all cases. He’s provided me with a challenge and an opportunity to sharpen my analysis, and I thank him for that.


Postscript, March 28, 2012: In private correspondence and conversation, Cem suggested a different interpretation of a paragraph from this post that I quoted above to provide context for this post. In order to ward off that interpretation, here’s how I might write that paragraph today:

“The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception. Reducing the product story to this invalid number without additional information means reducing the product’s relationship with people to this invalid number. By extension when this invalid number is being used to evaluate people, that means reducing people to this invalid number too. So to irresponsible, unethical, and unprofessional, in this case we could add unscientific and inhumane.”

To be clear: these two posts have not been a blanket condemnation of all measurement, but of a particular metric that fails spectacularly when subjected to the tests of construct validity and reasonable and foreseeable side effects in Kaner and Bond’s Software Engineering Metrics: What Do They Measure and How Do We Know?. Pass vs. fail is not an imperfect metric; this is a metric that has no discernable construct validity to me (or even to Cem). I’ve both experienced and seen pain and systematic deception with this metric at the centre of it. In this, it’s not like imperfect financial figures that are generated by legitimate companies subject to scrutiny by regulators, by auditors, by shareholders, and by markets. It’s more like financial forecasting data dreamed up by Bernie Madoff. I don’t mind dealing with imperfect but plausibly valid information; that’s all a tester ever gets to do, really. But if Bernie Madoff were to ask me to lend my credibility to his models, data, or business practices, I’d feel personally bound to decline that particular request.

Should Testers Play Planning Poker?

Wednesday, October 26th, 2011

My colleague and friend Eric Jacobson, who recently (as I write) did a bang-up job on his first conference presentation at STAR West 2011, asks a question in response to this blog post from 2006. (I like it when people reflect on an issue for a few years.) Eric asks:

You are suggesting it may not make sense for testers to give time-based estimates to their teams, but what about relative estimates? Let’s say a Rapid Software Tester is asked to participate in Planning Poker (relative-based story estimation) on an Agile Scrum team. I’ve always considered this a golden opportunity. Are you suggesting said tester may want to refuse to participate in the Planning Poker?

Having observed Planning Poker in action, I’m conflicted. Estimating anything is always a bit of a dodgy business, even at the best of times. That’s especially true for investigation and in particular for discovery. (I’ve written about some of the problems with estimation here and in subsequent posts, and with how those problems pertain to testing here.) Yet Planning Poker may be one way to get a good deal closer to the best of times. I like the idea of testers hearing what’s going on in planning sessions, and of offering perspective on the possible implications of work or change. On the other hand, at Planning Poker sessions I’ve observed or participated in, testers are often pressured to lower their numbers. In an environment where there’s trust, there tends to be much less pressure; in an environment where there’s less trust, I’d take pressure to lower the estimate as a test result with several possible interpretations. (I leave those interpretations as an exercise for the reader, but don’t stop until you get to five, at least.)

In any case, some fundamental problems remain: First, testing is oriented towards discovering things, not building things. At the root of it all, any estimate of how long it will take to test something is like estimating how long it will take you to evaluate someone’s ability to speak Spanish (which I wrote about here), and discovering problems in their ability to express themselves. If you already know something or can reasonably anticipate it, that helps a lot, and the Planning Poker approach (among many others) can help with that to some degree.

The second problem is that there’s not necessarily symmetry between the effort in creating something and the effort in testing it. A function or feature that takes very little effort to program might take an enormous amount of effort to test. What kinds of variation could we put into data, workflow, timing, platform dependencies and interactions, scenarios, and so forth? Meanwhile, a feature that takes signficant amounts of programming effort could take almost no time to test (since “programming effort” could include an enormous amount of testing effort). There are dozens of factors involved, including the amount of testing the programmers do as they code; what kind of review is being done; what the scope of the change is; when particular discoveries get made (during “development time” or “testing time”; the skill of the parties involved; the testability of the product under test; how buggy the finished feature is (in which case there will be more time needed for investigation and reporting)… Planning Poker doesn’t solve the asymmetry problem, but it provides a venue for discussing it and getting started on sorting it out.

The third problem, closely related to the second, is this idea that all testing work associated with developing something must and shall happen within the same iteration. Testing never ends; it only stops. So it’s folly to think that all testing for a given amount of programming work can always fit into the same iteration in which the work is done. I’d argue that we need a more nuanced perspective and more options than that. The decision as to how much testing we’ll need is informed by many factors. Paradoxically, we’ll need some testing to help reveal and inform our notions of how much testing we’ll need.

I understand the desire to close the book on a development story within the sprint. I often—even usually—share that desire. Yet many kinds of testing work must respond to development work, and in such cases the development work has to be complete in some lesser sense than “fully tested”. Many kinds of confirmatory checking work, it seems to me, can be done within the same sprint as the programming work; no problem there. Yet it seems to me that other kinds of testing can reasonably wait for subsequent sprints—indeed, must wait for subsequent sprints, unless we’d like to have programmers stop all programming work altogether after a certain day in the sprint. Let me give you an example: in big banks, some kinds of transactions take several days to wend their way through batch processes that are run overnight. The testing work associated with that can be simulated, for sure (indeed, one would hope that most of such work would be simulated), but only at the expense of some loss of realism. For the test, whether the realism is important or not is always an open question with a fallible answer. Instead of making sure that there’s NO testing debt, consider reasonable, small, and sustainable amounts of testing debt that spans iterations. Agile can be about actual agility, instead of dogma.

So… If playing Planning Poker is part of the context, go for it. It’s a heuristic approach to getting people to consider testing more consciously and thoughtfully, and there’s something to that. It’s oriented towards estimating things in a more comprehensible time frame, and in digestible chunks of task and effort. Planning Poker is fallible, and one approach among many possible approaches. Like everything else, its usefulness largely depends mostly on the people using it, and how they use it.

Testing: Difficult or Time-Consuming?

Thursday, September 29th, 2011

In my recent blog post, Testing Problems Are Test Results, I noted a question that we might ask about people’s perceptions of testing itself:

Does someone perceive testing to be difficult or time-consuming? Who? What’s the basis for that perception? What assumptions underlie it?

The answer to that question may provide important clues to the way people think about testing, which in turn influences the cost and value of testing.

As an example, an pseudonymous person (“PM Hut”) who is evidently associated with project management in some sense (s/he provides the URL http://www.pmhut.com) answered my questions above.

Just to answer your question “Does someone perceive testing to be difficult or time-consuming?” Yes, everyone, I can’t think of a single team member I have managed who doesn’t think that testing is time consuming, and they’d rather do something else.

This, alas, isn’t an unusual response. To someone like me who offers help in increasing the value and reducing the cost of testing, it triggers some questions that might prompt reframes or further questions.

  • What do the team members think testing is? Do they think that it’s something ancillary to the project, rather than an essential and integrated aspect of software development? To me, testing is about gathering information and raising awareness that’s essential for identifying product risks and steering the project. That’s incredibly important and valuable.

    So when the team members are driving a car, do they perceive looking out the windshield to be difficult or time-consuming? Do they perceive looking at the dashboard to be difficult or time-consuming? If so, why? What are the differences between the way they obtain awareness when they’re driving a car, versus the way they obtain awareness when they’re contributing to the development of a product or service?

  • Do the team members think testing is the mindless repetition of actions and observation of specific outputs, as prescribed by someone else? If so, I’d agree with them that testing is an unpalatable activity—except I don’t call that testing. I call it checking, and I’d rather let a machine do it. I’d also ask if checking is being done automatically by the programmers at lower levels where it tends to be fast, cheap, easy, useful and timely—or manually at higher levels, where it tends to be slower, more expensive, more difficult, less useful, and less timely—and tedious?
  • Is testing focused mostly on confirmation of things that we already know or hope to be true? Is it mostly focused on the functional aspects of the program (which are amenable to checking)? People tend to find this dull and tedious, and rightly so. Or is testing an active search for new information, problems, and risks? Does it include focus on parafunctional aspects of the product—the things that provide important perceptions of real value to real people? Are the testers given the freedom and responsibility to manage a good deal of their own investigation? Testers tend to find this kind of approach a lot more engaging and a lot more interesting, and the results are typically more wide-ranging, informative, and valuable to programmers and managers.
  • Is testing overburdened by meaningless and valueless paperwork, bureaucracy, and administrivia? How did that come to pass? Are team members aware that there are simple, lightweight, rapid, and highly effective ways of planning, recording, and reporting testing work and project status?
  • Are there political issues? Are testers (or people acting temporarily in a testing role) routinely blown off (as in this example)? Are the nuggets of information revealed by testing habitually dismissed? Is that because testing is revealing trivial information? If so, is there a problem with specific testing skills like modeling the test space, determining coverage, determining oracles, recording, or reporting?
  • Have people been trained on the basis of testing as a skilled, sophisticated thinking art? Or is testing something for which capability can be assessed by a trivial, 40-question multiple choice exam?
  • If testing is being done well (which given people’s attitudes expressed above would be a surprise), are programmers or managers afraid of having to deal with the information that testing reveals? Does that lead to recrimination and conflict?
  • If there’s a perception that testing is by its nature dull and slow, are the testers aware of the quick testing approaches in our Rapid Software Testing class (PDF, page 97-99) , in the Black Box Software Testing course offered by the Association for Software Testing, or in James Whittaker’s How to Break Software? Has anyone read and absorbed Lessons Learned in Software Testing?
  • If there’s a perception that technical reviews are slow, have the testers, programmers, or managers read Perfect Software and Other Illusions About Testing? Do they recognize the ways in which careful observation provides us with “instant reviews” (see Perfect Software, page 143)? Has anyone on the team read any other of Jerry Weinberg’s books on software management and measurement?
  • Have the testers, programmers, and managers recognized the extent to which exploratory testing is going on all the time? Do they recognize that issues revealed by testing might be even more important than bugs? Do they understand that every test result and every testing problem points to meta-information that can be extremely valuable in managing the project?

On PM Hut’s own Web site, there’s an article entitled “Why Project Managers Fail“. The author, Jim Benson, lists five common problems, each of which could be quickly revealed by looking at testing as a source of information, rather than by simply going through the motions. Take it from the former program manager of a product that, in its day, was the best-selling piece of commercial software in the world: testers, testing, and the information they reveal are a project manager’s best friends and most valuable assets—when you have the awareness to recognize them.

Testing need not be difficult, tedious or time-consuming. A perception that it is so, or that it must be so, suggests a problem with testing as practised or testing as perceived. Astute managers and teams will investigate that important and largely mistaken perception.

The Best Tour

Thursday, June 30th, 2011

Cem Kaner recently wrote a reply to my blog post Of Testing Tours and Dashboards. One way to address the best practice issue is to go back to the metaphor and ask “What would be the best tour of London?” That question should give rise to plenty of other questions.

  • Are you touring for your own purposes, or in support of someone else’s interests? To what degree are other people interested in what you learn on the tour? Are you working for them? Who are they? Might they be a travel agency? A cultural organization? A newspaper? A food and travel show on TV? The history department of a university? What’s your information objective? Does the client want quick, practical, or deep questions answered? What’s your budget?
  • How well do you know London already?  How much would you like to leave open the possibility of new discoveries?  What maps or books or other documentation do you have to help to guide or structure your tour?  Is updating those documents part of your purpose?
  • Is someone else guiding your tour? What’s their reputation? To what extent do you know and trust them? Are they going to allow you the opportunity and the time to follow your own lights and explore, or do they have a very strict itinerary for you to follow? What might you see—or miss—as a result?
  • Are you traveling with other people? What are they interested in? To what degree do you share your discovery and learning?
  • How would you prefer to get around? By Tube, to get around quickly? By a London Taxi (which includes some interesting information from the cabbie? By bus, so you can see things from the top deck? On foot? By tour bus, where someone else is doing all the driving and all the guiding (that’s scripted touring)?
  • What do you need to bring with you? Notepad? Computer? Mobile phone? Still camera? Video camera? Umbrella? Sunscreen? (It’s London; you’ll probably need the umbrella.)
  • How much time do you have available?   An afternoon?  A day?  A few days? A week?  A month?
  • What are you (or your clients) interested in? Historical sites? Art galleries? Food? Museums? Architecture? Churches? Shopping? How focused do you want your tour to be? Very specialized, or a little of this and a little of that? What do you consider “in London”, and what’s outside of it?
  • How are you going to organize your time? How are you going to account for time spent in active investigation and research versus moving from place to place, breaks, and eating? How are you going to budget time to collect your findings, structure and summarize your experience, and present a report?
  • How do you want to record your tour? If you’re working for a client, what kind of report do they want? A conversation? Written descriptions? Pictures? Do they want things in a specific format?

(Note, by the way, that these questions are largely structured around the CIDTESTD guidewords in the Heuristic Test Strategy Model (Customer, Information, Developer Relations, Equipment and Tools, Schedule, Test Item, and Deliverables)—and that there are context-specific questions that we can add as we model and explore the mission space and the testing assignment.)

There is no best tour of London; they have their strengths and weaknesses. Reasonable people who think about it for a moment realize that the “best” tour of London is a) relative to some person; b) relative to that person’s purposes and interests; c) relative to what the person already knows; d) relative to the amount of time available.  And such a reasonable person would be able to apply that metaphor to software testing tours too.

Common Languages Ain’t So Common

Tuesday, June 28th, 2011

A friend told me about a payment system he worked on once. In the system models (and in the source code), the person sending notification of a pending payment was the payer. The person who got that notice was called the payee. That person could designate somone else—the recipient—to pick up the money. The transfer agent would credit the account of the recipient, and debit the account of the person who sent notification—the payer, who at that point in the model suddenly became known as the sender. So, to make that clear: the payer sends email to the payee, who receives it. The sender pays money to the recipient (who accepts the payment.) Got that clear? It turns out there was a logical, historical reason for all this. Everything seemed okay at the beginning of the project; there was one entity named “payer” and another named “payee”. Payer A and Payee B exchanged both email and money, until someone realized that B might give someone else, C, the right to pick up the money. Needing another word for C, the development group settled on “recipient”, and then added “sender” to the model for symmetry, even though there was no real way for A to split into two roles as B had. Uh, so far.

There’s a pro-certification argument that keeps coming back to the discussion like raccoons to a garage: the claim that, whatever its flaws, “at least certification training provides us with a common language for testing.” It’s bizarre enough that some people tout this rationalization; it’s even weirder that people accept it without argument. Fortunately, there’s an appropriate and accurate response: No, it doesn’t. The “common language” argument is riddled with problems, several of which on their own would be showstoppers.

  • Which certification training, specifically, gives us a common language for testing? Aren’t there several different certification tribes? Do they all speak the same language? Do they agree, or disagree on the “common language”? What if we believe certification tribes present (at best) a shallow understanding and a shallow description of the ideas that they’re describing?
  • Who is the “us” referred to in the claim? Some might argue that “us” refers to the testing “industry”, but there isn’t one. Testing is practiced in dozens of industries, each with its own contexts, problems, and jargon.
  • Maybe “us” refers to our organization, or our development shop. Yet within our own organization, which testers have attended the training? Of those, has everyone bought into the common language? Have people learned the material for practical purposes, or have they learned it simply to pass the certification exam? Who remembers it after the exam? For how long? Even if they remember it, do they always and everafter use the language that has been taught in the class?
  • While we’re at it, have the programmers attended the classes? The managers? The product owners? Have they bought in too?
  • With that last question still hanging, who within the organization decides how we’ll label things? How does the idea of a universal language for testing fit with the notion of the self-organizing team? Shouldn’t choices about domain-specific terms in domain-specific teams be up to those teams, and specific to those domains?
  • What’s the difference between naming something and knowing something? It’s easy enough to remember a label, but what’s the underlying idea? Terms of art are labels for constructs—categories, concepts, ideas, thought-stuff. What’s in and what’s out with respect to a given category or label? Does a “common language” give us a deep understanding of such things? Please, please have a look at Richard Feynman’s take on differences between naming and knowing, http://www.youtube.com/watch?v=05WS0WN7zMQ.
  • The certification scheme has representatives from over 25 different countries, and must be translated into a roughly equivalent number of languages. Who translates? How good are the translations?
  • What happens when our understanding evolves? Exploratory testing, in some literature, is equated with “ad hoc” testing, or (worse) “error guessing”. In the 1990s, James Bach and Cem Kaner described exploratory testing as “simultaneous test design, test execution, and learning”. In 2006, participants in the Workshop on Heuristic and Exploratory Techniques discussed and elaborated their ideas on exploratory testing. Each contributed a piece to a definition synthesized by Cem Kaner: “Exploratory software testing is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.” That doesn’t roll off the tonque quite so quickly, but it’s a much more thorough treatment of the idea, identifying exploratory testing as an approach, a way that you do something, rather than something that you do. Exploratory work is going on all the time in any kind of complex cognitive activity, and our understanding of the work and of exploration itself evolves (as we’ve pointed out here, and here, and here, and here, and here.). Just as everyday, general-purpose languages adopt new words and ideas, so do the languages that we use in our crafts, in our communities, and with our clients.

In software development, we’re alway solving new problems. Those new problems may involve people to work with entirely new technological or business domains, or to bridge existing domains with new interactions and new relationships. What happens when people don’t have a common language for testing, or for anything else in that kind of development process? Answer: they work it out. As Peter Galison notes in his work on trading zones, “Cultures in interaction frequently establish contact languages, systems of discourse that can vary from the most function-specific jargons, through semispecific pidgins, to full-fledged creoles rich enough to support activities as complex as poetry and metalinguistic reflection.”  Each person in a development group brings elements of his or her culture along for the ride; each project community develops its own culture and its own language.

Yes, we do need common language for testing. Anthropology shows us that meaningful language develops organically when people gather for a common purpose in a particular context. Just as we need tests that are specific to a given context, we need terms that are that way too. So instead of focusing training on memorizing glossary entries, let’s teach testers more about the relationships between words and ideas. Let’s challenge each other to ask better questions about the language we’re using, and how it might be fooling us.

More of What Testers Find

Wednesday, March 30th, 2011

Damn that James Bach, for publishing his ideas before I had a chance to publish his ideas! Now I’ll have to do even more work!

A couple of weeks back, James introduced a few ideas to me about things that testers find in addition to bugs.  He enumerated issues, artifacts, and curios.  The other day I was delighted to find an elaboration of these ideas (to which he added risks and testability issues) in his blog post called What Testers Find.  Delighted, because it notes so many important things that testers learn and report beyond bugs.  Delighted, because it gives me an opportunity and an incentive to dive into James’ ideas more deeply. Delighted, because it gives us all a chance to explore and identify a much richer view of testing than the simplistic notion that “testers find bugs”.

Despite the fact that testers find much more than bugs, let’s start with bugs.  James begins his list of what testers find by saying

Testers find bugs. In other words, we look for anything that threatens the value of the product.

How do we know that something threatens the value of the product?  The fact is, we don’t know for sure.  Quality is value to some person, and different people will have different perceptions of value.  Since we don’t own the product, the project, or the business, we can’t make absolute declarations of whether something is a bug or whether it’s worth fixing.  The programmers, the managers, and the project owner will make those determinations, and often they’re running in different directions.  Some will see a problem as a bug; some won’t.  Some won’t even see a problem. It seems like the only certain thing here is uncertainty.  So what can we testers do?

We find problems that might threaten the value of the product to some person who matters. How do we do that? We identify quality criteria–aspects of the product that provide some kind of value to customers or users that we like, or that help to defend the product from users that we don’t like, such as unethical hackers or fraudsters or thieves.  If we’re doing a great job, we also to account for the fact that users we do like will make mistakes from time to time.  So defending value also means making the product robust to human ineptitude and imperfection.  In the Heuristic Test Strategy Model (which we teach as part of the Rapid Software Testing course), we identify these quality criteria:

  • Capability (or functionality)
  • Reliability
  • Usability
  • Security
  • Scalability
  • Performance
  • Installability
  • Compatibility
  • Supportability
  • Testability
  • Maintainability
  • Portability
  • Localizability

In order to identify threats to the quality of the product, we use oracles.  Oracles are heuristic (useful, fast, inexpensive, and fallible) principles or mechanisms by which we recognize problems.  Most oracles are based on the notion of consistency.  We expect a product to be consistent with

  • History (the product’s own history, prior results from earlier test runs, our experience with the product or other products like it…)
  • Image (a reputation our development organization wants to project, our brand identity,…)
  • Comparable products (products like this one that we develop, competitors’ products, test programs or algorithms,…)
  • Claims (things that important people say about the product, requirements, specifications, user documentation, marketing material,…)
  • User expections (what reasonable people might anticipate the product could or should do, new features, fixed bugs,…)
  • Product (behaviour of the interface and UI elements, values that should be the same in different views,…)
  • Purpose (explicitly stated uses of the product, uses that might be implicit or inferred from the product’s design, no excessive bells and whistles,…)
  • Standards (relevant published guidelines, conventions for use or appearance for products of this class or in this domain, behaviour appropriate to the local market,…)
  • Statutes (relevant laws, relevant regulations,…)

In addition to these consistency heuristics, there’s an inconsistency heuristic too:  we’d like the product to be inconsistent with patterns of problems that we’ve seen before.  Typically those problems are founded in one of the consistency heuristics listed above. Yet it’s perfectly reasonable to observe a problem and recognize it first by its familiarity. We’ve seen lots of testers do that over the years.

We encourage people do come up with their own lists, or modifications to ours. You don’t have to use Heuristic Test Strategy Model if it doesn’t work for you.  You can create your own models for testing, and we actively encourage people who want to become great testers to do that.  Testers find models, ways of looking at the product, the project, and testing itself, in the effort to wrestle down the complexity of the systems we’re testing and the approaches that we need to test them.

In your context, do you see a useful distinction between compatibility (playing nice with other programs that happen to co-exist on the system) and  interoperability (working well with programs with which your application specifically interacts)?  Put interoperability on your quality criteria list.  Is accessibility for disabled users so important for your product that you want to highlight it in a separate quality criterion?  Put it on your list. Recently, James noticed that explicablility is a consistency heuristic that can act as an oracle too:  when we see behaviour we can’t explain or make sense of, we have reason to suspect that there might be a problem.  Testers find factors, relevant and material aspects of our models, products, projects, businesses, and test strategies.

When testers see some inconsistency in the product that threatens one or more of the quality criteria, we report.  For the report to be relevant and meaningful, it must link quality criteria, oracles, and risk in ways that are clear, meaningful, and important to our clients. Rather than simply noticing an inconsistency, we must show why the inconsistency threatens some quality criterion for some person who matters.  Establishing and describing those links in a chain of logic from the test mission to the test result is an activity that James and I call test framing.  So:  Testers find frames, the logical relationships between the test mission, our observations of the product, potential problems, and why we think they might be problems. James gave an example of a bug (“a list of countries in a form is missing ‘France’”). That might mean a minor usabilty problem based on one quality criterion, with a simple workaround (the customer trying to choose a time zone from a list of countries presented as examples; so pick Spain, which is in the same time zone). Based on another criterion like localizability, we’d perceive a more devastating problem (the customer is trying to choose a language, so despite the fact that the Web site has been translated, it won’t be presented in French, cutting our service off from a nation of 65 million people).

In finding bugs, testers find many other things too.  Excellent testing depends on our being able to identify and articulate what we find, how we find it, and how we contextualize it. That’s an ongoing process.  Testers find testing itself.

And there’s more, if you follow the link.

Exegesis Saves (Part 3) Beyond the Bromides

Sunday, January 23rd, 2011

Over the last few blog posts, some colleagues and I have been analyzing this sentence:

“In successful agile development teams, every team member takes responsibility for quality.”

Now, in one sense, it’s unfair for me to pick on this sentence, because I’ve taken it out of context. It’s not unique, though; a quick search on Google reveals lots of similar sentences:

“Agile teams work in a more collaborative and open manner which reduces the need for documentation-heavy, bureaucratic approaches. The good news is that they have a greater focus on quality-oriented, disciplined, and value-adding techniques than traditionalists do. However, the challenge is that the separation of work isn’t there any more—everyone on the team is responsible for quality, not just ‘quality practitioners’. This requires you to be willing to work closely with other IT professionals, to transfer your skills and knowledge to them and to pick up new skills and knowledge from them.”
http://www.ambysoft.com/essays/agileTesting.html

“In Agile software development, the whole team is responsible for quality, but there are many barriers to accomplishing that goal.”
http://www.rallydev.com/downloads/document/191-the-best-kept-secret-of-agile-software-quality.html

“So what does testing now need to know and do to work effectively within a team to deliver a system using an agile method? The concept of ‘the team being responsible for quality’ i.e. ‘the whole team concept’ and not just the testing team, is a key value of agile methods. Agile methods need the development team writing Unit tests and/or following Test First Design (TDD) practices (don’t confuse TDD as a test activity as in fact it is a mechanism to help with designing the code). The goal here is to get as much feedback on code and build quality as early as possible.”
http://agiletesting.com.au/agile-methodology/agile-methods-and-software-testing/

“As we have seen quality is the responsibility of every team member in an agile team, not just the developers. Every team member has a role to play in building quality in.”
http://www.catosplace.net/blogs/personal/?p=580

“The responsibility for quality was shifted to the whole team. Each of the different roles is responsible for doing some form of testing. Programmers, architects, analysts, and even managers are all intimately involved with testing activities and work closely together to achieve quality goals.”
http://www.ciol.com/resources/UserFiles/developer/Effective-utilization-of-Agile-Methods-in-QA.doc

“In traditional systems, the responsibility for quality is mainly delegated to testing teams that must make sure the code is of high quality. Agile thinking makes quality a collective responsibility of the customer, the developers and the testers all the time from the first, to the last minute of a project. The customer is involved in quality by defining acceptance tests. The developers are involved in quality by helping the customers write the tests, by writing unit tests for all the production code they write and the testers are involved by helping the developers automate acceptance (customer) tests and by extending the suite of automated tests.”
http://danbunea.blogspot.com/2008/05/chapter-6-quality-and-testing.html

“I’m an advocate of any methodology that empowers engineers to work toward higher standards of software integrity. Agile and test-driven methodologies have found a way to do that by distributing ownership and responsibility for quality. And with so many ways to make static analysis boost the type of automated testing this requires, agile is a better formula for better code that can keep up with shorter scrum cycles and produce frequently “potentially shippable” products.”
http://blog.coverity.com/development/why-go-agile/

“This then leads back to my original question. Who is responsible for quality? The answer is everyone.”
http://www.basilv.com/psd/blog/2010/who-is-responsible-for-quality

“The corollary to this rule is that testers cannot be responsible for quality; developers must be. The Agile methods put the responsibility for quality precisely where it belongs, with the developers.”
http://www.cmcrossroads.com/cm-journal-articles/9688-build-quality-in-the-agile-methods-are-right

“In a successful team, every member feels responsible for the quality of the product. Responsibility for quality cannot be delegated from one team member to another team member or function. Similarly, every team member must be a customer advocate, considering the eventual usability of the product throughout its development cycle. Quality has to be built into the plans and schedules. Use bug allotments, iterations devoted to fixing bugs, to bring down bug debt. This will reduce velocity which may provide enough slack in future iterations to reduce bug rates.”
http://www.yoursharepointexperts.com/microsoft/Documents/MSF%20for%20Agile%20Overview.pdf

So the sentence that I quoted is by no means unique. http://www.agilejournal.com/articles/columns/column-articles/2722-whats-a-tester-without-a-qa-team“>Here’s the source for it. It’s an article by Lisa Crispin and Janet Gregory. You can read the rest of the article, and make your own decisions about whether the rest of it is helpful. You may decide (and I’d agree with you) that there are several worthy points in the article. You may decide that I’ve been unfair (and I’d agree with you) in excerpting the sentence without the rest of the article, especially considering that until now I’ve ignored the sentence that follows immediately, “This means that although testers bring special expertise and a unique viewpoint to the team, they are not the only ones responsible for testing or for the quality of the product.”

The latter sentence certainly clarifies the former. I would argue (and this is why I brought the whole matter up) that, in context, the second sentence renders the first unnecessary and even a distraction. Please note: my intention was not to critique the article, nor to launch a personal attack on Lisa and Janet. As I’ve said, I have no doubt that Lisa and Janet mean well, and their article contains several worthy points. My goal instead was to test this often-uttered tenet of agile lore. That’s why I didn’t reveal the source of the sentence initially; the article itself is not at issue here.

Yet the sentence, found in so many similar forms, is a bromide. My Oxford Dictionary of English says that a bromide is “a trite statement intended to soothe or placate”. As philosophers might say, the sentence doesn’t do any real explanatory work; it doesn’t make any useful distinctions. Even in context, it’s often unclear as to whether the sentence is intended be descriptive (“this is the way things are”) or normative (“this is the way things should be”).

To highlight my greatest misgivings about the slogan, let’s restate it, replacing the word “quality” with Jerry Weinberg’s definition of it:

“In successful agile development teams, every team member takes responsibility for value to some person(s).”

Every time I see the whole-team-responsible-for-quality trope, I find myself wondering responsibility to whom, quality according to whom, and what it means to take responsibility. Figuring out what we intend to achieve, and how we are to achieve it, are among the harder problems of software development. (One can see this being played out in the Craftsmanship Skirmishes that are going on as I write.)

Here’s what would make the conversation more meaningful to me. I suggest that we who develop and write about software…

  • Consider quality not as something simple, objective, and abstract, but as something messy, subjective and very human. Quality isn’t a thing, but rather a complex set of relationships between products, people, and systems. As such, quality shouldn’t be swept under the rug of a vague slogan.
  • Remember that there are as many ways to think about quality as there are people to think about them, and that in a software development project, people’s interests will clash from time to time. Those conflicts will require some people to concede certain values in favour of other values. As Jerry Weinberg has pointed out, decisions about quality are political and emotional. Sorting out those conflicts will require us to identify who has the power to make the decision, and to contribute to empowering that person when appropriate. That in turn will require us to develop skill, courage, and confidence, tempered with patience and humility.
  • Apropos of the last point, we must keep in mind that teams are collections—or systems—of individuals. it’s a nice idea to think that the whole team makes decisions collectively, but in reality some person—the product owner—must be granted the ultimate authority to make a decision. If the team is to be successful, there’s an implicit contract: the product owner must enable, trust and empower the members of the team to design and perform their work collaboratively, in the best ways that they know how; and the members of the team must inform, trust and empower the product owner, such that she can make the best decisions possible.
  • Since quality means “value to some person(s)”, be specific about what particular dimension of value we mean, and to whom. For example, in a particular context, if we want “quality” to mean “bug prevention,” let’s say precisely that. Then let’s recognize the ways in which certain approaches towards preventing bugs might represent a threat to someone’s current interests or to their personal safety rules. If, in a particular context, we want quality to mean “problems solved for customers”, let’s say precisely that. Then let’s recognize that there are many approaches to solving problems, and that some problems might be solved by writing less software, not more. If we want “quality” to mean “many features in a product”, let’s say precisely that. Then let’s recognize how “many features” can satisfy some people—while adding complexity, development time, and expense to a product, thereby confusing and annoying other people. In other words, let’s use the word “quality” in a more careful way, which starts with deciding whether we want to use the word at all.
  • Since quality is a complex notion, we must learn not only to declare quality in terms of things that we expect and prescribe. Those things are important, to be sure. Yet we must also prepare to seek the unexpected, to make discoveries, and to respond to change.
  • Rather than striving for influence (which to me has echoes of the quality assurance or quality control mindset), testers in particular must strive to develop understanding and awareness of the system that we’re being asked to test. To me, that’s our principal product: learning about the system and reporting what we’ve learned to help the project community to gain greater understanding of the system, the problems, and the risks.
  • Consider responsibility not in terms of something people take, but as something that people accept, and also as something that people grant, share, and extend to one another. To me, that means contributing to an environment in which everyone is empowered to create and defend the value of each other’s work. That includes offering, asking for and accepting help—and dealing with unpleasant news appropriately, whether we’re delivering it or receiving it.

There have been several pleasures in working through this exercise—most notably the transpection with my colleagues on Twitter and in Skype. But there’s been another: rediscovering and re-reading the Declaration of Interdependence and George Orwell’s Politics and the English Language.

Note that I’m saying that these ideas would make the conversation more meaningful to me. You may have other thoughts, and I’d like to hear them, so I hope you’re willing to share them.