Blog Posts for the ‘Risk’ Category

Drop the Crutches

Thursday, January 5th, 2017

This post is adapted from a recent blast of tweets. You may find answers to some of your questions in the links; as usual, questions and comments are welcome.

Update, 2017-01-07: In response to a couple of people asking, here’s how I’m thinking of “test case” for the purposes of this post: Test cases are formally structured, specific, proceduralized, explicit, documented, and largely confirmatory test ideas. And, often, excessively so. My concern here is directly proportional to the degree to which a given test case or a given test strategy emphasizes these things.

crutches

I had a fun chat with a client/colleague yesterday. He proposed—and I agreed—that test cases are like crutches. I added that the crutches are regularly foisted on people who weren’t limping to start with. It’s as though before the soccer game begins, we hand all the players a crutch. The crutches then hobble them.

We also agreed that test cases often lead to goal displacement. Instead of a thorough investigation of the product, the goal morphs into “finish the test cases!” Managers are inclined to ask “How’s the testing going?” But they usually don’t mean that. Instead, they almost certainly mean “How’s the product doing?” But, it seems to me, testers often interpret “How’s the testing going?” as “Are you done those test cases?”, which ramps up the goal displacement.

Of course, “How’s the testing going?” is an important part of the three-part testing story, especially if problems in the product or project are preventing us from learning more deeply about the product. But most of the time, that’s probably not the part of story we want to lead with. In my experience, both as a program manager and as a tester, managers want to know one thing above all:

Are there problems that threaten the on-time, successful completion of the project?

The most successful and respected testers—in my experience—are the ones that answer that question by actively investigating the product and telling the story of what they’ve found. The testers that overfocus on test cases distract themselves AND their teams and managers from that investigation, and from the problems investigation would reveal.

For a tester, there’s nothing wrong with checking quickly to see that the product can do something—but there’s not much right—or interesting—about it either. Checking seems to me to be a reasonably good thing to work into your programming practice; checks can be excellent alerts to unwanted low-level changes. But when you’re testing, showing that the product can work—essentially, demonstration—is different from investigating and experimenting to find out how it does (or doesn’t) work in a variety of circumstances and conditions. Sometimes people object saying that they have to confirm that the product works and that they have don’t have time to investigate. To me, that’s getting things backwards. If you actively, vigorously look for problems and don’t find them, you’ll get that confirmation you crave, as a happy side effect.

No matter what, you must prepare yourself to realize this:

Nobody can be relied upon to anticipate all of the problems that can beset a non-trivial product.

fractalWe call it “development” for a reason. The product and everything around it, including the requirements and the test strategy, do not arrive fully-formed. We continuously refine what we know about the product, and how to test it, and what the requirements really are, and all of those things feed back into each other. Things are revealed to us as we go, not as a cascade of boxes on a process diagram, but more like a fractal.

The idea that we could know entirely what the requirements are before we’ve discussed and decided we’re done seems like total hubris to me. We humans have a poor track record in understanding and expressing exactly what we want. We’re no better at predicting the future. Deciding today what will make us happy ten months—or even days—from now combines both of those weaknesses and multiplies them.

For that reason, it seems to me that any hard or overly specific “Definition of Done” is antithetical to real agility. Let’s embrace unpredictability, learning, and change, and treat “Definition of Done” as a very unreliable heuristic. Better yet, consider a Definition of Not Done Yet: “we’re probably not done until at least These Things are done”. The “at least” part of DoNDY affords the possibility that we may recognize or discover important requirements along the way. And who knows?—we may at any time decide that we’re okay with dropping something from our DoNDY too. Maybe the only thing we can really depend upon is The Unsettling Rule.

Test cases—almost always prepared in advance of an actual test—are highly vulnerable to a constantly shifting landscape. They get old. And they pile up. There usually isn’t a lot of time to revisit them. But there’s typically little need to revisit many of them either. Many test cases lose relevance as the product changes or as it stabilizes.

Many people seem prone to say “We have to run a bunch of old test cases because we don’t know how changes to the code are affecting our product!” If you have lost your capacity to comprehend the product, why believe that you still comprehend those test cases? Why believe that they’re still relevant?

Therefore: just as you (appropriately) remain skeptical about the product, remain skeptical of your test ideas—especially test cases. Since requirements, products, and test ideas are subject to both gradual and explosive change, don’t overformalize or otherwise constrain your testing to stuff that you’ve already anticipated. You WILL learn as you go.

Instead of overfocusing on test cases and worrying about completing them, focus on risk. Ask “How might some person suffer loss, harm, annoyance, or diminished value?” Then learn about the product, the technologies, and the people around it. Map those things out. Don’t feel obliged to be overly or prematurely specific; recognize that your map won’t perfectly match the territory, and that that’s okay—and it might even be a Good Thing. Seek coverage of risks and interesting conditions. Design your test ideas and prepare to test in ways that allow you to embrace change and adapt to it. Explain what you’ve learned.

Do all that, and you’ll find yourself throwing away the crutches that you never needed anyway. You’ll provide a more valuable service to your client and to your team. You and your testing will remain relevant.

Happy New Year.

Further reading:

Testing By Percentages
Very Short Blog Posts (11): Passing Test Cases
A Test is a Performance
Test Cases Are Not Testing: Toward a Culture of Test Performance” by James Bach & Aaron Hodder (in http://www.testingcircus.com/documents/TestingTrapeze-2014-February.pdf#page=31)

The Honest Manual Writer Heuristic

Monday, May 30th, 2016

Want a quick idea for a a burst of activity that will reveal both bugs and opportunities for further exploration? Play “Honest Manual Writer”.

Here’s how it works: imagine you’re the world’s most organized, most thorough, and—above all—most honest documentation writer. Your client has assigned you to write a user manual, including both reference and tutorial material, that describes the product or a particular feature of it. The catch is that, unlike other documentation writers, you won’t base your manual on what the product should do, but on what it does do.

You’re also highly skeptical. If other people have helpfully provided you with requirements documents, specifications, process diagrams or the like, you’re grateful for them, but you treat them as rumours to be mistrusted and challenged. Maybe someone has told you some things about the product. You treat those as rumours too. You know that even with the best of intentions, there’s a risk that even the most skillful people will make mistakes from time to time, so the product may not perform exactly as they have intended or declared. If you’ve got use cases in hand, you recognize that they were written by optimists. You know that in real life, there’s a risk that people will inadvertently blunder or actively misuse the product in ways that its designers and builders never imagined. You’ll definitely keep that possibility in mind as you do the research for the manual.

You’re skeptical about your own understanding of the product, too. You realize that when the product appears to be doing something appropriately, it might be fooling you, or it might be doing something inappropriate at the same time. To reduce the risk of being fooled, you model the product and look at it from lots of perspectives (for example, consider its structure, functions, data, interfaces, platform, operations, and its relationship to time; and business risk, and technical risk). You’re also humble enough to realize that you can be fooled in another way: even when you think you see a problem, the product might be working just fine.

Your diligence and your ethics require you to envision multiple kinds of users and to consider their needs and desires for the product (capability, reliability, usability, charisma, security, scalability, performance, installability, supportability…). Your tutorial will be based on plausible stories about how people would use the product in ways that bring value to them.

You aspire to provide a full accounting of how the product works, how it doesn’t work, and how it might not work—warts and all. To do that well, you’ll have to study the product carefully, exploring it and experimenting with it so that your description of it is as complete and as accurate as it can be.

There’s a risk that problems could happen, and if they do, you certainly don’t want either your client or the reader of your manual to be surprised. So you’ll develop a diversified set of ways to recognize problems that might cause loss, harm, annoyance, or diminished value. Armed with those, you’ll try out the product’s functions, using a wide variety of data. You’ll try to stress out the product, doing one thing after another, just like people do in real life. You’ll involve other people and apply lots of tools to assist you as you go.

For the next 90 minutes, your job is to prepare to write this manual (not to write it, but to do the research you would need to write it well) by interacting with the product or feature. To reduce the risk that you’ll lose track of something important, you’ll probably find it a good idea to map out the product, take notes, make sketches, and so forth. At the end of 90 minutes, check in with your client. Present your findings so far and discuss them. If you have reason to believe that there’s still work to be done, identify what it is, and describe it to your client. If you didn’t do as thorough a job as you could have done, report that forthrightly (remember, you’re super-honest). If anything that got in the way of your research or made it more difficult, highlight that; tell your client what you need or recommend. Then have a discussion with your client to agree on what you’ll do next.

Did you notice that I’ve just described testing without using the word “testing”?

Very Short Blog Posts (1): “We Don’t Have Time for Testing”

Sunday, September 8th, 2013

When someone says “We don’t have time for testing”, try translating that into “We don’t have time to think critically about the product, to experiment with it, and to learn about ways in which it might fail.” Then ask if people feel okay about that.

Testing: Difficult or Time-Consuming?

Thursday, September 29th, 2011

In my recent blog post, Testing Problems Are Test Results, I noted a question that we might ask about people’s perceptions of testing itself:

Does someone perceive testing to be difficult or time-consuming? Who? What’s the basis for that perception? What assumptions underlie it?

The answer to that question may provide important clues to the way people think about testing, which in turn influences the cost and value of testing.

As an example, an pseudonymous person (“PM Hut”) who is evidently associated with project management in some sense (s/he provides the URL http://www.pmhut.com) answered my questions above.

Just to answer your question “Does someone perceive testing to be difficult or time-consuming?” Yes, everyone, I can’t think of a single team member I have managed who doesn’t think that testing is time consuming, and they’d rather do something else.

This, alas, isn’t an unusual response. To someone like me who offers help in increasing the value and reducing the cost of testing, it triggers some questions that might prompt reframes or further questions.

  • What do the team members think testing is? Do they think that it’s something ancillary to the project, rather than an essential and integrated aspect of software development? To me, testing is about gathering information and raising awareness that’s essential for identifying product risks and steering the project. That’s incredibly important and valuable.

    So when the team members are driving a car, do they perceive looking out the windshield to be difficult or time-consuming? Do they perceive looking at the dashboard to be difficult or time-consuming? If so, why? What are the differences between the way they obtain awareness when they’re driving a car, versus the way they obtain awareness when they’re contributing to the development of a product or service?

  • Do the team members think testing is the mindless repetition of actions and observation of specific outputs, as prescribed by someone else? If so, I’d agree with them that testing is an unpalatable activity—except I don’t call that testing. I call it checking, and I’d rather let a machine do it. I’d also ask if checking is being done automatically by the programmers at lower levels where it tends to be fast, cheap, easy, useful and timely—or manually at higher levels, where it tends to be slower, more expensive, more difficult, less useful, and less timely—and tedious?
  • Is testing focused mostly on confirmation of things that we already know or hope to be true? Is it mostly focused on the functional aspects of the program (which are amenable to checking)? People tend to find this dull and tedious, and rightly so. Or is testing an active search for new information, problems, and risks? Does it include focus on parafunctional aspects of the product—the things that provide important perceptions of real value to real people? Are the testers given the freedom and responsibility to manage a good deal of their own investigation? Testers tend to find this kind of approach a lot more engaging and a lot more interesting, and the results are typically more wide-ranging, informative, and valuable to programmers and managers.
  • Is testing overburdened by meaningless and valueless paperwork, bureaucracy, and administrivia? How did that come to pass? Are team members aware that there are simple, lightweight, rapid, and highly effective ways of planning, recording, and reporting testing work and project status?
  • Are there political issues? Are testers (or people acting temporarily in a testing role) routinely blown off (as in this example)? Are the nuggets of information revealed by testing habitually dismissed? Is that because testing is revealing trivial information? If so, is there a problem with specific testing skills like modeling the test space, determining coverage, determining oracles, recording, or reporting?
  • Have people been trained on the basis of testing as a skilled, sophisticated thinking art? Or is testing something for which capability can be assessed by a trivial, 40-question multiple choice exam?
  • If testing is being done well (which given people’s attitudes expressed above would be a surprise), are programmers or managers afraid of having to deal with the information that testing reveals? Does that lead to recrimination and conflict?
  • If there’s a perception that testing is by its nature dull and slow, are the testers aware of the quick testing approaches in our Rapid Software Testing class (PDF, page 97-99) , in the Black Box Software Testing course offered by the Association for Software Testing, or in James Whittaker’s How to Break Software? Has anyone read and absorbed Lessons Learned in Software Testing?
  • If there’s a perception that technical reviews are slow, have the testers, programmers, or managers read Perfect Software and Other Illusions About Testing? Do they recognize the ways in which careful observation provides us with “instant reviews” (see Perfect Software, page 143)? Has anyone on the team read any other of Jerry Weinberg’s books on software management and measurement?
  • Have the testers, programmers, and managers recognized the extent to which exploratory testing is going on all the time? Do they recognize that issues revealed by testing might be even more important than bugs? Do they understand that every test result and every testing problem points to meta-information that can be extremely valuable in managing the project?

On PM Hut’s own Web site, there’s an article entitled “Why Project Managers Fail“. The author, Jim Benson, lists five common problems, each of which could be quickly revealed by looking at testing as a source of information, rather than by simply going through the motions. Take it from the former program manager of a product that, in its day, was the best-selling piece of commercial software in the world: testers, testing, and the information they reveal are a project manager’s best friends and most valuable assets—when you have the awareness to recognize them.

Testing need not be difficult, tedious or time-consuming. A perception that it is so, or that it must be so, suggests a problem with testing as practised or testing as perceived. Astute managers and teams will investigate that important and largely mistaken perception.

Testing Problems Are Test Results

Tuesday, September 6th, 2011

I often do an exercise in the Rapid Software Testing class in which I ask people to catalog things that, for them, make testing harder or slower. Their lists fit a pattern I hear over and over from testers (you can see an example of the pattern in this recent question on Stack Exchange). Typical points include:

  • I’m a tester working alone with several programmers (or one of a handful of testers working with many programmers).
  • I’m under enormous time pressure. Builds are coming in continuously, and we’re organized on one- or two-week development cycles.
  • The product(s) I’m testing is (are) very complex.
  • There are many interdependencies between modules within the product, or between products.
  • I’m seeing a consistent pattern of failures specifically related to those interdependencies; the tiniest change here can have devastating impact there—or anywhere.
  • I believe that I have to run a complete regression test on every build to try to detect those failures.
  • I’m trying to cope by using automated checks, but the complexity makes the automation difficult, the program’s testing hooks are minimal at best, and frequent product changes make the whole relationship brittle.
  • The maintenance effort for the test automation is significant, at a cost to other testing I’d like to do.
  • I’m feeling overwhelmed by all this, but I’m trying to cope.

On top of that,

  • The organization in which I’m working calls itself Agile.
  • Other than the two-week iterations, we’re actually using at most two other practices associated with Agile development, (typically) daily scrums or Kanban boards.

Oh, and for extra points,

  • The builds that I’m getting are very unstable. The system falls over under the most basic of smoke tests. I have to do a lot of waiting or reconfiguring or both before I can even get started on the other stuff.

How might we consider these observations?

We could choose to interpret them as problems for testing, but we could think of them differently: as test results.

Test results don’t tell us whether something is good or bad, but they may inform a decision, or an evaluation, or more questions. People observe test results and decide whether there are problems, what the problems are, what further questions are warranted, and what decisions should be made. Doing that requires human judgement and wisdom, consideration of lots of factors, and a number of possible interpretations.

Just as for automated checks and other test results, it’s important to consider a variety of explanations and interpretations for testing meta-results—observations about testing. If we don’t do that, we risk missing important problems that threaten the quality of testing effort, and the quality of the product, too.

As Jerry Weinberg points out in Perfect Software and Other Illusions About Testing, whatever else something might be, it’s information. If testing is, as Jerry says, gathering information with the intention of informing a decision, it seems a mistake to leave potentially valuable observations lying around on the floor.

We often run into problems when we test. But instead of thinking of them as problems for testing, we could also choose to think of them as symptoms of product or project problems—problems that testing can help to solve.

For example, when a tester feels outnumbered by programmers, or when a tester feels under time pressure, that’s a test result. The feeling often comes from the programmers generating more work and more complexity than the tester can handle without help.

Complexity, like quality, is a relationship between some person and something else. Complexity on its own isn’t necessarily a problem, but the way people react to it might be. When we observe the ways in which people react to perceived complexity and risk, we might learn a lot.

  • Do we, as testers, help people to become conscious of the risks—especially the Black Swans—that typically accompany complexity?
  • If people are conscious of risk, are they paying attention to it? Are they panicking over it? Or are they ignoring it and whistling past the graveyard? Or…
  • Are people reacting calmly and pragmatically? Are they acknowledging and dealing with the complexity of the product?
  • If they can’t make the product or the process that it models less complex, are they at least taking steps to make that product or process easier to understand?
  • Might the programmers be generating or modifying code so quickly that they’re not taking the time to understand what’s really going on with it?
  • If someone feels that more testers are needed, what’s behind that feeling? (I took a stab at an answer to that question a few years back.)

How might we figure that out answers to those questions? One way might be to look at more of the test results and test meta-results.

  • Does someone perceive testing to be difficult or time-consuming? Who?
  • What’s the basis for that perception? What assumptions underlie it?
  • Does the need to investigate and report bugs overwhelm the testers’ capacity to obtain good test coverage? (I wrote about that problem here.)
  • Does testing consistently reveal consistent patterns of failure?
  • Are programmers consistently surprised by such failures and patterns?
  • Do small changes in the code cause problems that are disproportionately large or hard to find?
  • Do the programmers understand the product’s interdependencies clearly? Are those interdependencies necessary, or could they be eliminated?
  • Are programmers taking steps to anticipate or prevent problems related to interfaces and interactions?
  • If automated checks are difficult to develop and maintain, does that say something about the skill of the tester, the quality of the automation interfaces, or the scope of checks? Or about something else?
  • Do unstable builds get in the way of deeper testing?
  • Could we interpret “unstable builds” as a sign that the product has problems so numerous and serious that even shallow testing reveals them?
  • When a “stable” build appears after a long series of unstable builds, how stable is it really?

Perhaps, with the answers to those questions, we could raise even more questions.

  • What risks do those problems present for the success of the product, whether in the short term or the longer term?
  • When testing consistently reveals patterns of failures and attendant risk, what does the product team do with that information?
  • Are the programmers mandated to deliver code? Or are the programmers mandated to deliver code with a warrant that the code does what it should (and doesn’t do what it shouldn’t), to the best of their knowledge? Do the programmers adamantly prefer the latter mandate?
  • Is someone pressuring the programmers to make schedule or scope commitments that they can’t really fulfill?
  • Are the programmers and the testers empowered to push back on scope or schedule pressure when it adds to product or project risk?
  • Do the business people listen to the development team’s concerns? Are they aware of the risks that testers and programmers bring to their attention? When the development team points out risks, do managers and business people deal with them congruently?
  • Is the team working at a sustainable pace? Or is the product and the project being overwhelmed by complexity, interdependencies, fragility, and problems that lurk just beyond the reach of our development and testing effort?
  • Is the development team really Agile, in the sense of the precepts of the Agile Manifesto? Or is “agility” being used in a cargo-cult way, using practices or artifacts to mask over an incoherent project?

Testers often feel that their role is to find, investigate, and report on bugs in a running software product. That’s usually true, but it’s also a pretty limited view of what testers could test. A product can be anything that someone has produced: a program, a requirements document, a diagram, a specification, a flowchart, a prototype, a development process mode, a development process, an idea. Testing can reveal information about all of those things, if we pay attention.

When seen one way, the problems that appear at the top of this article look like serious problems for testing. They may be, but they’re more than that too. When we remember Jerry’s definition of testing as “gathering information with the intention of informing a decision”, then everything that we notice or discover during testing is a test result.

(See also this discussion for an example of looking beyond the test result for possible product and project risks.)

This post was edited in small ways, for clarity, on 2017-03-11.

More of What Testers Find, Part II

Friday, April 1st, 2011

As a followup to “More of What Testers Find“, here are some more ideas inspired by James Bach’s blog post, What Testers Find. Today we’ll talk about risk. James noted that…

Testers also find risks. We notice situations that seem likely to produce bugs. We notice behaviors of the product that look likely to go wrong in important ways, even if we haven’t yet seen that happen. Example: A web form is using a deprecated HTML tag, which works fine in current browsers, but may stop working in future browsers. This suggests that we ought to do a validation scan. Maybe there are more things like that on the site.

A long time ago, James developed The Four-Part Risk Story, which we teach in the Rapid Software Testing class that we co-author. The Four-Part Risk Story is a general pattern for describing and considering risk. It goes like this:

  1. Some victim
  2. will suffer loss or harm
  3. due to a vulnerability in the product
  4. triggered by some threat.

A legitimate risk requires all four elements. A problem is only a problem with respect to some person, so if a person isn’t affected, there’s no problem. Even if there’s a flaw in a product, there’s no problem unless some person becomes a victim, suffering loss or harm. If there’s no trigger to make a particular vulnerability manifest, there’s no problem. If there’s no flaw to be triggered, a trigger is irrelevant. Testers find risk stories, and the victims, harm, vulnerabilities, and threats around which they are built.

In this analysis, though, a meta-risk lurks: failure of imagination, something at which humans appear to be expert. People often have a hard time imagining potentional threats, and discount the possibility or severity of threats they have imagined. People fail to notice vulnerabilities in a product, or having noticed them, fail to recognize their potential to become problems for other people. People often have trouble making the connection between inanimate objects (like nuclear reactor vessels), the commons (like the atmosphere or sea water), or intangible things (like trust) on the one hand, and people who are affected by damage to those things on the other. Excellent testers recognize that a ten-cent problem multiplied by a hundred thousand instances is a ten-thousand dollar problem (see Chapter 10 of Jerry Weinberg’s Quality Software Management, Volume 2: First Order Measurement). Testers find connections and extrapolations for risks.

In order to do all that, we have to construct and narrate and edit and justify coherent risk stories. To to that well, we must (as Jerry Weinberg put it in Computer Programming Fundamentals in 1961) develop a suspicious nature and a lively imagination. We must ask the basic questions about our products and how they will be used: who? what? when? where? why? how? and how much? We must anticipate and forestall future Five Whys by asking Five What Ifs. Testers find questions to ask about risks.

When James introduced me to his risk model, I realized that there people held at least three different but intersecting notions of risk.

  1. A Bad Thing might happen. A programmer might make a coding error. A programming team might design a data structure poorly. A business analyst might mischaracterize some required feature. A tester might fail to investigate some part of the product. These are, essentially, technical risks.
  2. A Bad Thing might have consequences. The coding error could result in miscalculation that misrepresents the amount of money that a business should collect. The poorly designed data structure might lead to someone without authorization getting access to privileged information. The mischaracterized feature might lead to weeks of wasted work until the misunderstanding is detected. The failure to investigate might lead to an important problem being released into production. These are, in essence, business risks that follow from technical risks.
  3. A risk might not be a Bad Thing, but an Uncertain Thing on which the business is willing to take a chance. Businesses are always evaluating and acting on this kind of risk. Businesses never know for sure whether the Good Things about the product are sufficiently compelling for the business to produce it or for people to buy it. Correspondingly, the business might consider Bad Things (or the absence of Good Things) and dismiss them as Not Bad Enough to prevent shipment of the product.

So: Testers find not only risks, but links between technical risk and business risk. Establishing and articulating those links are depend on the related skills of test framing and bug advocacy. Test framing is the set of logical connections that structure and inform a test. Bug advocacy is the skill of determining the meaning and significance of a bug, and reporting the bug in terms of potential risks and consequences that other people might have overlooked. Bug advocacy doesn’t mean jumping up and down and screaming until every bug—or even one particular bug—is fixed. It means providing context for your bug report, helping managers to understand and decide why they might to choose to fix a problem, right now, later, or never.

In my travels around the world and around the Web, I observe that some people in our craft have some fuzzy notions about risk. There are at least three serious problems that I see with that.

Tests are focused on (documented) requirements. That is, test strategies are centred around making sure that requirements are checked, or (in Agile contexts) that acceptance tests derived from user stories pass. The result is that tests are focused on showing that a product can meet some requirement, typically in a controlled circumstance in which certain stated conditions assumed necessary have been met. That’s not a bad thing on its own. Risk, however, lives in places where where necessary conditions haven’t been stated, where stated conditions haven’t been met, or where assumptions have been buried, unfulfilled, or inaccurate. Testing is not only about demonstrating that some instance of a requirement has been satisfied. It’s also about identifying things that threaten the successful fulfillment of that requirement. Testers find alternative ideas about risk.

Tests don’t get framed in terms of important risks. Many organizations and many testers focus on functional correctness. That can often lead to noisy testing—lots of problems reported, where those problems might not be the most important problems. Testers find ways to help prioritize risks.

Important risks aren’t addressed by tests. A focus on stated requirements and functional correctness can leave parafunctional aspects of the product in (at best) peripheral vision. To address that problem, instead of starting with the requirements, start with an idea of a Bad Thing happening. Think of a quality criterion (see this post) and test for its presence or its absences, or for problems that might threaten it. Want to go farther? My colleague Fiona Charles likes to mention “story on the front page of the Wall Street Journal” or “question raised in Parliament” as triggers for risk stories. Testers find ways of developing risk ideas.

James’ post will doubtless trigger more ideas about what testers find. Stay tuned!

P.S. I’ll be at the London Testing Gathering, Wednesday, April 6, 2011 starting at around 6:00pm. It’s at The Shooting Star pub (near Liverpool St. Station), 129 Middlesex St., London, UK. All welcome!

A Letter To The Programmer

Tuesday, September 29th, 2009

This is a letter that I would not show to a programmer in a real-life situation. I’ve often thought of bits of it at a time, and those bits come up in conversation occasionally, but not all at once.

This is based on an observation of the chat window in Skype 4.0.0.226.

Dear Programmer,

I discovered a bug today. I’ll tell you how I found it. It’s pretty easy to reproduce. There’s this input field in our program. I didn’t know what the intended limit was. It was documented somewhere, but that part of the spec got deleted when the CM system went down last week. I could have asked you, but you were downstairs getting another latte.

Plus, it’s really quick and easy to find out empirically; quicker than looking it up, quicker than asking you, even if you were here. There’s this tool called PerlClip that allows me to create strings that look like this

*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*…

As you’ll notice, the string itself tells you about its own length. The number to the left of each asterisk tells you the offset position of that asterisk in the string. (You can use whatever character you like for a delimiter, including letters and numbers, so that you can test fields that filter unwanted characters.)

It takes a handful of keystrokes to generate a string of tremendous length, millions of characters. The tool automatically copies it to the Windows clipboard, whereupon you can paste it into an input field. Right away, you get to see the apparent limit of the field; find an asterisk, and you can figure out in a moment exactly how many characters it accepts. It makes it easy to produce all kinds of strings using Perl syntax, which saves you having to write a line of Perl script to do it and another few lines to get it into the clipboard. In fact, you can give PerlClip to a less-experienced tester that doesn’t know Perl syntax at all (yet), show them a few examples and the online help, and they can get plenty of bang for the buck. They get to learn something about Perl, too. This little tool is like a keychain version of a Swiss Army knife for data generation. It’s dead handy for analyzing input constraints. It allows you to create all kinds of cool patterns, or data that describes itself, and you can store the output wherever you can paste from the clipboard. Oh, and it’s free.

You can get a copy of PerlClip here, by the way. It was written by James Bach and Danny Faught. The idea started with a Perl one-liner by Danny, and they build on each other’s ideas for it. I don’t think it took them very long to write it. Once you’ve had the idea, it’s a pretty trivial program to implement. But still, kind of a cool idea, don’t you think?

So anyway, I created a string a million characters long, and I pasted it into the chat window input field. I saw that the input field apparently accepted 32768 characters before it truncated the rest of the input. So I guess your limit is 32768 characters.

Then I pressed “Send”, and the text appeared in the output field. Well, not all of it. I saw the first 29996 characters, and then two periods, and then nothing else. The rest of the text had vanished.

That’s weird. It doesn’t seem like a big deal, does it? Yet there’s this thing called representativeness bias. It’s a critical thinking error, the phenomenon that causes us to believe that a big problem always looks big from every angle, and that an observation of a problem with little manifestations always has little consequences.

Our biases are influenced by our world views. For example, last week when that tester found that crash in that critical routine, everyone else panicked, but you realized that it was only a one-byte fix and we were back in business within a few minutes. It also goes the other way, though: something that looks trivial or harmless can have dire and shocking consequences, made all the more risky because of the trivial nature of the symptom. If we think symptoms and problems and fixes are all alike in terms of significance, when we see a trivial symptom, no one bothers to investigate the problem. It’s only a little rounding error, and it only happens on one transaction in ten, and it only costs half a cent at most. When that rounding error is multiplied over hundreds of transactions a minute, tens of thousands an hour… well you get the point.

I’m well aware that, as a test, this is a toy. It’s like a security check where you rattle the doorknob. It’s like testing a car by kicking the tires. And the result that I’m seeing is like the doorknob falling off, or the door opening, or a tire suddenly hissing. For a tester, this is a mere bagatelle. It’s a trivial test. Yet when a trivial test reveals something that we can’t explain immediately, it might be good idea to seek an explanation.

A few things occurred to me as possibilities.

  • The first one is that someone, somewhere, is missing some kind of internal check in the code. Maybe it’s you; maybe it’s the guy who wrote the parser downstream, maybe it’s the guy that’s writing the display engine. But it seems to me as though you figured that you could send 32768 bytes, someone else has a limit of 29998 bytes. Or 29996, probably. Well, maybe.
  • Maybe one of you isn’t aware of the published limits of the third-party toolkits you’re using. That wouldn’t be the first time. It wouldn’t necessarily be negligence on your part, either—the docs for those toolkits are terrible, I know.
  • Maybe the published limit is available, but there’s simply a bug in one of those toolkits. In that case, maybe there isn’t a big problem here, but there’s a much bigger problem that the toolkit causes elsewhere in the code.
  • Maybe you’re not using third-party toolkits. Maybe they’re toolkits that we developed here. Mind you, that’s exactly the same as the last problem; if you’re not aware of the limits, or if there’s a bug, who produced the code has no bearing on the behaviour of the code.
  • Maybe you’re not using toolkits at all, for any given function. Mind you, that doesn’t change the nature of the problems above either.
  • Maybe some downstream guy is truncating everything over 29996 bytes, placing those two dots at the end, and ignoring everything else, and and he’s not sending a return value to you to let you know that he’s doing it.
  • Maybe he is sending you a return value, but the wrong one.
  • Maybe he’s sending you a return value, and you’re ignoring it.
  • Maybe he’s sending you a return value, and you are paying attention to it, but there’s some confusion about what it means and how it should be handled.
  • Maybe you’re truncating the last two and a half kilobytes or so of data before you send it on, and we’re not telling the user about it. Maybe that’s your intention. Seems a little rude to me to do that, but to you, it works as designed. To some user, it doesn’t work—as designed.
  • Maybe there’s no one else involved, and it’s just you working on all those bits of the code, but the program has now become sufficiently complex that you’re unable to keep everything in your head. That stands to reason; it is a complicated program, with lots of bits and pieces.
  • Maybe you’re depending on unit tests to tell you if anything is wrong with the individual functions or objects. But maybe nothing is wrong with any particular one of them in isolation; maybe it’s the interaction between them that’s problemmatic.
  • Maybe you don’t have any unit tests at all.
  • Maybe you do have unit tests for this stuff. From right here, I can’t tell. If you do have them, I can’t tell whether your checks are really great and you just missed one this time, or if you missed a few, or if you missed a bunch of them, or whether there’s a ton of them and they’re all really lousy.
  • Any of the above explanations could be in play, many of them simultaneously. No matter what, though, all your unit tests could pass, and you’d never know about the problem until we took out all the mocks and hooked everything up in the real system. Or deployed into the field. (Actually, by now they’re not unit tests; they’re just unit checks, since it’s a while since this part of the code was last looked at and we’ve been seeing green bars for the last few months.)

For any one of the cases above, since it’s so easy to test and check for these things, I would think that if you or anyone else knew about this problem, your sense of professionalism and craftsmanship would tell you to do some testing, write some checks, and fix it. After all, as Uncle Bob Martin said, you guys don’t want us to find any bugs, right?

But it’s not my place to say that. All that stuff is up to you. I don’t tell you how to do your work; I tell you what I observe, in this case entirely from the outside. Plus it’s only one test. I’ll have to do a few more tests to find out if there’s a more general problem. Maybe this is an aberration.

Now, I know you’re fond of saying, “No user would ever do that.” I think what you really mean is no user that you’ve thought of, and that you like, would do that on purpose. But it might be a thought to consider users that you haven’t thought of, however unlikely they and their task might be to you. It could be a good idea to think of users that neither one of us like, such as hackers or identity thieves. It could also be important to think of users that you do like who would do things by accident. People make mistakes all the time. In fact, by accident, I pasted the text of this message into another program, just a second ago.

So far, I’ve only talked about the source of the problem and the trigger for it. I haven’t talked much about possible consequences, or risks. Let’s consider some of those.

  • A customer could lose up to 2770 bytes of data. That actually sounds like a low-risk thing, to me. It seems pretty unlikely that someone would type or paste that much data in any kind of routine way. Still, I did hear from one person that they like to paste stack traces into a chat window. You responded rather dismissively to that. It does sound like a corner case.
  • Maybe you don’t report truncated data as a matter of course, and there are tons of other problems like this in the code, in places that I’m not yet aware of or that are invisible from the black box. Not this problem, but a problem with the same kind of cause could lead to a much more serious problem than this unlikely scenario.
  • Maybe there is a consistent pattern of user interface problems where the internals of the code handle problems but don’t alert the user, even though the user might like to know about them.
  • Maybe there’s a buffer overrun. That worries me more—a lot more—than the stack trace thing above. You remember that this kind of problem used to be dismissed as a “corner case” back when we worked at Microsoft—and then how Microsoft shut down new product development spent two months on investigating these kinds of problems, back in the spring of 2002? Hundreds of worms and viruses and denial of service attacks stem from problems whose outward manifestation looked exactly as trivial as this problem. There are variations on it.
  • Maybe there’s a buffer overrun that would allow other users to view a conversation that my contact and I would like to keep between ourselves.
  • Maybe an appropriately crafted string could allow hackers to get at some of my account information.
  • Maybe an appropriately crafted string could allow hackers to get at everyone‘s account information.
  • Maybe there’s a vulnerability that allows access to system files, as the Blaster worm did.
  • Maybe the product is now unstable, and there’s a crash about to happen that hasn’t yet manifested itself. We never know for sure if a test is finished.
  • Here’s something that I think is more troubling, and perhaps the biggest risk of all. Maybe, by blowing off this report, you’ll discourage testers from reporting a similarly trivial symptom of a much more serious problem. In a meeing a couple of weeks ago, the last time a tester reported something like this, you castigated her in public for the apparently trivial nature of the problem. She was embarrassed and intimidated. These days she doesn’t report anything except symptoms that she thinks you’ll consider sufficiently dramatic. In fact, just yesterday she saw something that she thought to be a pretty serious performance issue, but she’s keeping mum about it. Some time several weeks from now, when we start to do thousands or millions of transactions, you may find yourself wishing that she had felt okay about speaking up today. Or who knows; maybe you’ll just ask her why she didn’t find that bug.

NASA calls this last problem “the normalization of deviance”. In fact, this tiny little inconsistency reminds me of the Challenger problem. Remember that? There were these O-rings that were supposed to keep two chambers of highly-pressurized gases separate from each other. It turns out that on seven of the shuttle flights that preceded the Challenger, these O-rings burned through a bit and some gases leaked (they called this “erosion” and “blow-by”). Various managers managed to convince themselves that it wasn’t a problem, because it only happened on about a third of the flights, and the rings, at most, only burned a third of the way through. Because these “little” problems didn’t result in catastrophe the first seven times, NASA managers used this as evidence for safety. Every successful flight that had the problem was taken as reassurance that NASA could get away with it. In that sense, it was like Nassim Nicholas Taleb’s turkey, who increases his belief in the benevolence of the farmer every day… until some time in the week before Thanksgiving.

Richard Feynman, in his Appendix to the Rogers Commission Report on the Space Shuttle Challenger Accident, nailed the issue:

The phenomenon of accepting for flight, seals that had shown erosion and blow-by in previous flights, is very clear. The Challenger flight is an excellent example. There are several references to flights that had gone before. The acceptance and success of these flights is taken as evidence of safety. But erosion and blow-by are not what the design expected. They are warnings that something is wrong. The equipment is not operating as expected, and therefore there is a danger that it can operate with even wider deviations in this unexpected and not thoroughly understood way. The fact that this danger did not lead to a catastrophe before is no guarantee that it will not the next time, unless it is completely understood. When playing Russian roulette the fact that the first shot got off safely is little comfort for the next.

That’s the problem with any evidence of any bug, at first observation; we only know about a symptom, not the cause, and not the consequences. When the system is in an unpredicted state, it’s in an unpredictable state.

Software is wonderfully deterministic, in that it does exactly what we tell it to do. But, as you know, there’s sometimes a big difference between what we tell it to do and what we meant to tell it to do. When software does what we tell it to do instead of what we meant, we find ourselves off the map that we drew for ourselves. And once we’re off the map, we don’t know where we are.

According to Wikipedia,

Feynman’s investigations also revealed that there had been many serious doubts raised about the O-ring seals by engineers at Morton Thiokol, which made the solid fuel boosters, but communication failures had led to their concerns being ignored by NASA management. He found similar failures in procedure in many other areas at NASA, but singled out its software development for praise due to its rigorous and highly effective quality control procedures – then under threat from NASA management, which wished to reduce testing to save money given that the tests had always been passed.

At NASA, back then, the software people realized that just because their checks were passing, it didn’t mean that they should relax their diligence. They realized that what really reduced risk on the project was appropriate testing, lots of tests, and paying attention to seemingly inconsequential failures.

I know we’re not sending people to the moon here. Even though we don’t know the consequences of this inconsistency, it’s hard to conceive of anyone dying because of it. So let’s make it clear: I’m not saying that the sky is falling, and I’m not making a value judgment as to whether we should fix it. That stuff is for you and the project managers to decide upon. It’s simply my role to observe it, to investigate it, and to report it.

I think it might be important, though, for us to understand why the problem is there in the first place. That’s because I don’t know whether the problem that I’m seeing is a big deal. And the thing is, until you’ve looked at the code, neither do you.

As always, it’s your call. And as usual, I’m happy to assist you in running whatever tests you’d like me to run on your behalf. I’ll also poke around and see if I can find any other surprises.

Your friend,

The Tester

P.S. I did run a second test. This time, I used PerlClip to craft a string of 100000 instances of :). That pair of characters, in normal circumstances, results in a smiley-face emoticon. It seemed as though the input field accepted the characters literally, and then converted them to the graphical smiley face. It took a long, long time for the input field to render this. I thought that my chat window had crashed, but it hadn’t. Eventually it finished processing, and displayed what it had parsed from this odd input. I didn’t see 32768 smileys, nor 29996, nor 16384, nor 14998. I saw exactly two dots. Weird, huh?