Archive for the ‘Uncategorized’ Category

What Exploratory Testing Is Not (Part 1): Touring

Thursday, December 15th, 2011

Touring is one way of structuring exploratory testing, but exploratory testing is not necessarily touring, and touring is not necessarily exploratory.

At one extreme, a tourist might parachute into a territory for which there is no detailed knowledge of the landscape, flora and fauna, or human culture, with the goal of identifying what’s there to be learned. Except in such cases, we wouldn’t call her a tourist; we’d call her an anthropologist, or a field botanist, or a field geologist, or an archaeologist. The activity is in this case is interactive with the territory. At the other extreme, a tourist might visit a travel agent, get on a plane to Orlando, meet a chartered bus at the airport, and sit through the rides at Disney World. The activity there is largely passive.

Touring a program can be done in a more scripted or more exploratory way, just as touring a city can be done in a more scripted or more exploratory way. A tourist has many options. Before going on a trip, a tourist might study what is already known about a particular destination. To prepare, she might supply herself with maps and travel guides, and some ideas about destinations of interest. Upon arrival, she might choose a set of walking tours from a guidebook and follow the routes closely, eating only at the restaurants identified in the guidebook, noting buildings and artifacts and other objects of interest by matching them with the descriptions and illustrations. At a given site, she might listen to a prepared audio guide that directs her observations very specifically. She might spend all of her time in the presence of a tour guide who tells her what to observe and how to interpret it. She might choose to accept everything the tour guide told her as the complete story, and refrain from asking questions. Even though the experience would be new to her, and she might learn something from it, she would not likely add much to what is already known. We call that activity touring, but it isn’t very exploratory, and a report on such a tour would largely recapitulate the guidebook. Is your testing like that?

On the other hand, rather than touring like a tourist, she might cover a territory as a historian, or a social scientist, or a travel writer. In that kind of role, she would have a research goal based on the idea of obtaining new knowledge. Learning something new and imparting it to other people requires a more open agenda than sitting on the bus while someone or something else directs your attention. Our researcher might make her way directly to particular destinations or landmarks and begin her research there, or she might amble through neighbourhoods or historical sites to discover new things about them. She could choose to focus on specific aspects of what’s there to observe, or she could choose to let the observations come to her—and, of course, she might do both. She might work with descriptions that she had been given with the intention of adding to them, or she might work from a set of questions that haven’t been asked before. Depending on her mission, she might choose to look for specific patterns or problems, or she might seek deeper understanding that would help her to identify or refine what kind of patterns or problems to look for. Even though the mission to discover new information might come from someone else, she remains in control of the specifics of the itinerary and of each activity from one moment to the next. Is your testing more like that?

One of the hallmarks of exploratory activity is the extent to which it is guided and structured by the person performing that activity. Another hallmark is the extent to which new knowledge feeds into choice of which action to perform next. Touring is not equivalent to exploration; touring can be done is a scripted way or an exploratory way.

Shapes of Actions

Monday, December 5th, 2011

In the spring of 2010, I was privileged to have a conversation with Simon Schaffer, who pointed me to the work of a sociologist and philosopher of science named Harry Collins. This year, I discovered and read Collins’ new book, Tacit and Explicit Knowledge, and a somewhat older book, The Shape of Actions (co-authored with Martin Kusch). My colleague James Bach and I believe that these books have great significance in terms of the way we understand, practice, learn, and teach the craft of testing. Three ideas in particular stand out: a distinction between two kinds of actions; a distinction between three types of tacit knowledge; and the notion of repair, whereby people fix up interactions with each other and with our media—and particularly with our machines. Today, I’ll talk about shapes of actions.

In The Shape of Actions, Collins and Kusch describe key differences between two kinds of intentional human actions that they call mimeomorphic and polimorphic. In both words, “morph” refers to shape, or form. “Mimeo-” refers to copying. (The grey-haired among us may remember that stencil printing machines used to be called mimeographs.) Mimeomorphic actions are actions that we want to do the same way every time, almost as though we were machines. Collins and Kusch use the example of a golf swing, a kind of action in which we want to eliminate variation and emphasize precision, regularity, and smoothness. “Poli-” is a pun, referring to two similar-sounding Greek roots. The Greek word polys refers to many, much, or several. The Greek polis—a different word entirely— literally means the city, so Collins and Kusch use “poli-” to emphasize the collective and diversified nature of human actions. Polimorphic actions are naturally and appropriately variable, and are rooted in social and human interactions and goals. Conversation is a canonical example of polimorphic action. Filling out a form is an example of mimeomorphic action.

Most human life and human value is centred around polimorphic actions. Still, in many actions, there’s an interplay between the mimeomorphic and the polimorphic. Shifting gears, when performed by a human driver, is something that we almost always want to do smoothly, regularly, mechanically, and (most of the time) below the level of consciousness. Indeed, the majority of North American drivers delegate the mimeomorphic action of gear shifting to mechanisms in the car itself.

Making shifting into a mimeomophic action provides support for the parts of driving that are decidedly not mimeomorphic: merging into traffic, negotiating a left turn, and knowing when to break the letter of traffic laws while maintaining their spirit. Polimorphic actions are handled differently in different places, based on different social paradigms and performed for different purposes. Collins and Kusch note that in some parts of the world (Britain and North America, for example), responsibility for safety is governed by the idea of people following the rules, “violations from the rules of orderly flow being met with expressions of rage”. In Tacit and Explicit Knowledge, Collins point out that in other parts of the world (China, India), responsibility for safety is rooted in the collective, and is governed by the idea of drivers expecting the unexpected. To drivers from the West, drivers from these parts of the world drive in ways that we would consider suicidal or sociopathic. Equally surprisingly to us, people in China and India deal with this style of driving without getting upset or even remarking on it.

All this reminds me of the passage, written by Cem Kaner, in the preface of Testing Computer Software:

Some books say that if our projects are not “properly” controlled, if our written specifications are not always complete and up to date, if our code is not properly organized according to whatever methodology is fashionable, then, well, they should be. These books talk about testing when everyone else plays “by the rules.”

This book is about doing testing when your coworkers don’t, won’t and don’t have to follow the rules.

Consumer software projects are often characterized by a budget that is too small, a staff that is too small, a deadline that is too soon and which can’t be postponed for as long as it should be, and by a shared vision and a shared commitment among the developers.

The quality of a great product lies in the hands of the individuals designing, programming, testing, and documenting it, each of whom counts. Standards, specifications, committees, and change controls will not assure quality, nor do software houses rely on them to play that role. It is the commitment of the individuals to excellence, their mastery of the tools of their crafts, and their ability to work together that makes the product, not the rules.

That is: software development is a polimorphic activity, and if that’s true, testing needs to respond accordingly.

Software development involves mostly polimorphic actions, but some mimeomorphic actions help it along. Compiling a program is so much of a mimeomorphic action that these days we delegate it entirely to machines. Typing is mimeomorphic; we learn to touch-type mimeomorphically so that we can develop programs without the mechanics of typing getting in the way. Programming coaches and programming groups often mandate programmers to develop a specific style of indentation and punctuation to reduce overhead in reading and parsing code, and they mandate exercises or policies to make the regularity automatic. Even though code is designed to be run mimeomorphically, developing it, maintaining it, and interpreting it when things go wrong are all polimorphic actions.

Mimeomorphic activities tend to be easy to observe, so they tend to be easy to identify and to explicate. As a result, conversation, writing, and training in testing has tended to focus on artifacts, on documents, on procedures, and on things that can be automated—the mimeomorphic actions. Those conversations, writings, and training programs almost entirely ignore aspects of testing that are much less visible yet are far more important. This, I believe, is why so many people in our craft talk about writing test cases that are easily described as mimeomorphic actions. Those same people seem to spend little time in discussing how to test, which is composed mostly of polimorphic actions. The challenges of understanding polimorphic actions—combined with the ease of observing and describing mimeomorphic actions—explains why so many people confuse testing with checking. Those challenges explain why people credit Cucumber and its given/when/then formulas much more quickly than they credit the conversations that surround it. Those challenges explain why lowering cost by outsourcing checking work dominates the idea of increasing value by developing local testing skill. And those challenges explain why automation is often seen as some kind of silver bullet for testing problems.

Polimorphic actions are often based on tacit knowledge, different ways of valuing things, and social contexts. Collins notes that polimorphic actions

“can only be executed successfully by a person who understands the social context. Copying the visible behaviour that is the counterpart of an observed action is unlikely to reproduce the action unless it is a mimeomorphic action, because in the case of polimorphic actions, the right behavioural instantiations will change with context. Here (that is, in the book Tacit and Explicit Knowledge –MB) it will be concluded that, for now and the foreseeable future, polimorphic actions—and only polimorphic actions—remain outside the domain of the explicable, whichever of the four possible ways ‘explicable’ is defined. This has significance for the success of different kinds of machines and for the way we teach.”

Watch for a lot more discussion of polimorphic and mimeomorphic actions in the next few blog posts. Watch also for such discussion to work its way into the ways that James and I teach rapid testing.

You’ve Got Issues

Monday, January 31st, 2011

What’s our job as testers? Reporting bugs, right?

When I first started reading about Session-Based Test Management, I was intrigued by the session sheet. At the top, there’s a bunch of metadata about the session—the charter, the coverage areas, who did the testing, when they started, how long it took, and how much time was spent on testing versus interruptions to testing. Then there’s the meat of the session sheet, a more-or-less free-form section of test notes, which include activities, observations, questions, musings, ideas for new coverage, newly recognized risks, and so forth. Following that, there’s a list of bugs. The very last section, at the bottom of the sheet, sets out issues.

“What’s an issue?” I asked James. “That’s all the stuff that’s worth reporrting that isn’t a bug,” he replied. Hmmm. “For example,” he went on, “if you’re not sure that something is a bug, and you don’t want to commit it to the bug-tracking system as a bug, you can report it as an issue. Say you need more information to understand something better; that’s an issue. Or you realize that while you’ve been testing on one operating system, there might be other supported operating systems that you should be testing on. That’s an issue, too.”

That was good enough as far as it went, but I still didn’t quite get the idea in a comprehensive way. The information in the “test notes” section of session sheet is worth reporting too. What distinguishes issues from all that other stuff, such that “Issues” has its own section on parallel with “Bugs”?

Parallelism saves the day. In the Rapid Software Testing class, we teach that a bug is anything that threatens the value of the product. (Less formally, we also say that a bug is something that bugs somebody… who matters.) At one point, a defintion came to me: if a bug is anything that threatens the value of the product, an issue is anything that threatens the value of our testing. In our usual way, I transpected on this with James, and we now say that an issue is anything that threatens the value of the project, and in particular the test effort. Less formally and more focused on testing, an issue is anything that slows testing down or makes it harder. If testing is about making invisible problems visible, then an issue is anything that gives problems more time or more opportunities to hide.

When believe that we we see a bug, it’s because there’s an oracle at work. Oracles—those principles or mechanisms by which we recognize a problem—are heuristic. A heuristic helps us to solve a problem or make a decision quickly and inexpensively, but heuristics aren’t guaranteed to work. As such, oracles are fallible. Sometimes it’s pretty clear to us that we’re seeing a bug in a product, the program seems to crash in the middle of doing something. Yet even that could be wrong; maybe something else running on the same machine crashed, and took our program down with it. A little investigation shows that the product crashes in the same place twice more. At that point, we should have no compunction reporting what we’ve seen as a bug in the product.

An issue may be clear, or it may be something more general and less specific. A few examples of issues:

  • As you’re testing, you see behaviour in the new version of the product that’s inconsistent with the old version. The Consistency with History oracle tells you that you might be seeing a problem here, yet one could make the case that either behaviour is reasonable. The specification that you’re working from is silent or ambiguous on the subject. So maybe you’ve got a bug, but for sure you have an issue.
  • While reviewing the architecture for the system, you realize that there’s a load balancer in the production environment, but not in the test environment. You’ve never heard anyone talk about that, and you’re not aware of any plans to set one up. It’s time to identify that as an issue.
  • You sit with a programmer for a few minutes while she sketches out the structure of a particular module to help identify test ideas. You copy the diagram, and take notes. At the end of the meeting, you ask her to look the diagram over, and she agrees that that’s exactly what she meant. You reflect on it for a while, and add some more test ideas. You take the diagram to another programmer, one who works for her, and he points at part of the diagram and says, “Wait a second—that’s not a persistent link; that’s stateless.” You’ve found disagreement between two people making a claim. Since the code hasn’t been built for that feature yet, you can’t log it as a bug in the product, but you can identify it as an issue.
  • As you’re testing the application, a message dialog appears. There’s no text in the dialog; just a red X. You dismiss the dialog, and everything seems fine. It seems not to happen again that day. The next day, it happens once more, in a different place. Try as you might, you can’t replicate it. You can’t report it as a bug, but you can record it as an issue.
  • A steady pattern of broken builds means that you wait from 10:00am until the problem is fixed—typically at least an hour, and often three or four hours. Before you’re asked, “Why is testing taking so long?” or “Why didn’t you find that bug?” report an issue.
  • You’ve been testing a new feature, and there are lots of bugs. 80% of your session time is being spent on investigating the bugs and logging them. This has a big impact on your test coverage; you only got through a small subset of the test ideas that were suggested by the session’s charter. The bugs that you’ve logged are important and you can expect to be thanked for that, but you’re concerned that managers might not recognize the impact they’ve had on test coverage. Raise an issue.
  • You’re a tester in an outsourced test lab in India. Your manager, under a good deal of pressure himself, instructs you to run through the list of 200 test cases that has been provided to him by the clueless North American telecom company, and to get everything done within three days. With practically every test you perform, you see risk. All the tests pass, if you follow them to the letter, but the least little experimentation shows that the application shows frightening instability if you deviate from the test steps. Still your boss insists that your mission is to finish the test cases. He’s made it clear that, for the next three days, he doesn’t want to hear anything from you except the number of tests that you’ve run per day. Do your best to finish them on schedule, but sneak a moment here and there to identify risks (consider a Moleskine notebook or an ASCII text file). When you’re done, hand him your list of bugs—and in email, send him your list of issues.
  • You’re a tester in a small development shop that provides customizable software for big banks. You have concerns about security, and you quickly read up on the subject. What you read is enough to convince you that you’re not going to get up to speed soon enough to test effectively for security problems. That’s an issue.
  • As a new tester in a company, you’ve noticed that the team is organized such that small groups of people tend to work in their own little silos. You can point to a list of a dozen high-severity bugs that appear to have been the result of this lack of communication. You can see the cost of these twelve bugs and the risk that there are more lurking. You recognize that you’re not responsible for managing the project, yet it might be a good idea to raise the issue to those who do.

Those are just a few examples. I’m sure you can come up with many, many more without breaking a sweat.

Teams might handle issues in different ways. You might like to collect an issues list, and put on a Big Visible Chart somewhere. Someone might become responsible for collecting and managing the issues submitted on index cards. Some issues might end up as a separate category in the bug tracking system (but watch out for that; out of sight, out of mind). Still others might get put onto the project’s risk list.

Some issues might get handled by management action. Some issues might get addressed by a straightforward conversation just after tomorrow morning’s daily standup. Someone might take personal responsibility for sorting out the issue; other issues might require input and effort from several people. And, alas, some issues might linger and fester.

When issues linger, it’s important not to let them linger without them being noticed. After all, an issue may have a terrible bug hiding behind it, or it may slow you down just enough to prevent you from finding a problem as soon as you can. Issues don’t merely present risk; they have a nasty habit of amplifying risks that are already there.

So, as testers, it’s our responsibilty to report bugs. Even more importantly, it’s our responsibility to raise awareness of risk, by reporting those things that delay or interfere with our capacity to find bugs as quickly as possible: issues.

Jerry Weinberg Interview (from 2008)

Wednesday, January 19th, 2011

In the spring of 2008, I was privileged to chat with Jerry Weinberg on why he was favouring CAST with his only conference appearance of that year, other than the Amplifying Your Effectiveness conference, of which he’s a co-founder and host. CAST that year saw the launch of Jerry’s book Perfect Software and Other Illusions About Testing. It’s now available as an e-book, too.

Jerry will not, so far as I know, be at CAST 2011. Nonetheless, his advice about going to conferences where smart people hang out remains sound.

Michael: You’ve been involved with computers for 50 years, and with giving people advice for almost that long. What do you suggest my first question should be, and how would you answer it?

Jerry: Ask me why I chose this conference as my one of the year. And other things.

Michael: Sounds good. So: why did you choose this conference as your one of the year?

Jerry: Errors have been the principal issue in computing right from the beginning, as John von Neumann pointed out even before I got into the field (and that’s really a long time ago). I wrote about testing as the opening topic in my first book, “Computer Programming Fundamentals” way back in 1960—and way back then, I already took flack from some reviewers who didn’t think errors was a suitable topic for politically correct people. You’d think I had written about human excrement.

And you’d also think that as our field matured, we would have outgrown that prudish attitude about error—but we haven’t. Back then, we had no professional testers. Testing was every developer’s job (though they weren’t called “developers” back then, or even “programmers”). We fought hard to have testing recognized as a profession of its own, and though we have people called “testers” today, we still have the prudes. In many organizations, testers are, sadly, considered lower-class citizens.

Testing holds a special place in my vision of the future of the computing profession as a whole. Why? Because testing is the first place where we generally get an independent and realistic view of what we are doing right and what we are doing wrong when we build new systems. We do get this view from Support (another area that’s considered low-class), but by the time information arrives from Support, the people who put the errors in a product are often long gone and immune to learning from their mistakes.

Quite simply, if we don’t learn to learn from our mistakes, we won’t improve as a profession. And if we don’t improve, we limit whatever good this amazing new (still) technology offers to humanity.

That’s why I’ve made the status of testing and testers my first priority for some years, and why I’m debuting my book on testing fallacies and myths (Perfect Software, and Other Illusions About Testing) at CAST, the one conference that I feel is a creation of testers, by testers, and for testers.

Michael: Recently you launched a new Web site, and your banner is “Helping smart people to be happy.” Why did you choose that?

Jerry: Most of the people in the computing professions are pretty smart, at least as measured by tests and the kind of technical work they accomplish. But so many of them haven’t learned how to use their smarts on themselves. They can create wonderful systems, but when they use their brains to think about themselves, they often think themselves into depression.

I was like that, for a long time, until I began to figure out what I was doing to myself. I set myself the task of learning how to be happy, and as I began to succeed, I realized that one of the things that makes me happy is working with other happy people. So, selfishly, I decided I would devote myself to helping my colleagues and students learn to share my happiness. Like most things I do, it’s completely selfish—but has side effects that others may enjoy.

Michael: Why not “Helping happy people be smart?”

Jerry: If you’re happy, you don’t need to be smart. Smart isn’t the only road to happiness. It’s not that I mind helping people be smart, or smarter, but it’s just not my primary goal. Nevertheless, I guess there are thousands of people out there who would say I’ve helped them grow smarter in some way. I think that’s true of you, Michael, at least from what you tell me. I hope I’ve helped you be happier, too.

Michael: Happier for sure, and smarter I hope. I’ve learned about both from conversations that I’ve had with you and other smart people. I remember once that Joshua Kerievsky asked you about why and how you tested in the old days—and I remember you telling Josh that you were compelled to test because the equipment was so unreliable. Computers don’t break down as they used to, so what’s the motivation for unit testing and test-first programming today?

Jerry: We didn’t call those things by those names back then, but if you look at my first book (Computer Programming Fundamentals, Leeds & Weinberg, first edition 1961 —MB) and many others since, you’ll see that was always the way we thought was the only logical way to do things. I learned it from Bernie Dimsdale, who learned it from von Neumann.

When I started in computing, I had nobody to teach me programming, so I read the manuals and taught myself. I thought I was pretty good, then I ran into Bernie (in 1957), who showed me how the really smart people did things. My ego was a bit shocked at first, but then I figured out that if von Neumann did things this way, I should.

John von Neumann was a lot smarter than I’ll ever be, or than most people will ever be, but all that means is that we should learn from him. And that’s why I go to a select number of conferences, like CAST and AYE, because there are lots of smart people there to learn from. I recommend my tactic to any smart person who wants to be happy.

Jerry’s Web Site is at http://www.geraldmweinberg.com. Want to help to make at least one smart person happy? I’d recommend buying—and reading—one of his fiction books, and letting him know that you did.

Exploratory Testing or Scripted Testing: Which Comes First?

Tuesday, January 4th, 2011

The PDF file linked here is a transcript of a conversation over Skype, New Year’s Eve (December 31), 2010.

The conversation was prompted by a Twitter exchange on exploratory testing (ET) started by Andy Glover, who observed that “When developing scripts you need to explore. But this tends to be exploring with out the s/w so I would say it’s not ET.: I disagree; developing scripts is test design, and test design is certainly part of testing.  Since the process of developing test scripts is an exploratory (unscripted) process, I would contend that script development is both exploratory and testing, and therefore exploratory testing. To get around Twitter’s limitations, I proposed an impromptu online chat. Anna Baik, Ajay Balamurugudas, Tony Bruce, Anne-Marie Charrett, Albert Gareev, Mohinder Kholsa, Michel Kraaij, and Erkan Yilmaz joined the conversation. Alas, Andy had other commitments and couldn’t be with us.

Enjoy!

EuroSTAR Trip Report, Part 3

Sunday, December 12th, 2010

In the last posting, I remarked on some of the people with whom I chatted with at EuroSTAR, and whom I’m seeing as emerging leaders in a community of skilled testers. Here are a few more.

Lynn McKee (Twitter: @lynn_mckee on Twitter) gave an inspiring and very well-attended talk on how to instill passion in testers—and in how to respect and defend the passion that’s there. Lynn walks her talk; her own passion is contagious. She’s on the board of the Association for Software Testing, she’s one of the organizers of POST (a peer conference in Calgary), and she’s one of the organizers of the North American branch of Weekend Testing.  With her colleague, Nancy Kelln (@nkelln on Twitter, also one to watch), Lynn is organizing a session of Rapid Software Testing in Calgary, Alberta that I’ll be presenting in February 2011.

Zeger Van Hese (@TestSideStory on Twitter) was another of the Vanguard’s roving reporters, tweeting up a storm wherever he went with wit and skepticism.  Note that skepticism, as James Bach puts it, is not the rejection of belief, but the rejection of certainty. Zeger also has a terrific blog that I can heartily recommend. The post “Exploring Rapid Reporter” is an exemplary account of what goes through a tester’s mind in the midst of exploration. In his most recent posting, as of this writing, he’s saved me considerable time and effort by providing an excellent report of the Danish Alliance meetup. He links to Shmuel Gershon’s videos of the lightning talks, too; check them out.

It’s always good to have a local agent, and Jesper Ottosen (@jlottosen on Twitter) was Our Man in Copenhagen (along with the aforementioned Carsten Feilberg). All of us who attended the Danish Alliance meeting owes him a vote of thanks for energetically helping to promote and organize it (he said that even that was a learning experience). He also gave a lightning talk that reminded us to look for perfects as well as defects to contextualize our problem reports and to give people recognition for their good work. For so many people, compliments—supported by a visible token—matter! Jesper organized a post-Gala-dinner pub crawl and thereby helped to enable the ensuing extended conversations. Other than his sharp observations on Twitter, I’ve not been familiar with Jesper’s work, which is a problem that I’m looking forward to rectifying. One minor complication: I might have to learn Danish…

Andy Glover, the Cartoon Tester (@CartoonTester on Twitter), was one of the people that I met for the first time after admiring his work from afar. There are many ways to tell the story of testing, and cartooning can be a great way to do it; see Andy’s blog, and Rob Sabourin‘s I Am a Bug (in book and web-based versions) for wonderful examples. Andy led an interesting challenge on Tuesday night, in which he encouraged people to draw their impressions of one of James Bach‘s descriptions of testing—”the infinite art of comparing the invisible to the ambiguous to prevent the unthinkable from happening to the anonymous“. Go ahead; draw that! As a bonus, Andy sold some of his cartoons at the conference and raised a significant sum for charity.

Joris Meerts (@testingref) is someone I met only briefly. I wish we’d had more time to talk. He’s attempting to create a comprehensive historical timeline of the testing craft. I was skeptical when I first looked at Joris’ timeline; it didn’t seem to be missing a number of touchstones. One reason lies in the fact our craft doesn’t have a very good sense of history, and to my knowledge, no one has really attempted to capture it as Joris has. By the same token, it’s also a tricky problem to filter information, because so many important ideas about testing come from other disciplines. Since the conference, Joris and I have begun an email chat on the subject, and it’s clear to me that Joris is going about this very thoughtfully. I intend to do what I can to help him out, and I hope you will too.

Nathalie Rooseboom DeVries (Twitter: @FunTESTic) was another member of the conference committee. One of her roles at the conference, so it seemed, was to question and challenge the Vanguard. To some, that might look like defense of the Traditional way. To me, it looked more like a challenge to the Vanguard to test its own beliefs and practices—which is a very Vanguard thing to to. By posing challenging questions to what people (including me) were saying and tweeting, Nathalie evinced exemplary behaviour for our community. Good for her.

Petter Mattsson wasn’t presenting at EuroSTAR this year, although he has done so in the past. But he was present, and it was lovely to talk to him again. Petter is one of the senior members of the Vanguard, having delivered an experience report on exploratory testing at EuroSTAR several years before it was fashionable to do so. With his colleague Herman Afzelius, he has introduced structured, disciplined approaches to exploratory testing into two companies (and counting), and he’s been successful, despite some occasional middle-management pushback. When he’s managing and training testers, he focuses on minds before processes and tools. That’s before, not instead of: at one of the companies, he commissioned a reporting tool very much like Shmuel Gershon’s Rapid Reporter. A couple of years back, he showed me a wonderful little trick: instead of using hyphens or dots as the bullet points in your session notes, start the paragraph with a smiley emoticon for good news, and a frowny emoticon for bad news. Readers can scan the bullets to get a feel of what the tester is reporting. Instant, easy visualization! A blink test for reporting!

Kristoffer Nordström (@kristoffer_nord on Twitter) joined Petter and me in a couple of conversations. Kristoffer was one of the team leads working with Petter at UIQ Technologies when I visited there in 2008. In 2009, Petter, Herman, Kristoffer and I had a memorable Mongolian meal and a grand chat just across from Stockholm Central station, in which we described exploratory approaches, Vanguard-style values, and management resistance. Alas, we missed a chance for dinner this year, but Petter, Kristoffer, and I did compare notes on our Weltschmerz with respect to Traditionalist approaches and Traditionalist presentations in the EuroSTAR program, of which I suspect that they attended more than I did. I don’t think any of Petter, Kristoffer, or Herman have a blog. I wish they did, and if they do, I wish they’d tell me about it. Meanwhile, Kristoffer has started micro-blogging at least.

It was a pleasure to meet, finally Ola Hyltén (@ola_hylten). He attended my Tuesday morning tutorial on Test Framing, and contributed a number of valuable insights there.  (Update to this post:  I’ve just discovered, to my delight, that he’s got a blog here.  And in the most recent post, he’s writing on a topic that is near and dear to me:  parallels between testing and music.)

John Stevenson (@steveo1967) was also a keen contributor at the tutorial. John is a dedicated student of exploratory testing and systems thinking, and he writes a blog in he which stretches thinking about testing outside of the craft, which I argue is essential to advancing it. As examples of his wide-ranging perspective, look at his two posts inspired by EuroSTAR: The Human Element and Sorting the Chaff from the Wheat.

Rob Lambert (@Rob_Lambert) is one of the central figures in the Software Testing Club, an online community that provides some of the more articulate discussions on testing these days. That effort has spilled over into The Testing Planet, a periodical testing newspaper of the physical kind (remember newspapers?  News, printed on paper?). Rob is a very conscientious fellow, sharp at spotting flaws but also ready to see the good and the salvageable in the things that he observes. I admire that.

Anko Tijman (@agiletesternl) is a passionate advocate for agile approaches, and maintains a strong focus on the first phrase in the Agile Manifesto: individuals and interactions. He also frequently advocates something that I don’t think is always prominent in the Agile community: an emphasis on diversity in testing. He’s written a book that is as yet, only available in Nederlandese (Dutch), and he has a blog that he diligently updates, well, not very often at all. So the key, apparently, is to find him at a conference, see one of his presentations and chat with him, or to follow him on Twitter.

There are still at least two more EuroSTAR missives to go. More later!

Test Ideas for Documentation

Friday, June 4th, 2010

Most people who bother with the matter at all would admit that the English language is in a bad way, but it is generally assumed that we cannot by conscious action do anything about it. Our civilization is decadent and our language — so the argument runs — must inevitably share in the general collapse. It follows that any struggle against the abuse of language is a sentimental archaism, like preferring candles to electric light or hansom cabs to aeroplanes. Underneath this lies the half-conscious belief that language is a natural growth and not an instrument which we shape for our own purposes.

Now, it is clear that the decline of a language must ultimately have political and economic causes: it is not due simply to the bad influence of this or that individual writer. But an effect can become a cause, reinforcing the original cause and producing the same effect in an intensified form, and so on indefinitely. A man may take to drink because he feels himself to be a failure, and then fail all the more completely because he drinks. It is rather the same thing that is happening to the English language. It becomes ugly and inaccurate because our thoughts are foolish, but the slovenliness of our language makes it easier for us to have foolish thoughts. The point is that the process is reversible. Modern English, especially written English, is full of bad habits which spread by imitation and which can be avoided if one is willing to take the necessary trouble. If one gets rid of these habits one can think more clearly, and to think clearly is a necessary first step toward political regeneration: so that the fight against bad English is not frivolous and is not the exclusive concern of professional writers.

George Orwell, Politics and the English Language

On those increasingly rare occasions when I’m at home, I’m in the city of Toronto, the province of Ontario, Canada.  At the beginning of July, Ontario will replace its provincial sales tax with the Harmonized Sales Tax (HST). The HST combines Canada’s federal Goods and Services Tax with something that looks a lot like the current provincial tax, but which, in essence, gets applied to more goods and services.  One positive aspect of the HST is that Ontario businesses won’t have to collect and submit two different taxes to two different governments.  On the negative side, we have to learn some new rules.  Here’s one: the amount of tax charged depends in part on where the service is delivered.  There are four rules associated with this, as documented in the Government of Canada’s GST Technical Bulletin “Place of supply rules for determining whether a supply is made in a province” I’ll show you just one of those rules, and its explanation, here:

Rule 1

Subject to the proposed place of supply rules for services that are explained in Parts II to XII of this section and in Sections 6 to 9, a supply of a service is made in a province if, in the ordinary course of business of the supplier, the supplier
(a) obtains only one address that is a home or a business address in Canada of the recipient, the home or business address in Canada of the recipient in the province,
(b) obtains more than one address described in paragraph (a), the address in the province described in that paragraph that is most closely connected with the supply, or
(c) in any other case, obtains an address in Canada of the recipient that is most closely connected with the supply.

That sounds a little confusing.  Let’s see what it means:

Rule 1 is based on the location of the recipient. The rule generally accomplishes this by determining the place of supply based on a particular address in Canada of the recipient that is obtained by the supplier in the ordinary course of business.

Okay…

The rule does not require a supplier to obtain an address of the recipient that the supplier does not already obtain in the ordinary course of its business. The determination of the relevant address in respect of a supply under Rule 1 is based on the facts taking into account the ordinary business practice of each supplier with respect to each supply.

So…

It should be noted that an address of the recipient obtained by a supplier will only be relevant for purposes of this rule if it is obtained in the ordinary course of the supplier’s business practices in connection with the supply. On the other hand, any address of the recipient obtained in the ordinary course of business of the supplier should be taken into consideration in applying the rule. The relevant address of the recipient also does not have to be an address obtained in respect of every supply made to the recipient for it to be considered a relevant address obtained in the ordinary course of business. An address of the recipient obtained by a supplier in the ordinary course of business could therefore include: an address of the recipient from which the supplier is hired in connection with a supply pursuant to an agreement for the supply (the “contracting address”); an address of the recipient that the supplier deals with in connection with a supply; or a billing address of the recipient in connection with a supply.

Oh, thank you.  That clears things up considerably. (And if you, dear reader, didn’t wade all the way through that mud, I promise I won’t be upset.)

Here’s a direct quote from the Rapid Software Testing course:  “When people say ‘that should be documented’, what they really mean is ‘that should be documented if and how and when it serves our purposes.’”  And we add, “Who will read it? Will they understand it? Is there a better way to communicate that information? What does documentation cost you?”

When people—managers, consultants (yes, like me), bureaucrats, process enthusiasts, methodologists, staff employees, executives, pundits, TV preachers—tell us do something, it behooves us to think of the value of (not) doing something, cost of (not) doing something, the risk of (not) doing something, and the quality of the work. Yet none of those things—especially quality, value to some person—are the same for different people. To me, the most important thing to start with is to ask the question, “Who are my clients? Who are my stakeholders?” Let’s call those people your client community.

For any piece of writing, I’d like to suggest that there’s an important stakeholder who is often overlooked: you. Your writing is a medium, as Marshall McLuhan would say. That is, your writing is an extension of you and your human capabilities, in particular speech and presence. Writing extends your physical being and what you might say were you immediately, physically present. As such, your written work a stand-in for you. It represents you, literally re-presents you at a different time, in a different place, and in a different form.

Another significant person in your client community is your direct client, the person you’re working for, the one who has commissioned the work. (That might be your employer, or it might be yourself.) Your writing is what you might say to your client, or on your client’s behalf, if you were actually in the room with your reader.

As most authorities on writiing will agree, your reader is extremely important and the ongoing focus of your work. Yet media present risk: as much as a medium extends and accelerates and intensifies some human capabilities and the roles of some senses (McLuhan might have said “heats them up”), it also diminishes the roles of others. While writing extends speech and intensifies the role of vision, writing diminishes conversation and the interaction between the speaker and the listener. Your listener can’t ask questions, and as Northrop Frye remarked, a book always says the same thing. Vision becomes hot, and sound becomes cool, requiring the reader to involve himself by constructing the sound of your voice. That calls for caution: if your writing is really going to represent you, it should sound in your reader’s head as you would want it to sound. Trouble is, you can’t ever know that. But you can test your writing, and if you discover problems, you might reveal information that would inform a decision to fix it.

So one really good way to test a piece of writing is to ask questions about it.

  • How would this sound if I were to utter it out loud?
  • Does it roll off the tongue naturally?
  • Would it sound like me as I’d like to be heard?
  • Would it sound like the sort of thing I would say?
  • Are these the words that I would choose?
  • Would I sound sloppy, or inflated, or confusing if I were to speak this sentence in a conversation?
  • Is there an obvious way to re-interpret or misinterpret this sentence?
  • What are the risks associated with misunderstanding?
  • What could I do to make my meaning more plain?

And, most importantly,

  • What problem am I trying to help my reader to solve?
  • Have I helped my reader to solve that problem?

Those questions, if we’re going to answer them well, require us to think carefully about our readers. We’re often writing for people that we haven’t met in person. We’re writing for people who have never heard us. Who might such people be? How do we imagine them? Might some readers be unfamiliar with the topic? Might some be unfamiliar with English? Might some be easily confused, or easily bored? What might we do to help them understand us? Should we provide examples? Stories? Make the message longer? Shorter?

Here’s another interesting question: what does the finished piece of writing—and the effort that it took to produce it—tell us about the subject of our writing? Might that information tell us something important about the subject? Here’s a hint: the Technical Bulletin, designed to help us interpret where a service is being performed, runs to 53 pages.

I suspect that no one—neither the writer of the passage I quoted above, nor his or her clients—asked any the questions that I’ve suggested here. Or if those questions were asked, no one cared enough about the information to do anything about it.  Still, as an optimist, I believe that the process is reversible.  If you’re feeling down, one antidote might be to read Orwell’s brilliant essay, and follow its advice.

And at last, I leave you, dear reader, with a question:  How is all this like testing and software development?

Black Box Software Testing Course in Toronto, June 23-25 2010

Thursday, May 6th, 2010

In 1996, I was working as a program manager for Quarterdeck, which at the time produced some of the best-selling utility software on the market. I took a three-day in-house training class that quite literally changed the course of my life. That class was the Black Box Software Testing course, by Cem Kaner.

Unlike anyone else that I was aware of at the time, Cem was writing and talking about a different kind of testing from what we were used to. Most of the books and testing models that I was aware of talked about things like timely, complete, and unambiguous requirements; they talked about process models; they talked about how, if you didn’t get what the books said you needed, you should refuse to test. They talked about testers as the gatekeepers of quality. Other books talked about test techniques in an abstract and largely mathematical way. All focused on some notion of functional correctness. Very few, if any, focused on value to the customer, and the idea that software testing was a very human part of software development, itself a very human thing.

Cem’s book, Testing Computer Software (written with Hung Nguyen and Jack Falk), was different. It was a book for testers who were working in environments where no one else followed “the rules”, the so-called best practices that were neither best nor practiced in real life. The BBST course took the same tack. Cem didn’t preach that we were quality gatekeepers; in fact, he demolished that myth. Instead, he offered an approach that was much more skills-oriented than proces-focussed, pragmatic rather than Platonic, and investigation-focused rather than confirmation-focused. In 2002, Cem released a new book (with James Bach and Brett Pettichord) called Lessons Learned in Software Testing. That book was strongly interconnected with the BBST course material (which, by then, credited James Bach with co-authorship). In that era, Cem began to release videos of the course lectures online, along with presentation slides, course notes, self-quizzes, extra material, reading lists, and references. Portions of the online BBST course are now being offered in an instructor-led form by the Association for Software Testing for its members, with more and more classes being added each year.

Now, after 15 years of continuous development on the Black Box Software Testing course, Cem is coming to Toronto to deliver a very rare live, public, version of the class, June 23 through June 25, 2010. He says, “The Black Box Software Testing course takes an explorer’s view of the core issues in software testing. We look at the primary test techniques (tests based on scenarios, risks, specifications, or attributes of the data under test), at the challenges of identifying and credibly reporting failures, and at the management challenges of adapting your practices to the project’s context (for example, regulatory or market requirements).

“Supplementing the course is a rich collection of multimedia instructional materials, available free, online. This gives us the freedom to tailor the course to the preferences of the students, leaving some topics to the videos, buying time for more activities, discussions, and exercises in class.”

The class is being sponsored by TASSQ, the Toronto Association of Systems and Software Quality. I’ll be there in a support role.

You can sign up for the class via the form at http://www.tassq.org/pdf/registration_form_black_box.pdf. If you mention the promotional code BLG, you’ll be able to register for the Early Bird Rate of $1400 through May 21.

Many thanks to the eagle-eyed testers who pointed out that the title of Lessons Learned in Software Testing was not, in fact Testing Computer Software as this post once erroneously claimed.

The Testers’ Christmas Present

Monday, October 19th, 2009

So the holidays are coming up, and you’re wondering what to get for your tester friends, or (if you’re a tester) for your kids.

Let me be the first this season to recommend I Am A Bug, a perfectly charming little book by Robert Sabourin, and illustrated by his daughter Catherine, who was between 11 and 12 years old as the book was being published. It’s been around for several years.

The secret is that I Am A Bug is a serious testing book, cleverly disguised as a children’s book. It’s one of the wisest books on testing I’ve ever read. Each page begins with a big message, illustrated and elaborated below. Here’s a littl sample:

A bee sting may hurt a bit

The same bug can be found in different computer programs. In one program the bug may not cause much damage…

But it can kill you if you’re allergic to bees.

…but in another program it could be fatal.


The whole book is online, at Rob’s Web site, http://www.amibug.com. (The book is found under “Presentations”.) It’s fun to read there, but order the dead-tree version for a very reasonable price. You can get it from the usual online booksellers. Highly recommended.

A Letter To The Programmer

Tuesday, September 29th, 2009

This is a letter that I would not show to a programmer in a real-life situation. I’ve often thought of bits of it at a time, and those bits come up in conversation occasionally, but not all at once.

This is based on an observation of the chat window in Skype 4.0.0.226.

Dear Programmer,

I discovered a bug today. I’ll tell you how I found it. It’s pretty easy to reproduce. There’s this input field in our program. I didn’t know what the intended limit was. It was documented somewhere, but that part of the spec got deleted when the CM system went down last week. I could have asked you, but you were downstairs getting another latte.

Plus, it’s really quick and easy to find out empirically; quicker than looking it up, quicker than asking you, even if you were here. There’s this tool called PerlClip that allows me to create strings that look like this

*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*…

As you’ll notice, the string itself tells you about its own length. The number to the left of each asterisk tells you the offset position of that asterisk in the string. (You can use whatever character you like for a delimiter, including letters and numbers, so that you can test fields that filter unwanted characters.)

It takes a handful of keystrokes to generate a string of tremendous length, millions of characters. The tool automatically copies it to the Windows clipboard, whereupon you can paste it into an input field. Right away, you get to see the apparent limit of the field; find an asterisk, and you can figure out in a moment exactly how many characters it accepts. It makes it easy to produce all kinds of strings using Perl syntax, which saves you having to write a line of Perl script to do it and another few lines to get it into the clipboard. In fact, you can give PerlClip to a less-experienced tester that doesn’t know Perl syntax at all (yet), show them a few examples and the online help, and they can get plenty of bang for the buck. They get to learn something about Perl, too. This little tool is like a keychain version of a Swiss Army knife for data generation. It’s dead handy for analyzing input constraints. It allows you to create all kinds of cool patterns, or data that describes itself, and you can store the output wherever you can paste from the clipboard. Oh, and it’s free.

You can get a copy of PerlClip here, by the way. It was written by James Bach and Danny Faught. The idea started with a Perl one-liner by Danny, and they build on each other’s ideas for it. I don’t think it took them very long to write it. Once you’ve had the idea, it’s a pretty trivial program to implement. But still, kind of a cool idea, don’t you think?

So anyway, I created a string a million characters long, and I pasted it into the chat window input field. I saw that the input field apparently accepted 32768 characters before it truncated the rest of the input. So I guess your limit is 32768 characters.

Then I pressed “Send”, and the text appeared in the output field. Well, not all of it. I saw the first 29996 characters, and then two periods, and then nothing else. The rest of the text had vanished.

That’s weird. It doesn’t seem like a big deal, does it? Yet there’s this thing called representativeness bias. It’s a critical thinking error, the phenomenon that causes us to believe that a big problem always looks big from every angle, and that an observation of a problem with little manifestations always has little consequences.

Our biases are influenced by our world views. For example, last week when that tester found that crash in that critical routine, everyone else panicked, but you realized that it was only a one-byte fix and we were back in business within a few minutes. It also goes the other way, though: something that looks trivial or harmless can have dire and shocking consequences, made all the more risky because of the trivial nature of the symptom. If we think symptoms and problems and fixes are all alike in terms of significance, when we see a trivial symptom, no one bothers to investigate the problem. It’s only a little rounding error, and it only happens on one transaction in ten, and it only costs half a cent at most. When that rounding error is multiplied over hundreds of transactions a minute, tens of thousands an hour… well you get the point.

I’m well aware that, as a test, this is a toy. It’s like a security check where you rattle the doorknob. It’s like testing a car by kicking the tires. And the result that I’m seeing is like the doorknob falling off, or the door opening, or a tire suddenly hissing. For a tester, this is a mere bagatelle. It’s a trivial test. Yet when a trivial test reveals something that we can’t explain immediately, it might be good idea to seek an explanation.

A few things occurred to me as possibilities.

  • The first one is that someone, somewhere, is missing some kind of internal check in the code. Maybe it’s you; maybe it’s the guy who wrote the parser downstream, maybe it’s the guy that’s writing the display engine. But it seems to me as though you figured that you could send 32768 bytes, someone else has a limit of 29998 bytes. Or 29996, probably. Well, maybe.
  • Maybe one of you isn’t aware of the published limits of the third-party toolkits you’re using. That wouldn’t be the first time. It wouldn’t necessarily be negligence on your part, either—the docs for those toolkits are terrible, I know.
  • Maybe the published limit is available, but there’s simply a bug in one of those toolkits. In that case, maybe there isn’t a big problem here, but there’s a much bigger problem that the toolkit causes elsewhere in the code.
  • Maybe you’re not using third-party toolkits. Maybe they’re toolkits that we developed here. Mind you, that’s exactly the same as the last problem; if you’re not aware of the limits, or if there’s a bug, who produced the code has no bearing on the behaviour of the code.
  • Maybe you’re not using toolkits at all, for any given function. Mind you, that doesn’t change the nature of the problems above either.
  • Maybe some downstream guy is truncating everything over 29996 bytes, placing those two dots at the end, and ignoring everything else, and and he’s not sending a return value to you to let you know that he’s doing it.
  • Maybe he is sending you a return value, but the wrong one.
  • Maybe he’s sending you a return value, and you’re ignoring it.
  • Maybe he’s sending you a return value, and you are paying attention to it, but there’s some confusion about what it means and how it should be handled.
  • Maybe you’re truncating the last two and a half kilobytes or so of data before you send it on, and we’re not telling the user about it. Maybe that’s your intention. Seems a little rude to me to do that, but to you, it works as designed. To some user, it doesn’t work—as designed.
  • Maybe there’s no one else involved, and it’s just you working on all those bits of the code, but the program has now become sufficiently complex that you’re unable to keep everything in your head. That stands to reason; it is a complicated program, with lots of bits and pieces.
  • Maybe you’re depending on unit tests to tell you if anything is wrong with the individual functions or objects. But maybe nothing is wrong with any particular one of them in isolation; maybe it’s the interaction between them that’s problemmatic.
  • Maybe you don’t have any unit tests at all.
  • Maybe you do have unit tests for this stuff. From right here, I can’t tell. If you do have them, I can’t tell whether your checks are really great and you just missed one this time, or if you missed a few, or if you missed a bunch of them, or whether there’s a ton of them and they’re all really lousy.
  • Any of the above explanations could be in play, many of them simultaneously. No matter what, though, all your unit tests could pass, and you’d never know about the problem until we took out all the mocks and hooked everything up in the real system. Or deployed into the field. (Actually, by now they’re not unit tests; they’re just unit checks, since it’s a while since this part of the code was last looked at and we’ve been seeing green bars for the last few months.)

For any one of the cases above, since it’s so easy to test and check for these things, I would think that if you or anyone else knew about this problem, your sense of professionalism and craftsmanship would tell you to do some testing, write some checks, and fix it. After all, as Uncle Bob Martin said, you guys don’t want us to find any bugs, right?

But it’s not my place to say that. All that stuff is up to you. I don’t tell you how to do your work; I tell you what I observe, in this case entirely from the outside. Plus it’s only one test. I’ll have to do a few more tests to find out if there’s a more general problem. Maybe this is an aberration.

Now, I know you’re fond of saying, “No user would ever do that.” I think what you really mean is no user that you’ve thought of, and that you like, would do that on purpose. But it might be a thought to consider users that you haven’t thought of, however unlikely they and their task might be to you. It could be a good idea to think of users that neither one of us like, such as hackers or identity thieves. It could also be important to think of users that you do like who would do things by accident. People make mistakes all the time. In fact, by accident, I pasted the text of this message into another program, just a second ago.

So far, I’ve only talked about the source of the problem and the trigger for it. I haven’t talked much about possible consequences, or risks. Let’s consider some of those.

  • A customer could lose up to 2770 bytes of data. That actually sounds like a low-risk thing, to me. It seems pretty unlikely that someone would type or paste that much data in any kind of routine way. Still, I did hear from one person that they like to paste stack traces into a chat window. You responded rather dismissively to that. It does sound like a corner case.
  • Maybe you don’t report truncated data as a matter of course, and there are tons of other problems like this in the code, in places that I’m not yet aware of or that are invisible from the black box. Not this problem, but a problem with the same kind of cause could lead to a much more serious problem than this unlikely scenario.
  • Maybe there is a consistent pattern of user interface problems where the internals of the code handle problems but don’t alert the user, even though the user might like to know about them.
  • Maybe there’s a buffer overrun. That worries me more—a lot more—than the stack trace thing above. You remember that this kind of problem used to be dismissed as a “corner case” back when we worked at Microsoft—and then how Microsoft shut down new product development spent two months on investigating these kinds of problems, back in the spring of 2002? Hundreds of worms and viruses and denial of service attacks stem from problems whose outward manifestation looked exactly as trivial as this problem. There are variations on it.
  • Maybe there’s a buffer overrun that would allow other users to view a conversation that my contact and I would like to keep between ourselves.
  • Maybe an appropriately crafted string could allow hackers to get at some of my account information.
  • Maybe an appropriately crafted string could allow hackers to get at everyone‘s account information.
  • Maybe there’s a vulnerability that allows access to system files, as the Blaster worm did.
  • Maybe the product is now unstable, and there’s a crash about to happen that hasn’t yet manifested itself. We never know for sure if a test is finished.
  • Here’s something that I think is more troubling, and perhaps the biggest risk of all. Maybe, by blowing off this report, you’ll discourage testers from reporting a similarly trivial symptom of a much more serious problem. In a meeing a couple of weeks ago, the last time a tester reported something like this, you castigated her in public for the apparently trivial nature of the problem. She was embarrassed and intimidated. These days she doesn’t report anything except symptoms that she thinks you’ll consider sufficiently dramatic. In fact, just yesterday she saw something that she thought to be a pretty serious performance issue, but she’s keeping mum about it. Some time several weeks from now, when we start to do thousands or millions of transactions, you may find yourself wishing that she had felt okay about speaking up today. Or who knows; maybe you’ll just ask her why she didn’t find that bug.

NASA calls this last problem “the normalization of deviance”. In fact, this tiny little inconsistency reminds me of the Challenger problem. Remember that? There were these O-rings that were supposed to keep two chambers of highly-pressurized gases separate from each other. It turns out that on seven of the shuttle flights that preceded the Challenger, these O-rings burned through a bit and some gases leaked (they called this “erosion” and “blow-by”). Various managers managed to convince themselves that it wasn’t a problem, because it only happened on about a third of the flights, and the rings, at most, only burned a third of the way through. Because these “little” problems didn’t result in catastrophe the first seven times, NASA managers used this as evidence for safety. Every successful flight that had the problem was taken as reassurance that NASA could get away with it. In that sense, it was like Nassim Nicholas Taleb’s turkey, who increases his belief in the benevolence of the farmer every day… until some time in the week before Thanksgiving.

Richard Feynman, in his Appendix to the Rogers Commission Report on the Space Shuttle Challenger Accident, nailed the issue:

The phenomenon of accepting for flight, seals that had shown erosion and blow-by in previous flights, is very clear. The Challenger flight is an excellent example. There are several references to flights that had gone before. The acceptance and success of these flights is taken as evidence of safety. But erosion and blow-by are not what the design expected. They are warnings that something is wrong. The equipment is not operating as expected, and therefore there is a danger that it can operate with even wider deviations in this unexpected and not thoroughly understood way. The fact that this danger did not lead to a catastrophe before is no guarantee that it will not the next time, unless it is completely understood. When playing Russian roulette the fact that the first shot got off safely is little comfort for the next.

That’s the problem with any evidence of any bug, at first observation; we only know about a symptom, not the cause, and not the consequences. When the system is in an unpredicted state, it’s in an unpredictable state.

Software is wonderfully deterministic, in that it does exactly what we tell it to do. But, as you know, there’s sometimes a big difference between what we tell it to do and what we meant to tell it to do. When software does what we tell it to do instead of what we meant, we find ourselves off the map that we drew for ourselves. And once we’re off the map, we don’t know where we are.

According to Wikipedia, Feynman’s investigations also revealed that there had been many serious doubts raised about the O-ring seals by engineers at Morton Thiokol, which made the solid fuel boosters, but communication failures had led to their concerns being ignored by NASA management. He found similar failures in procedure in many other areas at NASA, but singled out its software development for praise due to its rigorous and highly effective quality control procedures – then under threat from NASA management, which wished to reduce testing to save money given that the tests had always been passed.

At NASA, back then, the software people realized that just because their checks were passing, it didn’t mean that they should relax their diligence. They realized that what really reduced risk on the project was appropriate testing, lots of tests, and paying attention to seemingly inconsequential failures.

I know we’re not sending people to the moon here. Even though we don’t know the consequences of this inconsistency, it’s hard to conceive of anyone dying because of it. So let’s make it clear: I’m not saying that the sky is falling, and I’m not making a value judgment as to whether we should fix it. That stuff is for you and the project managers to decide upon. It’s simply my role to observe it and report it.

I think it might be important, though, for us to understand why the problem is there in the first place. That’s because I don’t know whether the problem that I’m seeing is a big deal. And the thing is, until you’ve looked at the code, neither do you.

As always, it’s your call. And as usual, I’m happy to assist you in running whatever tests you’d like me to run on your behalf. I’ll also poke around and see if I can find any other surprises.

Your friend,

The Tester

P.S. I did run a second test. This time, I used PerlClip to craft a string of 100000 instances of :) . That pair of characters, in normal circumstances, results in a smiley-face emoticon. It seemed as though the input field accepted the characters literally, and then converted them to the graphical smiley face. It took a long, long time for the input field to render this. I thought that my chat window had crashed, but it hadn’t. Eventually it finished processing, and displayed what it had parsed from this odd input. I didn’t see 32768 smileys, nor 29996, nor 16384, nor 14998. I saw exactly two dots. Weird, huh?