Blog Posts for the ‘Words and Semantics’ Category

Very Short Blog Posts (25): Testers Don’t Break the Software

Tuesday, February 17th, 2015

Plenty of testers claim that they break the software. They don’t really do that, of course. Software doesn’t break; it simply does what it has been designed and coded to do, for better or for worse. Testers investigate systems, looking at what the system does; discovering and reporting on where and how the software is broken; identifying when the system will fail under load or stress.

It might be a good idea to consider the psychological and public relations problems associated with claiming that you break the software. Programmers and managers might subconsciously harbour the idea that the software was fine until the testers broke it. The product would have shipped on time, except the testers broke it. Normal customers wouldn’t have problems with the software; it’s just that the testers broke it. There are no systemic problems in the project that lead to problems in the product; nuh-uh, the testers broke it.

As an alternative, you could simply say that you investigate the software and report on what it actually does—instead of what people hope or wish that it does. Or as my colleague James Bach puts it, “We don’t break the software. We break illusions about the software.”

Give Us Back Our Testing

Saturday, February 14th, 2015

“Program testing involves the execution of a program over sample test data followed by analysis of the output. Different kinds of test output can be generated. It may consist of final values of program output variables or of intermediate traces of selected variables. It may also consist of timing information, as in real time systems.

“The use of testing requires the existence of an external mechanism which can be used to check test output for correctness. This mechanism is referred to as the test oracle. Test oracles can take on different forms. They can consist of tables, hand calculated values, simulated results, or informal design and requirements descriptions.”

—William E. Howden, A Survey of Dynamic Analysis Methods, in Software Validation and Testing Techniques, IEEE Computer Society, 1981

Once upon a time, computers were used solely for computation. Humans did most of the work that preceded or followed the computation, so the scope of a computer program was limited. In the earliest days, testing a program mostly involved checking to see if the computations were being performed correctly, and that the hardware was working properly before and after the computation.

Over time, designers and programmers became more ambitious and computers became more powerful, enabling more complex and less purely numerical tasks to be encoded and delegated to the machinery. Enormous memory and blinding speed largely replaced the physical work associated with storing, retrieving, revising, and transmitting records. Computers got smaller and became more powerful and protean, used not only by mathematicians but also by scientists, business people, specialists, consumers, and kids.

Software is now used for everything from productivity to communications, control systems, games, audio playback, video displays, thermostats… Yet many of the software development community’s ideas about testing haven’t kept up. In fact, in many ways, they’ve gone backwards.

Ask people in the software business to describe what testing means to them, and many will begin to talk about test cases, and about comparing a program’s output to some predicted or expected result. Yet outside of software development, “testing” has retained its many more expansive meanings.

A teenager tests his parents’ patience. When confronted with a mysterious ailment, doctors perform diagnostic tests (often using very sophisticated tools) with open expectations and results that must be interpreted. Writers in Cook’s Illustrated magazine test techniques for roasting a turkey, and report on the different outcomes that they obtain by varying factors—flavours, colours, moisture, textures, cooking methods, cooking times… The Mythbusters, says Wikipedia, “use elements of the scientific method to test the validity of rumors, myths, movie scenes, adages, Internet videos, and news stories.”

Notice that all of these things called “testing” are focused on exploration, investigation, discovery, and learning. Yet over the last several decades, Howden’s notions of testing as checking for correctness, and of an oracle as a mechanism (or an artifact) became accepted by many people in the development and testing communities at large. Whether people were explicitly aware of those notions, they certainly seem tacitly to have subscribed to the idea that testing should be focused on analysis of the output, displacing those broader and deeper meanings of testing.

That idea might have been more reasonable when computers did nothing but compute. Today, computers and their software are richly intertwined with daily social life and things that we value. Yet for many in software development, “testing” has this narrow, impoverished meaning, limited to what James Bach and I call checking. Checking is a tactic of testing; the part of testing that can be encoded as algorithms and that therefore can be performed entirely by machinery. It is analogous to compiling, the part of programming that can be performed algorithmically.

Oddly, since we started distinguishing between testing and checking, some people have claimed that we’re “redefining” testing. We disagree. We believe that we are recovering testing’s meaning, restoring it to its original, rich, investigative sense. Testing’s meaning was stolen; we’re stealing it back.

The Rapid Software Testing Namespace

Monday, February 2nd, 2015

Just as no one has the right to tell you what language to speak at home, nobody outside of your project has the authority to tell you how to speak inside your project. Every project develops its own namespace, so to speak, and its own formal or informal criteria for naming things inside it.

Rapid Software Testing is, among other things, a project in that sense. For years, James Bach and I have been developing labels for ideas and activities that we talk about in our work and in our classes. While we’re happy to adopt useful ideas and terms from other places, we have the sole authority (for now) to set the vocabulary formally within Rapid Software Testing (RST).

We don’t have the right to impose our vocabulary on anyone else. So what do we do when other people use a word to mean something different from what we mean by the same word?

We invoke “the RST namespace” when we talk about testing and checking, for example, so that we can speak clearly and efficiently about ideas that we bring up in our classes and in the practice of Rapid Software Testing. From time to time, we also try to make it clear why we use words in a specific way.

For example, we make a big deal about testing and checking. We define checking as “the process of making evaluations by applying algorithmic decision rules to specific observations of a product” (and a check is an instance of checking). We define testing as “the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.” (and a test is an instance of testing).

This is in contrast with the ISTQB, which in its Glossary defines “test” as “a set of test cases”—along with “test case” as “a set of input values, execution preconditions, expected results and execution postconditions, developed for a particular objective or test condition, such as to exercise a particular program path or to verify compliance with a specific requirement.”

Interesting, isn’t it: the ISTQB’s definition of test looks a lot like our definition of check. In Rapid Software Testing, we prefer to put learning and experimentation (rather than satisfying requirements and demonstrating fitness for purpose) at the centre of testing. We prefer to think of a test as something that people do as an act of investigation; as a performance, not as an artifact.

Because words convey meaning, we converse (and occasionally argue, and sometimes passionately) the value we see in the words we choose and the ways we think of them. Our goal is to describe things that people haven’t noticed, or to make certain distinctions clear, with the goal of reducing the risk that someone will misunderstand—or miss—something important.

Nonetheless, we freely acknowledge that we have no authority outside of Rapid Software Testing. There’s nothing to stop people from using the words we use in a different way; there are no language police in software development. So we’re also willing to agree to use other people’s labels for things when we’ve had the conversation about what those labels mean, and have come to agreement.

People who tout a “common language” often mean “my common language”, or “my namespace”. They also have the option to certify you as being able to pass a vocabulary test, if anyone thinks that’s important. We don’t.

We think that it’s important for people to notice when words are being used in different ways. We think it’s important for people to become polyglots—and that often means working out which namespace we might be using from one moment to the next.

In our future writing, conversation, classes, and other work, you might wonder what we’re talking about when we refer to “the RST namespace”. This post provides your answer.

Taking Severity Seriously

Wednesday, January 14th, 2015

There’s a flaw in the way most organizations classify the severity of a bug. Here’s an example from the Elementool Web site (as of 14 January, 2015); I’m sure you’ve seen something like it:

Critical: The bug causes a failure of the complete software system, subsystem or a program within the system.
High: The bug does not cause a failure, but causes the system to produce incorrect, incomplete, inconsistent results or impairs the system usability.
Medium: The bug does not cause a failure, does not impair usability, and does not interfere in the fluent work of the system and programs.
Low: The bug is an aesthetic (sic —MB), is an enhancement (ditto) or is a result of non-conformance to a standard.

These are serious problems, to be sure—and there are problems with the categorizations, too. (For example, non-conformance to a medical device standard can get you publicly reprimanded by the FDA; how is that low severity?) But there’s a more serious problem with models of severity like this: they’re all about the system as though no person used that system. There’s no empathy or emotion here; there’s no impact on people. The descriptions don’t mention the victims of the problem, and they certainly don’t identify consequences for the business. What would happen if we thought of those categories a little differently?

Critical: The bug will cause so much harm or loss that customers will sue us, regulators will launch a probe of our management, newspapers will run a front-page story about us, and comedians will talk about us on late night talk shows. Our company will spend buckets of money on lawyers, public relations, and technical support to try to keep the company afloat. Many capable people will leave voluntarily without even looking for a new job. Lots of people will get laid off. Or, the bug blocks testing such that we could miss problems of this magnitude; go back to the beginning of this paragraph.

High: The bug will cause loss, harm, or deep annoyance and inconvenience to our customers, prompting them to flood the technical support phones, overwhelm the online chat team, return the product demanding their money back, and buy the competitor’s product. And they’ll complain loudly on Twitter. The newspaper story will make it to the front page of the business section, and our product will be used for a gag in Dilbert. Sales will take a hit and revenue will fall. The Technical Support department will hold a grudge against Development and Product Management for years. And our best workers won’t leave right away, but they’ll be sufficiently demoralized to start shopping their résumés around.

Medium: The bug will cause our customers to be frustrated or impatient, and to lose faith in our product such that they won’t necessarily call or write, but they won’t be back for the next version. Most won’t initiate a tweet about us, but they’ll eagerly retweet someone else’s. Or, the bug will annoy the CEO’s daughter, whereupon the CEO will pay an uncomfortable visit to the development group. People won’t leave the company, but they’ll be demotivated and call in sick more often. Tech support will handle an increased number of calls. Meanwhile, the testers will have—with the best of intentions—taken time to investigate and report the bug, such that other, more serious bugs will be missed (see “High” and “Critical” above). And a few months later, some middle manager will ask, uncomprehendingly, “Why didn’t you find that bug?”

Low: The bug is visible; it makes our customers laugh at us because it makes our managers, programmers, and testers look incompetent and sloppy—and it causes our customers to suspect deeper problems. Even people inside the company will tease others about the problem via grafitti in the stalls in the washroom (written with a non-washable Sharpie). Again, the testers will have spent some time on investigation and reporting, and again test coverage will suffer.

Of course, one really great way to avoid many of these kinds of problems is to focus on diligent craftsmanship supported by scrupulous testing. But when it comes to that discussion in that triage meeting, let’s consider the impact on real customers, on the real people in our company, and on our own reputations.

Testing is…

Tuesday, October 28th, 2014

Every now and again, someone makes some statement about testing that I find highly questionable or indefensible, whereupon I might ask them what testing means to them. All too often, they’re at a loss to reply because they haven’t really thought deeply about the matter; or because they haven’t internalized what they’ve thought about; or because they’re unwilling to commit to any statement about testing. And then they say something vague or non-committal like “it depends” or “different things to different people” or “that’s a matter of context”, without suggesting relevant dependencies, people, or context factors.

So, for those people, I offer a set of answers from which they can choose one; or they can adopt the entire list wholesale; or they use one or more items as a point of departure for something of their own invention. You don’t have to agree with any of these things; in that case, invent your own ideas about testing from whole cloth. But please: if you claim to be a tester, or if you are making some claim about testing, please prepare yourself and have some answer ready when someone asks you “what is testing?”. Please.

Here are some possible replies; I believe everything is Tweetable, or pretty close.

  • Testing is—among other things—reviewing the product and ideas and descriptions of it, looking for significant and relevant inconsistencies.
  • Testing is—among other things—experimenting with the product to find out how it may be having problems—which is not “breaking the product”, by the way.
  • Testing is—among other things—something that informs quality assurance, but is not in and of itself quality assurance.
  • Testing is—among other things—helping our clients to make empirically informed decisions about the product, project, or business.
  • Testing is—among other things—a process by which we systematically examine any aspect of the product with the goal of preventing surprises.
  • Testing is—among other things—a process of interacting with the product and its systems in many ways that challenge unwarranted optimism.
  • Testing is—among other things—observing and evaluating the product, to see where all those defect prevention ideas might have failed.
  • Testing is—among other things—a special part of the development process focused on discovering what could go badly (or what is going badly).
  • Testing is—among other things—exploring, discovering, investigating, learning, and reporting about the product to reveal new information.
  • Testing is—among other things—gathering information about the product, its users, and conditions of its use, to help defend value.
  • Testing is—among other things—raising questions to help teams to develop products that more quickly and easily reveal their own problems.
  • Testing is—among other things—helping programmers and the team to learn about unanticipated aspects of the product we’re developing.
  • Testing is—among other things—helping our clients to understand the product they’ve got so they can decide if it’s the product they want.
  • Testing is—among other things—using both tools and direct interaction with the product to question and evaluate its behaviours and states.
  • Testing is—among other things—exploring products deeply, imaginatively, and suspiciously, to help find problems that threaten value.
  • Testing is—among other things—performing actual and thought experiments on products and ideas to identify problems and risks.
  • Testing is—among other things—thinking critically and skeptically about products and ideas around them, with the goal of not being fooled.
  • Testing is—among other things—evaluating a product by learning about it through exploration, experimentation, observation and inference.

You’re welcome.

I’ve Had It With Defects

Wednesday, April 2nd, 2014

The longer I stay in the testing business and reflect on the matter, the more I believe the concept of “defects” to be unclear and unhelpful.

A program may have a coding error that is clearly inconsistent with the program’s specification, whereupon I might claim that I’ve found a defect. The other day, an automatic product update failed in the middle of the process, rendering the product unusable. Apparently a defect. Yet let’s look at some other scenarios.

  • I perform a bunch of testing without seeing anything that looks like a bug, but upon reviewing the code, I see that it’s so confusing and unmaintainable in its current state that future changes will be risky. Have I found a defect? And how many have I found?
  • I observe that a program seems to be perfectly coded, but to a terrible specification. Is the product defective?
  • A program may be perfectly coded to a wonderfully written specification— even though the writer of the specification may have done a great job at specifying implementation for a set of poorly conceived requirements. Should I call the product defective?
  • Our development project is nearing release, but I discover a competitive product with this totally compelling feature that makes our product look like an also-ran. Is our product defective?
  • Half the users I interview say that our product should behave this way, saying that it’s ugly and should be easier to learn; the other half say it should behave that way, pointing out that looks don’t matter, and once you’ve used the product for a while, you can use it quickly and efficiently. Have I identified a defect?
  • The product doesn’t produce a log file. If there were a log file, my testing might be faster, easier, or more reliable. If the product is less testable than it could be, is it defective?
  • I notice that the Web service that supports our chain of pizza stores slows down noticeably dinner time, when more people are logging in to order. I see a risk that if business gets much better, the site may bog down sufficiently that we may lose some customers. But at the moment, everything is working within the parameters. Is this a defect? If it’s not a defect now, will it magically change to a defect later?

On top of all this, the construct “defect” is at the centre of a bunch of unhelpful ideas about how to measure the quality of software or of testing: “defect count”; “defect detection rate”; “defect removal efficiency”. But what is a defect? If you visit LinkedIn, you can often read some school-marmish clucking about defects. People who talk about defects seem refer to things that are absolutely and indisputably wrong with the product. Yet in my experience, matters are rarely so clear. If it’s not clear what is and is not a defect, then counting them makes no sense.

That’s why, as a tester, I find it much more helpful to think in terms of problems. A problem is “a difference between what is perceived and what is desired” or “an undesirable situation that is significant to and maybe solvable by some agent, though probably with some difficulty”. (I’ve written more about that here.) A problem is not something that exists in the software as such; a problem is relative, a relationship between the software and some person(s). A problem may take the form of a bug—something that threatens the value of the product—or an issue—something that threatens the value of the testing, or of the project, or of the business.

As a tester, I do not break the software. As a reminder of my actual role, I often use a joke that I heard attributed to Alan Jorgenson, but which may well have originated with my colleague James Bach: “I didn’t break the software; it was broken when I got it.” That is, rather than breaking the software, I find out how and where it’s broken. But even that doesn’t feel quite right. I often find that I can’t describe the product as “broken” per se; yet the relationship between the product and some person might be broken. I identify and illuminate problematic relationships by using and describing oracles, the means by which we recognize problems as we’re testing.

Oracles are not perfect and testers are not judges, so to me it would seem presumptuous of me to label something a defect. As James points out, “If I tell my wife that she has a defect, that is not likely to go over well. But I might safely say that she is doing something that bugs me.” Or as Cem Kaner has suggested, shipping a product with known defects means shipping “defective software”, which could have contractual or other legal implications (see here and here, for examples).

On the one hand, I find that “searching for defects” seems too narrow, too absolute, too presumptuous, and politically risky for me. On the other, if you look at the list above, all those things that were questionable as defects could be described more easily and less controversially as problems that potentially threaten the value of the product. So “looking for problems” provides me with wider scope, recognizes ambiguity, encourages epistemic humility, and acknowledges subjectivity. That in turn means that I have to up my game, using many different ways to model the product, considering lots of different quality criteria, and looking not only for functional problems but anything that might cause loss, harm, or annoyance to people who matter.

Moreover, rejecting the concept of defects ought to help discourage us from counting them. Given the open-ended and uncertain nature of “problem”, the idea of counting problems would sound silly to most people—but we can talk about problems. That would be a good first step towards solving them—addressing some part of the difference between what is perceived and what is desired by some person or persons who matter.

That’s why I prefer looking for problems—and those are my problems with “defects”.

Very Short Blog Posts (14): “It works!”

Monday, March 31st, 2014

“It works” is one of Jerry Weinberg‘s nominees for the most ambiguous sentence in the English language.

To me, when people say “it works”, they really mean

Some aspect
of some feature
or some function
appeared
to meet some requirement
to some degree
based on some theory
and based on some observation
that some agent made
under some conditions
once
or maybe more.

One of the most important tasks for a tester is to question the statement “it works”, to investigate the claim, and to elaborate on it such that important people in the product know what it really means.

Related posts:

A Little Blog Post on a Big Idea: Does the Software Work? (Pete Walen)
Behavior-Driven Development vs. Testing (James Bach)

and…

Perfect Software and Other Illusions about Testing (Jerry Weinberg)

Very Short Blog Posts (13): When Will Testing Be Done?

Friday, March 21st, 2014

When a decision maker asks “When will testing be done?”, in my experience, she really means is “When will I have enough information about the state of the product and the project, such that I can decide to release or deploy the product?”

There are a couple of problems with the latter question. First, as Cem Kaner puts it, “testing is an empirical, technical investigation of the product, done on behalf of stakeholders, that provides quality-related information of the kind that they seek”. Yet the decision to ship is a business decision, and not purely a technical one; factors other than testing inform the shipping decision. Second, only the decision-maker can decide how much information is enough for her purposes.

So how should a tester answer the question “When will testing be done?” My answer would go like this:

“Testing will be done when you decide to ship the product. That will probably be when you feel that you have enough information about the product, its value, and real and potential risks—and about what I’ve covered and how well I’ve covered it to find those things out. So I will learn everything I can about the product, as quickly as possible, and I’ll continuously communicate what I’ve learned to you. I’ll also help you to identify things that you might consider important influences on your decision. If you’d like me to keep testing after deployment (for example, to help technical support), I’ll do that too. Testing will be done when you decide that you’re satisfied that you need no more information from testing.”

That’s your very (or at least pretty) short blog post. For more, see:

Test Estimation is Really Negotiation

Test Project Estimation, The Rapid Way

Project Estimation and Black Swans (Part 5): Test Estimation: Is there really such a thing as a test project, or is it mostly inseparable from some other activities?

Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

Testing, Checking, and Convincing the Boss to Explore: You might want to take a more exploratory approach to the testing of your product or service, yet you might face some difficulty in persuading people who are locked into an idea of testing the product as “checking to make sure that it works”. So, some colleagues came up with ideas that might help.

Harry Collins and The Motive for Distinctions

Monday, March 3rd, 2014

“Computers and their software are two things. As collections of interacting cogs they must be ‘checked’ to make sure there are no missing teeth and the wheels spin together nicely. Machines are also ‘social prostheses’, fitting into social life where a human once fitted. It is a characteristic of medical prostheses, like replacement hearts, that they do not do exactly the same job as the thing they replace; the surrounding body compensates.

“Contemporary computers cannot do just the same thing as humans because they do not fit into society as humans do, so the surrounding society must compensate for the way the computer fails to reproduce what it replaces. This means that a complex judgment is needed to test whether software fits well enough for the surrounding humans to happily ‘repair’ the differences between humans and machines. This is much more than a matter of deciding whether the cogs spin right.”

—Harry Collins

Harry Collins—sociologist of science, author, professor at Cardiff University, a researcher in the fields of the public understanding of science, the nature of expertise, and artificial intelligence—was slated to give a keynote speech at EuroSTAR 2013. Due to illness, he was unable to do so. The quote above is the abstract from the talk that Harry never gave. (The EuroSTAR community was very lucky and grateful to have his colleague, Rob Evans, step in at the last minute with his own terrific presentation.)

Since I was directed to Harry’s work in 2010 (thank you, Simon Schaffer), James Bach and I have been galvanized by it. As we’ve been trying to remind people for years, software testing is a complex, cognitive, social task that requires skill, tacit knowledge, and many kinds of expertise if we want people to do it well. Yet explaining testing is tricky, precisely because so much of what skilled testers do is tacit, and not explicit; learned by practice and by immersion in a culture, not from documents or other artifacts; not only mechanical and algorithmic, but heuristic and social.

Harry helps us by taking a scalpel to concepts and ideas that many people consider obvious or unimportant, and dissecting those ideas to reveal the subtle and crucial details under the surface.

As an example, in Tacit and Explicit Knowledge, he takes the idea of tacit knowledge—formerly, any kind of knowledge that was not told—and divides it into three kinds: relational, the kind of knowledge that resides in an individual human mind, and that in general could be told; somatic, resident in the system of a human body and a human mind; and collective, residing in society and in the ever-changing relationships between people in a culture.

How does that matter? Consider the Google car. On the surface, operating a car looks like a straightforward activity, easily made explicit in terms of the laws of physics and the rules of the road. Look deeper, and you’ll realize that driving is a social activity, and that interaction between drivers, cyclists, and other pedestrians is negotiated in real time, in different ways, all over the world.

So we’ve got Google cars on the road experimentally in California and Washington; how will they do in Beijing, in Bangalore, or in Rome? How will they interact with human drivers in each society? How will they know, as human drivers do, the extent to which it is socially acceptable to bend the rules—and socially unacceptable not to bend them?

In many respects, machinery can do far better than humans in the mechanical aspects of driving. Yet testing the Google car will require far more than unit checks or a Cucumber suite—it will require complex evaluation and judgement by human testers to see whether the machinery—with no awareness or understanding of social interactions, for the foreseeable future—can be accommodated by the surrounding culture.

That will require a shift from the way testing is done at Google according to some popular stories. If you want to find problems that matter to people before inflicting your product on them, you must test—not only the product in isolation, but in its relationships with other people.

In Rapid Software Testing, our goal all the way along has been to probe into the nature of testing and the way we talk about it, with the intention of empowering people to do it well. Part of this task involves taking relational tacit knowledge and making it explicit. Another part involves realizing that certain skills cannot be transferred by books or diagrams or video tutorials, but must be learned through experience and immersion in the task. Rather than hand-waving about “intuition” and “error guessing”, we’d prefer to talk about and study specific, observable, trainable, and manageable skills.

We could talk about “test automation” as though it were a single subject, but it’s more helpful to distinguish the many ways that we could use tools to support and amplify our testing—for checking specific facts or states, for generating data, for visualization, for modeling, for coverage analysis… Instead of talking about “automated testing” as though machines and people were capable of the same things, we’d rather distinguish between checking (something that machines can do, an activity embedded in testing) and testing (which requires humans), so as to make both our checking and our testing more powerful.

The abstract for Prof. Collins’ talk, quoted above, is an astute, concise description of why skilled testing matters. It’s also why the distinction between testing and checking matters, too. For that, we are grateful.

There will be much more to come in these pages relating Harry’s work to our craft of testing; stay tuned. Meanwhile, I give his books my highest recommendation.

Tacit and Explicit Knowledge
Rethinking Expertise (co-authored with Rob Evans)
The Shape of Actions: What Humans and Machines Can Do (co-authored with Martin Kusch)
The Golem: What You Should Know About Science (co-authored with Trevor Pinch)
The Golem at Large: What You Should Know About Technology (co-authored with Trevor Pinch)
Changing Order: Replication and Induction in Scientific Practice
Artificial Experts: Social Knowledge and Intelligent Machines

Can You Hear The Alarm Bells?

Monday, November 25th, 2013

Many people seem certain about what happened to cause the healthcare.gov fiasco. Stories are starting to trickle out, and eventually they’ll be an ocean of them. To anyone familiar with software development, especially in large organizations, these stories include familiar elements of character and plot. From those, it’s easy to extrapolate and fill in the details based on imagination and experience. We all know what happened.

Well, we don’t. In a project of that size, no one knows what happened. No one can know what happened. Imagine Rashomon scaled up to hundreds of people, each making his own observations and decisions along the way.

As time goes by, I anticipate some people saying that the project will represent a turning point in software development and project management. “Surely,” they will say, “after a project failure of this size and scope, people will finally learn.” Alas, I’m less optimistic. As the first three premises of rapid software testing describe it, software development is a human activity that is surrounded by 1) confusion, 2) complexity, 3) volatility, 4) urgency and… 5) ambition. Increasing ambition causes increases in the other four items too. In our societies, we could help to defend ourselves against future fiascos by restraining our ambitions, but I fear that people will put blindfolds on each other, pass around the keys, and scramble to get back into the driver’s seat of the school bus. How will they do this?

One form of the blindfold is to say “That not going to be a problem here because…”

…failure is not an option.
…we have our best people on it.
…we can’t disappoint the client.
…it doesn’t have to be perfect. (thanks, Joe Miller, @lilshieste)
…we’ll fix it in production.
…no user would ever do that.
…the users will figure it out.
…the users will never notice that.
…THAT bug is in someone else’s code.
…we don’t have to fix that; that’s a new feature request.
…it’s working exactly as designed.
…if there’s no test case for it, it’s not a bug.
…the clients will come to their senses before the ship date.
…we have thousands of automated tests that we run on every build
…this time it will be different.
…we have budget to fix that before we deploy.
…at least the back end is working right.
…if there are performance problems, we’ll just add another few servers.
…we’ve done lots of projects just like this one.
…foreign-language support is something we could cut.
…that list there says that this is a level three threat, not a level one threat.
…the support people can handle whatever problems come up.
…this graph shows that the load will never get that high.
…now is too soon; we’ll tell the clients about the problems after we’ve fixed them.
…we’re thinking positively&mdashthat can-do spirit will see us through.
…we still have plenty of time left to fix that.
…the spec didn’t say anything about having to handle special characters. How are single quotes a big deal?
…the client should have thought of that before.
…seriously, that’s just a cosmetic problem.
…it’s important not to complicate things.
…everybody WILL put in some overtime and we WILL get this thing done.
…well, at least the front end looks good, and people will be happy with that.
…everyone here is committed to making sure this ships on time.
…we’ll just shorten the test cycle.
…if there’s a problem, the other/upstream/downstream team will let us know.
…they can take care of that in training.
…we’ve planned to make sure that nothing unexpected happens.
…we’ve got this fantastic new framework that’ll make things go faster.
…we’ll pull a bunch of people off other projects to work on this one.

I wonder whether these things were said, at one time or another, during the healthcare.gov project. I don’t know if they were. I don’t know what happened. I didn’t work on it. But I’ve heard these things on projects before, I know that I can listen for them, and I know that they’re a sign of trouble ahead. Are they being said on your project?