DevelopsenseLogo

The Tyranny of Always

I just spent $3,000 to get my nose fixed, and then I found out it was my tie that was crooked.

—Steve Shrott

There’s a piece of software development mythodology that suggests that it’s always more expensive to fix a problem late in the development process rather than early. Usually the ratios quoted are fantastic; a hundred to one, a thousand to one, ten thousand to one. Let’s put that idea under the microscope for a moment.

The original idea comes from work by Barry Boehm, based on work that he did at TRW, a defense contractor. The idea includes “often”, not always, and a figure of 100 to one.

Here are some stories to contrast with the always/10,000 to one trope.

  • This has happened dozens of times for me, and I rather suspect that this has happened at least once for large numbers of testers: just before release, you find a bug, you show it to a developer, and she says, “Oh. Cool. Just a second. [type, type, type]. Fixed.” That didn’t cost 10,000 to one.
  • This has also happened for me, I suspect that this has happened for practically everyone as well: you find a bug, you show it to a program manager, and she says, “Oh. Well, it is a problem, I suppose, but we’ve got really important problems to fix and this one isn’t such a big deal. Let’s not bother with fixing it.” After release, no one complains. That didn’t cost 10,000 to one either.
  • Similarly, we can spend a lot of time and money trying to work around the persistent and pernicious problems with certain platforms, or we could simply decide to drop support for it and wait for the problem to go away. The recent movement to drop support for Internet Explorer 6.0 is a case in point; IE 6’s market share has been dropping consistently month by month since September 2005 (weirdly, there was an uptick in June 2009), and it’s down to 14.5 per cent now. Dropping support would save Web developers everywhere a good deal of grief, maybe to the temporary displeasure of some customers—who will eventually upgrade anyway. Will that cost 10,000 to one?
  • This has happened for some of us: “Gak! Another bug? This feature really isn’t working. We should back this whole feature out.” That might save way more than it costs.
  • Another one from my own experience: a programmer (let’s call him Phil) struggled for weeks and weeks with a problem to no avail. He decided to put off fixing it. Shipping time approached, and Phil’s problem list was too long for him to handle on his own. Another programmer (let’s call him Ron), free from having shipped his product, was suddenly available. With the luxury of a clear mind and an absence of preconceptions, he was able to fix the problem within an hour. Had Phil been forced to try to fix the problem earlier, he might have wasted enormous amounts of time. He saved time by delaying.
  • We choose to delay shipping the product to fix a problem. In so doing, we miss our shipping schedule, and our company declares a zero-revenue quarter. The drop in the value of our stock puts us out of business. Fixing the problem in that case may well have been unwise.
  • Testing reveals that the problem is a bug in someone else’s code. They fix the bug. Our cost to fix the problem is not 10,000 to one; it’s free.
  • Many organizations prepare prototypes. There are problems in the prototype. We don’t fix those problems right away; we save them for the Real Thing.
  • After deferring the problem for a while, some third party comes up with an update or a library or a toolkit that addresses the problem. Jonathan Bach tells a wonderful story about a bug that was found during testing of a popular commercial software product that he worked on. The developer resisted fixing the problem. The bug languished in development for many months, but was consistently deferred because the development effort (and therefore the cost) for the fix was intolerably high. The team eventually decided that the problem was significant enough to fix. The developer looked into solving the problem, and discovered that a third-party library had been released one week earlier; that library made the effort associated with the fix trivial. Had the developer tried to fix the problem earlier, that effort would have been spent at the cost of not being able to fix other bugs.
  • We might decline to fix a trivial bug now because fixing that bug might unblock a bunch of much more serious bugs.
  • After wrestling with the problem for a while, and then dropping it, someone comes up with a flash of insight. We develop a new approach to solving it. This takes much less time than our original approach would have taken.
  • We choose to release software, having fixed some of the problems and having ignored others, based on our theories of what the customers will like and dislike. We discover that people care very deeply about some of the problems that we didn’t fix, and that they wouldn’t have cared about the ones that we did.

Now: there are plenty of cases in which it does cost vastly less to fix a problem earlier than later. One of the more dramatic cases of this is the 1995 Pentium problem, in which a handful of missing entries from a table caused a number of floating point calculations to be handled incorrectly. That one probably could have been caught with more careful work, and it cost a billion dollars or so to manage the PR debacle. Yet how much would it have cost Intel to make sure that it’s processors were perfect? To this day, Intel doesn’t fix everything that it finds in testing. No one would pay for processors created with that level of effort.

It’s probably the case that it’s often or even usually more expensive to fix a problem later than earlier, based on certain assumptions. Yet I believe that it’s important to make sure that those assumptions aren’t buried under slogans. Software development isn’t assembling Tinkertoys according to a canned plan; it is suffused with questions of context and cost vs. value. It’s not merely assembly; it’s design and learning and discovery and investigation and branching and backtracking and breaking and fixing and learning some more. Yes, it’s important to be a good craftsman, and a good craftsman methodically fixes problems as he finds them… usually. But it’s also important to recognize that craftsmen also make their own decisions about what’s most important right now. As Victor Basili and Forrest Shull say (in a book edited by Boehm)

“It is clear that there are many sources of variation between one development context and another, and it is not clear a priori what specific variables influence the effectiveness of a process in a given context.”

So: Always question simplistic slogans that start with “always”.

16 replies to “The Tyranny of Always”

  1. I have always been suspicious of the statement “it is always more expensive to fix a bug later”… 😉

    This statement is sometimes even presented with a falsified exponential curve that suggests that it always becomes dramatically more expensive the closer to the project end you get. It is interesting that it at the same time has a limit at the end… 🙂

    People like simplified models because it is easy to understand; and it is easy to retell. It is like “urban legends”; people might know or suspect that it isn’t true, but it feels good to believe in it.

    Reply
  2. Thanks for that great article Michael. Always great to here you say things like this because I've had an underlying belief that it's not always more expensive to fix later.

    Particularly in agile environment the high cost of fixing is managed by the customer. Like one of your examples, the fix to the bug might be unneccessary as the customer isn't that bothered.

    Sometimes it is expensive, sometimes not. Balance.

    Rob..

    Reply
  3. Hi micheal,

    Great post, you really analyzed the sying that it cost 10 to,es to fix bugs later in the development cycle, the cases you presented are great. your post have given me new directions to think

    nice post 🙂

    Reply
  4. Great post. Fully agree.

    I think that the "an early fix is always cheaper" statement taps to some simplistic (imaginary) view of programming, as follows:

    The programmer is sitting in front of his computer. He types in his code. At some point in time, he enters code that causes a bug. If he had an immediate indication that he just entered a bug he could have fixed it in like 2 seconds.

    On the other hand, if he gets such an indication at the end of the day/week/month the amount of code that needs to be scanned in order to detect the bug is considerably larger and hence the increased fixing costs.

    I don't buy this view. It is too simplistic. For one, sometime the new code merely triggers the bug which actually resides in another module. In such cases there is still a lot of code to scan….

    Reply
  5. Interesting and you have made valid points. This goes to show that in some situations, or contexts as you're prone to state, that the premise/statement can be proven false. That is true with anything. That is one of the founding principles of testing; prove or disprove a theory or point.

    But I think you need to account for other factors other than talking to the developer (which at times could be a simple thing as a 5 minute code change, or they delay the fix) or users (they decide they don't need to fix the problem right now).

    There are other "rework" cost factors that can, and do, support the work of Boehm. Your example of the Intel Pentium Floating Point calculation problem is a good one. These "other" tangible factors like money spent on PR, additional tech support time, money to replacing "defective" CPU's, loss of revenue to rivals (AMD did pretty well for a while there), and loss of renewal revenue (people with machines with older Intel CPU's either not upgrading or going to rival/competitor) add to rework impact on the Cost-to-Fix curve. In this way it does support Boehm's findings.

    Another example is Ashton Tate with dBase IV in 1989. One of the defects (improper closure of Memo field causing record in database to be corrupted) was known before release, but was not fixed. This was an executive management decision to ship the product and fix it later. Needless to say this one, along with a few other killer defects caused Ashton Tate to loose enough market share and revenue (lack of new sales or upgrades, and loss of sales to competitors) that it eventually killed them. Ashton Tate was eventually bought out by its main competitor, Borland. Now the funny thing is is that Ashton Tate had the patch ready very shortly after the initial release, but the damage was already done.

    Now as you pointed out with today's Agile methods/processes you can deflate the curve, which is good. So you can lower the cost ratio's between the phases (based on the procedural lifecycle model Boehm was using), and have more cumulative effect. So the question now becomes how much, and where, can you deflate the curve.

    Regarding Boehm's original work/premise I don't think you can totally discredit it or ignore it. There will "always" be instances where complications (costs) caused by rework will rear their ugly head, and their impact prove to be quite large.

    Reply
  6. Thank you all for your comments.

    Calkelpdiver, yes: Ashton Tate was another example. I considered including it. I remember the dBaseIV debacle. I was working for Quarterdeck at the time. I was moved to produce a variation on a joke at the time.

    Two venture capitalists at a party in Silicon Valley. One of them says, "Gotta tell you about this new startup we're funding. It's gonna be hot. You should see the first product they're wokring on. We've got Jobs doing the vision for it…"

    "You've got Steve Jobs doing product development?"

    "Uh… no. Ron Jobs. He's a guy who used to do marketing for a toothpaste company. But we're not worried about it. The product's going to be a total hit. We've got Joy to code it…"

    "Bill Joy is writing the code for your startup?"

    "Well… no. Stan Joy. He's a new kid, just graduated from some college near Encino some place. But it's not worried about it. We've got Esber as the CEO…"

    "Ed Esber is the CEO of your startup?!"

    "Yeah!"

    As for totally discrediting Boehm's work, you're right, it can't be done… and that's my point. It's not my intention to discredit the assertion that products could often cost up to 100 times more to fix later than to fix now. Instead, I'm trying to encourage questioning and reflection about the tropes that we often hear in testing and other realms of development.

    Cheers,

    —Michael B.

    Reply
  7. Michael,

    You were at Quarterdeck in Santa Monica?? When, around 89-90? I was at Peter Norton in Santa Monica around that time.

    Jim Hazen

    Reply
  8. Michael,

    Great post, and an important subject to raise. I particularly like the last point that you mention. The “mythodology” is built on the invalid assumption that we are planning to fix all of the bugs. In most cases this is impractical if not impossible, and so we need to focus our limited resources on fixing the ones that are going to affect our business. In many cases deferring commitment can furnish us with more information on what the customer wants and what issues they are likely to encounter. This allows us to apply more targeted effort to addressing the right actions (whether bugs or feature development) which actually saves money. This can be through prototyping, as you’ve mentioned, beta release or simply building a relationship and finding out more about their needs.

    A specific example of scenario 6 (missing a shipping deadline) that is particularly relevant to startups is not just missing the deadline but missing the market entirely. Delaying too long on releasing a product into a fast moving market can make the difference between dominating that field or being an also ran. In this case the cost of fixing the bugs needs to be compared relative to the resulting value of the company, which could be many orders of magnitude of difference.

    Adam

    —————————————-
    http://a-sisyphean-task.com/
    twitter: adampknight
    —————————————-

    Reply
  9. There is also another point, the longer time you cycle takes, the more expensive it might be to fix bugs. Many times we see at this curve and we try to fix the bugs as soon as possible. Make your release cycle shorter! that might help improving the cost of fixing bugs.

    Michael replies: Like the original heuristic that “bugs found earlier tend to be less expensive than bugs found later”, I think that’s a good general heuristic, but it warrants some explanation, clarification, refinement.

    “Short cycle times” generally means that less work gets before it gets subjected to some scrutiny, which means that fewer problems persist without our knowing about them. I think, in general, that’s a good thing. At the same time, it’s worth remembering that the amount of programming work and the amount of testing work for a given burst of development is not symmetrical. Something that may be easy and quick to code may be challenging and time-consuming to test; something that took ages to program might take relatively little time to test. I would say it’s generally a good idea to tinker, trying lots of small experiments where problems are relatively inconsequential. But making your release cycle shorter may also mean that you spend more time on setup for the new builds, or more time investigating and reporting problems in builds that don’t yet warrant that degree of effort, at the expense of time spent in direct interaction with the product.

    Reply
  10. Although this video has some good introductory content (as does the series) they still err on the side of misconceptions. One being defects found early are less expensive than defects found on release, or anywhere in between.

    Heres the link: https://www.youtube.com/watch?v=An7HC1LolDM&list=PLDC2A0C8D2EC934C7&index=2#t=143.03194

    45k subs, I think these people have great presentation skills, plus the basics of what testing is, with a little (uneducated) hyperbole.

    Reply

Leave a Comment