Blog Posts for the ‘Bugs’ Category

Bug of the Day: AI Sees Bits, Not Things

Monday, January 4th, 2021

An article that I was reading this morning was accompanied by a stock photo with an intriguing building in the background.

Students throwing their graduation caps in the air

I wanted to know where the building was, and what it was. I thought that maybe Chrome’s “Search Google for image” feature could help to locate an instance of the photo where the building was identified. That didn’t happen, but I got something else instead.

An assortment of images of migrating geese

Google Images provided me with a reminder that “machine learning” doesn’t see things and make sense of them; it matches patterns of bits to other patterns of bits. A bunch of blobby things in a variegated field? Birds in the sky, then—and the fact that there are students in their graduation gowns just below doesn’t influence that interpretation.

That reminded me of this talk by Martin Krafft:

The MIT network’s concept of a tree (called a symbol) does not extend beyond its visual features. This network has never climbed a tree or heard a branch break. It has never seen a tree sway in the wind. It doesn’t know that a tree has roots, nor that it converts carbon dioxide into oxygen. It doesn’t know that trees can’t move, and that when the leaves have fallen off in winter, it won’t recognize the tree as the same one because it cannot conclude that the tree is still in the same position and therefore must be the same tree.”

Martin Krafft, The Robots Won’t Take Away Our Jobs: Let’s Reframe the Debate on Artificial Intelligence, 14:30</p>

Then I had another idea: what if I fed a URL to the image above to Google Images? This is what I got:

Results from a Google Image search, given a link to an image

Software and machinery assist us in many ways as we’re organizing and sifting and sorting and processing data. That’s cool. When it comes to making sense of the world, drawing inferences, and making decisions that matter to people, we must continue to regard the machinery as cognitively and socially oblivious. Whether we’re processing loan applications, driving cars, or testing software, machinery can help us, but responsible, socially aware humans must remain in charge.

(A couple of friendly correspondents on Twitter have noted that the building is the Marina Bay Sands resort in Singapore.)

Lessons Learned from a Little Bug

Saturday, September 5th, 2020

Almost 10 years ago, I wrote a series of blog posts on project estimation and black swans.

And, almost 10 years after that, Chris NeJame reported an observation about the following passage towards the end of Part 4 of the series:

As Jerry (Weinberg) has frequently pointed out, plenty of organizations fall victim to back luck, but much of the time, it’s not the bad luck that does them in; it’s how they react to the bad luck.

Did you notice the problem?

Chris did. He courteously reported “possible typo on part 4: ‘back luck’?” Whereupon I fixed the bug.

What are the lessons to be learned here? Lots, I think.

  • Bugs can exist and persist without the author noticing them. As usual with all of the posts I write, I pored over that one as I was writing it. (You wouldn’t believe how long it takes me to write a blog post.) I read it over and over again; I found tons of errors and fixed them. And yet I still didn’t see the “back luck” error. Everyone, everyone, is prone to oblivion to problems in their own work to some degree. When we’ve been looking at something for a long time, our capacity to notice specific bugs diminishes.
  • Bugs can exist without users noticing them, either. Human beings repair problems in communication, often with no conscious effort. When some people’s eyes gather a string of text (“fall victim to ba-something luck”), the sensemaking faculty in their brains may repair the problem and it won’t come to their attention at all. For others, the flow of reading might be interrupted momentarily. They’ll make sense of “back luck” as they read the following two instances of “bad luck”, repair the problem in their minds, and move on.
  • Fresh eyes find failure. That’s one of the most concise and memorable lines from Lessons Learned in Software Testing. This was the first time that Chris had read the post. He had fresh eyes and critical distance from the author’s perspective, making it easier for him to see the problem than it was for me.
  • When testing, it helps to look at things at different times. For several minutes, “One of the most concise…” in the paragraph read “On of the most concise…” I tend to write blog text at the same time as I’m marking it up, and the “list item” tags affect the way I read, so that problem persisted for a while.
  • When testing, it helps to look at things in different ways. As I was composing this post, I was focusing on the words of the text, as usual. I wasn’t focusing much on the presentation. At best, I was imagining it. When I switched to Preview Mode, I began to realize that a single sentence in bold at the beginning of each lesson would help the lessons to stand out. That wasn’t as obvious when writing in text and markup. One antidote to this is to look at the post in preview mode, where such errors are easier to see.
  • The developer’s experience of something is profoundly different from the customer’s experience. When I’m writing something in text, my ideas about the experience of reading it are both imaginary and vague. There’s no replacement for experiencing the product and interacting with it the way its users do. This is why experiential testing is so important. (Please don’t call it “manual testing”.)
  • Bugs can persist for a long, long time without being reported. Chris isn’t alone; I once found a similar problem in a book by Jerry Weinberg. When I reported it to Jerry, he told me that the error had been around for 30 years or so.
  • The idea that “the users will report the bugs” is bogus. People who dismiss the value of testing, or of testers, often use this argument. It’s silly. Lots of users won’t notice the bug. Lots of users will notice it, and won’t report it. Lots of users will notice it, won’t report it, and simply won’t use (or buy) your product. And you won’t hear anything from them. Some users will notice it and report it, but sometimes your crazy-busy support people won’t report it to you. Although it’s possible, Chris was almost certainly not the first person to notice the problem. But he is the first to have bothered to report it.
  • A bug that is not important to your users might be important to you, and vice versa. My readers didn’t notice the problem, or were sufficiently unconcerned about it not to mention it, probably thinking that it didn’t matter. I care about not looking sloppy in blog posts, so it mattered to me.
  • It takes time and energy to report a bug. Therefore, it might be a really, really good idea to eliminate any friction in reporting a bug, both for users and for testers.
  • Checking tools may help us to find checkable problems. Spelling checkers may help us to find spelling errors. Grammar checkers may help us to find grammatical errors (although few of them, in my experience, are much good). The spelling checker built into the browser has flagged several typos which I was able to notice and fix in the course of writing this post. Hurrah.
  • Checking tools can be unreliable. As I write, I’m noticing that after WordPress refreshes a page that is being edited, the spelling checker built into the browser doesn’t flag all of the spelling errors in the text editing window. It only flags an error if the insertion point (that is, the text editing cursor) has been placed in the paragraph with the error in it. I almost missed a bunch of errors because of that. Meanwhile, the spelling checker is flagging “checkable” in the paragraph above, but that’s exactly the word I want, even though it’s not in the browser’s dictionary. And the point is…
  • To find problems, machinery can help, but there’s no replacing human observation and judgment. Checking tools don’t understand our intentions. In the “back luck” case, there was nothing wrong with the spelling of the words, nor was anything wrong with the syntax of the sentence. It was the meaning of the sentence, the semantics of it, that was wrong. In the “checkable” case, I’m using a neologism that humans can interpret just fine. The spelling checker won’t alert me to a missing word, either, and no tools can tell me that the blog post I’ve written is the blog post I want.
  • Critics are important. Some people (like me, and like Chris) have a capacity and a predilection for spotting problems in other people’s work. We have the critic’s mindset, even though we may be oblivious to certain problems in our own work. It’s a very a good thing for testers to have that mindset, and to engage testers who have it. But…
  • It’s socially risky to be a critic. It’s good to know about errors, in the long run, but not many people always love being confronted with errors. It helps for testers to remember that. So…
  • Excellent testers manage social risk. Although he said “possible bug”, I’ll bet that Chris was pretty much certain that there was a bug, and that I would agree. Yet by saying “possible bug”, he left me in charge of the decision about whether there was a bug. This is an important move for a tester. It helps to acknowledge that authors (programmers, designers, managers…) get to decide whether the product they’ve got is the product they want, and that they are responsible for the quality of the work. This can help soften the blow of confronting yet another damned error.
  • We can learn a lot from a small problem. Here’s a case where the problem is two letters where there should have been one, and of those two, one was wrong. This is no big deal in a blog post, but in a software products, two bytes can make a difference between a working product and a devastating problem. Such problems can remain invisible for years until suddenly, one day, they’re not. Yet even though this is a relatively trivial problem, look at what we can learn from it, if we choose! And look at how reflecting on the problem leads to experiences that lead to even more learning!

Thanks to Chris for reporting the bug, but also for triggering the opportunity to explore these lessons.

What lessons would you add?

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

Testers Don’t Prevent Problems

Wednesday, May 4th, 2016

Testers don’t prevent errors, and errors aren’t necessarily waste.

Testing, in and of itself, does not prevent bugs. Platform testing that reveals a compatibility bug provides a developer with information. That information prompts him to correct an error in the product, which prevents that already-existing error from reaching and bugging a customer.

Stress testing that reveals a bug in a function provides a developer with information. That information helps her to rewrite the code and remove an error, which prevents that already-existing error from turning into a bug in an integrated build.

Review (a form of testing) that reveals an error in a specification provides a product team with information. That information helps the team in rewriting the spec correctly, which prevents that already-existing error from turning into a bug in the code.

Transpection (a form of testing) reveals an error in a designer’s idea. The conversation helps the designer to change his idea to prevent the error from turning into a design flaw.

You see? In each case, there is an error, and nothing prevented it. Just as smoke detectors don’t prevent fires, testing on its own doesn’t prevent problems. Smoke detectors direct our attention to something that’s already burning, so we can do something about it and prevent the situation from getting worse. Testing directs our attention to existing errors. Those errors will persist—presumably with consequences—unless someone makes some change that fixes them.

Some people say that errors, bugs, and problems are waste, but they are not in themselves wasteful unless no one learns from them and does something about them. On the other hand, every error that someone discovers represents an opportunity to take action that prevents the error from becoming a more serious problem. As a tester, I’m fascinated by errors. I study errors: how people commit errors (bug stories; the history of engineering), why they make errors (fallible heuristics; cognitive biases), where we we might find errors (coverage), how we might recognize errors (oracles). I love errors. Every error that is discovered represents an opportunity to learn something—and that learning can help people to change things in order to prevent future errors.

So, as a tester, I don’t prevent problems. I play a role in preventing problems by helping people to detect errors. That allows those people to prevent those errors from turning into problems that bug people.

Still squeamish about errors? Read Jerry Weinberg’s e-book, Errors: Bugs, Boo-boos, Blunders.

As Expected

Tuesday, April 12th, 2016

This morning, I started a local backup. Moments later, I started an online backup. I was greeted with this dialog:

Looks a little sparse. Unhelpful. But there is that “More details” drop-down to click on. Let’s do that.

Ah. Well, that’s more information. But it’s confusing and unhelpful, but I suppose it holds the promise of something more helpful to come. I notice that there’s a URL, but that it’s not a clickable link. I notice that if the dialog means what it says, I should copy those error codes and be ready to paste them into the page that comes up. I can also infer that there’s not local help for these error codes. Well, let’s click on the Knowledge Base button.

Oh. The issue is that another backup is running, and starting a second one is not allowed.

As a tester, I wonder how this was tested.

Was an automated check programmed to start a backup, start a second backup, and then query to see if a dialog appeared with the words “Failed to run now: task not executed” in it? If so, the behaviour is as expected, and the check passed.

Was an automated check programmed to start a backup, start a second backup, and then check for any old dialog to appear? If so, the behaviour is as expected, and the check passed.

Was a test script given to a tester that included the instruction to start a backup, start a second backup, and then check for a dialog to appear, including the words “Failed to run now: task not executed”? Or any old dialog that hinted at something? If so, the behaviour is as expected, and the “manual” test passed.

Here’s what that first dialog could have said: “A backup is in progress. Please wait for that backup to complete before starting another.”

At this company, what is the basic premise for testing? When testing is designed, and when results are interpreted, is the focus on confirming that the product “works as expected”? If so, and if the expectations above are met, no bug will be noticed. To me, this illustrates the basic bankruptcy of testing to confirm expectations; to “make sure the tests all pass”; to show that the product “meets requirements”. “Meets requirements”, in practice, is typically taken to mean “is consistent with statements in a requirements document, however misbegotten those statements might be”.

Instead of confirmation, “pass or fail”, “meets the requirements (documents)” or “as expected”, let’s test from the perspective of two questions: “Is there a problem here?” and “Are we okay with this?” As we do so, let’s look at some of the observations that we might make were and questions we might ask. (Notice that I’m doing this without reference to a specification or requirements document.)

Upon starting a local backup and then attempting to start an online backup, I observe this dialog.

I am surprised by the dialog. My surprise is an oracle, a means by which I might recognize a problem. Why am I surprised? Is there a problem here?

I had a desire to create a local backup and an online backup at the same time. On a multi-tasking, multi-threaded operating system, that desire seems reasonable to me, and I’m surprised that it didn’t happen.

Inconsistency with reasonable user desire is an oracle principle, linked to quality criteria that might include capability, usability, performance, and charisma. The product apparently fails to fulfill quality criteria that, in my opinion, a reasonable user might have. Of course, as a tester, I don’t run the project. So I must ask the designer, or the developer, or the product manager: Are we okay with this?

This might be exactly the dialog that has been programmed to appear under this condition—whatever the condition is. I don’t know that condition, though, because the dialog doesn’t tell me anything specific about the problem that the software is having with fulfilling my desire. So I’m somewhat frustrated, and confused. Is there a problem here?

I can’t explain or even understand what’s going on, other than the fact that my desire has been thwarted. My oracle—pointing to a problem—is inconsistency with explainability, in addition to inconsistency with my desires. So I’m seeing a potential problem not only with the product’s behaviour, but also in the dialog. Are we okay with this?

Maybe more information will clear that up.

Still nothing more useful here. All I see is a bunch of error codes; no further explanation of why the product won’t do what I want. I remain frustrated, and even more confused than before. In fact, I’m getting annoyed. Is there a problem here?

One key purpose of a dialog is to provide a user with useful information, and the product seems inconsistent with that (the inconsistency-with-purpose oracle). Are these codes correct? Maybe these error codes are wildly wrong. If they are, that would be a problem too. If that’s the case, I don’t have a spec available, so that’s a problem I’m simply going to miss. Are we okay with that?

I have to accept that, as a human being, there are some problems I’m going to miss—although, if I were testing this in-house, there are things I could do to address the gaps in my knowledge and awareness. I could note the codes and ask the developer about them; or I could ask for a table of the available codes. (Oh… no one has collected a comprehensive listing of the error codes; they’re just scattered through the product’s source code. Are we okay with this?)

Back to the dialog. Maybe those error codes are precisely correct, but they’re not helping me. Are we okay with this?

All right, so there’s that Knowledge Base button. Let’s try it. When I click on the button, this appears:

Let’s look at this in detail. I observe the title: 32493: Acronis True Image: “Failed to run now: task not executed.” That’s consistent with the message that was in the dialog. I notice the dates; something like this has been appeared in the knowledgebase for a while. In that sense, it seems that the product is consistent with its history, but is that a desirable consistency? Is there a problem here?

The error codes being displayed on this Web page seem consistent with the error codes in the dialog, so if there’s a problem with that, I don’t see it. Then I notice the line that says “You cannot run two tasks simultaneously.” Reading down over a long list of products, and through the symptoms, I observe that the product is not intended to perform two tasks simultaneously. The workaround is to wait until the first task is done; then start the second one. In that sense, the product indeed “works as expected”. And yet…are we okay with this?

Once again, it seems to me that attempting to start a second task could be a reasonable user desire. The product doesn’t support that, but maybe we’re okay with that. Yet is there a problem here?

The product displays a terse, cryptic error message that confuses and annoys the user without fulfilling its apparent intended purpose to inform the user of something. The product sends the user to the Web (not even to a local Help file!) to find that the issue is an ordinary, easily anticipated limitation of the program. It does look kind of amateurish to deal with this situation in this convoluted way, instead of simply putting the relevant information in the initial dialog. Is there a problem here?

I believe that this behaviour is inconsistent with an image that the company might reasonably want to project. The behaviour is also inconsistent with the quality criteria we call usability and charisma. A usable product is one that behaves in a way that allows the user to accomplish a task (including dealing with the product’s limitations) quickly and smoothly. A charismatic product is one that does its thing in an elegant way; that engages the user instead of irritating the user; that doesn’t make the development group look silly; that doesn’t prompt a blog post from a customer highlighting the silliness.

So here’s my bug report. Note that I don’t mention expectations, but I do talk about desires, and I cite two oracles. The title is “Unhelpful dialog inconsistent with purpose.” The body would say “Upon attempting to start a second backup while one is in progress, a dialog appears saying ‘Failed to run now: task not executed.’ While technically correct, this message seems inconsistent with the purpose of informing the user that we can’t perform two backup tasks at once. The user is then sent to the (online) knowledge base to find this out. This also seems inconsistent with the product’s image of giving the user a seamless, reliable experience. Is all this desired behaviour?”

Finally: it could be that the testers discovered all of these problems, and laid them out for the the product’s designers, developers, and managers, just as I’ve done here. And maybe the reports were dismissed because the product works “as expected”. But “as expected” doesn’t mean “no problem”. If I can’t trust a backup product to post a simple, helpful dialog, can I really trust it to back up my data?

A Bad Couple of Days

Friday, June 5th, 2015

I’m home in Toronto for a day after several weeks of helping people learn to test software, and as far as I can see, the whole Web is screwed up. Here are some of the things that have happened in the last 48 hours or so.

  • A fellow on Twitter told me about an interesting Skype bug: send the string “http://:” (no quotes), and Skype hangs. Fpr me, it did more than hang; I was unable to restart Skype. I tried to update my Skype client; this was blocked by an error 1603. I tried uninstalling Skype; 1603. I tried using the Microsoft Fixit tool, which repairs corrupted Windows Registry entries (apparently some aspect of update, uninstalling, or trying to reinstall Skype leads to corrupted registry entries); still an error 1603. I went through the Registry myself, removing all the references to Skype I could find; still no joy. Eventually I was able to download a complete installation package, which finally worked; I can only speculate as to why.
  • I tweeted about the bug, copying Skype Support on the tweet. In reply, Skype support claimed that the initial problem, the http://: problem, had been fixed. (This claim appears to be true.) “We have already addressed the issue 😀 … Just update your Skype and you are good to go :)” I replied, “1603” to this. Skype Support responded, “Take a look at this Community post – maybe it will be of help”. (It wasn’t, really.) I eventually replied, “Yes, I’m reinstalled. Question, though: why a community post rather than an official, researched one from your organization?” The answer was “The Skype Community is quite adept at offering Skype Support, particularly for errors such as this that are not very common.” My reply was, “Fair enough… but why would the official support channel not be so adept?” Or at least, that’s what my reply would have been, but…
  • Twitter has blocked my account, apparently due to suspicious activity. That’s not what the iPad or iPhone clients say, though. They simply say, “Sorry, we weren’t able to send your Tweet. Would you like to retry or save your Tweet to drafts?” From the mobile Web client, I was able to enter a Tweet and apparently have the client accept the input, but no Tweet appeared, and no error message appeared either.

    Eventually, with a little detective work, I was able to determine that my account was blocked due to suspicious activity. The Web site advised me to reset my password. There are two ways to authenticate yourself: one is to have an email message sent to your registered address. I chose this option twice; the email never appeared. I opted to have a code sent to my phone via SMS. The code appeared almost immediately.I entered it into the Password Reset page. I was startled to see the message “We couldn’t find your account with that information.” I repeated the process twice more with the same result.

    On the fourth time, of course, I was told that I had tried to reset my password too many times, and that I would have to wait an hour. I waited an hour and ten minutes, and tried again. My account remained locked. A button offered the opportunity to “contact support”. This isn’t exactly a means of contacting any human support person, but a set of pages offering suggestions and some options for troubleshooting.

    Eventually I found a form that (apparently) affords a means of contacting people. The confirmation page noted “We are usually able to respond within a few days, but some issues may take longer. Please check your email inbox for an email from Twitter Support.” But remember… if Twitter had attempted to contact me via email, that didn’t work.

  • Someone contacted me for help via LinkedIn. I went to LinkedIn’s internal email client, and started a reply to him (I’ll anonymize his name here). “Hi, Jules…”, I typed, and press the Enter key. The Enter key had no effect. End of attempt to reply. Some software change has been made since the last time I answered a personal message on LinkedIn. Did anyone try this after the change was made?
  • Amnesty International sent me some mail on a campaign they’re running, which I agree with. I clicked on the “Take Action” button, which took me to a form soliciting my name and address, which Chrome’s auto-fill settings supplied automatically. Upon pressing Send, an error message appeared: “PAF violation: Insufficient address data.” My address data was entirely correct; I’ve used this page and that address dozens of times before. I left the browser window open. About half an hour later, I clicked on the button again, without having changed the data. The site accepted the data this time. I don’t know why it failed, and I don’t know why it started working again.

At the end of the day, these things aren’t life-or-death problems. What worries me more is that software being developed for life-or-death contexts may be developed in something like the same way. That software—because it is subject to a value-destroying problem based on a single misplaced bit—often may be just as fragile, just as unreliable, just as vulnerable. It frightens me that the company developing the Google Car is the same company that developed Google Buzz. It annoys and frustrates me that organizations that supposedly provide a service have chipped away at the idea of live, real-time customer support.

So, I’ll see you on Twitter. Eventually. Maybe.

Taking Severity Seriously

Wednesday, January 14th, 2015

There’s a flaw in the way most organizations classify the severity of a bug. Here’s an example from the Elementool Web site (as of 14 January, 2015); I’m sure you’ve seen something like it:

Critical: The bug causes a failure of the complete software system, subsystem or a program within the system.
High: The bug does not cause a failure, but causes the system to produce incorrect, incomplete, inconsistent results or impairs the system usability.
Medium: The bug does not cause a failure, does not impair usability, and does not interfere in the fluent work of the system and programs.
Low: The bug is an aesthetic (sic —MB), is an enhancement (ditto) or is a result of non-conformance to a standard.

These are serious problems, to be sure—and there are problems with the categorizations, too. (For example, non-conformance to a medical device standard can get you publicly reprimanded by the FDA; how is that low severity?) But there’s a more serious problem with models of severity like this: they’re all about the system as though no person used that system. There’s no empathy or emotion here; there’s no impact on people. The descriptions don’t mention the victims of the problem, and they certainly don’t identify consequences for the business. What would happen if we thought of those categories a little differently?

Critical: The bug will cause so much harm or loss that customers will sue us, regulators will launch a probe of our management, newspapers will run a front-page story about us, and comedians will talk about us on late night talk shows. Our company will spend buckets of money on lawyers, public relations, and technical support to try to keep the company afloat. Many capable people will leave voluntarily without even looking for a new job. Lots of people will get laid off. Or, the bug blocks testing such that we could miss problems of this magnitude; go back to the beginning of this paragraph.

High: The bug will cause loss, harm, or deep annoyance and inconvenience to our customers, prompting them to flood the technical support phones, overwhelm the online chat team, return the product demanding their money back, and buy the competitor’s product. And they’ll complain loudly on Twitter. The newspaper story will make it to the front page of the business section, and our product will be used for a gag in Dilbert. Sales will take a hit and revenue will fall. The Technical Support department will hold a grudge against Development and Product Management for years. And our best workers won’t leave right away, but they’ll be sufficiently demoralized to start shopping their résumés around.

Medium: The bug will cause our customers to be frustrated or impatient, and to lose faith in our product such that they won’t necessarily call or write, but they won’t be back for the next version. Most won’t initiate a tweet about us, but they’ll eagerly retweet someone else’s. Or, the bug will annoy the CEO’s daughter, whereupon the CEO will pay an uncomfortable visit to the development group. People won’t leave the company, but they’ll be demotivated and call in sick more often. Tech support will handle an increased number of calls. Meanwhile, the testers will have—with the best of intentions—taken time to investigate and report the bug, such that other, more serious bugs will be missed (see “High” and “Critical” above). And a few months later, some middle manager will ask, uncomprehendingly, “Why didn’t you find that bug?”

Low: The bug is visible; it makes our customers laugh at us because it makes our managers, programmers, and testers look incompetent and sloppy—and it causes our customers to suspect deeper problems. Even people inside the company will tease others about the problem via grafitti in the stalls in the washroom (written with a non-washable Sharpie). Again, the testers will have spent some time on investigation and reporting, and again test coverage will suffer.

Of course, one really great way to avoid many of these kinds of problems is to focus on diligent craftsmanship supported by scrupulous testing. But when it comes to that discussion in that triage meeting, let’s consider the impact on real customers, on the real people in our company, and on our own reputations.

I’ve Had It With Defects

Wednesday, April 2nd, 2014

The longer I stay in the testing business and reflect on the matter, the more I believe the concept of “defects” to be unclear and unhelpful.

A program may have a coding error that is clearly inconsistent with the program’s specification, whereupon I might claim that I’ve found a defect. The other day, an automatic product update failed in the middle of the process, rendering the product unusable. Apparently a defect. Yet let’s look at some other scenarios.

  • I perform a bunch of testing without seeing anything that looks like a bug, but upon reviewing the code, I see that it’s so confusing and unmaintainable in its current state that future changes will be risky. Have I found a defect? And how many have I found?
  • I observe that a program seems to be perfectly coded, but to a terrible specification. Is the product defective?
  • A program may be perfectly coded to a wonderfully written specification— even though the writer of the specification may have done a great job at specifying implementation for a set of poorly conceived requirements. Should I call the product defective?
  • Our development project is nearing release, but I discover a competitive product with this totally compelling feature that makes our product look like an also-ran. Is our product defective?
  • Half the users I interview say that our product should behave this way, saying that it’s ugly and should be easier to learn; the other half say it should behave that way, pointing out that looks don’t matter, and once you’ve used the product for a while, you can use it quickly and efficiently. Have I identified a defect?
  • The product doesn’t produce a log file. If there were a log file, my testing might be faster, easier, or more reliable. If the product is less testable than it could be, is it defective?
  • I notice that the Web service that supports our chain of pizza stores slows down noticeably dinner time, when more people are logging in to order. I see a risk that if business gets much better, the site may bog down sufficiently that we may lose some customers. But at the moment, everything is working within the parameters. Is this a defect? If it’s not a defect now, will it magically change to a defect later?

On top of all this, the construct “defect” is at the centre of a bunch of unhelpful ideas about how to measure the quality of software or of testing: “defect count”; “defect detection rate”; “defect removal efficiency”. But what is a defect? If you visit LinkedIn, you can often read some school-marmish clucking about defects. People who talk about defects seem refer to things that are absolutely and indisputably wrong with the product. Yet in my experience, matters are rarely so clear. If it’s not clear what is and is not a defect, then counting them makes no sense.

That’s why, as a tester, I find it much more helpful to think in terms of problems. A problem is “a difference between what is perceived and what is desired” or “an undesirable situation that is significant to and maybe solvable by some agent, though probably with some difficulty”. (I’ve written more about that here.) A problem is not something that exists in the software as such; a problem is relative, a relationship between the software and some person(s). A problem may take the form of a bug—something that threatens the value of the product—or an issue—something that threatens the value of the testing, or of the project, or of the business.

As a tester, I do not break the software. As a reminder of my actual role, I often use a joke that I heard attributed to Alan Jorgenson, but which may well have originated with my colleague James Bach: “I didn’t break the software; it was broken when I got it.” That is, rather than breaking the software, I find out how and where it’s broken. But even that doesn’t feel quite right. I often find that I can’t describe the product as “broken” per se; yet the relationship between the product and some person might be broken. I identify and illuminate problematic relationships by using and describing oracles, the means by which we recognize problems as we’re testing.

Oracles are not perfect and testers are not judges, so to me it would seem presumptuous of me to label something a defect. As James points out, “If I tell my wife that she has a defect, that is not likely to go over well. But I might safely say that she is doing something that bugs me.” Or as Cem Kaner has suggested, shipping a product with known defects means shipping “defective software”, which could have contractual or other legal implications (see here and here, for examples).

On the one hand, I find that “searching for defects” seems too narrow, too absolute, too presumptuous, and politically risky for me. On the other, if you look at the list above, all those things that were questionable as defects could be described more easily and less controversially as problems that potentially threaten the value of the product. So “looking for problems” provides me with wider scope, recognizes ambiguity, encourages epistemic humility, and acknowledges subjectivity. That in turn means that I have to up my game, using many different ways to model the product, considering lots of different quality criteria, and looking not only for functional problems but anything that might cause loss, harm, or annoyance to people who matter.

Moreover, rejecting the concept of defects ought to help discourage us from counting them. Given the open-ended and uncertain nature of “problem”, the idea of counting problems would sound silly to most people—but we can talk about problems. That would be a good first step towards solving them—addressing some part of the difference between what is perceived and what is desired by some person or persons who matter.

That’s why I prefer looking for problems—and those are my problems with “defects”.


Monday, July 23rd, 2012

Several years ago, I wrote an article for Better Software Magazine called Testing Without a Map. The article was about identifying and applying oracles, and it listed several dimensions of consistency by which we might find or describe problems in the product. The original list came from James Bach.

Testers often say that they recognize a problem when the product doesn’t “meet expectations”. But that seems empty to me; a tautology. Testers can be a lot more credible when they can describe where their expectations come from. Perhaps surprisingly, many testers struggle with this, so let’s work through it.

Expectations about a product revolve around desirable consistencies between related things.

  • History. We expect the present version of the system to be consistent with past versions of it.
  • Image. We expect the system to be consistent with an image that the organization wants to project, with its brand, or with its reputation.
  • Comparable Products. We expect the system to be consistent with systems that are in some way comparable. This includes other products in the same product line; competitive products, services, or systems; or products that are not in the same category but which process the same data; or alternative processes or algorithms.
  • Claims. We expect the system to be consistent with things important people say about it, whether in writing (references specifications, design documents, manuals, whiteboard sketches…) or in conversation (meetings, public announcements, lunchroom conversations…).
  • Users’ Desires. We believe that the system should be consistent with ideas about what reasonable users might want. (Update, 2014-12-05: We used to call this “user expectations”, but those expectations are typically based on the other oracles listed here, or on quality criteria that are rooted in desires; so, “user desires” it is. More on that here.)
  • Product. We expect each element of the system (or product) to be consistent with comparable elements in the same system.
  • Purpose. We expect the system to be consistent with the explicit and implicit uses to which people might put it.
  • Statutes. We expect a system to be consistent with laws or regulations that are relevant to the product or its use.

I noted that, in general, we recognize a problem when we observe that the product or system is inconsistent with one or more of these principles; we expect this from the product, and when we get that, we have reason to suspect a problem.

(If I were writing that article today, I would change expect to desire, for reasons outlined here.)

“In general” is important. Each of these principles is heuristic. Oracle principles are, like all heuristics, fallible and context-dependent; to be applied, not followed. An inconsistency with one of the principles above doesn’t guarantee that there’s a problem; people make the determination of “problem” or “no problem” by applying a variety of oracle principles and notions of value. Our oracles can also mislead us, causing us to see a problem that isn’t there, or to miss a problem that is there.

Since an oracle is a way of recognizing a problem, it’s a wonderful thing to be able to keep a list like this in your head, so that you’re primed to recognize problems. Part of the reason that people have found the article helpful, perhaps, is that the list is memorable: the initial letters of the principles form the word HICCUPPS. History, Image, Claims, Comparable products, User expectations (since then, changed to “user desires”), Product, Purpose, and Statutes. 

With a little bit of memorization and practice and repetition, you can rattle off the list, keep it in your head, and consult it at moment’s notice. You can use the list to anticipate problems or to frame problems that you perceive.
Another reason to internalize the list is to be able to move quickly from a feeling of a problem to an explicit recognition and description of a problem. You can improve a vague problem report by referring to a specific oracle principle. A tester’s report is more credible when decision-makers (program managers, programmers) can understand clearly why the tester believes an observation points to a problem.

I’ve been delighted with the degree to which the article has been cited, and even happier when people tell me that it’s helped them. However, it’s been a long time since the article was published, and since then, James Bach and I have observed testers using other oracle principles, both to anticipate problems and to describe the problems they’ve found. To my knowledge, this is the first time since 2005 that either one of us has published a consolidated list of our oracle principles outside of our classes, conference presentations, or informal conversations. Our catalog of oracle principles now includes:

  • Statutes and Standards. We expect a system to be consistent with relevant statutes, acts, laws, regulations, or standards. Statutes, laws and regulations are mandated mostly by outside authority (though there is a meaning of “statute” that refers to acts of corporations or their founders). Standards might be mandated or voluntary, explicit or implicit, external to the development group or internal to it.

    What’s the difference between Standards and Statutes versus Claims? Claims come from inside the project. For Standards and Statutes, the mandate comes from outside the project. When a development group consciously chooses to adhere to a given standard, or when a law or regulation is cited in a requirements document, there’s a claim that would allow us to recognize a problem. We added Standards when we realized that sometimes a tester recognizes a potential problem for which no explicit claim has yet been made.

    While testing, a tester familiar with a relevant standard may notice that the product doesn’t conform to published UI conventions, to a particular RFC, or to an informal, internal coding standard that is not controlled by the project itself.

    Would any of these things constitute a problem? At least each would be an issue, until those responsible for the product declare whether to follow to the standard, to violate some points in it, or reject it entirely.

    A tester familiar with the protocols of an FDA audit might recognize gaps in the evidence that the auditor desires.  Similarly, a tester familiar with requirements in the Americans With Disabilities Act might recognize accessibility problems that other testers might miss. Moreover, an expert tester might use her knowledge of the standard to identify extra cost associated with misunderstanding of the standard, excessive documentation, or unnecessary conformance.

  • Explainability. We expect a system to be understandable to the degree that we can articulately explain its behaviour to ourselves and others.If, as testers, we don’t understand a system well enough to describe it, or if it exhibits behaviour that we can’t explain, then we have reason to suspect that there might be a problem of one kind or another. On the one hand, there might be a problem in the product that threatens its value. On the other hand, we might not know the about the product well enough to test it capably. This is, arguably, a bigger problem than the first. Our misunderstanding might waste time by prompting us to report non-problems. Worse, our misunderstandings might prevent us for recognizing a genuine problem when it’s in front of us.

    Aleksander Simic, in a private message, suggests that the explainability heuristic extends to more members of the team than testers. If a programmer can’t explain code that she must maintain (or worse, has written), or if a development team has started with something ill-defined and confusion is moving slowly through the product, then we have reason to suspect, investigate, or report a problem. I agree with Aleksander. Any kind of confusion in the product is an issue, and issues are petri dishes for bugs.

  • World. We expect the product to be consistent with things that we know about or can observe in the world.Often this kind of inconsistency leads us to recognize that the product is inconsistent with its purpose or with an expectation that we might have had, based on our models and schemas.  When we’re testing, we’re not able to realize and articulate all of our expectations in advance of an observation. Sometimes we notice an inconsistency with our knowledge of the world before we apply some other principle.This heuristic can fail when our knowledge of the world is wrong; when we’re misinformed or mis-remembering. It can also fail when the product reveals something that we hadn’t previously known about the world.

There is one more heuristic that testers commonly apply as they’re seeking problems, especially in an unfamiliar product. Unlike the preceding ones, this one is an inconsistency heuristic:

  • Familiarity. We expect the system to be inconsistent with patterns of familiar problems.When we watch testers, we notice that they often start testing a product by seeking problems that they’ve seen before. This gives them some immediate traction; as they start to look for familiar kinds of bugs, they explore and interact with the product, and in doing so, they learn about it.Starting to test by focusing on familiar problems is quick and powerful, but it can mislead us. Problems that are significant in one product (for example, polish in the look of the user interface in a commercial product) may be less significant in another context (say, an application developed for a company’s internal users). A product developed in one context (for example, one in which programmers perform lots of unit testing) might have avoided problems familiar to other us in other contexts (for example, one in which programmers are less diligent).

    Focusing on familiar problems might divert our attention away from other consistency principles that are more relevant to the task at hand. Perhaps most importantly, a premature search for bugs might distract us from a crucial task in the early stages of testing: a search for benefits and features that will help us to develop better ideas about value, risk, and coverage, and will inform deeper and more thoughtful testing.Note that any pattern of familiar problems must eventually reduce to one of the consistency heuristics; if it was a problem before, it was because the system was inconsistent with some oracle principle.

Standards was the first of the new heuristics that we noticed; then Familiar problems. The latter threatened our mnenomic! For a while, I folded Standards in with Statutes, suggesting that people memorize HICCUPPS(F), with that inconsistent F coming at the end. But since we’ve added Explainability and World, we can now put F at the beginning, emphasizing the reality that testers often start looking for problems by looking for familiar problems. So, the new mnemonic: (F)EW HICCUPPS. When we’re testing, actively seeking problems in a product, it’s because we desire… FEW HICCUPPS.

This isn’t an exhaustive list. Even if we were silly enough to think that we had an exhaustive list of consistency principles, we wouldn’t be able to prove it exhaustive. For that reason, we encourage testers to develop their own models of testing, including the models of consistency that inform our oracles.

This article was first published 2012-07-23. I made a few minor edits on 2016-12-18, and a few more on 2017-01-26.

When A Bug Isn’t Really Fixed

Tuesday, January 11th, 2011

On Monday, January 10, Ajay Balamurugadas tweeted, “When programmer has fixed a problem, he marks the prob as fixed. Programmer is often wrong. – #Testing computer software book Me: why?”

I intended to challenge Ajay, but I made a mistake, and sent the message out to a general audience:

“Challenge for you: think of at least ten reasons why the programmer might be wrong in marking a problem fixed. I’ll play too. #testing”

True to my word, I immediately started writing this list:

1. The problem is fixed on a narrow set of platforms, but not on all platforms. (Incomplete fix.)

2. In fixing the problem, the programmer introduced a new problem. (In this case, one could argue that the original problem has been fixed, but others could argue that there’s a higher-order problem here that encompasses both the first problem and the second.) (Unwanted side effects)

3. The tester might have provided an initial problem report that was insufficient to inform a good fix. Retesting reveals that the programmer, through no fault of his own, fixed the special case, rather than the general case. (Poor understanding, tester-inflicted in this case)

4. The problem is intermittent, and the programmer’s tests don’t reveal the problem every time. (Intermittent problem)

5. The programmer might have “fixed” the problem without testing the fix at all. Yes, this isn’t supposed to happen. Stuff happens sometimes. (Programmer oversight)

6. The programmer might be indifferent or malicious. (Avoidance, Indifference, or Pathological Problem)

7. The programmer might have marked the wrong bug fixed in the tracking system. (Tracking system problem)

8. One programmer might have fixed the problem, but another programmer (or even the same one) might have broken or backed out the fix since the problem was marked “fixed”. Version control goes a long way towards lessening this problem.  Everyone makes mistakes, and even people of good will can thwart version control sometimes.  (Version control problem)

9. The problem might be fixed on the programmer’s system, but not in any other environments. Inconsistent or missing .DLLs can be at the root of this problem. (Programmer environment not matched to test or production)

10. The programmer hasn’t seen the problem for a long time, and guesses hopefully that he might have fixed it. (Avoidance or indifference)

I had promised myself that I wouldn’t look at Twitter as I prepared the list. By the time I finished, though, the iPhone on my hip was buzzing with Twitter notifications to the point where I was getting a deep-tissue massage.

When I got up this morning, there were still more replies. Peter Haworth-Langford wanted “Do we need to focus on what’s wrong? Solutions? Are we missing something else by focusing on what’s wrong?” The last response on the thread, so far as I know, was Darren McMillan asking, rather like Peter, “I’d like to know which role you are playing when you say I’ll play too? Customer/PM….. Is there more to the challenge?” Nope. And thank goodness for that; I was swamped with replies. I decided to gather them and try grouping them to see if there were patterns that emerged.

Very few problems indeed have any single, unique, and exclusive cause. Moreover, this list is based on Joel’s Law of Leaky Abstractions: “All non-trivial abstractions are to some degree leaky.” Some of the problems listed below may fit into more than one category, and, for you, may fit better into a category to which I’ve arbitrarily assigned them. Cool; we think differently. Let me know how, and why.

Note also that we’re looking at this without prejudice. The object of this game is not to question anyone’s skill or integrity, but rather to try to imagine what could happen if all the stars are misaligned.  We’re not saying that these things always happen, or even that they frequently happen. Indeed, for a couple, of items, they might never have happened. In a brainstorm like this, even wildly improbable ideas are welcome, because they might trigger us to think of a more plausible risk.

Erroneous Fix

Just as people might fail to implement something the first time, they might fail when they try to fix the error, even though their acting in good faith with all of the skill they’ve got.

Lynn McKee: He is a human being and simply made an error in fixing it.
Darren McMillan: The developer didn’t have the skills to apply fix correctly, resulting code caused later regression issues.

That kind of problem gets compounded when we add one or more of the other problems listed below.

Incomplete Fix

Somtimes fixes are incomplete. Sometimes the special case has been fixed, but not the general case. Sometimes that’s because of a poor understanding of the scope of the problem; problems are usually multi-dimensional.  (We’ll get to the root of the misunderstanding later.)

Ben Kelly: ‘Fix’ hid erroneous behavior but did not resolve the underlying problem
Ben Simo: The problem existed in multiple places and requires additional fixes.
Lanette Creamer: Fix isn’t accesible
Lanette Creamer: Fix is not localized.
Lanette Creamer: Fix isn’t triggered in some paths.
Lanette Creamer: Fix doesn’t integrate w other code
Stephen Hill: The programmer might have fixed that symptom of the bug but not dealt with the root cause.
Stephen Hill: Has the fix been applied only to new installs or can it retrospectively fix pre-existing installs too?

Sometimes people will fix the letter of the problem without doing all of the related work.

Ben Kelly: Bug fix did not have accompanying automation checks added (in a culture where this is the norm)

It’s possible for people comply maliciously to the letter of the spec, fixing a coding problem while ignoring an underlying design problem.

Erkan Yilmaz: dev knows since decision abt design it’s bad but against his belief fixs also bug(s). He cant look honestly in mirror anymore

Unwanted Side Effects

Sometimes a good-faith attempt to fix the problem introduces a new problem, or helps to expose an old problem. Much of the time such problems could easily intersect with “Incomplete Fix” above or “Poor Understanding” below.

Ben Simo: The fix created a new problem worse than the solution.
Lanette Creamer: fix breaks some browsers/platforms
Lanette Creamer: fix has memory leaks
Lanette Creamer: fix breaks laws/reqirements
Lanette Creamer: fix slows performance to a crawl.
Michel Kraaij: The bug was fixed, but “spaghetti code” increased.
Michel Kraaij: The dev did fix the bug, but introduced a dozen other bugs. (this issue fixed or not?
Michel Kraaij: The fix increases the user’s manual process to an unacceptable level.
Nancy Kelln: Was never actually a problem. By applying a ‘fix’ they now broke it.
Pete Walen: Mis-read problem description, “fixed” something that was previously working.

Intermittent Problem

In my early days as telephone support person at Quarterdeck, customers used to ask me why we hadn’t fixed a particular problem. I observed that problems that happened in every case, on every platform, tended to get a lot of attention. Sometimes a fix will appear to solve a problem whose symptom is intermittent. The fix might apply to first iteration through a loop, but not subsequent iterations; or for all instances except the intended maximum. Problems may reproduce easily with certain data, and not so often or not at all with other data. Timing, network traffic, available resources can conspire to make a problem intermittent.

Michel Kraaij: The dev happened to use test data which didn’t make the bug occur.
Michel Kraaij: The dev followed “EVERY described step to reproduce the bug” and now the bug didn’t occur anymore.
Pete Walen: Fixed sql query with commands that don’t work on that DB… not that that ever happened to anything I tested…
Pradeep Soundararajan: might have thought the bug to be fixed by tryin to repeat test mentioned in the report although there are other ways to repro

Environment Issues

We’ve all heard (or said) “Well, it works on my machine.” The programmer’s environment many not match the test environments.

Adam Yuret: The fix only works on the Dev’s non-production analogous workstation/environment.
Michel Kraaij: The fix is based on module, which has became obsolete earlier, but wasn’t removed from the dev’s env.
Pradeep Soundararajan: Works on his machine
Stephen Hill: Might be fixed in dev’s environment where he has all the DLLs already in place but not on a clean m/c.

“Works on my machine” is a special case of a more general problem: the programmer’s environment might not be representative of the test environment, but the test environment might not be representative of the production environment, either. There might not be a test environment. Patches, different browser versions, different OS versions, different libraries, different mobile platforms… all those differences can make it appear that a problem has been fixed.

Darren McMillan: Fix on production code blocking customer upgrades
Lynn McKee: No two tests are ever exactly the same, so even tho code change was made something is diff in testers “environment”.
Michel Kraaij: The fix demands a very expensive hardware upgrade for the production environment.
Pete Walen: Fixed code for 1 DB, not for the other 3, and not the one that was used in testing.

Poor Understanding or Explanation

Arguably all problems include an element of this one. Sometimes there’s poor communication between the programmer and the tester, due to either or both. The tester may not have described or explained the problem well, and the programmer provided a perfect fix to the problem as (poorly) described. Sometimes the programmer doesn’t understand the problem or the implications of the fix, and provides an imperfect fix to a well-described problem. Sometimes a report might seem to refer to the same problem as another, when the report really refers to different problem. These problems can be aided or exacerbated by the medium of communication: the bug tracking system, linguistic or cultural differences, written instead of face-to-face communication (or the converse).

Ben Kelly: Programmer can’t reproduce the problem – tester didn’t provide sufficient info.

Ben Simo: The problem wasn’t understood well enough to be satisfactorily fixed.
Michel Kraaij: The dev asked whether “the problem was solved this way?” He got back a “yes”. He just happened to ask the wrong stakeholder. Nancy Kelln: Wasn’t clear what the problem was and they fixed something else.
Pradeep Soundararajan: Assumes it to be a duplicate of a bug he recently fixed.
Zeger Van Hese: Developer solved the wrong problem (talking to the tester would have helped).
Ben Simo: The tester was wrong and gave the programmer information leading to breaking, not fixing, the software.
Michel Kraaij: The tester classifies the bug as “incorrectly fixed”. However, it’s the tester who’s wrong. Bug IS fixed.

Zeger Van Hese: The dev doesn’t see a problem and marks it fixed (as in: functions as designed)

Another variation on poor understanding is that the “problem” might not be a problem at all.

Darren McMillan: It wasn’t a problem after all. Customer actually considered the problem a feature. On hearing about the fix customer cried.
Darren McMillan: Problem came from a tester with a tendency to create his own problems. Wasn’t actually a problem worth fixing.

Finally in this category, a “problem” can be defined as “a difference between things as perceived and things as desired” (that’s from Exploring Requirements Weinberg and Gause). To that, I would add the refinement suggested by the Relative Rule “…to some person, at some time.” A bug is not a thing in the program; it’s a relationship between the product and some person. One way to address a problem is to solve it technically, of course. But there other ways to address the problem: change the perception (without changing the fact of the matter); change the desire; decide to ignore the person with the problem; or wait, such that perhaps the problem, the perception, the desire, or the person are no longer relevant.

Darren McMillan: Fix was a customer support call. Fix satisfied customer, didn’t fit product needs.
Pradeep Soundararajan: Because the bug is the perception not the code

Insufficient Programmer Testing

Partial problem fixes, intermittent problems, and poor understanding tend not to thrive in the face of programmers who think and act critically. Inadequate programmer testing is never the sole source of a problem, but it can contribute to a problem being marked “fixed when it really isn’t.

Ben Simo: The fix wasn’t sufficiently tested.
Darren McMillan: Obvious: wasn’t properly tested, didn’t consult required parties for fix,
Lynn McKee: …and therefore must not have done any or /any/ effective testing on his end first.

Version Control Problems

Version control software was relatively new to most on the PC platform in the middle 1990s. These days, it’s implemented far more commonly, which is almost entirely to the good. Yet no tool can guarantee perfect execution for either individuals or teams, and accidents still happen. Alas, version control can’t guarantee that the customer has the same configuration on which you’re developing or testing—or that he had yesterday, or that he’ll have tomorrow.

George Dinwiddie: Forgot to check in some of the new code.
Nancy Kelln: Bad code merge overwrote the changes. Bug fix got lost.
Pete Walen: Fixed code, did not check it in, fixed it again, check in BOTH (contradictory fixes)
Pete Walen: Fixed code, forgot to check it in. Twice. #hypotheticallyofcourse
Pradeep Soundararajan: Has a fix and marks fix before he commits the code and checks in the wrong one.
Stephen Hill: Programmer might not be using the same code build as the customer so does not get the bug.

Tracking Tool Problems

Sometime our tools introduce problems of their own.

Ben Kelly: Programmer fat-fingers the form response to a bug fix.
Nancy Kelln: Marked the wrong bug as fixed in the tracking tool.
Pradeep Soundararajan: The bug tracking system could have a bug that occasionally registers Fixed for any other option.
Pradeep Soundararajan: He might have overlooked a bug id and marked it fixed while it was the other
Zeger Van Hese: The dev wanted to re-assign, but marked it fixed instead (context: he hates that defect tracking system they made him use)

Process or Responsibility Issues

Perhaps there’s a protocol that should be followed, and perhaps it hasn’t been followed correctly—or at all.

Ben Simo: Perhaps it isn’t the responsibility of one programmer OR one tester to mark a problem fixed on their own.
Darren McMillan: Obvious: didn’t follow company procedure for fixes (fix notes, check lists, communications)
Dave Nicolette: Maybe programmer shouldn’t be the one to say the problem is fixed, but only that it’s ready for another review.
Michel Kraaij: The dev misunderstood the bug fixing process and declared it “fixed” instead of “resolved”.

Avoidance, Indifference, or Pathological Behaviour

We don’t like to think about certain kinds of behaviour, yet they can happen. As Herbert Leeds and Jerry Weinberg put it in one of the first books on computer programming, “When we have been working on a program for a long time, and if someone is pressing us for completion, we put aside our good intentions and let our judgment be swayed.”

Ben Kelly: Programmer is under duress from management to ‘fix all the bugs immediately’
Michel Kraaij: The dev is just lazy
Michel Kraaij: The dev is out of “fixing time”. To make the fake deadline, he declares it fixed.
Pradeep Soundararajan: Is forced by a manager or a stakeholder to do so
Pradeep Soundararajan: That way he is buying more time because it goes through a big cycle before it comes back.

Maybe the pressure comes from outside of work.

Erkan Yilmaz: Dev has import. date w. fiancé’s parents, but boss wants dev 2 work overtime suddenly, family crisis happens

There may be questions of competence and trust between the parties. A report from unskilled tester who cries “Wolf!” too often might not be taken seriously. A bad reputation may influence a programmer to reject a bug with minimal (and insufficient) investigation.

Ben Kelly: Bug was dismissed as the tester reporting had a track record of reporting false positives.

Perhaps someone else was up to some mischief.

Erkan Yilmaz: dev went 2 other person’s pc who was away + used that pc 4 fix/marking – during that he found info that invaded privacy

Sometimes we can persuade ourselves that there isn’t a problem any more, even though we haven’t checked.

Michel Kraaij: The dev has a huge ego and declares EVERY bug he touches as “fixed”.
Pradeep Soundararajan: He did some code change that made the bug unreproducible and hence considers it to be fixed.
Pradeep Soundararajan: Considers it fixed thinking it was logged on latest version & prior versions have it altho code base in production is old
Pradeep Soundararajan: Thinks his colleague already fixed that.
Stephen Hill: Has the person for whom the problem was ‘a problem’ re-tested under the same circumstances as previously?

Sometimes management or measurement exacerbates undesirable behaviours.

Ben Kelly: Programmer’s incentive scheme counts the number of bugs fixed (& today is the deadline)
Ben Simo: Perhaps programmer is evaluated by number of things mark fixed; not producing working software.
Lynn McKee: He is being measured on how many bugs he fixes and hopes no one will notice no actual coding was done.
Pradeep Soundararajan: Yielding to SLA demands to fix a bug within a specific time.

It is remarkable how a handful of experienced people can come up with a list of this length and scope. Thank you all for that.

Another Silly Quantitative Model

Wednesday, July 14th, 2010

John D. Cook recently issued a blog post, How many errors are left to find?, in which he introduces yet another silly quantitative model for estimating the number of bugs left in a program.

The Lincoln Index, as Mr. Cook refers to it here, was used as a model for evaluating typographical errors, and was based on a method for estimating the population of a given species of animal. There are several terrible problems with this analysis.

First, reification error. Bugs are relationships, not things in the world. A bug is a perception of a problem in the product; a problem is a difference between what is perceived and what is desired by some person. There are at least four ways to make a problem into a non-problem: 1) Change the perception. 2) Change the product. 3) Change the desire. 4) Ignore the person who perceives the problem. Any time a product owner can say, “That? Nah, that’s not a bug,” the basic unit of the system of measurement is invalidated.

Second, even if we suspended the reification problem, the model is inappropriate. Bugs cannot be usefully modelled as a single kind of problem or a single population. Typographical errors are not the only problems in writing; a perfectly spelled and syntactically correct piece of writing is not necessarily a good piece of writing. Nor are plaice the only species of fish in the fjords, nor are fish the only form of life in the sea, nor do we consider all life forms as equivalently meaningful, significant, benign, or threatening. Bugs have many different manifestations, from capability problems to reliability problems to compatibility problems to performance problems and so forth. Some of those problems don’t have anything to do with coding errors (which themselves could be like typos or grammatical errors or statements that can interpreted ambiguously). Problems in the product may include misunderstood requirements, design problems, valid but misunderstood implementation of the design, and so forth. If you want to compare estimating bugs in a program to a population estimate, it would be more appropriate to compare it to estimating the number of all biological organisms in a given place. Imagine some of the problems in doing that, and you may get some insight into the problem of estimating bugs.

Third, there’s Djikstra’s notion that testing can show the presence of problems, but not their absence. That’s a way of noting that testing is subject to the Halting Problem. Since you can’t tell if you’ve found the last problem in the product, you can’t estimate how many are left in it.

Fourth, the Ludic Fallacy (part one). Discovery and analysis of problems in a product is not a probabilistic game, but a non-linear, organic system of exploration, discovery, investigation, and learning. Problems are discovered at neither a steady nor a random rate. Indeed, discoveries often happen in clusters as the tester learns about the program and things that might threaten its value. The Lincoln Index, focused on typos—a highly decidable and easily understood problem that could largely be accomplished by checking—doesn’t fit for software testing.

Fifth, the Ludic Fallacy (part two). Mr. Cook’s analysis implies that all problems are of equal value. Those of us who have done testing and studied it for a long time know that, from one time to another, some testers find a bunch of problems, and others find relatively few. Yet those few problems might be of critical significance, and the many of lesser significance. It’s an error to think in terms of a probabilistic model without thinking in terms of the payoff. Related to that is the idea that the number of bugs remaining in the product may not be that big a deal. All the little problems might pale in significance next to the one terrible problem; the one terrible problem might be easily fixable while the little problems grind down the users’ will to live.

Sixth, measurement-induced distortion. Whenever you measure a self-aware system, you are likely to introduce distortion (at best) and dysfunction (at worst), as the system adapts itself to optimize the thing that’s being measured. Count bugs, and your testers will report more bugs—but finding more bugs can get in the way of finding more important bugs. That’s at least in part because of…

Seventh, the Lumping Problem (or more formally, Assimiliation Bias). Testing is not a single activity; it’s a collection of activities that includes (at least) setup, investigation and reporting, and design and execution. Setup and investigation and reporting take time away from test coverage. When a tester finds a problem, she investigates reports it. That time is time that she can’t spend finding other problems. The irony here is that the more problems you find, the fewer problems you have time to find. The quality of testing work also involves the quality of the report. Reporting time, since it isn’t taken into account in the model, will distort the perception of the number of bugs remaining.

Eighth, estimating the number of problems remaining in the product takes time away from sensible, productive activities. Considering that the number of problems remaining is subjective, open-ended, and unprovable, one might be inclined to think that counting how many problems are left is a waste of time better spent on searching for other bad ones.

I don’t think I’ve found the last remaining problem with this model.

But it does remind me that when people see bugs as units and testing as piecework, rather than the complex, non-linear, cognitive process that it is, they start inventing all these weird, baseless, silly quantitative models that are at best unhelpful and that, more likely, threaten the quality of testing on the project.