DevelopsenseLogo

Why Checking Is Not Enough

Here is a specific, real-world example of testing where the focus doesn’t include explicit checking, and does not result in yes-or-no answers to predetermined questions.

This morning, I acted on a piece of email I received several days ago, offering a free upgrade to a PDF conversion package which I’ll call “PDFThing”. I’ll walk you through what happened, and parts of my thought process as it happened.

Since the email is addressed to me, and since it notes that I had purchased upgrade insurance, I presume that the company has all of the data needed to know which product was associated with that email address.

The mail message includes this text: “It’ll only take a few minutes, but we’ll need your serial number (also known as a license key) to deliver your upgade. [How do I find my serial number?]” (The text in square brackets was a link.)

I note that “upgrade” is misspelled. Spelling can be checked, and a spelling checker would have found that problem, but checks can’t guarantee correct spelling. Do you doubt this? What if I had said that Czechs can’t guarantee correct spelling? Or that cheques can’t guarantee correct spelling? You would have noticed right away, but a spelling checker would not have.

I see a more serious potential problem, though. If the company has data about me, why not provide helpful serial number information directly and immediately? Options include “Your serial number is…” or if they’re worried about someone intercepting the mail, “The serial number can be found in your version of PDFThing by…”, in a way specific to the version that is associated with that email address.

I click on the link. It takes me to an FAQ page that has a list of questions. Conveniently, the question titled “Where do I find my serial number for PDFThing?” already shows an answer:

“It depends on what version of PDFThing you have and where you bought it. If you bought PDFThing 7 from our website, your serial number (in alpha-numeric characters) can be found under the Help tab in the About section.”

I have PDFThing 6, though. And I purchased it at a store. So I apply an oracle: consistency with an implicit purpose. An implicit purpose of this answer is to convey information to users of *any* version of PDFThing, purchased *anywhere*. The answer doesn’t do that. Is this a problem? I don’t know if the product owner will consider this a problem at all, or a problem worth fixing, so I can’t provide a yes-or-no answer. What I can do as a tester is to note a possible problem, and move on.

I decide to open my existing version PDFThing, and I apply another set of consistency heuristics: consistency with history; consistency within the product; consistency with an implicit claim; consistency within the product. Maybe the serial number for version 6 is located in the same place as the serial number for version 7. I click on Help and then About. I find that the serial number is not in the place referred to by the FAQ text, so with respect to the product I own, that text is misleading and incorrect. Plus I apply the “consistency with comparable product” heuristic; many products put the serial number in the Help/About box. All in all, this looks more like a problem. Will the product owner see it that way?

I dimly remember that I received a copy of the serial number in the e-mail that I got when I first registered the product. I go on a hunt for that e-mail. It takes me a few minutes to find it, but eventually I do. I copy the serial number to the clipboard and I returned to the original e-mail and click on upgrade to download the product. My own impatience and exasperation suggests to me that there’s a problem here. Note that although you can test for an emotional reaction, you can’t check for it. At best, you can anticipate things like delays of a certain duration as the program is executing. Measurement theorists call that a surrogate measure—using one kind of measureument as a substitute or a stand-in for the thing that we’d actually like to measure.

When the product finishes downloading, I begin the installation process. I’m prompted for the serial number for my older version, which I provide. The installer accepts the serial number and prompts me for a directory into which the new version of PDFThing should be installed. I notice that the product is being installed into a folder that is specific to version 7. I suspect that the prior version of PDFThing is not being uninstalled, so rather than accepting the default directory, I browse upward. I find that indeed the new product is not replacing the old product. Problem or no problem? For a check to determine this, the decision rule would have to have been decided and programmed in advance.

I go to Add/Remove programs, and begin the uninstallation process for PDFThing 6. The uninstaller posts a dialog saying, “The following applications should be closed before continuing the install”, and in the window beneath, I see a reference to the title of an email message I’m drafting in Outlook. That makes sense; PDFThing 6 installs a toolbar in Outlook so that I can print PDFs directly from the program. Still, there’s no reference to Outlook itself, though. So is that the message that the designer of the program wants me to see?

I close the offending message window and save the message as a draft. I return to the uninstallation dialog, press Retry, and the uninstallation proceeds. It appears to make some progress. I switch to another window and continue working while uninstallation continues in background. After a brief interval, it posts the same dialog as before, but this time tells me that Microsoft Word should be closed before continuing the install. Is this the behaviour that the designer wanted?

I now wonder what would have happened had I not chosen to uninstall PDFThing 6. Would Outlook and Word have acquired a second set of toolbars for PDFThing 7? Would they be separate? Would the new one have replaced the old one? I could have perhaps have programmed checks for that, but would it have been worthwhile to do that? Wouldn’t eyeballing it be cheaper and faster? Maybe not; maybe there are bunches of registry entries and files and configuration settings and stuff connected to Outlook (and Word, and Explorer, and PowerPoint, and Excel) such that we’d really need a program to help us probe that. Would a check have wondered and raised that issue to programmers or designers?

The installation process continues. In the middle, a browser window appears, asking me why I’m uninstalling PDFThing 6. The options are “I don’t want to purchase or continue with the trial”; “I purchased the product and am uninstalling the trial”; “I’m upgrading to the latest version”; “I’m about to move my PDFThing 6 license to another computer.” It seems to me that the third option would be unnecessary if PDFThing 7’s installation program automatically removed PDFThing 6. So is this the uninstallation process that the product owner wants?

I answer the question (I’m upgrading), and the Web page offers a thank you for answering the question. In the interim, the uninstallation process seems to have terminated. Was it successful? I don’t know. Did the designers intend that uninstallation should end immediately? And what if I hadn’t had an active Internet connection; what would have happened then? Would checks raise these questions? Perhaps the development of checks might have, but the checks themselves would not have.

I return to the installer for PDFThing 7, and start it up again. Oddly, I’m not asked for my serial number this time. Has the product retained it from the last attempt? I don’t know. How would I find out?

The installation process carries on for a while, and at the end, I’m presented with a dialog that asks whether I want to buy the product or activate it. I choose the latter; I’ve already bought it. The activation window asks for the serial number. I provide it, and immediately I’m presented with this error dialog (which I haven’t altered):

Note that the dialog is titled “Information”; the name of the product isn’t provided. Look as well at the formatting of the message; it looks sloppy and unprofessional to me. Oh, and dammit, it IS the right serial number (it was accepted last time). Is this what the product owner wants?

I dismiss the dialog, and the activation dialog has a moving graphic indicating that the product is waiting from something. Otherwise, the product seems hung. Just in case, though, I click on the Activate button again. The “Information” dialog above appears again. There’s no choice apparent except to dismiss the two dialogs and get on the line to technical support, whereupon the costs will really begin to rack up.

Now you could say that, if I were a tester working for the PDFThing people, I should have received all of this information before beginning test execution, whereupon I should have prepared checks to be applied against the product. It’s a fine idea. But even when we’re working on the best imaginable teams in the best-managed projects, as soon as we begin to test test, we begin immediately to discover things that no one—neither testers, designers, programmers, nor product owner—had anticipated or considered before testing revealed them. It’s simply fatuous to suggest that everyone involved in the development of the product knows exactly what they will want or need from the outset. It’s even more fatuous to suggest that they should know such a thing. Software development is not simply construction according to prescribed plans. It is development. Like testing itself, it is a process of exploration, discovery, investigation, and learning.

It’s important not to confuse checks with oracles. An oracle is a principle or mechanism by which we recognize a problem. A check is a mechanism, an observation linked to a decision rule. That rule is based on a single application of a single principle. A check provides a signal, a bit, when the product’s behaviour or state is inconsistent with that principle. A check follows a rule; it does not apply a heuristic. Testing, which may include many checks, is not so restricted. Testing may produce a yes-or-no answer, but it may also produce an observation, a question, a concern, a dilemma, a new test idea, or a new check idea. Testing is not governed by rules; it is governed by heuristics that, to be applied appropriately, require sapient awareness and judgement.

Checking is an approach to making sure that we get the right answers, for questions and desired answers that we’ve already determined in advance. A passing check doesn’t tell us that the product is acceptable. At best, a check that doesn’t pass suggests that there is a problem in the product that might make it unacceptable.

Testing incorporates checking, but is a far richer set of activities: exposing ourselves to the unexpected, making new observations, spotting unanticipated problems, and raising new questions. Yet not even testing is about telling people that the product is acceptable. On the one hand, testing may have a different purpose. Cem Kaner, in the BBST course, lists

  • Finding defects
  • Maximizing bug count
  • Blocking premature product releases
  • Helping managers make ship / no-ship decisions
  • Minimizing technical support costs
  • Assessing conformance to specification
  • Assessing conformance to regulations
  • Minimizing safety-related lawsuit risk
  • Finding safe scenarios for use of the product (workarounds that make the product potentially tolerable, in spite of the bugs)
  • Assessing quality
  • Verifying the correctness of the product

To which I would add

  • assessing compatibility with other products or systems
  • assessing readiness for internal deployment
  • ensuring that that which used to work still works
  • design-oriented testing, such as review or test-driven development
  • understanding the workings of a poorly-documented product or library
  • evaluating the usefulness of a new tool or service
  • refining notions of risk

On the other hand, much of the time, we’re testing to help determine whether a product is acceptable for release. But decisions about acceptability are in the hands of managers, programmers, designers; those who build the product (and ultimately, acceptability is the decision of the product owner). Testing is about investigating the product to reveal knowledge that informs the acceptability decision. Sometimes that information comes in the form of binary answers to known questions; checks. Sometimes that information comes in the form of discoveries that pose new ideas, new risks, and new questions for those who are responsible for building and releasing the product.

11 replies to “Why Checking Is Not Enough”

  1. “Checking is an approach to making sure that we get the right answers, for questions and desired answers that we’ve already determined in advance.”

    Nice.

    Michael replies: Thanks!

    Reply
  2. Hi Michael, nice post reinforcing the importance of thinking of the users in software development.

    I have 1 question though re Cem Kaners testing activities, namely the blocking of premature product releases.

    As a Tester, i’m under the impression I shouldn’t be the one blocking releases – I’m not a gatekeeper, more the messenger.

    I have the BBST materials to hand – could you point me to the relevant section &/or elaborate that point please?

    Michael replies: I’d agree with you in almost every case, but the context-driven approach requires us to consider exceptions. Here’s one scenario: You’re working for a company that makes medical gear. The product manager for your group is under serious management pressure to ship to a tight deadline. You have grave doubts about the safety of the device, but you don’t have a showstopper bug in hand. When you ask the programmers about the state of the product, their feet begin to shuffle and their eyes turn away. That lack of confidence is one thing, and you might let it go if it were an online dating service where you can push out a fix in the next continuous build. The knowledge that there’s a good chance that people will die if you don’t probe into the unknown unknowns means that you have to develop a strategy to get the programmers to voice their concerns, to find a killer bug in a big hurry, to buy a little more time, to help the program manager recoginize the significance of the programmers’ emotional states, or to escalate to a higher management level. In other words, blocking the release could become your mission when there are lives on the line and no one else is willing to speak up. Cem tells a story something like this, where for an extended period, a tester found a showstopper about once a week. The subsequent slips gave the rest of the team enough time to find hundreds of less significant but still important bugs. Testing is not just about shipping a product on time and on budget; sometimes it’s about shipping a good enough product a little later.

    The PDFs that come with BBST should have the string “block premature” somewhere in there. It’s certainly littered througout the presentations on Cem’s Web site, http://www.kaner.com.

    Reply
  3. Thanks for the prompt & full reply Michael, greatly appreciated.

    I’ve not worked in such an environment – probably why I didn’t consider it. Certainly in the industries I have worked in there’s no way I could have blocked a release!

    Cheers for the search string – I’m booked into the BBST Foundations next year so I’ll be wading through the course notes as well as the required & recommended reading soon!

    Duncan

    Reply
  4. Fantastic post! I have one curiosity question – did you report your findings to the company providing the PDFThing?

    Michael replies: I haven’t got around to that yet. I will.

    In the past I have attempted to submit findings similar to this, but many companies make it difficult to submit.

    Yes. This is one way that companies poke out their own eyes and cover their ears. Mark Federman has written a wonderful paper on the subject.

    Thanks for the continued inspiration.

    And thank you for the thanks.

    Reply
  5. Many years ago I was encouraged to stay on to take charge of software testing – those were the early days of testing BTW – One of my conditions of taking on the position was that I would have a ‘go / no go’ decision on shipping.

    Both myself and the company (I think) were naive in thinking I would have all the facts in hand but it worked for several years.

    Then along came our new product, it really wasn’t ready for release but the company needed to start selling, at the same time they needed someone to ‘demo’ it to a well known ‘fruit’ company in California so I was shipped off for several weeks.

    To my horror the company actually started to sell the product whilst I was out of the country (UK), I had advised the ‘fruit’ company that it wouldn’t be ready for at least a year when I was there.

    Moral to this story is to supply good empirical evidence to support your position but never to stand in the way of a sound business decision unless you really believe it would harm persons or the company itself and then you need to be damn sure the evidence is there.

    Michael replies: I see another message here: you’re either a tester (in which case you don’t have a go/no-go decision) or you’re the product owner (project manager, program manager, product manager). My message to the world is this: if you want to be a product owner, be a product owner. If you want to be a tester, don’t harbour any illusions that your role is more than an informational one; other people own the product and the business.

    Reply
  6. Hi Michael,

    “Checking is an approach to making sure that we get the right answers, for questions and desired answers that we’ve already determined in advance.”

    Can I conclude that for testing activities we shall forget the good old “Expected results” phrase? And use “Applied oracles” instead?

    Michael replies: Forget it? Or write it off as alarmingly trivial? There’s nothing wrong with the concept of an expected result, but there’s not a whole lot that’s right with it either, in any kind of deep sense. It’s an oracle (a principle or mechanism by which we recognize a problem) that provides us with the expectation of a result; that oracle is heuristic, and therefore fallible. For many testing approaches that I see, the focus is on the “expected result” to the degree that other important things could be missed easily. To get around that problem, we need to look at the product from a variety of perspectives, and to apply a lot more oracles. We also need to apply certain verbal heuristics to help reduce the chance that we’ll miss an important problem. Try adding “unless…” to the end of the statements that you hear, or that you see in a specification document. Try substituting “a” when people say “the”, as in “the killer bug here is…”. Try considering that whatever is going on, something else is also going on.

    So I’d say that your idea of thinking in terms of applying oracles thoughtfully leads to to a much more precise take on what testing is all about. If you haven’t seen it already, you might like a look at this conversation on inputs and expected results..

    “ensuring that that which used to work still works”

    If BBST mentioned, I find this one a bit controversial unless “works” verb has the BBST meaning: “appears to meet some requirement to some degree”. Moreover, I think most of the people would refer to some test-retry activity when read this sentence and that is definitely the area of checking, isn’t it?

    Yes, exactly: “it works” means “something appears to meet some requirement to some degree”, and reconfirming that in and of itself is usually what I would call checking, unless there’s a motivation for repeating the observation. James talks about that here.

    Thanks for the questions and comments.

    Reply
  7. Hi Michael (and happy new year).

    Good post. I commented on checking/testing in a linked in thread about ET after listening to someone talking about diagnosis of bi-polar patients on Radio 4 – see http://tinyurl.com/6ukhft3.

    Now off to read your “What ET is not…” series!

    Reply
  8. “A passing check doesn’t tell us that the product is acceptable. At best, a check that doesn’t pass suggests that there is a problem in the product that might make it unacceptable.”
    as a product manager for 10 years, I would say, what you summarize is my rule of thumb….

    Reply
  9. Great post! I just forwarded it to the whole office because I thought it was worthwhile reading for everyone. From the real-life example to the contrast between testing and checking to the discussion about the various purposes of testing it’s all good stuff that I couldn’t have said better myself.

    Reply

Leave a Comment