DevelopsenseLogo

Counting the Wagons

A member of Linked In asks if “a test case can have multiple scenarios”. The question and the comments (now unreachable via the original link) reinforce, for me, just how unhelpful the notion of the “test case” is.

Since I was a tiny kid, I’ve watched trains go by—waiting at level crossings, dashing to the window of my Grade Three classroom, or being dragged by my mother’s grandchildren to the balcony of her apartment, perched above a major train line that goes right through the centre of Toronto. I’ve always counted the cars (or wagons, to save us some confusion later on). As a kid, it was fun to see how long the train was (were more than a hundred wagons?!). As a parent, it was a way to get the kids to practice counting while waiting for the train to pass and the crossing gates to lift.

train

Often the wagons are flatbeds, loaded with shipping containers or the trailers from trucks. Others are enclosed, but when I look through the screening, they seem to be carrying other vehicles—automobiles or pickup trucks. Some of the wagons are traditional boxcars. Other wagons are designed to carry liquids or gases, or grain, or gravel. Sometimes I imagine that I could learn something about the economy or the transportation business if I knew what the trains were actually carrying. But in reality, after I’ve counted them, I don’t know anything significant about the contents or their value. I know a number, but I don’t know the story. That’s important when a single car could have explosive implications, as in another memory from my youth.

A test case is like a railway wagon. It’s a container for other things, some of which have important implications and some of which don’t, some of which may be valuable, and some of which may be other containers. Like railway wagons, the contents—the cargo, and not the containers—are the really interesting and important parts. And like railway wagons, you can’t tell much about the contents without more information. Indeed, most of the time, you can’t tell from the outside whether you’re looking at something full, empty, or in between; something valuable or nothing at all; something ordinary and mundane, or something complex, expensive, or explosive. You can surely count the wagons—a kid can do that—but what do you know about the train and what it’s carrying?

To me, a test case is “a question that someone would like to ask (and presumably answer) about a program”. There’s nothing wrong with using “test case” as shorthand for the expression in quotes. We risk trouble, though, when we start to forget some important things.

  • Apparently simple questions may contain or infer multiple, complex, context-dependent questions.
  • Questions may have more outcomes than binary, yes-or-no, pass-or-fail, green-or-red answers. Simple questions can lead to complex answers with complex implications—not just a bit, but a story.
  • Both questions and answers can have multiple interpretations.
  • Different people will value different questions and answers in different ways.
  • For any given question, there may be many different ways to obtain an answer.
  • Answers can have multiple nuances and explanations.
  • Given a set of possible answers, many people will choose to provide a pleasant answer over an unpleasant one, especially when someone is under pressure.
  • The number of questions (or answers) we have tells us nothing about their relevance or value.
  • Most importantly: excellent testing of a product means asking questions that prompt discovery, rather than answering questions that confirm what we believe or hope.

Testing is an investigation in which we learn about the product we’ve got, so that our clients can make decisions about whether it’s the product they want. Other investigative disciplines don’t model things in terms of “cases”. Newspaper reporters don’t frame their questions in terms of “story cases”. Historians don’t write “history cases”. Even the most reductionist scientists talk about experiments, not “experiment cases”.

Why the fascination with modeling testing in terms of test cases? I suspect it’s because people have a hard time describing testing work qualitatively, as the complex cognitive activity that it is. These are often people whose minds are blown when we try to establish a distinction between testing and checking. Treating testing in terms of test cases, piecework, units of production, simplifies things for those who are disinclined to confront the complexity, and who prefer to think of testing as checking at the end of an assembly line, rather than as an ongoing, adaptive investigation. Test cases are easy to count, which in turn makes it easy to express testing work in a quantitative way. But as with trains, fixating on the containers doesn’t tell you anything about what’s in them, or about anything else that might be going on.


As an alternative to thinking in terms of test cases, try thinking in terms of coverage. Here are links to some further reading:

  • Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.
  • Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.
  • A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.
  • What Counts“, an article that I wrote for Better Software magazine, on problems with counting things.
  • Braiding the Stories” and “Delivering the News“, two blog posts on describing testing qualitatively.
  • My colleague James Bach has a presentation on the case against test cases.
  • Apropos of the reference to “scenarios” in the original thread, Cem Kaner has at least two valuable discussions of scenario testing, as tutorial notes and as an article.

12 replies to “Counting the Wagons”

  1. The boxcar analogy does not translate well to what you were trying to say. Even though you – as a detached observant – might think that doing so is a waste, train engineers do need to count boxcars because it is one of the ways they know how much more load they can put on the train, how much energy the train needs to get from A to B to C to … to Z, how long it might take to unload sections of the train, and even for possible collision avoidance with other trains. You can call this “train management.”

    Michael replies: So counting boxcars is “train management”? Would it matter if the boxcars were empty or full? Full of toxic chemicals or household appliances or beach balls? Where which cars were going? How long it would take to load or unload them? The weight of each one? Please do not confuse counting boxcars with train management, just as I recommend not confusing counting test cases with test management.

    Similarly, counting test cases is oftentimes useful for test management, particularly when exploratory testing is involved. In fact, the concept of “test case” includes exploratory testing and is well-defined. In other words, an exploratory testing session is indeed a group of test cases done in sequence, and there is nothing in the concept that requires it to be scripted (except, of course, your own bias against scripted testing).

    The construct (which is what you mean when you say concept) of “test case” is not well-defined, unless you mean “I, Mario, have oversimplified models of both test cases and exploratory testing that I, Mario, declare to be well-defined.” An exploratory testing session is not a group of test cases done in sequence.

    Your model removes the role of human cognition, of discovery of risk, of tacit knowledge, of the very role of humans who invent and prepare the model, and who contextualize and analyze the results.

    See here: http://ortask.com/test-case-equivalence-part-4/

    I looked at it. It contains the kinds of oversimplifications that would cause me to send any tester who worked for me off for retraining. If he kept missing the point of testing, I’d fire him. Let me quote Nassim Taleb here: “What I call Platonicity, after the ideas (and personality) of the philosopher Plato, is our tendency to mistake the map for the territory, to focus on pure and well-defined “forms,” whether objects, like triangles, or social notions, like utopias (societies built according to some blueprint of what “makes sense”), even nationalities. When these ideas and crisp constructs inhabit our minds, we privilege them over other less elegant objects, those with messier and less tractable structures (an idea that I will elaborate progressively throughout this book). Platonicity is what makes us think that we understand more than we actually do. But this does not happen everywhere. I am not saying that Platonic forms don’t exist. Models and constructions, these intellectual maps of reality, are not always wrong; they are wrong only in some specific applications. The difficulty is that a) you do not know beforehand (only after the fact) where the map will be wrong, and b) the mistakes can lead to severe consequences. These models are like potentially helpful medicines that carry random but very severe side effects. The Platonic fold is the explosive boundary where the Platonic mind-set enters in contact with messy reality, where the gap between what you know and what you think you know becomes dangerously wide. It is here that the Black Swan is produced.”

    Taleb, Nassim Nicholas (2010-05-04). The Black Swan: Second Edition: The Impact of the Highly Improbable Fragility” (Kindle Locations 440-450). Random House, Inc.. Kindle Edition.

    I can also quote Harry Collins: “Computers and their software, are two things. As collections of interacting cogs they must be ‘checked’ to make sure there are no missing teeth and they wheels spin together nicely. Machines are also ‘social prostheses’, fitting into social life where a human once fitted. It is a characteristic of medical prostheses, like replacement hearts, that they do not do exactly the same job as the thing they replace; the surrounding body compensates. Contemporary computers cannot do just the same thing as humans either because they do not fit into society like humans do… Therefore, the surrounding society must compensate for the way the computer fails reproduce what it replaces. To know whether a machine is working satisfactorily, then, is a complex judgment concerning whether it fits well enough for the surrounding humans to happily ‘repair’ the differences; this is much more than a matter of deciding whether the cogs spin right.” (Personal correspondence that formed the basis of the talk that Prof. Collins prepared for EuroSTAR 2013.)

    Neither of these is appeal to authority, by the way. Both of these men have written more succinctly than I could, but my world view and my experience are in accordance with what I’ve quoted here.

    But you can do this to prove me wrong: write a program that does all the testing deemed necessary by a client, and then prepares a complete test report. Until you can do that, I don’t think we should be corresponding. Our world views are too separate to bridge the gap without merely flinging rhetoric at each other.

    Cheers,
    Mario Gonzalez

    Reply
  2. Good post. Totally agree…

    Meanwhile…although I don’t think it was the main point, this post got me thinking about “counting”.

    Usually, a thing can be described in different ways. As I see it, quantity (the amount of things derived by counting) is just another way of describing things. Therefore, I don’t think “counting things” is bad. “Counting things” is just one way of describing things. But, I do think that only “counting things” is bad. Similarly, I think that describing things in only one way is bad.

    Describing things in only one way is bad. Describing things in multiple ways is better.

    Describing things only by counting them is bad. Describing things by counting them and other ways is better.

    Describing wagons only by counting them is bad. Describing wagons by counting them and (for example) their cargo is better.

    Describing “test cases” only by counting them is bad. Describing “test cases” by counting them and other ways is better.

    Michael replies: I’d say that believing that you understand something solely because somebody counted something is bad. Fostering such beliefs is very bad.

    Reply
  3. >> So counting boxcars is “train management”?

    Nope, and I think that is where your misunderstanding begins. Counting boxcars is part of train management, just like counting test cases is part of test management, just like Damian Synadinos above beautifully explains.

    >> It contains the kinds of oversimplifications that would cause me to send any tester who worked for me off for retraining.

    Well, yes. Mathematical models are indeed meant to simplify difficult concepts which is why they are used successfully time and again in many areas of science: physics, biology, chemistry, astronomy, et cetera.

    Michael replies: Yes; and they can oversimplify too. Our ongoing financial problems stem from precisely the kind of oversimplification I’m talking about: the trust in mathematical financial models that flew in the face of common sense. Specifically: lend too much money to people who can’t afford the payments, and you’ll hose not only yourself but your customers too—no matter how swell your mathematical models.

    How interesting that the Wikipedia entry on Kripke Structures is sub-titled model-checking. It’s nice to see that apparently some members of your community understand the distinction between testing and checking.

    Unfortunately, the natural reaction of people who do not understand the models (or mathematics, for that matter) is to think they are gibberish. This is not unexpected.

    There is a whole community of well-informed scientists who know that test cases are well-defined and subsume exploratory testing. Look up Kripke structures and Abstract Interpretation and you will find a good source of knowledge there. However, it is heavy in math, which you might not understand.

    I understand the mathematics of state transition diagrams. I also understand some things that you appear to have missed, “[A Kripke Structure] is a simple abstract machine (a mathematical object) to capture the idea of a computing machine, without adding unnecessary complexities.” (emphasis mine) I agree that those complexities are largely unnecessary when you’re preparing a mathematical proof for a program; when you’re checking it. But as Knuth himself said, “Beware of bugs in the above code; I have only proved it correct, not tried it.” (Interestingly, when I did a fast lookup of the quote (on a site that seems no longer to exist, alas —MAB 2017-12-06) the version I found was tagged with the words “mistakes”, “caution”, and “ethics”.)

    A program is not merely a set of instructions for a computer; that’s the code part. As Cem Kaner puts it, the program is also a communication between various people. Those people include the programmers, the users, the designers, the business, the business’ customers, etc. etc.. Part of the expression of that communication is in code. But the code is not all there is to it, nor is functional correctness the only criterion that people will use to evaluate the quality of the program, nor will something that’s functionally incorrect necessarily pose a serious threat to value. People will also value or discount programs based on reliability, usability, scalability, performance, installability, compatibility, supportability, maintainability, localization and localizability, and design. That is, a program can be functionally correct, and yet people may still hate it when some other quality criteria are threatened.

    Also, look up Model-Based testing and you will find what you want to believe should never exist. Hint: Yuri Gurevich is the central figure here.

    I don’t believe that model-based testing should never exist. I believe that we should do it intensively in certain contexts and to find certain problems. I’d suggest you look up Harry Robinson, who is a wonderful proponent and expert practitioner of model-based testing, and a man that I greatly respect. One reason that I respect him is that he seems to have the epistemic modesty for MBT that you appear to be missing.

    Cheers,
    Mario Gonzalez Macedo
    Ortask founder and chief engineer
    http://www.ortask.com

    So at this point, I have to ask: have you even read my previous replies to you? Just up above? The quote from Taleb? The quote from Collins? One last time: there is power in mathematical models. There is utility in checking programs against those models. We should reject neither those models nor the checking software against them out of hand. We should, however, reject the idea that software is merely a set of instructions for a computer, and subject only to formal proofs or checks against a model. We should reject the idea that well-checked program is necessarily a well-tested program.

    So, have you released your product yet? Ah, no… I see it’s in beta. So now there’s something I don’t understand: if it works as you seem to claim that it will, and a program could be tested entirely by products like the one you’re written, why would you need other people involved in testing it? Why not just run the program on itself, and release it today? You’re not really going to depend on feedback from people, are you?

    Reply
  4. People use this term, “test case”. They use the term “test case” to refer to an idea. One major problem is that many people have little agreement about the idea, itself. Person A may think the idea means “this”, while person B thinks the idea means “that”. If so, then who cares if person A and B happen to use the same term to describe their different ideas?

    Anyway, to proceed, I’ll assume that there *is* a common idea. To proceed, I’ll describe the idea using a phrase from the blog post. The idea is “a question that someone would like to ask (and presumably answer) about a program”. Let’s refer to this idea using the term “test case”. Now, every time I hear “test case”, I’ll mentally replace it with “a question that someone would like to ask (and presumably answer) about a program”. Ok.

    Unfortunately, now we run into another problem: Not all “test cases” (whoops!)…not all “questions that someone would like to ask (and presumably answer) about a program” are the same. Some are big, some are small. Some are complex, some are simple. Some are bumpy, some are smooth.

    So, how much information can I actually get by simply (and ONLY – my original comment) describing these dissimilar “questions that someone would like to ask (and presumably answer) about a program” by their texture? Some, but not much. It isn’t very helpful to ONLY know that “some are bumpy and some are smooth”.

    Similarly, how much information can I actually get by simply (and ONLY) describing these dissimilar “questions that someone would like to ask (and presumably answer) about a program” by their quantity? Some, but not much. It isn’t very helpful to ONLY know that “there are 37”.

    However, I could describe the “questions that someone would like to ask (and presumably answer) about a program” by texture AND quantity to get more info. It is more helpful to know that “there are 37 – 35 are bumpy and 2 are smooth”.

    And, if I keep adding ways to describe these “questions that someone would like to ask (and presumably answer) about a program”, I keep getting more info.

    Of course, as Michael points out, “believing that you understand something solely because somebody counted something is bad”. While I think we’re basically saying the same thing, I agree. Believing that simply (and ONLY) describing these dissimilar “questions that someone would like to ask (and presumably answer) about a program” by their quantity will help you understand something about them is bad.

    For example…be careful trying to understand “how much more load they can put on the train” or “how much energy the train needs” or “how long it might take” of boxcars simply (and ONLY) by counting the boxcars.

    Reply
  5. Whoops…I forgot this…

    I began my post by making an assumption. I needed to make the assumption so I could proceed with the rest of the comment. The assumption was “there is a common idea about test cases.”

    Of course, this is not true.

    So, while I still believe that “Describing things in only one way is bad. Describing things in multiple ways is better”, until there is agreement about what exactly a “test case” is, it doesn’t really matter how you describe them (whatever “them” is). I don’t want to “foster [such] bad beliefs”.

    Reply
  6. >> I see it’s in beta. So now there’s something I don’t understand

    Feel free to join the beta to understand what it’s about. I have reserved a spot just for you.

    Michael replies: Boy, it would sure be swell if you provided answers to my questions. I’ve spent an awful lot of time and patience answering yours.

    Reply
  7. Hello ,

    Sub:Demerits of Model Based Testing.

    Iam shravan and reader of your free valuable writing.
    It is amazing to read this material without a dime.I would like to comment about few demerits on model based testing.They are costly and time consuming to represent the application under test.Test Results might vary between the integration level and system level tests of the same app .It is hard to express complex use case dependancies where bugs can be there.

    Ofcourse,some expressive capabilities can be understood with model based testing like concurrency,deadlock etc…State charts can be used to represent the application under test for certain specific situations.

    Thanks and Regards,
    Shravan.

    Reply
  8. Hello ,

    Sub:Demerits of Model Based Testing.

    Iam shravan .It is amazing to read this material without a dime.I would like to comment about few demerits on model based testing.They are costly and time consuming to represent the application under test.Test Results might vary between the integration level and system level tests of the same app .It is hard to express complex use case dependancies where bugs might be present.

    Ofcourse,some expressive capabilities can be understood with model based testing like concurrency,deadlock etc…State charts,exteneded petrinets can be used to represent the application under test for certain specific components.

    Thanks and Regards,
    Shravan.

    Reply

Leave a Comment