DevelopsenseLogo

Pairwise Testing

I wrote a paper on pairwise testing in 2004 (and earlier), and now, in 2007, it’s time for an update. This post is an edited version of an appendix that I’ve recently added to that paper.

First, there appears to be great confusion in the world between orthogonal arrays and pairwise testing. People use the terms interchangeably, but there is a clear and significant difference. I’m fairly proud of the fact that I note that difference in my article albeit in some painful and not-very-interesting-to-most-people detail, and I think I get it right. If we’re going to talk about these things we might as well get them right, so if I’m wrong, I urge you to disabuse me.

Second, I’m no longer convinced of the virtues of either orthogonal arrays or pairwise testing, at least not in the pat and surfacey way that I talked about them in the original version of the article.

An on-the-job experience provided a tremor. The project was already a year behind schedule (for an 18-month project), and in dire shape. Pretty much everyone knew it, so the goal became plausible deniability&emdash;or, less formally, ass-covering. One of the senior project manager looked over my carefully constructed pairwise table, and said “Hey… this looks really good&emdash;this will impress the auditors.” He didn’t have other questions, and he seemed not to be concerned about the state of the project. Impressing the auditors was all that mattered.

This gave me pause, because it suddenly felt as though my work was being used to help fool people. I wondered if I was fooling myself too. Until that moment, I had taken some pride in the work that I had been doing. Figuring out the variables to be tested had taken a long time, and preparing the tables had taken quite a while too. Was the value of my work matching the cost? I suddenly realized that I hadn’t interacted with the product at all. When I finally got around to it, I discovered that the number, scope, and severity of problems in the product were such that the pairwise tests were, for that time, not only unnecessary but a serious waste of time. The product simply wasn’t stable enough to use them. Perhaps much later, after those problems had been fixed, and after I had learned a lot more about the product, I could have done a far better job of creating pairwise tables&emdash;but by then I might have found that pairwise tables wouldn’t have shed light on the things that mattered to the project. At that point I should have been operating and observing the product, rather than planning to test a product that desperately needed testing right away.

My test manager, for whom I have great respect, disappeared from that project due to differences with the project managers, and I was encouraged to disappear a week or two later. The project had been scheduled to deploy about six weeks after that. It didn’t. It eventually got released four months later, was pulled from production, and then re-released about six months after that.

A year or so later, there was an earthquake in the form of this paper by James Bach and Pat Schroeder. If you want to understand a much more nuanced and coherent story about pairwise testing than the one that I prepared in 2004, look there.

Pairwise testing is very seductive. It provides us with a plausible story to tell about one form of test coverage, it’s dressed up in fancy mathematical clothing, and it looks like it might reduce our workload. Does it provide the kind of coverage the kind that’s most important to the project? Is reducing the number of tests we run a goal, or is it a form of goal displacement? Might we be fooling ourselves? Maybe, or maybe not, but I think we should ask. I should have.

8 replies to “Pairwise Testing”

  1. I think the real damage here was the manager’s attitude, not your work.

    It reminds me of my days as a session musician. I’d cut a track for the artist and never see them again. They could have edited all my parts, re-cut them with another drummer, or released the album with stickers proclaiming my awesome work (never happened). You can only be so responsible for how people decide to spin your stuff.

    It appears that an epiphany occurred that caused you to re-analyze the whole shebang. It struck a chord with your internal value system literally. Remember your post on “Emotions in Testing”? Maybe you should add this experience to your presentation. It would be interesting (and beneficial) for us to know how one deals with circumstances where we feel culpable even if we’re not.

    Reply
  2. Hi, Zack…

    Lately I’ve been noticing a serious vulnerability that tends to happen when we use “the”. “The” can leads us to logical fallacies involving sole explanations and single paths of causation and so on.

    There’s no question that project management’s approach in this case was pretty pathological, and I know that managers are generally responsible for the quality of work that I’m directed to do. But I’m partially responsible too. One pattern on the job is for me to fall into routine, ponderous, and heavyweight behaviour when I would likely be much more effective working more rapidly.

    Thanks for the suggestion on using this with the emotions and oracles. I’m not so sure that this post fits with that, but your response suggests that it fits with something. 🙂

    —Michael B.

    Reply
  3. Mickael,

    Certainly not wanting to clog up your blog ( hey that rhymes! ).

    You may recall earlier this year that I fired off the e-flare under the heading “Dealing with Tester Regret”. I believe it was during this time you sent me the link to your presentation for “Emotions in Testing”. You were utilizing them as a heuristic, an oracle, etc. (Pardon the dismissive tone of that last sentence – it is a good work a worthy of reflection.)

    Your circumstance sounds painfully familiar to the problem I was having. While my catalyst was internally motivated – as opposed to externally/managerially – something south of my emotions were engaged. It was my spirit or what some would call ‘conscious’.

    You were used with others to help me navigate those murky waters which had until then involved much penitence and self-mutilation. I responded to this tumult with post #3934 on the Context Driven Software Testing Forum. I was hoping to return the favor.

    As far as the “the” is concerned: consider “the” in this context as being used by an INFP. It is a soft “the” meant to soothe and support; not blame or diagnose.

    Zach…

    Reply
  4. All-pairs is a coverage criterion, like all-triples, orthogonal arrays, all branches, all-fixed-bugs-retested, etc.

    Set aside questions of whether it’s good for impressing the auditors. I think it’s good for at least three other purposes:

    (1) Sometimes, there is a genuine (or perceived) risk of interaction among variables. An all-pairs tool gives you a simply-structured vehicle for designing tests that explore that risk. My primary concern about using all-pairs in this case (any form of combination testing that isn’t carefully thought out) is that people often go through the motions of entering the variables’ values into the input fields but they don’t then continue with the program to see how it uses those values, checking the decisions or calculations that might actually be affected by the combination.(Similarly for combination tests used for configuration testing, you have to use the configured system enough to find out if there is a problem.) I think all-pairs is at least as good a start for the design as any other combination heuristic. Note that I am suggesting that all-pairs is a vehicle for EXPLORING the risk. Once you have a better idea of what variables interact and how, other tests become more interesting.

    (2) Sometimes, there is no reason to know in advance whether there are interactions, but you have to do some budget negotiations. The all-pairs criterion is a way of stating a scope of preliminary combination testing. If you find bugs with these tests, you continue with more tests that are better tailored to the risks you now understand. On the other hand, if you don’t find bugs with these tests, you stop at the agreed stopping point, having spent the expected cost.

    (3) All-pairs provides a structure for considering what variables are worth combining and what values of those variables are worth combining. Not everyone needs that structure, and this structure doesn’t work for every test designer, but it’s a tool in the belt.

    One way to think about this is to distinguish between three types of combination test design: (a) mechanical combination test design (all-pairs is an example, random combinations put together algorithmically with a random number generator is another) (b) scenario-based design, in which you consider how the product is likely to be used or installed and (c) risk-based design in which you consider specifically how these combinations might create a failure. Mechanically-oriented designs are not optimized for real-life emulation or risk. They are just mechanical.

    A second distinction is between using combination testing in an exploratory way or a scripted-regression way. I’ve described using all-pairs in an exploratory way at the system level.

    At the subsystem level, I’d be more likely to use all-pairs in a regression suite. Consider protocol testing — testing communications between two applications (probably two that are authored and maintained separately). The protocol specifies how they communicate. Over time, one program might be revised in accordance with an upgraded protocol (or just be revised incorrectly), causing bad interoperation with the other application. A regression suite of test messages, where the risk over time is very general: a well-formed message (which combines values of many variables) will be misunderstood or a badly-formed-under-this-protocol message will be accepted. As with automated unit tests, I would expect these types of tests to be cheap to create and run and useful over time as refactoring aids, more than as system test aids.

    Reply
  5. Interesting, when I read your excellent presentation on Emotions in Testing, I thought of Pairwise Testing, and that it probably feels a lot better than it actually is.
    It feels as if you’re coping with the problem that all tests can’t be performed.
    My feeling is that educated guessing often work better, but is harder to sell…

    /Rikard

    Reply
  6. Are you referring to Pair-wise test design technique (black box domain technique involving more than one variable)or “Pair” testing involving two testers?

    I feel that you might be mixing “test design technique” with “testing” – a common thing observed as in “Keyword driven testing”, “model based testing”, “action based testing” etc.

    We need to be careful about claims like “xxx technique will systematically reduce the number of test cases while providing full test coverate”.

    No technique, can reduce the number of test cases, it is tester’s model/assertion that some test cases are not required to be executed as they would provide the same information as the rest of other cases.

    Reply

Leave a Comment