Blog Posts from December, 2012

What’s Comparable (Part 2)

Tuesday, December 4th, 2012

In the previous post, Lynn McKee recognized that, with respect to the Comparable Product oracle heuristic, “comparable” can be have several expansive interpretations, and not just one narrow one. I’ll emphasize: “comparable product”, in the context of the FEW HICCUPPS oracle heuristics, can mean any software product, any attribute of a software product, or even attributes of non-software products that we could use as a basis for comparison. (Since “comparable product” is a heuristic, it can fail us by not helping us to recognize a problem, or by fooling us into believing that there is a problem where there really isn’t one. For now, at least, I leave the failure modes for each example below as an exercise for the reader. That said…) Here are some examples of comparable products that we could use when applying this heuristic.

An alternative product. Our product is intended to help us accomplish a particular task or set of tasks. We compare the overall operation of our product to the alternative product and its behaviour, look and feel, output, workflow, and so forth. If our product is inconsistent with a product that helps people do the same thing, then we might suspect a problem in our product. This is the “Microsoft Word vs. OpenOffice” sense of “comparable product”.

A commercially competitive product. This is a special case of “alternative product”. People often hold commercial products to a higher standard than they hold freeware. If our product is inconsistent with another commercial product that is in the same market category (think “Microsoft Word vs. WordPerfect”), then we might suspect a problem in our product.

A product that’s a member of the same suite of products. Imagine being a tester on the enormous team that produces Microsoft Office. In places, Microsoft Outlook’s behaviour is inconsistent with the behaviour of Microsoft Word. We might recognize that a user could be frustrated or annoyed by inconsistencies between those products, because those products could reasonably be expected to have a consistent look and feel. I use both Word and Outlook. Sometimes I want to find a particular word or phrase in a long Outlook message that someone sent me. I press Ctrl-F. Instead of popping open the Find dialog, Outlook opens a copy of the message to be Forwarded. The appropriate key to launch a search for something in Outlook is F4, which by default is assigned to “Redo or Repeat” in Word. (Note that Joel Spolsky’s Law of Leaky Abstractions starts to take effect here. This flavour of the comparable product heuristic starts to leak into territory covered by the “user expectations” heuristic. That’s okay; some overlap between oracle heuristics helps to reduce to chance that we’ll miss a problems if one heuristic misfires. Moreover, weighing information from a variety of oracles helps us to evaluate the signficance of a given problem. There’s another leaky abstraction here too: what is a product? Given that Word is a product and Outlook is a product, is Office a product?)

Two products that are subcomponents within the same larger product. As in the Office/Outlook/Word example just above, Outlook isn’t even consistent within itself. In the (incoming) message reading window, Ctrl-F triggers the Forward Message function. In the (outgoing) message editing window, Ctrl-F does bring up the Find dialog. That’s because I have Outlook configured to use Word’s editor as Outlook’s. (There’s a leaky abstraction here too: the “consistency within the product” heuristic, where similar behaviours and states within the product should be consistent with one another. It’s good when oracles overlap!)

An existing product whose sole purpose is comparable to a specific feature in our product. A very simple product might have a purpose that is directly comparable to a purpose, feature or function in our product. A command-line tool like wc (Unix’ command-line word-count program) isn’t comparable with Microsoft Word in the large, but it can be used as a point of comparison for a specific aspect of Word’s behaviour.

An existing product that is different, yet shares some comparable feature, function, or concept. Many non-testers (and, apparently, many testers too) would consider Halo IV and Microsoft Word to be in completely different categories, yet there are similarities. Both are pieces of computer software; both process data; both exhibit behaviour; both save and restore state; both may change their appearance depending on the display settings. If either one were to crash, respond slowly, or misrepresent something on the screen, we might recognize a problem, and recognizing or conceiving of a problem in one might trigger us to consider a problem in the other.

A chain of events in some product. We might choose to build simple test automation to aid us in comparing the output of comparable functions or algorithms in two products. (For example, if we were testing OpenOffice, we might use scripting to compare OpenOffice’s result of a sin(x) function with Microsoft Excel’s API result, or we could use a call to the Web to obtain comparable output from the sin(x) function in Wolfram Alpha.) Those comparisons may become much more interesting when we chain a number of functions together. Note that if we’re not modeling accurately, coding carefully, and logging diligently, comparisons of chains of events may be harder to analyze.

A product that we develop specifically to implement a comparable algorithm. While working at a bank, I developed an Excel spreadsheet and VBA code to model the business logic for the teller workstation application I was testing. I used the use cases for the application as a specification, which allowed me to predict and test the ways in which which general ledger accounts would be affected by each supported transaction. This was a superb way to learn about the application, the business rules, and the power of Excel.

A reference output or artifact. Those who use FIT or FitNesse develop tables of functions, inputs, and outputs that the tool compares to output from integration-level functions; those tables are comparable products. If our testing mission were to examine the font output of Word, the display from a font management tool could be comparable to Word’s output. The comparable product may not even be instantiated in software or in electronic form. For example, we could compare the fonts in the output of our presentation software to the fonts in a Letraset catalog; we could compare the output from a pocket calculator to the output of our program; we could compare aggregated output from our program to a graph sketched on paper by a statistician; we could compare the data in our mailing list to the data in the postal code book. (Well, we used to do that; now it’s much easier to do it with some scripting that gets the data from the postal service.) More than once I’ve found a bug by comparing the number posted on the “Contact Us” page to the number printed on our business cards or in our marketing material. We could also compare output produced by our program today (to output produced by our program yesterday (an idea that leaks into the “consistency with history” heuristic).

A product that we don’t like. I remember this joke from somewhere in Isaac Asimov’s work: “People compare my violin playing to Jascha Heifetz. They say, ‘A Heifetz he ain’t!'” A comparable product is not always comparable in a desirable way. If someone touts a music management product to me saying “it’s just like iTunes!”, I’m not likely to use it. If people have been known to complain about a product, and our product provides the same basis for a complaint, we suspect a problem with our product. (The Law of Leaky Abstractions applies here too, leaking into the “familiar problems” heuristic, in which a product should be inconsistent with patterns of problems that we’ve seen before.)

Patterns of behaviour in a range or sphere of products. We can compare our product against our experience with technology and with entire classes of relevant or interesting products, without immediately referring to a specific product. “It’s one thing from freeware, but I didn’t expect crashes this often from a professional product.” “Well, this would be passable on a Windows system, but you know about those finicky Mac users.” “Yikes! I didn’t expect this product to make my password visible on-screen!” “Aw geez; the on-screen controls here are just as confusing as they are on a real VCR—and there are no tooltips, either.” “The success code is 1? Non-zero return codes on command-line utilities usually represent errors, don’t they?”

All of these things point to a few overarching points.

  • “Similar” and “comparable” can be interpreted narrowly or broadly. Even when products are dissimilar in important respects, even one point of similarity may be useful.

  • Products can be compared by similarity or by contrast.

  • We can make or prepare comparable products, in addition to referring to existing ones.

  • A comparable product may or may not be computer software.

  • Especially in reference to the few categories above, there is great value for a tester in knowing not only about technologies and functional aspects of products in the same product space, but also about user interface conventions, business or workplace domains, sources of background information, cultural and aesthetic characteristics, design heuristics, and all kinds of other things because…

  • If the object of the exercise is to find problems in the product quickly, it’s a good idea to have access to a requisite variety of ideas about what we might use as bases for comparison. (I describe “requisite variety” here, and Jurgen Appello describes it even better here.)

  • Bugs thrive on overly narrow or overly broad interpretations of “comparable”. Know what you’re comparing, and why the comparison matters to your testing and to your clients.

The comparable product heuristic is an oracle principle, but in describing it here, I haven’t paid much attention to mechanisms by which we might make comparisons. We’ll get to that issue next.

What’s Comparable (Part 1)

Monday, December 3rd, 2012

People interpret requirements and specifications in different ways, based on their models, and their past experiences, and their current context. When they hear or read something, many people tend to choose an interpretation that is familiar to them, which may close off their thinking about other possible interpretations. That’s not a big problem in simple, stable systems. It’s a bigger problem in software development. The problems we’re trying to solve are neither simple nor stable, and the same is true with the software that we’re developing.

The interpretation problem applies not only to software development and testing, but to the teaching of testing too. For example, in Rapid Software Testing, James Bach and I teach that an oracle is a way to recognize a problem, and one of the most important and powerful a broader set of oracle heuristics.

Here’s how the typical experiment went. We started by asking “We’re thinking of applying the comparable product oracle heuristic to a test of Microsoft Word. What product could we use for that?” Almost everyone suggested OpenOffice Writer, which seems to be the last remaining well-known full-featured word processing alternative to Microsoft Word. Some suggested WordPad, or Notepad, although almost everyone who did so suggested that WordPad (much less Notepad) wouldn’t be much use as comparable products. “Why not?” we asked. In general, the answer was that WordPad and Notepad were too simple, and didn’t reflect the complexity of Word.

Then we asked some follow-up questions. Is Word comparable with Unix’s command-line program wc? Most people said No (for some, we had to explain what wc is; it counts the words in a file that you provide as input). It was only when we asked, “What if we were testing the word count feature in Microsoft Word?” that the light began to dawn. When we asked if Word was comparable with Halo (the game), most people still said No. When we encouraged them to think more broadly about specific features of Word that we might compare with Halo, they started to get unstuck, and began to realize that while Word and Halo were dramatically different products in important respects, they were nonetheless comparable on some levels.

By contrast, here’s a conversation with Lynn McKee. The chat has been edited to de-Skypeify it (I’ve removed some typos, fixed some punctuation, and removed a couple of digressions not consequential to the conversation).

Michael: If you were asked, “We’re thinking of applying the comparable product oracle heuristic to a test of Microsoft Word. What product could we use for that?”, how would you answer?

Lynn: Hmmm. Certainly, we could use products such as “Open Office”, “Notepad” and others. Could you tell me more about what “we” are hoping to learn about the product under test to better assess which comparable products to use? Is this a brand new product? A version release? If so, what changed and what functions are we interested in comparing?

Michael: That’s a pretty good answer. A followup question: do you know the wc program, typically available under Unix?

Lynn: Sorry, I am not familiar with that product. Can you tell me more about it? How does it relate to your product? Is your product running on Unix?

Michael: Yes. wc is a command-line program. Its purpose is to count the words in a document. You supply the document as input; it returns the number of words in that document.

Lynn: While you were typing, I used my handy Google search to tell me a bit about Unix WC. Oh interesting, so are you looking to gather information about how capable, performant, etc the word count functionality is within MS Word? Can you tell me more about what functions of MS Word interest you the most? And why?

Michael: One more question: Halo IV — the game. Is that a comparable product to MS Word?

Lynn: Sheesh, I’ve only ever seen ads. Lemme think. It blows people’s brains out…sometimes I want to do that with MS Word. 😉 It would depend on what type of comparison we are hoping to draw. For example, Halo is a game and does require interaction with a user. From a UI perspective, there are menus and other forms of cause-and-effect type of interaction—that is, when I do X, I expect Y. There are also state comparisons I could draw. When I start a new game, save a game, reopen a game I have expectations about the state the game should be in. This is similar to how I may expect a document to behave with states. I may also expect certain behavior with pausing or crashing the game in terms of recovery that could be compared to MS Word. Conversely… if I am looking to compare the product’s ability to display fonts, images, format tables, etc. then I may find very low value in comparing the products. I think that you could compare any two products but you may find very different value in the comparison exercise, depending on what you hope to learn.

This is an answer that I would consider exemplary. I have related it here because it was outstanding in two ways: it was an extremely good answer, but it was also exceptional, in that most people didn’t consider wc or Halo to be even remotely comparable to Microsoft Word without a good deal of prompting. Lynn, on the other hand, recognized that “comparable” doesn’t necessarily mean “highly similar”; it can also mean “anything or any aspect of something that you might use as a basis for comparison“. She immediately questioned the question, to make sure that she understood the task at hand. She also did a bit of research on her own while I was answering the question, and asked some highly relevant questions about risks and particular concerns that I might have. Note that she’s doing important informal work—understanding the testing mission—before making too firm a commitment to what might or might not be considered “comparable” for the purposes of a particular question that we might have about the product.

I’ll have more to say about the Comparable Product heuristic tomorrow.