Blog Posts from November, 2015

Oracles from the Inside Out, Part 8: Successful Stumbling

Thursday, November 26th, 2015

When we’re building a product, despite everyone’s good intentions, we’re never really clear about what we’re building until we try to build some of it, and then study what we’ve built. Even after that, we’re never sure, so to reduce risk, we must keep studying. For economy, let’s group the processes associated with that study—review, exploration, experimentation, modelling, checking, evaluating, among many others—and call them testing. Whether we’re testing running code or testing ideas about it, testing at every step reveals problems in what we’ve built so far, and in our ideas about what we’ve built.

Clever people have the capacity to detect some problems and address them before they become bigger problems. A smart business analyst is aware of unusual exceptions in a workflow, recognizes an omission in the requirements document, and gets it corrected. An experienced designer goes over her design in her head, notices a gap in her model, and refines it. A sharp programmer, pairing with another, realizes that a function is using a data type that will overflow, and points out the problem such that it gets fixed right away.

Notice that in each one of these cases, it’s not quite right to say that the business analyst, the designer, or the programmer prevented a problem. It’s more accurate to say that a person detected a little problem and prevented it from becoming a bigger problem. Bug-stuff was there, but a savvy person stomped it while it was an egg or a nymph, before it could hatch or develop into a full-blown cockroach. In order to prevent bigger problems successfully, we have to become expert at detecting the small ones while they’re small.

Sometimes we can be clever and anticipate problems, and design our testing to shine light on them. We can build collaboration into our designs, review into our specifications, and pairing into our programming. We can set up static analysis tools that check code for inconsistency with explicit rules. When we’re dealing with running code, testing might take the form of specific procedures for a tester to follow; sometimes it takes the form of explicit conditions to observe; and sometimes it takes the form of automated checks. All of these approaches can help to find problems along the way.

It’s a fact that when we’re testing, we don’t always find the problems we set out to find. One reason might be, alas, that the problems have successfully evaded our risk ideas, our procedures, our coverage, and our oracles. But another reason might be that, thanks to people’s diligence, some problems were squashed before they had a chance to encounter our testing for them.

Conversely, some problems that we do find are ones that we didn’t anticipate. Instead, we stumble over them. “Stumbling” may sound unappealing until we consider the role that serendipity—accidental or incidental discovery—has played in every aspect of human achievement.

So here, I’m not talking about stumbling in terms of clumsiness. Instead, I’m speaking in terms of what we might find, against the odds, through a combination of diligent search, experimentation, openness to discovery, and alertness—as people have stumbled over diamonds, lost manuscripts, new continents, or penicillin. Chance favours the explorer and—as Pasteur pointed out—the prepared mind. If we don’t open our testing to problems where customers could stumble, customers will find those places.

Productive stumbling can be extended and amplified by tools. They don’t have to be fancy tools by any means, either.

Example: Stuck for a specific idea about risk heuristic, I created some tables of more-or-less randomized data in Excel, and used a Perl script to cover all of the possible values in a four-digit data field. One of those values returned an inappropriate result—one stumble over a gold nugget of a bug. Completely unexpectedly, though, I also stumbled over a sapphire: while scanning quickly through the log file, using a blink oracle: every now and then, a transaction took ten times longer than it should have courtesy of a startling and completely unrelated bug.

Example: At a client site, I had a suspicion that a test script contained an unreasonable amount of duplication. I opened the file in a text editor, selected the first line in a data structure, hit the Ctrl-F key, and kept hitting it. I applied a blink oracle again: most of the text didn’t change at all; tiny patches, representing a handful of variables flickered. Within a few seconds I had discovered that the script wasn’t really doing anything significant except trying the same thing with different numbers. More importantly, I discovered that the tester needed real help in learning how to create flexible, powerful, and maintainable test code.

Example: I wrote a program as a testing exercise for our Rapid Software Testing class. A colleague used James Bach’s PerlClip tool to discover the limit on the amount of data that the program would accept. From this, he realized that he could determine precisely the maximum numeric value supported by the program, something that I, the programmer, had never considered. (When you’re in the building mindset, there’s a lot that you don’t consider.)

Example: Another colleague, testing the same program, used Excel to generate all of the possible values for one of the input fields. From this he determined that the program was interpreting input strings in ways that, once again, I had never considered. Just this test and the previous one revealed information that exploded my five-line description of the program into fifteen far more detailed lines, laden with surprises and exceptions. One of these lines represents a dangerous and subtle gotcha in the programming language’s standard libraries. All this learning came from a program that is, at its core, only two lines of code! What might we learn about a program that’s two million lines of code?

Example: In this series of posts on oracles, I’ve already recounted the tale of how James took data from hundreds of test runs, and used Excel’s conditional formatting feature to visualize the logged results. The visualizations instantly highlighted patterns that raised questions about the behaviour of a product, questions that fed back into refinements of the requirements and design decisions.

Example: While developing a tool to simulate multi-step transactions in a banking application, I discovered that the order in which the steps were performed had a significant impact on the bank’s profit on the overall transaction. This is only one instance of a pattern I’ve seen over and over again: while developing the infrastructure to perform checking, I stumble over bug after bug in the application to be tested. Subsequently, after the bugs are fixed and the product is stabilized and carefully maintained, the checks—despite their value as change detectors—don’t reveal bugs. Most of the value of the checks gets cashed in the testing activity that produces them.

Example: James performed 3000 identical queries on eBay; one query every two or three seconds. He expected random variation over time (i.e. a “drunkard’s walk”). Instead, the visualization allowed him to see suspicious repeating jumps and drops that looked anything but random. Analysis determined that he was probably seeing the effects of many servers responding to his query—some of which occasionally failed to contribute results before timing out.

These examples show how we can use tools powerfully: to generate data sets and increase coverage, so that we can bring specific conditions to our attention; to amplify signals amidst the noise; to highlight subtle patterns and make them clearly visible; to afford observation of things that we never expected to see; to perturb or stress the system such that rare or hidden problems become perceptible.

The traditional view of an oracle is an ostensibly “correct” reference that we can compare to the output from the program. A common view of test automation is using a tool to act like a robotic and unimaginative user to produce output to be checked against a reference oracle. A pervasive view of testing is nothing more than simple output checking, focused on getting right answers and ignoring the value of raising important new questions. In Rapid Software Testing, we think this is too narrow and limiting a view of oracles, of automation, and of testing itself. Testing is exploring a product and experimenting with it, so that we can learn about it, discover surprising things, and help our clients evaluate whether the product they’ve got is the product they want. Automated checking is only one way in which we can use tools to aid in our exploration, and to shine light on the product—and excellent automated checking depends on exploratory work to help us decide what might be interesting to check and to help us to refine our oracles. An oracle is any means—a feeling, principle, person, mechanism, or artifact—by which we might recognize a problem that we encounter during testing. And oracles have another role to play, which I’ll talk about in the last post in this long series.