Blog Posts from December, 2011

Why Checking Is Not Enough

Tuesday, December 27th, 2011

Here is a specific, real-world example of testing where the focus doesn’t include explicit checking, and does not result in yes-or-no answers to predetermined questions.

This morning, I acted on a piece of email I received several days ago, offering a free upgrade to a PDF conversion package which I’ll call “PDFThing”. I’ll walk you through what happened, and parts of my thought process as it happened.

Since the email is addressed to me, and since it notes that I had purchased upgrade insurance, I presume that the company has all of the data needed to know which product was associated with that email address.

The mail message includes this text: “It’ll only take a few minutes, but we’ll need your serial number (also known as a license key) to deliver your upgade. [How do I find my serial number?]” (The text in square brackets was a link.)

I note that “upgrade” is misspelled. Spelling can be checked, and a spelling checker would have found that problem, but checks can’t guarantee correct spelling. Do you doubt this? What if I had said that Czechs can’t guarantee correct spelling? Or that cheques can’t guarantee correct spelling? You would have noticed right away, but a spelling checker would not have.

I see a more serious potential problem, though. If the company has data about me, why not provide helpful serial number information directly and immediately? Options include “Your serial number is…” or if they’re worried about someone intercepting the mail, “The serial number can be found in your version of PDFThing by…”, in a way specific to the version that is associated with that email address.

I click on the link. It takes me to an FAQ page that has a list of questions. Conveniently, the question titled “Where do I find my serial number for PDFThing?” already shows an answer:

“It depends on what version of PDFThing you have and where you bought it. If you bought PDFThing 7 from our website, your serial number (in alpha-numeric characters) can be found under the Help tab in the About section.”

I have PDFThing 6, though. And I purchased it at a store. So I apply an oracle: consistency with an implicit purpose. An implicit purpose of this answer is to convey information to users of *any* version of PDFThing, purchased *anywhere*. The answer doesn’t do that. Is this a problem? I don’t know if the product owner will consider this a problem at all, or a problem worth fixing, so I can’t provide a yes-or-no answer. What I can do as a tester is to note a possible problem, and move on.

I decide to open my existing version PDFThing, and I apply another set of consistency heuristics: consistency with history; consistency within the product; consistency with an implicit claim; consistency within the product. Maybe the serial number for version 6 is located in the same place as the serial number for version 7. I click on Help and then About. I find that the serial number is not in the place referred to by the FAQ text, so with respect to the product I own, that text is misleading and incorrect. Plus I apply the “consistency with comparable product” heuristic; many products put the serial number in the Help/About box. All in all, this looks more like a problem. Will the product owner see it that way?

I dimly remember that I received a copy of the serial number in the e-mail that I got when I first registered the product. I go on a hunt for that e-mail. It takes me a few minutes to find it, but eventually I do. I copy the serial number to the clipboard and I returned to the original e-mail and click on upgrade to download the product. My own impatience and exasperation suggests to me that there’s a problem here. Note that although you can test for an emotional reaction, you can’t check for it. At best, you can anticipate things like delays of a certain duration as the program is executing. Measurement theorists call that a surrogate measure—using one kind of measureument as a substitute or a stand-in for the thing that we’d actually like to measure.

When the product finishes downloading, I begin the installation process. I’m prompted for the serial number for my older version, which I provide. The installer accepts the serial number and prompts me for a directory into which the new version of PDFThing should be installed. I notice that the product is being installed into a folder that is specific to version 7. I suspect that the prior version of PDFThing is not being uninstalled, so rather than accepting the default directory, I browse upward. I find that indeed the new product is not replacing the old product. Problem or no problem? For a check to determine this, the decision rule would have to have been decided and programmed in advance.

I go to Add/Remove programs, and begin the uninstallation process for PDFThing 6. The uninstaller posts a dialog saying, “The following applications should be closed before continuing the install”, and in the window beneath, I see a reference to the title of an email message I’m drafting in Outlook. That makes sense; PDFThing 6 installs a toolbar in Outlook so that I can print PDFs directly from the program. Still, there’s no reference to Outlook itself, though. So is that the message that the designer of the program wants me to see?

I close the offending message window and save the message as a draft. I return to the uninstallation dialog, press Retry, and the uninstallation proceeds. It appears to make some progress. I switch to another window and continue working while uninstallation continues in background. After a brief interval, it posts the same dialog as before, but this time tells me that Microsoft Word should be closed before continuing the install. Is this the behaviour that the designer wanted?

I now wonder what would have happened had I not chosen to uninstall PDFThing 6. Would Outlook and Word have acquired a second set of toolbars for PDFThing 7? Would they be separate? Would the new one have replaced the old one? I could have perhaps have programmed checks for that, but would it have been worthwhile to do that? Wouldn’t eyeballing it be cheaper and faster? Maybe not; maybe there are bunches of registry entries and files and configuration settings and stuff connected to Outlook (and Word, and Explorer, and PowerPoint, and Excel) such that we’d really need a program to help us probe that. Would a check have wondered and raised that issue to programmers or designers?

The installation process continues. In the middle, a browser window appears, asking me why I’m uninstalling PDFThing 6. The options are “I don’t want to purchase or continue with the trial”; “I purchased the product and am uninstalling the trial”; “I’m upgrading to the latest version”; “I’m about to move my PDFThing 6 license to another computer.” It seems to me that the third option would be unnecessary if PDFThing 7’s installation program automatically removed PDFThing 6. So is this the uninstallation process that the product owner wants?

I answer the question (I’m upgrading), and the Web page offers a thank you for answering the question. In the interim, the uninstallation process seems to have terminated. Was it successful? I don’t know. Did the designers intend that uninstallation should end immediately? And what if I hadn’t had an active Internet connection; what would have happened then? Would checks raise these questions? Perhaps the development of checks might have, but the checks themselves would not have.

I return to the installer for PDFThing 7, and start it up again. Oddly, I’m not asked for my serial number this time. Has the product retained it from the last attempt? I don’t know. How would I find out?

The installation process carries on for a while, and at the end, I’m presented with a dialog that asks whether I want to buy the product or activate it. I choose the latter; I’ve already bought it. The activation window asks for the serial number. I provide it, and immediately I’m presented with this error dialog (which I haven’t altered):

Note that the dialog is titled “Information”; the name of the product isn’t provided. Look as well at the formatting of the message; it looks sloppy and unprofessional to me. Oh, and dammit, it IS the right serial number (it was accepted last time). Is this what the product owner wants?

I dismiss the dialog, and the activation dialog has a moving graphic indicating that the product is waiting from something. Otherwise, the product seems hung. Just in case, though, I click on the Activate button again. The “Information” dialog above appears again. There’s no choice apparent except to dismiss the two dialogs and get on the line to technical support, whereupon the costs will really begin to rack up.

Now you could say that, if I were a tester working for the PDFThing people, I should have received all of this information before beginning test execution, whereupon I should have prepared checks to be applied against the product. It’s a fine idea. But even when we’re working on the best imaginable teams in the best-managed projects, as soon as we begin to test test, we begin immediately to discover things that no one—neither testers, designers, programmers, nor product owner—had anticipated or considered before testing revealed them. It’s simply fatuous to suggest that everyone involved in the development of the product knows exactly what they will want or need from the outset. It’s even more fatuous to suggest that they should know such a thing. Software development is not simply construction according to prescribed plans. It is development. Like testing itself, it is a process of exploration, discovery, investigation, and learning.

It’s important not to confuse checks with oracles. An oracle is a principle or mechanism by which we recognize a problem. A check is a mechanism, an observation linked to a decision rule. That rule is based on a single application of a single principle. A check provides a signal, a bit, when the product’s behaviour or state is inconsistent with that principle. A check follows a rule; it does not apply a heuristic. Testing, which may include many checks, is not so restricted. Testing may produce a yes-or-no answer, but it may also produce an observation, a question, a concern, a dilemma, a new test idea, or a new check idea. Testing is not governed by rules; it is governed by heuristics that, to be applied appropriately, require sapient awareness and judgement.

Checking is an approach to making sure that we get the right answers, for questions and desired answers that we’ve already determined in advance. A passing check doesn’t tell us that the product is acceptable. At best, a check that doesn’t pass suggests that there is a problem in the product that might make it unacceptable.

Testing incorporates checking, but is a far richer set of activities: exposing ourselves to the unexpected, making new observations, spotting unanticipated problems, and raising new questions. Yet not even testing is about telling people that the product is acceptable. On the one hand, testing may have a different purpose. Cem Kaner, in the BBST course, lists

  • Finding defects
  • Maximizing bug count
  • Blocking premature product releases
  • Helping managers make ship / no-ship decisions
  • Minimizing technical support costs
  • Assessing conformance to specification
  • Assessing conformance to regulations
  • Minimizing safety-related lawsuit risk
  • Finding safe scenarios for use of the product (workarounds that make the product potentially tolerable, in spite of the bugs)
  • Assessing quality
  • Verifying the correctness of the product

To which I would add

  • assessing compatibility with other products or systems
  • assessing readiness for internal deployment
  • ensuring that that which used to work still works
  • design-oriented testing, such as review or test-driven development
  • understanding the workings of a poorly-documented product or library
  • evaluating the usefulness of a new tool or service
  • refining notions of risk

On the other hand, much of the time, we’re testing to help determine whether a product is acceptable for release. But decisions about acceptability are in the hands of managers, programmers, designers; those who build the product (and ultimately, acceptability is the decision of the product owner). Testing is about investigating the product to reveal knowledge that informs the acceptability decision. Sometimes that information comes in the form of binary answers to known questions; checks. Sometimes that information comes in the form of discoveries that pose new ideas, new risks, and new questions for those who are responsible for building and releasing the product.

Scripts or No Scripts, Managers Might Have to Manage

Wednesday, December 21st, 2011

A fellow named Oren Reshef writes in response to my post on Worthwhile Documentation.

Let me be the devil’s advocate for a post.

Not having fully detailed test steps may lead to insufficient data in bug reports.

Yup, that could be a risk (although having fully detailed steps in a test script might also lead to insufficient data in bug reports; and insufficient to whom, exactly?).

So what do you do with a problem like that? You manage it. You train the tester, reminding her of the heuristic that each problem report needs a problem description; an example of something that shows the problem; and why she thinks it’s a problem (that is, the oracle; the principle or mechanism by which the tester recognizes the problem). Problem, example, and why; PEW. You praise and reward the tester for producing reports that follow the PEW heuristic; you critique reports that don’t have them. You show the tester lots of examples of bug reports, and ask her to differentiate between the good ones and the bad ones, why each one might be consider good or bad, and in what ways. If the tester isn’t getting it, you have the tester work with and be coached by someone who does get it. The coach talks the tester through the process of identifying a problem, deciding why it’s a problem, and outlining the necessary information. Sometimes it’s steps and specific data; sometimes the steps are obvious and it’s only the data you need to specify; sometimes the problem happens with any old data, and it’s the steps that are important. And sometimes the description of the problem contains enough information that you need supply neither steps nor data. As a tester under time pressure, she needs to develop the skill to do this rapidly and well—or, if nothing works, she might have to find a job for which she is better suited.

You can argue that a good tester should include the needed information and steps in her bug report, but this raise (at least) two problems:

– The same information may be duplicated across many bugs, and even worst it will not be consistent.

As a manager, I can not only argue that a tester should include the needed information; I can require that a tester include the needed information. Come on, Mr. Advocate… this is a problem that a capable tester and a capable test manager (and presumably your client) can solve. If “the same” information is duplicated across many bugs, might that be an interesting factor worth noting? A test result, if you will? Will this actually persist for long without the test manager (or test leads, or the test team) noticing or managing it?

And in any case, would a script solve the problem that you post above? If you can solve that problem in a script, can you solve it in a (set of) bug report(s)?

Writing test steps is not as trivial as it sounds (for example due to cognitive biases, or simply by overlooking steps that seems obvious to you), and to be efficient they also need to be peer reviewed and tested. You don’t want that to happen in a bug report.

“Writing test steps is not as trivial as it sounds.” I know. It’s non-trivial in terms of time, and it’s non-trivial in terms of skill, and it’s non-trivial in terms of cost. That’s why I write about those problems. That’s why James Bach writes about them.

Again: how do you solve problems like testers providing inefficient repro steps? You solve it with training, practice, coaching, review, supervision, observation, interaction… that is, if you don’t like the results you’re getting, you steer the testers in the direction you want them to go, with leadership and management.

The tester may choose the same steps over and over, or steps that are easier for her but does not represent real customers.

Yes, I often hear things like this to justify poor testing. “Real customers” according to whom? It seems as though many organizations have a problem recognizing that hackers are real; that people under pressure are real; that people who make mistakes are real; that people who can become distracted are real. That people who get up and go away from the keyboard, such that a transaction times out are real.

Is it the role of testers to behave always like idealized “real” customers? That’s like saying that it’s the role of airport security to assume that all of the business class customers are “real” business people. I’d argue that it’s nice for testers to be able to act like customers, but it’s far more important for testers to act like testers. It’s the tester’s role to identify important vulnerabilities in the product. Sometimes that involves behaving like a typical customer, and sometimes it involves behaving like an atypical customer, or and sometimes it involves behaving like someone who is not a customer at all. But again, mostly it involves behaving like a tester.

Again you may argue that a good tester should take all that into account, but it’s not that simple to verify it especially for tests involving many short trivial steps.

Maybe it isn’t that simple. If that’s a problem, what about logging? What about screen capture tools? Such tools will track activities far more accurately than a script the tester allegedly followed. After all, a test script is just a rumour of how something should be done, and the claim that the script was followed is also a rumour. What about direct supervision and scrutiny? What about occasional pairing? What about reviewing the testers’ work? What about providing feedback to testers, while affording them both freedom and responsibility?

And would scripts solve that problem when (for example) you’re a recording bug that you’ve just discovered (probably after deviating from a script)? How, exactly? What happens when a problem identified by a script is fixed? Does the value of the script stay constant over time?

Detailed test steps (at least to some extent) might important if your test activity might be transferred to another offshore team someday (happened to me a few weeks ago, I sent them a test document with only high level details and hoped for the best), or your customer requires in-depth understanding of your tests (a multi-billion Canadian telecommunication company insisted on getting those from us during the late 90’s, we chose the least readable TestDirector export format and shipped it to them…).

Ah, yes. “I sent them a test document with only high level details and hoped for the best.” What can I say about “hope” as a management approach? Does a pile of test scripts impart in-depth understanding? Or are they (as I suspect) a way of responding to a question that you didn’t know how to answer, which was in fact a question that the telco didn’t know how to ask?

Going through some set of actions by rote is not a test. A test script is not a test. A test is what you think and what you do. It is a complex, cognitive activity that requires the presence or the development of much tacit knowledge. Raw data or raw instructions at best provide you with a miniscule fraction of what you need to know. If someone wanted in-depth understanding of how a retail store works, would you send them a pile of uncontextualized cash register receipts?

The Devil’s Advocate never seems to have a thoughtful manager for a client. I would suggest that a tester neither hire nor work for the devil.

Thank you for playing the devil’s advocate, Oren.

What Exploratory Testing Is Not (Part 5): Undocumented Testing

Wednesday, December 21st, 2011

This week I had the great misfortune of reading yet another article which makes the false and ridiculous claim that exploratory testing is “undocumented”. After years and years of plenty of people talking about and writing about and practicing excellent documentation as part of an exploratory testing approach, it’s depressing to see that there are still people shovelling fresh manure onto a pile that should have been carted off years ago.

Like the other approaches to test activities that have been discussed in this series (“touring“, “after-everything-else“, “tool-free“, and “quick testing“), “documented vs. undocumented” is in a category orthogonal to “exploratory vs. scripted”. True: usually scripted activities are performed by some agency following a set of instructions that has been written down somewhere. But we could choose to think of “scripted” in a slightly different and more expansive way, as “prescriptive”, or “mimeomorphic“. A scripted activity, in this sense, is one for which the actions to be performed have been established in advance, and the choices of the actions are not determined by the agency performing them. In that sense, a cook at McDonalds doesn’t read a script as he prepares your burger, but the preparation of a McDonald’s burger is a highly scripted activity.

Thus any kind of testing can be heavily documented or completely undocumented. A thoroughly documented test might be highly exploratory in nature, or it might be highly scripted.

In the Rapid Software Testing class, James Bach and I point out that when someone says “that should be documented”, what they’re really saying is “that should be documented if and how and when it serves our purposes.” So, let’s start by looking at the “when”.

When we question anything in order to evaluate it, there are moments in the process in which we might choose to record ideas or actions. I’ve broken these down into three basic categories that I hope you find helpful:

  • Before

  • During

  • After

There are “before”, “during”, and “after” moments with respect to any test activity, whether it’s a part of test design, test execution, result interpretation, or learning. Again, a hallmark of exploratory testing is the tester’s freedom and responsibility to optimize the value of the work as it’s happening. That means that when it’s important to record something, the tester is not only welcome but encouraged to

  • pick up a pen
  • take a screen shot
  • launch a session of Rapid Reporter
  • create or update a mind map
  • fire up a screen recorder
  • initiate logging (if it doesn’t start by default on the product you’re testing—and if logging isn’t available, you might consider identifying that as a testability problem and a related product and project risk)
  • sketch out a flowchart diagram
  • type notes into a private or shared repository
  • add to a table of data in Excel
  • fire off a note to a programmer or a product owner
and that’s an incomplete list. But they’re all forms of documentation.

Freedom to document at will should also mean that the tester is free to refrain from documenting something when the documentation doesn’t add value. At the same time, the tester is responsible and accountable for that decision. In Rapid Testing, we recommend writing down (or saving, or illustrating) only the things that are necessary or valuable to the project, and only when the value of doing so exceeds the cost. This doesn’t mean no documentation; it means the most informative yet fastest and least expensive documentation that completely fulfils the testing mission. Integrating that with testing work leads, we hold, to excellent testing—but it takes practice and skill.

For most test activities, it’s possible to relay information to other people orally, or even sometimes by allowing people to observe our behaviour. (At the beginning of the Rapid Testing class, I sometimes silently hold aloft a 5″ x 8″ index card in landscape orientation. I fold it in half along the horizontal axis, and write my first name on one side using a coloured marker. Everyone in the class mimics my actions. Without a single word of instruction being given or questions being asked, either verbally or in writing, the mission has been accomplished: each person now has a tent card in front of him.)

There’s a potential risk associated with an exploratory approach: that the tester might fail to document something important. In that case, we do what skilled people do with risk: we manage it. James Bach talks at length about managing exploratory testing sessions here. Producing appropriate documentation is partly a technical process, but the technical considerations are dominated by business imperatives: cost, value, and risk. There are also social considerations, too. The tester, the test lead, the test manager, the programmers, other managers, and the product owner determine collaboratively what’s important to document and what’s not so important with respect to the current testing mission. In an exploratory approach, we’re more likely to be emphasizing the discovery of new information. So we’re less likely to spend time on documenting what we will do, more likely to document what we are doing and what we have done. We could do a good deal of preparatory reading and writing, even in an exploratory approach—but we realize that there’s an ever-increasing risk that new discoveries will undermine the worth of what we write ahead of time.

That leads directly to “our purposes”, the task that we want to accomplish when documenting something. Just as testing itself has many possible missions, so too does test documentation. Here’s a decidedly non-exhaustive list, prepared over a couple of minutes:

  • to express testing strategy and tactics for an entire project, or for projects in general
  • to keep a set of personal notes to help structure a debriefing conversation
  • to outline testing activities for a test cycle
  • to report on activities during testing execution
  • to outline attributes of a particular quality criterion
  • to catalogue ideas about risk
  • to describe test coverage
  • to account for the work that we’ve done
  • to program a machine to perform a given set of actions
  • to alert people to potential problems in the product
  • to guide a tester’s actions over a test session
  • to identify structures in the application or service
  • to provide a description of how to use a particular test tool that we’ve crafted
  • to describe the tester’s role, skills, and qualifications
  • to explain business rules to someone else on the team
  • to outline scenarios in which the product might be used or tested
  • to identify, for a tester, a specific, explicit sequence of actions to perform, input to provide, and observations to make

That last item is the classic form of highly scripted testing, and that kind of documentation is usually absent from exploratory testing. Even so, a tester can take an exploratory approach using a script as a point of departure or as a reference, just as you might use a trail map to help guide an off-trail hike (among other things, you might want to discover shortcuts or avoid the usual pathways). So when someone says that “exploratory testing is undocumented”, I hear them saying something else. I hear them saying, “I only understand one form of test documentation, and I’ve successfully ignored every other approach to it or purpose for it.”

If you look in the appendices for the Rapid Software Testing class (you can find a .PDF at, you’ll see a large number of examples of documentation that are entirely consistent with an exploratory approach. That’s just one source. For each item in my partial list above, here’s a partial list of approaches, examples, and tools.

Testing strategy and tactics for an entire project, or for projects in general.
Look at the Satisfice Heuristic Test Strategy Model and the Context Model for Heuristic Test Planning (these also appear in the RST Appendices).

An outline of testing activities for a test cycle.
Look at the General Functionality and Stability Test Procedure for Certifed for Microsoft Windows Logo. See also the OWL Quality Plan (and the Risk and Task Correlation) in the RST Appendices.

Keeping a set of personal notes to help structure a debriefing or other conversation.
See the “Beans ‘R Us Test Report” in the RST Appendices; or see my notes on testing an in-flight entertainment system which I did for fun on a flight from India to Amsterdam.

Recording activities and ideas during test execution
A video camera or a screen recording tool can capture the specific actions of a tester for later playback and review. Well-designed log files may also provide a kind of retrospective record about what was testing. Still neither of these provide insight into the tester’s mind. Recorded narration or conversation can do that; tools like BB Test Assistant, Camtasia, or Morae can help. The classic approach, of course, is to take notes. Have a look at my presentation, “An Exploratory Tester’s Notebook“, which has examples of freestyle notes taken during an impromptu testing session, and detailed, annotated examples of Session-Based Test Management sessions. Shmuel Gerson’s Rapid Reporter and Jonathan Kohl’s Session Tester are tools oriented towards taking notes (and, in the former case, including screen captures) of testing sessions on the fly.

Outlining many attributes of a particular quality criterion
See “Heuristics of Software Testability” in the RST Appendices for one example.

Cataloguing ideas about risk
Several examples of this in the RST Appendices, most extensively in the “Deployment Planning and Risk Analysis” example. You’ll also find an “Install Risk Catalog”; “The Risk of Incompatibility”; the Risk vs. Tasks section in the “OWL Quality Plan”; the “Y2K Compliance Report”; “Round Results Risk A”, which shows a mapping of Risk Areas vs. Test Strategy and Tasks.

Describing or outlining test coverage
A mapping establishes or illustrates relationships between things. We can use any of these to help us think about test coverage. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. These can be constructed before, after, or during a given test activity, with the goal of covering the map with tests, or using testing to extend the map. I catalogued several ways of thinking about coverage and reporting on it, in three articles Got You Covered, Cover or Discover, and A Map By Any Other Name. Several examples of lightweight coverage outlines can be found in the RST Appendices (“Putt Putt Saves the Zoo”, “Table Formatting Test Notes”, There are also coverage ideas incorporated into the Apollo mission notes that we’ve titled “Guideword Heuristics for Astronauts”).

Accounting for testing work that we’ve done.
See Session-Based Test Management, and see “An Exploratory Tester’s Notebook“. Darren McMillan provides excellent examples of annotated mind maps; scroll down to the section headed “Session Reports”, and continue through “Simplifying feedback to management” and “Simplifying feedback to groups”. A forthcoming article, written by me, shows how a senior test manager tracks testing sessions at a half-day granularity level.

Programming a machine to help you to explore
See all manner of books on programming, both references and cookbooks, but for testers in particular, have a look at Brian Marick’s Everyday Scripting with Ruby. Check out Pete Houghton’s splendid examples of exploratory test automation that begin here. Cem Kaner (often in collaboration with Doug Hoffman) write extensively about automation-assisted exploratory testing; an example is here.

Alerting people to potential problems in the product
In general, bug reporting systems provide one way to handle the task of recording and reporting problems in the product. James Bach provides an example of a report that he provided to a client (along with a more informal account of the session).

Guiding a tester’s actions over a test session
Guiding a tester involves skills like chartering and checklisting. Start with the documentation on Session Based Test Management ( Selena Delesie has produced an excellent blog post on chartering exploratory testing sessions. The title of Cem Kaner’s presentation at CAST 2008, The Value of Checklists and the Danger of Scripts: What legal training suggests for testers describes the content perfectly. Michael Hunter’s You Are Not Done Yet lists can be used and adapted to your context as a set of checklists.

To identify structures in the application or service
The “Product Elements” section in the Heuristic Test Strategy Model provides a kind of framework for documenting product structures. In the RST Appendices, the test notes for “Putt Putt Saves the Zoo” and “Diskmapper”, and the “OWL Quality Plan” provide examples of identifying several different structures in the programs under test. Mind mapping provides a means of describing and illustrating structures, too; see Darren McMillan’s examples here and here. Ruud Cox and Ru Cindrea used a mind map of product elements to help win the Best Bug Report award in the Test Lab at EuroSTAR 2011. I’ve created a list of structures that support exploratory testing, and many of these are related to structures in the product.

Providing a description of how to use a particular test tool that we’ve crafted
While working at a bank, I developed (in Excel and VBA) a tool that could be used as an oracle and as a way of recording test results. (Thanks to non-disclosure agreements, I can describe these, but cannot provide examples.) When I left the project, I was obliged to document my work. I didn’t work on the assumption that anyone off the street would be reading the document. Instead, I presumed that anyone assigned to that testing job and to using that tool, would have the rapid learning skill to explore the tool, the product, and the business domain in a mutually supportive way. So I crafted documentation that was intended to tell testers just enough to get them exploring.

Explaining business rules to someone else on the team
I did include documentation for novices of one kind: within the documentation for that testing tool, I included a general description of how foreign exchange transactions worked from the bank’s perspective, and how appropriate accounts got credited and debited. I had learned this by reverse-engineering use cases and consulting with the local business analyst. I summarized it with a two-page document written in simple, direct language, referring disrectly to the simpler use cases and explaining the more confusing bits in more detail. For those whose learning style was oriented toward code, I also described the tables and array formulas that applied the business rules.

Outlining scenarios in which the product might be used or tested
I discuss some issues about scenarios here—why they’re important, and why it’s important to keep them open-ended and open to interpretation. It’s more important to record than to prescribe, since in a good scenario, you’ll observe and discover much more than you’ve articulated in advance. Cem Kaner gives ideas on how to produce scenarios; Hans Buwalda presents examples of soap opera testing.

Identifying required tester skill
People with skill don’t need prescriptive documentation for every little thing. Responsible managers identify the skills needed to test, and who commit to employing people who either have those skills or can develop them quickly. James Bach eliminated 50 pages of otiose documentation with two paragraphs. (Otiose is a marvelous word; it’s fun to look it up in a thesaurus.)

Identifying, for a tester, a particular explicit sequence of actions to perform, input to provide, and observations to make.
Again, a document that attempts to specify exactly what a tester should do is the hallmark of scripted testing. James Bach articulates a paradox that has not yet been noted clearly in our craft: in order to perform a scripted test well, you need signficant amounts of skill and tacit knowledge (and you also need to ignore the script on occasion, and you need to know when those occasions are). There’s another interesting issue here: preparing such documents usually depends on exploratory activity. There’s no script to tell you how to write a script. (You might argue there’s one exception. You can follow this script to write a test script: take each line of a requirements document, and add the words “Verify that” to the beginning of each line.)

Now, just as you can perform testing badly using any approach, you can perform exploratory testing and document it inappropriately, either by under-documenting it OR over-documenting it using any of the kinds of documentation above. But, as this document shows, the notion that exploratory testing is by its nature undocumented is not only ignorant, but aggressively ignorant about both testing and documentation. Whenever you see someone claim that exploratory testing is undocumented, I’d ask you to help by setting the record straight. Feel free to refer to this blog post, if you find it helpful; also, please point me to other exemplars of excellent documentation that are consistent with exploratory approaches. If we all work together, we can bury this myth, while providing excellent records and reports for our clients.

This is the end of the series “What Exploratory Testing Is Not”, for me. But James Bach has one more.

And, of course, in the face of all these instances of what exploratory testing is not, you might want to know our current take on what exploratory testing is.

Worthwhile Documentation

Monday, December 19th, 2011

In the Rapid Software Testing class, we focus on ways of doing the fastest, least expensive testing that still completely fulfills the mission. That involves doing some things more quickly, and it also involves doing other things less, or less wastefully. One of the prime candidates for radical waste reduction is documentation that’s incongruent with the testing mission.

Medical device projects typically present a high degree of risk. Excellent testing helps teams and product owners to identify risks and problems in the product. The quality of testing is a function of the skill of the tester; one would not set loose an incapable tester on high-risk project. Yet some managers have told me that they commission people to write test documentation in a particular style. That style is, to me, overly elaborate and specific with respect to actions to perform and observations to make. Yet at the same time, that style is remarkably devoid of ideas about motivation or risk.

I sometimes ask managers why they use this style of instruction. They usually answer, “because we want anyone to be able to walk up to this system and test it.”

“Anyone?” I ask. “Why anyone?”

“You know how it is. If we have to test a new revision of this program a year from now, there’s a good chance that we won’t have the same testers.” (Dude. If you’re inflicting on your staff the idea of testing as writing or following instructions for an automaton, I might have an explanation for you.)

“Anyone?” I ask. “How about a cat?”

“Well, Michael, that’s silly. Cats can’t think. Cats can’t read.”

“How about my daughter? She’s seven, and she can read well enough to read that. And she could follow the steps pretty well, too.”

“We don’t hire children here!”

“Okay,” I offer. “Would you hire a completely incompetent tester who needed to be told absolutely everything, in painful detail?”

“We wouldn’t hire anyone like that.”

“Fair enough, and I’d hope not. So, why do you insist that people write instructions for them that way?

Let me be clear: when the situation calls for skilled testers, you don’t need overly specific instructions for them. On the other hand, if you don’t have skilled testers, you’ve got a problem that scripted testing won’t be able to solve.

Here’s a splendid example of a machete that we believe that managers could use to cut through jungles of waste. In a recent project that involved work with FDA-regulated medical devices, James Bach found a huge number of excruciatingly overspecified, low-value test cases aimed at “anyone”. The following two paragraphs replaced 50 pages of waste.

3.0 Test Procedures

3.1 General Testing Protocol

In the test descriptions that follow, the word “verify” is used to highlight specific items that must be checked. In addition to those items, the tester shall at all times be alert for any unexplained or erroneous behaviour of the product. The tester shall bear in mind that, regardless of any specific requirement for any specific test, there is the overarching general requirement that the product shall not pose an unacceptable risk of harm to the patient, including an unacceptable risk due to reasonably foreseeable misuse.

Test personnel requirements: The tester shall be thoroughly familiar with the Generator and Workstation Function Requirement Specifications, as well as the working principles of the devices themselves. The tester shall also know the workings of the power test jig and associated software, including how to configure and calibrate it and how to recognize it is not working correctly. The tester shall have sufficient skill in data analysis and measurement theory to make sense of statistical test results. The tester shall be sufficiently familiar with test design to complement this protocol with exploratory testing in the event that anomalies appear that require investigation. The tester shall know how to keep test records to a credible professional standard.

To me, that’s something worth writing down. Follow those instructions, and your team will save time, save work, and put the emphasis in the right places: on risk, and on meeting and mitigating that risk with skills.

What Exploratory Testing Is Not (Part 4): Quick Tests

Sunday, December 18th, 2011

Quick testing is another approach to testing that can be done in a scripted way or an exploratory way. A tester using a highly exploratory approach is likely to perform many quick tests, and quick tests are often key elements in an exploratory approach. Nonetheless, quick testing and exploratory testing aren’t the same.

Quick tests are inexpensive tests that require little time or effort to prepare or perform. They may not even require a great deal of knowledge about the application being tested or its business domain, but they can help to develop new knowledge rapidly. Rather than emphasizing comprehensiveness or integrity, quick tests are designed to reveal information in a hurry at a minimal cost.

A quick test can be a great way to learn about the product, or to identify areas of risk, fragility, or confusion. A tester can almost always sneak a quick test or two into other testing activity. A burst of quick tests can help as the first activities during a smoke or sanity test. Cycles of relatively unplanned, informal quick tests may help to you discover or refine a more comprehensive or formal plan.

James Bach and I provide examples of many kinds of quick tests in the Rapid Software Testing class. You’ll notice that some of them are called tours. Note that not all tours are quick, and not all quick tests are tours. Here’s a catalog.

Happy Path
Perform a task, from start to finish, that an end-user might be expected to do. Use the product in the most simple, expected, straightforward way, just as the most optimistic programmer or designer might imagine users to behave. Look for anything that might confuse, delay, or irritate a reasonable person. Cem Kaner sometimes calls this “sympathetic testing”. Lean towards learning about the product, rather than finding bugs. If you do see obvious problems, it may be bad news for the product.

Variable Tour
Tour a product looking for anything that is variable and vary it. Vary it as far as possible, in every dimension possible. If you’re using quick tests for learning, seek and catalog the important variables. Look for potential relationships between them. Identifying and exploring variations is part of the basic structure of our testing when we first encounter a product.

Sample Data Tour
Employ any sample data you can, and all that you can. For one kind of quick tests, prefer simple values whose effects are easy to see or calculate. For a different kind of quick test, choose complex or extreme data sets. Observe the units or formats in which data can be entered, and try changing them. Challenge the assumption that the programmers have thought to reject or massage inappropriate data. Once you’ve got a handle on your ideas about reasonable or mildly challenging data, you might choose to try…

Input Constraint Attack
Discover sources of input and attempt to violate constraints on that input. Try some samples of pathological data: use zeroes where large numbers are expected; use negative numbers where positive numbers are expected; use huge numbers where modestly-sized ones are expected; use letters in every place that’s supposed to handle only numbers, and vice versa. Use a geometrically expanding string in a field. Keep doubling its length until the product crashes. Use characters that are in some way distinct from your idea of “normal” or “expected”. Inject noise of any kind into a system and see what happens. Use Satisfice’s PerlClip utility to create strings of arbitrary length and content; use PerlClip’s counterstring feature to create a string that tells you its own length so that you can see where an application cuts off input.

People tend to talk a lot about input constraint attacks. Perhaps that’s because input constraint attacks are used by hackers to compromise systems; perhaps it’s because input constraint attacks can be performed relatively straightforwardly; perhaps it’s because they can be described relatively easily; perhaps it’s because input constraint attacks can produce dramatic and highly unexpected results. Yet they’re by no means the only kind of quick test, and they’re certainly not the only way to test using an exploratory approach.

Documentation Tour
Look in the online help or user manual and find some instructions about how to perform some interesting activity. Do those actions explicitly. Then improvise from them and experiment. If your product has a tutorial, follow it. You may expose a problem in the product or in the documentation; either way, you’ve found an inconsistency that is potentially important. Even if you don’t expose a problem, you’ll still be learning about the product.

File Tour
Have a look at the folder where the program’s .EXE file is found. Check out the directory structure, including subs. Look for READMEs, help files, log files, installation scripts, .cfg, .ini, .rc files.
Look at the names of .DLLs, and extrapolate on the functions that they might contain or the ways in which their absence might undermine the application. Use whatever supplemental material you’ve got to guide or focus your actions. Another way to gather information for this kind of test: use tools to monitor the installation, and take the output from the tool as a point of departure.

Complexity Tour
Tour a product looking for the most complex features, the most challenging data sets, and the biggest interdepencies. Look for hidden nooks and crannies, but also look for the program’s high-traffic areas, busy markets, big office buildings, and train stations—places where there’s lots of interactions, and where bugs might be blending in with the crowd.

Menu, Window, and Dialog Tour
Tour a product looking for all the menus (main and context menus), menu items, windows, toolbars, icons, and other controls. Walk through them. Try them. Catalog them, or construct a mind map.

Keyboard and Mouse Tour
Tour a product looking for all the things you can do with a keyboard and mouse. Hit all of the keys on the keyboard. Hit all the F-keys; hit Enter, Tab, Escape, Backspace; run through the alphabet in order, and combine each key with Shift, Ctrl, Alt, the Windows key, CMD or Option, on other platforms, the AltGr key in Europe. Click (right, left, both, double, triple) on everything. Combine clicks with shifted keys.

Start activities and stop them in the middle. Stop them at awkward times. Perform stoppages using cancel buttons, O/S level interrupts (ctrl-alt-delete or task manager). Arrange for other programs to interrupt (such as screensavers or virus checkers). Also, try suspending an activity and returning later. Put your laptop into sleep or hibernation mode.

Start using a function when the system is in an appropriate state, then change the state part way through.
Delete a file while it is being edited; eject a disk; pull net cables or power cords) to get the machine an inappropriate state. This is similar to interruption, except you are expecting the function to interrupt itself by detecting that it no longer can proceed safely.

Set some parameter to a certain value, then, at any later time, reset that value to something else without resetting or recreating the containing document or data structure. Programmers often expect settings or variables to be adjusted through the GUI. Hackers and tinkerers expect to find other ways.

Dog Piling
Whatever you’re doing, do more of it and do other stuff as well while you’re at it. Get more processes going at once; try to establish more states existing concurrently. Invoke nested dialog boxes and non-modal dialogs. On multi-user systems, get more people using the system or simulate that with tools. If your test seems to trigger odd behaviour, pile on in the same place until the odd becomes crazy.

Continuous Use
While testing, do not reset the system. Leave windows and files open; let disk and memory usage mount.
You’re hoping to show that the system loses track of things or ties itself in knots over time.

Feature Interactions
Discover where individual functions interact or share data. Look for any interdependencies. Explore them, exploit them, and stress them out. Look for places where the program repeats itself or allows you to do the same thing in different places. For example, for data to be displayed in different ways and in different places, and seek inconsistencies. For example, load up all the fields in a form to their maximums and then traverse to the report generator.

Summon Help
Bring up the context-sensitive help feature during some operation or activity. Does the product’s help file explain things in a useful way, or does it offend the user’s intelligence by simply restating what’s already on the screen? Is help even available at all?

Click Frenzy
Ever notice how a cat or a kid can crash a system with ease? Testing is more than “banging on the keyboard”, but that phrase wasn’t coined for nothing. Try banging on the keyboard. Try clicking everywhere. Poke every square centimeter of every screen until you find a secret button.

Shoe Test
Use auto-repeat on the keyboard for a very cheap stress test. Look for dialog boxes constructed such that pressing a key leads to, say, another dialog box (perhaps an error message) that also has a button connected to the same key that returns to the first dialog box. Place a shoe on the keyboard and walk away. Let the test run for an hour. If there’s a resource or memory leak, this kind of test could expose it. Note that some lightweight automation can provide you with a virtual shoe.

Blink Test
Find an aspect of the product that produces huge amounts of data or does some operation very quickly. Look through a long log file or browse database records, deliberately scrolling too quickly to see in detail. Notice trends in line lengths, or the look or shape of the data. Use Excel’s conditional formatting feature to highlight interesting distinctions between cells of data. Soften your focus. If you have a test lab with banks of monitors, scan them or stroll by them; patterns of misbehaviour can be surprisingly prominent and easy to spot.

Error Message Hangover
Programmers are rewarded for implementing features. There’s a psychological problem with errors or exceptions: the label itself suggests that something has gone wrong. People often actively avoid thinking about problems or mistakes, and as a consequence, programmers sometimes handle errors poorly. Make error messages happen and test hard after they are dismissed. Watch for subtle changes in behaviour between error and normal paths. With automation, make the same error conditions appear thousands of times.

Resource Starvation
Progressively lower memory, disk space, display resolution, and other resources. Keep starving the product until it collapses, or gracefully (we hope) degrades.

Multiple Instances
Run a lot of instances of the app at the same time. Open, use, update, and save the same files. Manipulate them from different windows. Watch for competition for resources and settings.

Crazy Configs
Modify the operating system’s configuration in non-standard or non-default ways, either before or after installing the product. Turn on “high contrast” accessibility mode, or change the localization defaults. Change the letter of the system hard drive. Put things in non-default directories. Use RegEdit (for registry entries) or a text editor (for initialization files) to corrupt your program’s settings in a way that should trigger an error message, recovery, or an appropriate default behavior.

Again: quick tests tend to be highly exploratory, but they represent only one kind of exploratory testing. Don’t be fooled into believing that quick testing—or certain kinds of quick testing—is all there is to exploratory testing.

Next (and last) in the series: What Exploratory Testing Is Not (Part 5): Undocumented Testing

And, of course, in the face of all these instances of what exploratory testing is not, you might want to know our current take on what exploratory testing is.

What Exploratory Testing Is Not (Part 3): Tool-Free Testing

Saturday, December 17th, 2011

People often make a distinction between “automated” and “exploratory” testing. This is like the distinction between “red” cars and “family” cars. That is, “red” (colour) and “family” (some notion of purpose) are in orthogonal categories. A car can be one colour or another irrespective of its purpose, and a car can be used for a particular purpose irrespective of its colour. Testing, whether exploratory or not, can make heavy or light use of tools. Testing, whether it entails the use of tools or not, can be highly scripted or highly exploratory.

“Exploratory” testing is not “manual” testing. “Manual” isn’t a useful word for describing software testing in any case. When you’re testing, it’s not the hands that do the testing, any more than when you’re riding a pedal bike it’s the feet that do the bike-riding. The brain does the testing; the hands, at best, provide one means of input and interaction with the thing we’re testing. And not even “manual” testing is manual in the sense of being tool- or machinery-free. You do you use a computer when you’re testing, don’t you?

(Well, mostly, but not always. If you’re reviewing requirements, specifications, code, or documentation, you might be looking at paper, but you’re still testing. A thought experiment or a conversation about a product is a kind of a test; you’re questioning something in order to evaluate it, pitting ideas against other ideas in an unscripted way. While you’re reviewing, are you using a pen to annotate the paper you’re reading? A notepad to record your observations? Sticky tabs to mark important places in the text? Then you’re using tools, low-tech as they might be.)

Some people think of test automation in terms of a robot that pounds on virtual keys more quickly, more reliably, and more deterministically than a human could. That’s certainly one potential notion of test automation, but it’s very limiting. That traditional view of test automation focuses on performing checks, but that’s not the only way in which automation can help testing.

In the Rapid Software Testing class, James Bach and I suggest a more expansive view of test automation: any use of (software- or hardware-based) tools to support testing. This helps keeps us open to the idea that machines can help us with almost any of the mimeomorphic, non-sapient aspects of testing, so that we can focus on and add power to the polimorphic, sapient aspects. Exploration is polimorphic activity, but it can include and be supported by mimeomorphic actions. Cem Kaner and Doug Hoffman take a similar tack: exploratory test automation is “computer-assisted testing that supports learning of new information about the quality of the software under test.” Learning new information is one of the hallmarks of exploratory testing, which usually points towards emphasizing variation rather than repetition.

That said, there can be a role for mechanized repetition, even when you’re using a highly exploratory approach: when repeating aspects of the test are intended to support discovery of something new or surprising. The key is not whether you’re mechanizing the activity. The key is what happens at the end of the activity. The less the results of one activity are permitted to inform the next, the more scripted the approach. If the repetition is part of a learning loop—a cycle of probing, discovering, investigating, and interpreting—that feeds back on itself immediately, then the approach is exploratory. James has also posted a number of motivations for repeating tests. Each one can (with the possible exception of “avoidance or indifference”) be entirely consistent with and supportive of exploration.

There are some actions that tools can perform better than humans, as long as the action doesn’t require human judgment or wisdom. Humanity can even get in the way of some desirable outcome. For example, when your exploration of some aspect of a product is based on statistical analysis, and randomization is part of the test design, it’s important to remember that people are downright lousy at generating randomized data. Even when people believe that they’re choosing numbers at random, there are underlying (and usually quite unconscious) patterns and biases that inform their choices. If you want random numbers, tools can help.

Tools can support exploration in plenty of other ways: data generation, system configuration; simulation; logging and video capture; probes that examine the internal state of the system; oracles that detect certain kinds of error conditions in a product or generate plausible results for comparison; visualization of data sets, key elements to observe, relationships, or timing; recording and reporting of test activity.

A few years back, I was doing testing of a teller workstation application at a bank (I’ve written about this in How to Reduce the Cost of Software Testing). The other testers, working on domestic transactions, were working from scripts that contained painfully detailed and explicit steps and observations. (Part of the pain came from the fact that the scripts were supplemented with screen shots, and the text and the images didn’t always agree.) My testing assignment involved foreign exchange, and the testing tasks I had been given were unscripted and, to a large degree, self-determined. In order to learn the application quickly, I had to explore, but this in no way meant that I didn’t use tools. On the contrary, in fact. In that context, Excel was the most readily available and powerful tool on hand. I used it (and its embedded Visual Basic for Applications) to:

  • maintain and update (at a key stroke) enormous tables of currencies, rates, and transaction types
  • access appropriate entries from the table via regular expression parsing
  • model the business rules of the application under test
  • display the intended flow of money through a transaction
  • add visual emphasis to the salient outcomes of tests and test scenarios
  • provide, using a comparable algorithm, clear results to which the product’s results could be compared
  • help in performing extremely rapid evaluation of a test idea
  • create tables of customer data so that I could perform a test using a variety of personas
  • accelerate my understanding of the product and the test space
  • enhance my learning about Boolean algebra and how it could be used in algorithms
  • record my work and illustrate outcomes for my clients
  • perform quick calculations when necessary
  • help me find more actual problems than the other four testers combined

All of this activity happened in a highly exploratory way; each of the activities interacted with the others. I used very rapid cycles of looking at what I needed to learn next about the application, experimenting with and performing tests, programming, asking questions of subject matter experts and programmers and managers, reporting, reading reference documentation, debugging, and learning. Tight loops of activities happening in parallel are what characterize exploratory processes. Yet this was not tool-free work; tools were absolutely central to my exploration of the product, to my learning about it, and to the mission of finding bugs. Indeed, without the tools, I would have had much more limited ideas about what could be tested, and how it could be tested.

The explorers of old used tools: compasses and astrolabes, maps and charts, ropes and pulleys, ships and wagons. These days, software testers explore applications by using mind-mapping software and text editors; spreadsheets and calculators; data generation tools and search engines; scripting tools and automation frameworks. The concept that characterizes exploratory testing is not the input mechanism, which can be fingers on a keyboard, tables of data pumped into the program via API calls, bits delivered through the network, signals from a variable voltage controller. Exploratory testing is about the way you work, and the extent to which test design, test execution, and learning support and reinforce each other. Tools are often a critical part of that process.

Next in the series: What Exploratory Testing Is Not (Part 4): Quick Tests

And, of course, in the face of all these instances of what exploratory testing is not, you might want to know our current take on what exploratory testing is.

What Exploratory Testing Is Not (Part 2): After-Everything-Else Testing

Friday, December 16th, 2011

Exploratory testing is not “after-everything-else-is-done” testing. Exploratory testing can (and does) take place at any stage of testing or development.

Indeed, TDD (test-driven development) is a form of exploratory development. TDD happens in loops, in which the programmer develops a check, then develops the code to make the check pass (along with all of the previous checks), then fixes any problems that she has discovered, and then loops back to implementing a new bit of behaviour and inventing a new check. The information obtained from each loop feeds into the next; and the activity is guided and structured by the person or people involved in the moment, rather than in advance. The checks themselves are scripted, but the activity required to produce them and to analyze the results is not. Compared to the complex cognitive activity—exploratory, iterative—that’s going on as code is being developed, the checks themselves—scripted, linear—are trivial.

Requirement review is an exploratory activity too. Review of requirements (or specifications, or user stories, or examples) tends happens early on in a development cycle, whether it’s a long or a short cycle. While review might be guided by checklists, the people involved in the activity are making decisions on the fly as they go through loops of design, investigation, discovery, and learning. The outcome of each loop feeds back into the next activity, often immediately.

Code review can also be done in a scripted way or an exploratory way. When humans analyze the code, it’s an unscripted, self-directed activity that happens in loops; so it is exploratory. We call it review, but it’s gathering information with the intention of informing a decision; so it is testing. There is a way to review code that involves the application of scripted processes, via a tools that people generally call “static testing tools. When a machine parses code and produces a report, by definition it’s a form of checking, and it’s scripted. Yet using those tools productively requires a great deal of exploratory activity. Parsing and interpreting the report and responding to it is polimorphic, human action—unscripted, open-ended, iterative, and therefore exploratory.

Learning about a new product or a new feature is an exploratory activity if you want to do it or foster it well. Some suggest that test scripts provide a useful means of training testers. Research into learning shows that people tend to learn more quickly and more deeply when their learning is based on interaction and feedback; guided, perhaps, but not controlled. If you really want to learn about a product, try creating a mind map, documenting some aspect of the program’s behaviour, or creating plausible scenarios in which people might use—or misuse—the product. All of these activities promote learning, and they’re all exploratory activities. There’s far more information that you can use, apply, and discover than a script can tell you about. Come to think of it… where does the script come from?

Developing a test procedure—even developing a test script, whether for a machine or a human to follow, or developing the kind of “test” that skilled testers would call a demonstration—is an exploratory activity. There is no script that specifies how to write a new script for a particular purpose. Heard about a new feature and pondering how you might test it? You’ve already begun testing; you’re doing test design and you’re probably learning as you go. To the extent that you use the product or interact with it, bounce ideas off other people, or think critically about your design, you’re testing, and you’re doing it in an unscripted way. Some might suggest that certain tools create scripts that can perform automatic checks. Yet reviewing those checks for appropriateness, interpreting the results, and troubleshooting unexpected outcomes are all exploratory activities.

Supposing that a programmer, midway through a sprint, decides that she’d like some feedback on the work that she’s done so far on a new module. She hands you a bit of code to look at. You might interact with the code directly through a test tool that she provided, or (say) via the Ruby interpreter, or you might write some script code to exercise some of the functions in the module. In any event, you find some problems in it. In order to investigate a problem that you’ve discovered, you must explore. You must explore whether your recognition of the problem was triggered by your own interaction with the program or by a mechanically executed script. You’re in control of the activity; each new test around the problem feeds back into your choice of the next activity, and into the story that you’re going to tell about the product.

All of the larger activities that I’ve described above are exploratory, and they all happen before you have a completed function or story or sprint. Exploratory testing is not a stage or phase of testing to be performed after you’ve performed your other test techniques. Exploratory testing is not an “other” test technique, because it’s not a technique at all. Exploratory testing is not a thing that you do, but rather a way that you work (and think, and act), the hallmarks being who (or what) is in control, and the extent to which your activity is part of a loop, rather than a straight line. Any test technique can be applied in a scripted way or in an exploratory way. To those who say “we do exploratory testing after our acceptance tests are all running green”, I would suggest looking carefully and observing the extent to which you’re doing exploratory testing all the way along.

Next in the series: What Exploratory Testing Is Not (Part 3): Tool-Free Testing

And, of course, in the face of all these instances of what exploratory testing is not, you might want to know our current take on what exploratory testing is.

What Exploratory Testing Is Not (Part 1): Touring

Thursday, December 15th, 2011

Touring is one way of structuring exploratory testing, but exploratory testing is not necessarily touring, and touring is not necessarily exploratory.

At one extreme, a tourist might parachute into a territory for which there is no detailed knowledge of the landscape, flora and fauna, or human culture, with the goal of identifying what’s there to be learned. Except in such cases, we wouldn’t call her a tourist; we’d call her an anthropologist, or a field botanist, or a field geologist, or an archaeologist. The activity is in this case is interactive with the territory. At the other extreme, a tourist might visit a travel agent, get on a plane to Orlando, meet a chartered bus at the airport, and sit through the rides at Disney World. The touring activity there is largely passive. and we would call someone engaged in it a “tourist”—although “engagement” is something of an exaggeration here. For this kind of person, serious explorers and locals would probably use the word “tourist” in a somewhat deprecating kind of way.

Touring a program can be done in a more scripted or more exploratory way, just as touring a city can be done in a more scripted or more exploratory way. A tourist has many options. Before going on a trip, a tourist might study what is already known about a particular destination. To prepare, she might supply herself with maps and travel guides, and some ideas about destinations of interest. Upon arrival, she might choose a set of walking tours from a guidebook and follow the routes closely, eating only at the restaurants identified in the guidebook, noting buildings and artifacts and other objects of interest by matching them with the descriptions and illustrations. At a given site, she might listen to a prepared audio guide that directs her observations very specifically. She might spend all of her time in the presence of a tour guide who tells her what to observe and how to interpret it. She might choose to accept everything the tour guide told her as the complete story, and refrain from asking questions. Even though the experience would be new to her, and she might learn something from it, she would not likely add much to what is already known. We call that activity touring, but it isn’t very exploratory, and a report on such a tour would largely recapitulate the guidebook. Is your testing like that?

On the other hand, rather than touring like a tourist, she might cover a territory as a historian, or a social scientist, or a travel writer. In that kind of role, she would have a research goal based on the idea of obtaining new knowledge. Learning something new and imparting it to other people requires a more open agenda than sitting on the bus while someone or something else directs your attention. Our researcher might make her way directly to particular destinations or landmarks and begin her research there, or she might amble through neighbourhoods or historical sites to discover new things about them. She could choose to focus on specific aspects of what’s there to observe, or she could choose to let the observations come to her—and, of course, she might do both. She might work with descriptions that she had been given with the intention of adding to them, or she might work from a set of questions that haven’t been asked before. Depending on her mission, she might choose to look for specific patterns or problems, or she might seek deeper understanding that would help her to identify or refine what kind of patterns or problems to look for. Even though the mission to discover new information might come from someone else, she remains in control of the specifics of the itinerary and of each activity from one moment to the next. Is your testing more like that?

One of the hallmarks of exploratory activity is the extent to which it is guided and structured by the person performing that activity. Another hallmark is the extent to which new knowledge feeds into choice of which action to perform next. Touring is not equivalent to exploration; touring can be done is a scripted way or an exploratory way.

This series continues with What Exploratory Testing Is Not (Part 2): After-Everything-Else Testing.

And, of course, in the face of all the forthcoming instances of what exploratory testing is not, you might want to know our current take on what exploratory testing is.

Shapes of Actions

Monday, December 5th, 2011

In the spring of 2010, I was privileged to have a conversation with Simon Schaffer, who pointed me to the work of a sociologist and philosopher of science named Harry Collins. This year, I discovered and read Collins’ new book, Tacit and Explicit Knowledge, and a somewhat older book, The Shape of Actions (co-authored with Martin Kusch). My colleague James Bach and I believe that these books have great significance in terms of the way we understand, practice, learn, and teach the craft of testing. Three ideas in particular stand out: a distinction between two kinds of actions; a distinction between three types of tacit knowledge; and the notion of repair, whereby people fix up interactions with each other and with our media—and particularly with our machines. Today, I’ll talk about shapes of actions.

In The Shape of Actions, Collins and Kusch describe key differences between two kinds of intentional human actions that they call mimeomorphic and polimorphic. In both words, “morph” refers to shape, or form. “Mimeo-” refers to copying. (The grey-haired among us may remember that stencil printing machines used to be called mimeographs.) Mimeomorphic actions are actions that we want to do the same way every time, almost as though we were machines. Collins and Kusch use the example of a golf swing, a kind of action in which we want to eliminate variation and emphasize precision, regularity, and smoothness. “Poli-” is a pun, referring to two similar-sounding Greek roots. The Greek word polys refers to many, much, or several. The Greek polis—a different word entirely— literally means the city, so Collins and Kusch use “poli-” to emphasize the collective and diversified nature of human actions. Polimorphic actions are naturally and appropriately variable, and are rooted in social and human interactions and goals. Conversation is a canonical example of polimorphic action. Filling out a form is an example of mimeomorphic action.

Most human life and human value is centred around polimorphic actions. Still, in many actions, there’s an interplay between the mimeomorphic and the polimorphic. Shifting gears, when performed by a human driver, is something that we almost always want to do smoothly, regularly, mechanically, and (most of the time) below the level of consciousness. Indeed, the majority of North American drivers delegate the mimeomorphic action of gear shifting to mechanisms in the car itself.

Making gear shifting into a mimeomorphic action provides support for the parts of driving that are decidedly not mimeomorphic: merging into traffic, negotiating a left turn, and knowing when to break the letter of traffic laws while maintaining their spirit. Polimorphic actions are handled differently in different places, based on different social paradigms and performed for different purposes. Collins and Kusch note that in some parts of the world (Britain and North America, for example), responsibility for safety is governed by the idea of people following the rules, “violations from the rules of orderly flow being met with expressions of rage”. In Tacit and Explicit Knowledge, Collins point out that in other parts of the world (China, India), responsibility for safety is rooted in the collective, and is governed by the idea of drivers expecting the unexpected. To drivers from the West, drivers from these parts of the world drive in ways that we would consider suicidal or sociopathic. Equally surprisingly to us, people in China and India deal with this style of driving without getting upset or even remarking on it.

All this reminds me of the passage, written by Cem Kaner, in the preface of Testing Computer Software:

Some books say that if our projects are not “properly” controlled, if our written specifications are not always complete and up to date, if our code is not properly organized according to whatever methodology is fashionable, then, well, they should be. These books talk about testing when everyone else plays “by the rules.”

This book is about doing testing when your coworkers don’t, won’t and don’t have to follow the rules.

Consumer software projects are often characterized by a budget that is too small, a staff that is too small, a deadline that is too soon and which can’t be postponed for as long as it should be, and by a shared vision and a shared commitment among the developers.

The quality of a great product lies in the hands of the individuals designing, programming, testing, and documenting it, each of whom counts. Standards, specifications, committees, and change controls will not assure quality, nor do software houses rely on them to play that role. It is the commitment of the individuals to excellence, their mastery of the tools of their crafts, and their ability to work together that makes the product, not the rules.

That is: software development is a polimorphic activity, and if that’s true, testing needs to respond accordingly.

Software development involves mostly polimorphic actions, but some mimeomorphic actions help it along. Compiling a program is so much of a mimeomorphic action that these days we delegate it entirely to machines. Typing is mimeomorphic; we learn to touch-type mimeomorphically so that we can develop programs without the mechanics of typing getting in the way. Programming coaches and programming groups often mandate programmers to develop a specific style of indentation and punctuation to reduce overhead in reading and parsing code, and they mandate exercises or policies to make the regularity automatic. Even though code is designed to be run mimeomorphically, developing it, maintaining it, and interpreting it when things go wrong are all polimorphic actions.

Mimeomorphic activities tend to be easy to observe, so they tend to be easy to identify and to explicate. As a result, conversation, writing, and training in testing has tended to focus on artifacts, on documents, on procedures, and on things that can be automated—the mimeomorphic actions. Those conversations, writings, and training programs almost entirely ignore aspects of testing that are much less visible yet are far more important. This, I believe, is why so many people in our craft talk about writing test cases that are easily described as mimeomorphic actions. Those same people seem to spend little time in discussing how to test, which is composed mostly of polimorphic actions. The challenges of understanding polimorphic actions—combined with the ease of observing and describing mimeomorphic actions—explains why so many people confuse testing with checking. Those challenges explain why people credit Cucumber and its given/when/then formulas much more quickly than they credit the conversations that surround it. Those challenges explain why lowering cost by outsourcing checking work dominates the idea of increasing value by developing local testing skill. And those challenges explain why automation is often seen as some kind of silver bullet for testing problems.

Polimorphic actions are often based on tacit knowledge, different ways of valuing things, and social contexts. Collins notes that polimorphic actions

“can only be executed successfully by a person who understands the social context. Copying the visible behaviour that is the counterpart of an observed action is unlikely to reproduce the action unless it is a mimeomorphic action, because in the case of polimorphic actions, the right behavioural instantiations will change with context. Here (that is, in the book Tacit and Explicit Knowledge –MB) it will be concluded that, for now and the foreseeable future, polimorphic actions—and only polimorphic actions—remain outside the domain of the explicable, whichever of the four possible ways ‘explicable’ is defined. This has significance for the success of different kinds of machines and for the way we teach.”

Watch for a lot more discussion of polimorphic and mimeomorphic actions in the next few blog posts. Watch also for such discussion to work its way into the ways that James and I teach rapid testing.