Blog Posts from December, 2009

Handling an Overstructured Mission

Saturday, December 26th, 2009

Excellent testers recognize that excellent testing is not merely a process of confirmation, verification, and validation. Excellent testing is a process of exploration,discovery, investigation, and learning.

A correspondent that I consider to be an excellent tester (let’s call him Al) works in an environment where he is obliged by his managers to execute overly structured, highly confirmatory scripted tests. Al wrote to me recently, saying that he now realizes why that’s frustrating for him:  every time he runs through a scripted test, he gets five new ideas that he wants to act upon. I think that’s a wonderful thing, but when he acts on those ideas and fulfills his implicit mission (finding important problems in the product), it diverts him from his explicit mission (to complete some number of scripted tests per day), and he gets heat from his manager about that. At the end of a couple of days, the manager wants to know why Al is behind schedule—even if Al has revealed important problems along the way—because the manager is focused on test effort in terms of test cases completed, rather than test ideas explored.

I suggested to Al (as I suggest to you, if you’re in that kind of situation) a workaround: don’t act on the new test ideas; but do note them. Jot them down in handwritten notes or a text file, and especially note your motivation for them—ideas about risk, coverage, oracles, strategies, and the like. Tell your test manager or test lead that you didn’t run tests associated with those ideas, and then ask, “Are you okay with us NOT running them?”

In addition, check in with your manager more often than once every two days. Deliver a report, including new ideas, at one- to two-hour intervals.  If direct personal contact isn’t available, try instant messages or email. If those don’t work, batch them, but note the time at which you started and/or stopped a burst of testing activity.

Al was excited about that. “Wow!” he said. “That also means defects arising from the new ideas are noted down. Currently, my management is under the impression that test cases are the things that reveal problems, but it’s my acting on my test ideas that really reveals the problems.” He also noted, “There’s another bad thing that comes from that. If the test cases don’t reveal problems, we take the problems that we’ve found and create a test case for them so that those problems aren’t missed next time.” I’ve seen that happen a lot, too.  On the face of it, it doesn’t sound like a bad idea—except that specific problems that are fixed and verified tend to remain fixed. Repeating those tests is an opportunity cost against new tests that would reveal previously undiscovered problems.

So: the idea here is to make certain aspects of our work visible. Scripted test cases often reveal problems as those cases are developed. When those problems get fixed, the script loses much of its power. Thus variation on the script, rather than following the script rigourously, tends to reveal the actual problem. However, unless we’re clear that this is happening, managers will mistakenly give credit to the wrong thing—that is, the script—rather than to the mindset and the skill set of the tester.

Selena Delesie on Exploratory Test Chartering

Friday, December 18th, 2009

A little while ago, I mentioned that I’d be writing more about session-based test management (SBTM). For me, one thing that’s great about having a community of students and colleagues is that they can save me lots of time and work.

Selena Delesie took the Rapid Software Testing course from me a few years back (that is, she was a student). Since then, she has taken Rapid Testing and its practices, including SBTM, and made them her own. This is exactly what James Bach and I aim for.  We want to help testers, test leads, and managers realize the the most important factor in excellent testing, bar none, is the mindset and the skill set of the individual tester.  This means taking the ideas in the course and internalizing them, adopting them, developing them, experimenting with them, altering them to fit your context.  We get people started by making them feel powerful, mostly by helping them to recognize the power and skills that they already have. Then, after the class, they can feel confident in doing the heavy lifting on their own. Selena is by no means our only student who has done that, but she’s a paradigmatic example of what’s possible.

This post from her blog is a nice account of her appreciation of exploratory testing and of her career growth. That on its own would be good enough, but she’s now blogged a post on chartering sessions, and it’s excellent.  It identifies some of the common traps and misconceptions about chartering, and provides some sharp advice on how to avoid them. It talks not merely about how to charter, but how to do it in a way that affords the tester the freedom and responsibility to do his or her best work. Highest recommendation.

Structures of Exploratory Testing: Resources

Monday, December 14th, 2009

Update, 2012-09-11: The list below has been updated over time, and now exists as part of the resources page on my Web site. Feel free to share; comments and suggestions for additions are welcome.

In a Webinar that he did for uTest on December 10, James Whittaker mused aloud about what a great idea it would be to structure exploratory testing and capture ideas about it in a repository for sharing with others. It seems to me that one ideal version of that would take the form of a bibliography in a book about exploratory testing, but apparently that’s not available. Yet I digress.

The fact is, people have been doing exactly that for years. And I do like the idea of having a repository and sharing, so here’s a survey of some exploratory testing structures and some writing about them that I hope people will find helpful. There are some excellent books out there, but for now, these ones are all online and free. Expect updates.

  • Evolving Work Products, Skills and Tactics, ET Polarities, and Test Strategy. James Bach, Jon Bach, and I authored the latest version of the Exploratory Skills and Dynamics list. This is a kind of evolving master list of exploratory testing structures. James describes it here.
  • Oracles. The HICCUPPS consistency heuristics, which James Bach initiated and which I wrote about in this article for Better Software in 2005. (Actually, at the time it was only HICCUPP—History, Image, Comparable Products, Claims, User Expectations, Purpose, Product—but since then we’ve also added S, for Standards and Statutes. Mike Kelly also talks about HICCUPP here.
  • Test Strategy. James Bach’s Heuristic Test Strategy Model isn’t restricted to exploratory approaches, but certainly helps to guide and structure them.
  • Data Type Attacks, Web Tests, Testing Wisdom, Heuristics, and Frameworks. Elisabeth Hendrickson’s Test Heuristics Cheat Sheet is a rich set of guideword heuristics and helpful reference information.
  • Context Factors, Information Objectives. Cem Kaner most recently delivered his Tutorial on Exploratory Testing for the QAI Quest Conference in Chicago, 2008. There’s a similar, but not identical talk here.
  • Quick Tests. In our Rapid Software Testing course, James Bach and I talk about quick tests. The course notes are available for free. Fire up Acrobat and search for “Quick Tests”.
  • Coverage (specific). Michael Hunter’s You Are Not Done Yet is a detailed set of coverage ideas to help prompt further exploration when you think you’re done.
  • Coverage (general). James Bach wrote this article in 2001, in which he summarizes test coverage ideas under the mnemonic “San Francisco Depot.”—Structure, Function, Data, Platform, and Operations. Several years later, I convinced him to add an element to the list, so now it’s “San Francisco Depot. The last T is for… 
  • Time. I realized a few years ago that some guideword heuristics might help us to pay attention to the ways in which products related to time, and vice versa. That turned into a Better Software article called “Time for New Test Ideas”.
  • Tours. Mike Kelly’s FCC CUTS VIDS Touring Heuristics (note the date) provides a set of structured approaches for touring the application. 
  • Stopping Heuristics. There are structures to deciding when to stop a given test, a line of investigation, or a test cycle. I catalogued them here, and Cem Kaner made a vital addition here.
  • Accountability, Reporting Progress. James and Jon Bach’s description of Session-Based Test Management is a set of structures for making exploratory testing sessions more accountable.
  • Procedure. The General Functionality and Stability Test Procedure. It was designed for Microsoft in the late 1990s by James Bach, and may be the first documented procedure to guide exploratory test execution and investigation.
  • Emotions. I gave a talk on emotions as powerful pointers to test oracles at STAR West in 2007. That helped to inspire some ideas about…
  • Noticing, Observation. At STAR East 2009, I did a keynote talk on noticing, which can be important for exploratory test execution. The talk introduces a number of areas in which we might notice, and some patterns to sharpen noticing.
  • Leadership. For the 2009 QAI Conference in Bangalore, India, I did a plenary talk in which I noted several important structural similarities between exploratory testing and leadership.

So, there it is: a repository. It now exists as part of the resources page on my Web site. Feel free to share; comments and suggestions for additions are welcome.

Best Bug… or Bugs?

Wednesday, December 9th, 2009

And now for the immodest part of the EuroSTAR 2009 Test Lab report:  I won the Best Bug award, although it’s not clear to me which bug got the nod, since I reported several fairly major problems. 

I tested OpenEMR.  For me, one candidate for the most serious problem would have been a consistent pattern of inconsistency in input handling and error checking.  I observed over a dozen instances of some kind of sloppiness.

This reminded me of a problem that we testers often see in project work, the problem of measuring by counting things—counting bugs, counting bug reports, counting requirements.  When the requirement is to defend the application against overflowing text fields and vulnerability to input constraint attacks by hackers, how should we count?  How many mentions of that should there be?  One, in a statement of general principles at the beginning of a requirements document?  Hundreds, in a statement of specific purpose for each input field in a functional specification?  How many requirements are there to make sure that fields don’t overflow?  How many requirements that they support only the characters, numbers, or date ranges that they’re supposed to?  What about traceability?  If this is a genuine problem, and the requirements documents don’t mention a particular requirement explicitly, should we refrain from reporting on a problem with that implicit requirement?

When I report an issue—for example, that practically all of the input fields in OpenEMR have some kind of problem with them—should that count as one bug report?  Since it applies to hundreds of fields, should it count as hundreds of bug reports?  When such a pervasive overall problem exists, should the tester make a report for each and every field in which he observes a problem?  And if you want to answer Yes, to that question:  is it worth the opportunity cost to do that when there are plenty of other problems in the product?

So again, there were so many instances of unconstrained and unchecked input that I stopped recording specifics and instead reported a general pattern in the bug tracking system.  My decision to do this was an instance of the Dead Horse stopping heuristic; reporting yet another instance of the same class of problem would be like flogging a dead horse.  I could have wasted a lot of time and energy reporting each instance of each problem I observed, along with specific symptoms and possible ramifications of each one.  Yet I’m very skeptical that this would serve the project well.  In my experience as a program manager for a product whose code was being developed outside our company, I found that there was steadily diminishing return in value for many reports of the same ilk.  When, in testing, we identified a general pattern of failure, we stopped looking for more instances.  We sent the product back to the development shop, and required the programmers and their testers to review the product through-and-through for that kind of problem.

If I were to be evaluated on the number of bugs that I found, I’d find it hard to resist the easy pickings of yet another input constraint attack bug report.  Yet when I’m testing, every moment of bug investigation and reporting is, by some reckoning, another moment that I can’t spend on obtaining more test coverage (more about that here).  By focusing on investigating and reporting on input problems (and thereby increasing my bug count), am I missing opportunities to design and perform tests on scheduling conflict-resolution algorithms, workflows, database integrity,…?

There were two other fairly serious problems that I observed.  One was that the Chinese version of the product showed a remarkable number of English words, presumably untranslated, interspersed among the ideograms; I expected to see no English at all.  I treated that problem in the same way as the input constraint problem:  with a single report of a general problem.

The second serious problem was that searches of various kinds would place a link in the address bar.  The link represented a command to a CGI script of some kind, which evidently constructed and forwarded a query to an underlying SQL database.  Backspacing over the last digit in the address bar and replacing it with a slash caused a lovely SQL error message to appear on the screen, unhandled by any of OpenEMR’s code.  The message could have been used, said our local product owner, to expose the structure of the database to snoops or hackers.  I found that problem by a defocusing heuristic—looking at the browser, rather than the browser window.

I don’t know which of these problems took Best Bug honours.  I’m not sure that the presenters specified which bug they were crediting with Best Bug.  That makes a certain kind of sense, since I can’t tell which of these problems is the most serious either.  After all, a problem isn’t its own thing; it’s a relationship between a person and a product or a situation.  There are plenty of ways to address a problem.  You could fix the product or the situation.  You could change the perspective or the perception of the person observing the problem, say by keeping the problem as it is but providing a workaround.  You could choose to ignore the problem yourself, which underscores the fact that a problem for some person might not be a problem for you.  That’s why it’s not helpful to count problems.

Managers:  do you see how evaluating testers based on test cases or bug counts, rather than the value of reporting, will lead to distortion at best, and more likely to dysfunction?  Do you see how providing overstructured test scripts or test cases could reduce the diversity—and therefore the quality—of testing?  Do you see how the notion of “one test per requirement” or “one positive and one negative test per requirement” is misleading?

Testers:  do you see how being evaluated on bug counts could lead to inattentional blindness with respect to the more serious problems than the low-hanging fruit affords?  Do you see how focusing on bugs, rather than focusing on test coverage, could reduce the value of your testing?

Instead of counting things, let’s consider evaluating testing work in a different way.  Let’s consider the overall testing story and its many dimensions.  Let’s think about the story around each  bug, and each bug report—not just the number of reports, but the meaning and significance of each one.  Let’s look at the value of the information to stakeholders, primarily to programmers and to product owners.  Let’s think about the extent to which the tester makes things easier for others on the team, including other testers.  Let’s look at the diversity of problems discovered, the diversity of approaches used, and the diversity of tools and techniques applied.  And rather than using this information to reward or punish testers, let’s use it to guide coaching, mentoring, and training such that the focus is on developing skill for everyone.

The dimensions above are qualitative, rather than quantitative.  Yet if our mission is to provide information to inform decisions about quality, we of all people should recognize that expressing value in terms of numbers often removes important information rather than adding it.

Additional reading: 

Measuring and Managing Performance in Organizations (Robert D. Austin)
Software Engineering Metrics:  What Do They Measure and How Do We Know? (Kaner and Bond)
Quality Software Management, Vol. 2:  First Order Measurement (Weinberg)
Perfect Software (and Other Illusions About Testing) (Weinberg)

EuroSTAR’s Test Lab: Bravo!

Wednesday, December 9th, 2009

One of the coolest things about EuroSTAR 2009 was the test lab set up by James Lyndsay and Bart Knaack.

James and Bart (who self-identified as Test Lab Rats) provided testers with the opportunity to have a go at two applications, FreeMind (an open-source mind-mapping program) and OpenEMR (an open-source product for tracking medical records). The Lab Rats did a splendid job of setting things up and providing the services and information that participants needed to get up and running quickly.

Sponsorship in the form of five laptop computers was provided through the good graces of Steve Green at Test Partners, Stuart Noakes at Transition Consulting Ltd., and Bart Knaack at Logica. James Lyndsay also lent a server and a router to the event.

Sponsorship was also provided by tool vendors (here in alphabetical order) Andagon, Microsoft, MicroFocus, Neotys, and Testing Technologies. These sponsors had their tools installed on the laptops, and presented their demos by applying them to OpenEMR and FreeMind as they were installed in the Test Lab. On a loose schedule, some of the presenters did talks and demonstrations of how they tested.

The aforementioned Stuart Noakes and Mieke Gievers gave advice and assistance to the Lab Rats.

Well, that’s all very nice, but what was it like?

As someone who spent a couple of hours in the lab, exploring the applications and listening in on the presentations, I’d say it was terrific (although the prospect that OpenEMR is being used in actual medical practices seemed faintly alarming). Both applications were sophisticated enough for some reasonably serious testing, and had interesting problems to discover and report.

Interestingly, none of the certificationists or the standardization folks sat in the lab and tested, to my knowledge.

Bravo to James and Bart, to the sponsors, to the conference organizers and to the program committee for putting this together.  Let’s see more actual testing at testing conferences!