Blog Posts from September, 2006

Get to know Mike Kelly

Saturday, September 9th, 2006

I have no good explanation for why, until today, I hadn’t added Mike Kelly to the list of people that I respect.

Fixed.

Mike is President of the Association for Software Testing, a terrific, articulate, and (*ahem!*) regular blogger. He’s one of the small-but-growing group of passionate advocates for real tester skill. He’s engaged with the creation of Open Certification for Software Testers, which even I might be able to endorse. He also contributes to the community via the Indianapolis Workshops on Software Testing and the AST. For me, he and his wife Amy were a couple of highlights of Consultants’ Camp this year.

If you’re interested in some really solid thinking and writing about testing, I’d recommend that you check out his blog at http://www.testingreflections.com/blog/55.

Regression Testing, part 2

Friday, September 8th, 2006

Continuing Grig Gheorghiu’s questions from the Agile Testing mailing list…

I was just curious to know how you proceed in this case. I guess you teach your team to apply the rapid testing principles and techniques. Have you found that these principles/techniques are easily understood and applied? Are you using session-based testing? Have you still noticed regressions escaping out in the field? How many people do you usually have on your teams? I’d be interested in stuff like this…

Sometimes I am hired as a tester on short-term contracts. In the last couple of years, I’ve worked in teams varying in size from eight to three. In such cases, rapid and exploratory techniques got a lot more results, more quickly, than the existing scripted tests. The existing tests were usually dopey–stale, full of errors, and incapable of finding bugs. In some cases, the tests should have been automated; in most cases, they shouldn’t have been run at all, so low was their information value. The bugs that I found with exploration had never been covered by existing regression tests, and my experience was that bugs, once fixed, stayed fixed.

In one organization, despite management’s initial skepticism, I did lots of exploratory tests. At first, management was concerned that I was running far fewer tests than the scripted testers. However, I was finding far more bugs than the rest of the team, because they were wrestling with outdated and irrelevant scripted tests that narrowed the testers’ focus such that they weren’t spotting real problems. I was spending more time on bug investigation and reporting than on test design and execution, which had a big impact on test coverage. Since I was testing in a way that was (I believe) far more diversified and harsh than the scripts and the other testers, it wasn’t uncommon for me to find several bugs for each test idea.

Most of the tests that management required me to run had explicit data values associated with them. I found that the test data was all stale, but in updating it, I greatly expanded the range of values that were being used, and I found a lot more bugs. In the next test cycle, management was far more interested in an exploratory approach than they had been before. However, I persuaded them that, for that cycle, we needed to develop an automated oracle that would allow us to do more exploratory tests, more quickly. This turned out to be quite a powerful tool. However, most of the testers were contractors, my mandate at that organization didn’t involve training, and the organizational structure inhibits disruption, even if it’s positive.

A colleague is working at that organization, and reports that the default automation development strategy is to automate all existing tests. These are bad manual tests and would be even worse automated tests, so that’s an approach he is trying hard to change. That organization is big, slow, not agile, and not likely to change quickly.

In another organization for which I did active, day-to-day testing work, I was subsituting for a woman who was on maternity leave. I used mostly an exploratory approach, aided and abetted by Perl and Microsoft Excel. I used Fitnesse too, but found that it was of questionable value as a test tool (though I thought it had a lot of power as a requirements-with-examples tool and runnable examples). In particular, small numbers of tests were being run against the GUI using HTMLFixture–which for me was a high-maintenance, low-value approach to GUI testing. At that level, eyeballing stuff tended to be faster for most testing purposes.

When the woman returned, I stayed for a few more weeks, and in the last few weeks gave a class in rapid testing. There were three testers on the team–Tester A had lots of experience in mostly scripted approaches; Tester B had lots of experience in mostly exploratory approaches; and Tester C had little testing experience but a fair deal of training in programming. One of the tenets of rapid testing is the diversified team, so this balance suited well.

The team took the rapid testing practices and ran with them, instituting session-based testing and the testing dashboard (which is the Big Visible Chart approach to test cycle reporting). The project was a major rewrite of an in-production legacy application for which there were very few unit tests. Developers added those as they went. Tester C worked mostly with Ruby and WATIR; Tester A specialized in making sure that the important, risky test cases were followed; Tester B tended to build updated Fitnesse stories and work on test design for upcoming development. All the testers used exploratory sessions to help improve overall test design. Due to other duties, the test manager found it difficult to keep up with the session debriefing protocol, so the testers debriefed each other. Tester A would debrief Tester B; B would debrief C, and C would debrief A. They found lots of bugs. During the last iteration before release, they swapped session sheets from earlier sessions, such that A would perform the session performed by B and debriefed by C, and so on. The session sheets are detailed enough to guide regression tests, but open enough to stimulate creative diversity and new test ideas. Because of the effectiveness of the earlier testing, the new unit tests, and the robustness of the fixes, very few bugs were found at regression testing time, and I’m not aware of anything serious that went into release.

One of the key themes of the rapid testing philosophy is diversity–diversity of people, diversity of models, diversity of coverage, diversity of quality criteria, diversity of test approaches. So rather than depending on the uniform behaviour of a machine and of the program that runs on it, we use diversity to find lots of bugs.

Both James Bach and I teach the Rapid Software Testing course. In my own experience teaching the course, I’ve found that testers tend to grasp the skills easily and can begin to apply them immediately. Testers who were not motivated before the class become motivated; testers were become more motivated. By the end of the class, if it’s a corporate client, we often turn testers loose on their own products, and they very excitedly tend to find lots of new stuff. Many of our clients have adopted session-based test management, and are reporting that it’s powerful and effective, and the testers love it. Once they are encouraged to use their skills and intelligence instead of being controlled by a script, the testers tend to find more bugs with less effort, and they feel good about it. When management recognizes the value of this and encourages it, a positive feedback loop gets set up.

James has been teaching rapid testing to a large client for the last six years. In 2001, he and his brother Jon were hired to start an exploratory testing team in one group which exists to this day. The team (not Jon and James) used session-based test management and its metrics to demonstrate to management that, using exploratory testing, they found large numbers of bugs compared to previous approaches, which were mostly automated.

A blend of approaches worked better. A direct quote from the project manager: “We re-architected our firmware component, modifying nearly 80% of the code. This redesign involved no change in product features, just restructuring of the code. We thought this would be a situation in which our regression tests would really show their worth. They already covered the functionality to be tested so testing could start early with no delay for test development. But we chose to test this large code change by applying our scripted regression tests and exploratory testing in parallel.

“There were approximately 100 defects found. Exploratory tests found about 80% of them.”

—Michael B.

Regression Testing, part I

Wednesday, September 6th, 2006

More traffic from the Agile Testing mailing list; Grig Gheorghiu is a programmer in Los Angeles who has some thoughtful observations and questions.

I’m well aware of the flame wars that are going on between the ‘automate everything’ camp and the ‘rapid testing’ camp. I was hoping you can give some practical, concrete, non-sweeping-generalization-based examples of how your testing strategy looks like for a medium to large project that needs to ensure that every release goes out with no regressions. I agree that not all tests can be automated. For those tests that are not automated, my experience is that you need a lot of manual testers to make sure things don’t slip through the cracks.

That sounds like a sweeping generalization too. 🙂

I can’t provide you with a strategy that ensures that every release goes out with no regressions. Neither can anyone else.

Manual tests are typically slower to run than automated tests. However, they take almost no development time, and they have diversity and high cognitive value, thus they tend to have a higher chance of revealing new bugs. Manual tests can also reveal regression bugs, especially when they’re targeted for that purpose. Human testers are able to change their tests and their strategies based on observations and choices, and they can do it at an instant.

Automated tests can’t do that. At the end-user, system, and integration level, they tend to have a high development cost. At the unit level, the development cost is typically much lower. Unit tests (and even higher-level tests) are typically super-fast to run and ensure that all the tests that used to pass still pass.

When the discussion about manual tests vs. automated tests gets pathological, it’s because some people seem to miss the point that testing is about making choices. We have an infinite number of tests that we could run. Whatever subset of those tests we choose, and no matter how many we think we’re running, we’re still running a divided-by-infinity fraction of all the possibilities. That means that we’re rejecting huge numbers of tests with every test cycle. One of the points of the rapid testing philosophy is that by making intelligent choices about risks and coverage, we change our approach from compulsively rejecting huge numbers of tests to consciously rejecting huge numbers of tests.

So the overall strategy for regression testing is to make choices that address the regression risk. All that any strategy can do–automated, manual, or some combination of the two–is to provide you with some confidence that there has been minimal regression (where minimal might be equal to zero, but no form of testing can prove that).

Other factors can contribute to that confidence. Those factors could include small changes to the code; well-understood changes, based on excellent investigation of the original bug; smart, thoughtful developers; strolls through the code with the debugger; unit testing at the developer level; automated unit testing; paired developers; community ownership of the code; developers experienced with the code base; readable, maintainable, modular code; technical review; manual and automated developer testing of the fix at some higher level than the unit level; good configuration management. People called “testers” may or may not be involved in any of these activities, and I’m probably missing a few. All of these are filters against regression problems, and many of them are testing activities to some degree. Automated tests of some description might be part of the picture too.

When it comes to system-level regression testing, here are the specific parts of one kind of strategy. As soon as some code, any code, is available, rapid testers will test the change immediately after the repair is done. In addition, they’ll do significant manual testing around the fix. This need not require a whole bunch of testers, but investigative skills are important. Generally speaking, a fix for a well-investigated and well-reproduced bug can be tested pretty quickly, and feedback about success or failure of the fix can also be provided rapidly. We could automate tests at this higher level, but in my experience, at this point it is typically far more valuable to take advantage of the human tester’s cognitive skills to make sure that nothing else is broken. (Naturally, if a tool extends our capabilities in some form, we’ll use it if the cost is low and the value high.)

In an ideal scenario, it’s the unit tests that help to ensure that this particular problem doesn’t happen again. But even in a non-ideal scenario, things tend to be okay. Other filters heuristically capture the regression problems–and when regressions make it as far as human testers, thoughtful people will usually recognize that it’s a serious problem. If that happens, developers and development management tend to make changes farther up the line.

It was ever thus. In 1999, at the first Los Altos Workshop on Software Testing, a non-scientific survey of a roundtable of very experienced testers indicated that regression tests represented about 15% of the total bugs found over the course of the project, and that the majority of these were found in development or on the first run of the automated tests. Cem Kaner, in a 1996 class that I was attending that an empirical survey had noted a regression rate of about 6%. Both he and I are highly skeptical about empirical studies of software engineering, but those resultscould be translated with suitable margin of error into “doesn’t happen very much”.

With agile methods–including lots of tests at the level of functional units–that turns into “hardly happens at all”, especially since the protocol is to write a new test, at the lowest level possible, for every previous failure determined by other means. This extra check is both necessary and welcome in agile models, because the explicit intention is to move at high velocity. Agile models mitigate the risk of high velocity, I perceive, by putting in low-level automated unit tests, where it’s cheap to automate–low cost and high value. Automated regression at the system level is relatively high cost and low value–that is, it’s expensive, if all we’re addressing is the risk of developers making silly mistakes.

I’ll have more to say about some specific circumstances in a future post. Meanwhile, I hope this helps.