Blog Posts from April, 2014

Very Short Blog Posts (17): Regression Obsession

Thursday, April 24th, 2014

Regression testing is focused on the risk that something that used to work in some way no longer works that way. A lot of organizations (Agile ones in particular) seem fascinated by regression testing (or checking) above all other testing activities. It’s a good idea to check for the risk of regression, but it’s also a good idea to test for it. Moreover, it’s a good idea to make sure that, in your testing strategy, a focus on regression problems doesn’t overwhelm a search for problems generally—problems rooted in the innumerable risks that may beset products and projects—that may remain undetected by the current suite of regression checks.

One thing for sure: if your regression checks are detecting a large number of regression problems, there’s likely a significant risk of other problems that those checks aren’t detecting. In that case, a tester’s first responsibility may not be to report any particular problem, but to report a much bigger one: regression-friendly environments ratchet up not only product risk, but also project risk, by giving bugs more time and more opportunity to hide. Lots of regression problems suggest a project is not currently maintaining a sustainable pace.

And after all, if a bug clobbers your customer’s data, is the customer’s first question “Is that a regression bug, or is that a new bug?” And if the answer is “That wasn’t a regression; that was a new bug,” do you expect the customer to feel any better?

Related material:

Regression Testing (a presentation from STAR East 2013)
Questions from Listeners (2a): Handling Regression Testing
Testing Problems Are Test Results
You’ve Got Issues

A Tale of Four Projects

Wednesday, April 23rd, 2014

Once upon time, in a high-tech business park far, far away, there were four companies, each working on a development project.

In Project Blue, the testers created a suite of 250 test cases, based on 50 use cases, before development started. These cases remained static throughout the project. Each week saw incremental improvement in the product, although things got a little stuck towards the end. Project Blue kept a table of passing vs. failing test cases, which they updated each week.

Date Passed Failed Total
01-Feb 25 225 250
08-Feb 125 125 250
15-Feb 175 75 250
22-Feb 200 50 250
29-Feb 225 25 250
07-Mar 225 25 250

In Project Red, testers constructed a suite of 10 comprehensive scenarios. The testers refined these scenarios as development progressed. In the last week of the project, a change in one of the modules broke several elements in scenario that had worked in the first two weeks. One of Project Red’s KPIs was a weekly increase in the Passing Scenarios Ratio.

Date Passed Failed Total
01-Feb 1 9 10
08-Feb 5 5 10
15-Feb 5 3 10
22-Feb 8 2 10
29-Feb 9 1 10
07-Mar 9 1 10

Project Green used an incremental strategy to design and refine a suite of test cases. Management added more testers to the project each week. As the project went on, the testers also recruited end users to assist with test design and execution. At the end of four weeks, the team’s Quality Progress Table looked like this:

Date Passed Failed Total
01-Feb 1 9 10
08-Feb 25 25 50
15-Feb 70 30 100
22-Feb 160 40 200

In Week 5 of Project Green, the managers called a monster triage session that led to the deferral of dozens of Severity 2, 3, and 4 bugs. Nine showstopper bugs remained. In order to focus on the most important problems, management decreed that only the showstoppers would be fixed and tested in the last week. And so, in Week 6 of Project Green, the programmers worked on only the showstopper bugs. The fixes were tested using 30 test cases. Testing revealed that six showstoppers were gone, and three persisted. All the deferred Severity 2, 3, and 4 bugs remained in the product, but to avoid confusion, they no longer appeared on the Quality Progress Table.

Date Passed Failed Total
01-Feb 1 9 10
08-Feb 25 25 50
15-Feb 70 30 100
22-Feb 160 40 200
29-Feb 450 50 500
07-Mar 27 3 30

In the first few weeks of Project Purple, testers worked interactively with the product to test the business rules, while a team of automation specialists attempted to create a framework that would exercise the product under load and stress conditions. At the end of Week Four, the Pass Rate Dashboard looked like this:

Date Passed Failed Total
01-Feb 1 9 10
08-Feb 25 25 50
15-Feb 70 30 100
22-Feb 80 20 100

In Week 5 of Project Purple, the automation framework was finally ready. 820 performance scenario tests were run that revealed 80 new bugs, all related to scalability problems. In addition, none of the bugs opened in Week 4 were fixed; two key programmers were sick. So at the end of Week 5, this was the picture from the Pass Rate Dashboard:

Date Passed Failed Total
01-Feb 1 9 10
08-Feb 25 25 50
15-Feb 70 30 100
22-Feb 80 20 100
29-Feb 900 100 1000

In Week 6 of Project Purple, the programmers heroically fixed 40 bugs. But that week, a tester discovered a bug in the automation framework. When that bug was fixed, the framework revealed 40 entirely new bugs. And they’re bad; the programmers report most of them will take at least three weeks to fix. Here’s the Pass Rate Dashboard at the end of Week 6:

Date Passed Failed Total
01-Feb 1 9 10
08-Feb 25 25 50
15-Feb 70 30 100
22-Feb 80 20 100
29-Feb 900 100 1000
07-Mar 900 100 1000

Here’s the chart that plots the percentage of passing test cases, per week, for all four projects.

Four entirely different projects.

As usual, James Bach contributed to this article.

“In The Real World”

Monday, April 21st, 2014

In Rapid Software Testing, James Bach, our colleagues, and I advocate an approach that puts the skill set and the mindset of the individual tester—rather than some document or tool or test case or process modelY—at the centre of testing. We advocate an exploratory approach to testing so that we find not only the problems that people have anticipated, but also the problems they didn’t anticipate. We challenge the value of elaborately detailed test scripts that are expensive to create and maintain. We note that inappropriate formality can drive testers into an overly focused mode that undermines their ability to spot important problems in the product. We don’t talk about “automated testing”, because testing requires study, learning, critical thinking, and qualitative evaluation that cannot be automated. We talk instead about automated checking, and we also talk about using tools (especially inexpensive, lightweight tools) to extend our human capabilities.

We advocate stripping documentation back to the leanest possible form that still completely supports and fulfills the mission of testing. We advocate that people take measurement seriously by studying measurement theory and statistics, and by resisting or rejecting metrics that are based on invalid models. We’re here to help our development teams and their managers, not mislead them.

All this appeals to thinking, curious, engaged testers who are serious about helping our clients to identify, evaluate, and stamp out risk. But every now and then, someone objects. “Michael, I’d love to adopt all this stuff. Really I would. But my bosses would never let me apply it. We have to do stuff like counting test cases and defect escape ratios, because in the real world…” And then he doesn’t finish the sentence.

I have at least one explanation for why the sentence dangles: it’s because the speaker is going through cognitive dissonance. The speaker realizes that what he is referring to is not his sense of the real world, but a fantasy world that some people try to construct, often with the goal of avoiding the panic they feel when they confront complex, messy, unstable, human reality.

Maybe my friend shares my view of the world.

That’s what I’m hoping. My world is one in which I have to find important problems quickly for my clients, without wasting my client’s time or money. In my world, I have to develop an understanding of what I’m testing, and I have to do it quickly.

I’ve learned that specifications are rarely reliable, consistent, or up to date, and that the development of the product has often raced ahead of people’s capacity to document it. In my world, I learn most rapidly about products and tools by interacting with them.

I might learn something about the product by reading about it, but I learn more about the product by talking with people about it, asking lots of questions about it, sketching maps of it, trying to describe it. I learn even more about it by testing it, and by describing what I’ve found and how I’ve tested. I don’t structure my testing around test cases, since my focus is on discovering important problems in the product—problems that may not be reflected in scripted procedures that are expensive to prepare and maintain. At best, test cases and documents might help, maybe, but in my world, I find problems mostly because of what I think and what I do with the product.

In my world, it would be fantasy to think that a process model or a document—rather than the tester—is central to excellent testing (just as only in my fantasy world could a document—rather than the manager—be central to excellent management). In my world, people are at the centre of the things that people do.

In my version of a fantasy world, one could believe that conservative confirmatory testing is enough to show that the product fulfills its goals without significant problems for the end user. In my world, we must explore and investigate to discover unanticipated problems and risks.

If I wanted to do fake testing, I would foster the appearance of productivity by creating elaborate, highly-polished documentation that doesn’t help us to do work more effectively and more efficiently. But I don’t want to do that. In my world, doing excellent testing takes precedence over writing about how we intend—or pretend—to do testing.

In my version of a dystopic fantasy world, it would be okay to accept numbers without question or challenge, even if the number had little or no relationship to what supposedly was being measured. In my world, quantitative models allow us to see some things more clearly while concealing other things that might be important. So in my world, it’s a good idea to evaluate our numbers and our models—and our feelings about them—critically to reduce the chance that we’ll mislead ourselves and others.

I could fantasize about a world in which it would be obvious that numbers should drive decisions. In what looks like the real world to me, it’s safer to use numbers to help question our feelings and our reasoning.

Are you studying testing and building your skills? Are you learning about and using approaches that qualitative researchers use to construct and describe their work? If you’re using statistics to describe the product or the project, are you considering the validity of your constructs—and threats to validity?

Are you considering how you know what you know? Are you building and refining a story of the product, the work you’re doing, and the quality of that work? If you’re creating test tools, are you studying programming and using the approaches that expert programmers use? Are you considering the cost and value of your activities and the risks that might affect them? Are you looking only at the functional aspects of the product, or are you learning about how people actually use the product to get real work done?

Real-world people doing real-world jobs—research scientists, statisticians, journalists, philosophers, programmers, managers, subject matter experts—do these things. I believe I can learn to do them, and I’m betting you could learn them too. It’s a big job to learn all this stuff, but learning—for ourselves and for others—is the business that I think we testers are in, in my real world. Really.

Is your testing bringing people closer to what you would consider a real understanding of the real product—especially real problems and real risks in the product—so that they can make informed decisions about it? Or is it helping people to sustain ideas that you would consider fantasies?

Very Short Blog Posts (16): Usability Problems Are Probably Testability Problems Too

Wednesday, April 16th, 2014

Want to add ooomph to your reports of usability problems in your product? Consider that usability problems also tend to be testability problems, and vice versa.

The design of the product may make it frustrating, inconsistent, slow, or difficult to learn. Poor affordances may conceal useful features and shortcuts. Missing help files could fail to address confusion; self-contradictory or misleading help files could add to it. All of these things may threaten the value of the product for the intended users.

Bad as they might be, problems like this may also represent issues for testing. A product with a slick and speedy user interface is more likely to be pleasure to test. Clumsy or demotivating user interfaces present issues that may make testing harder or slower—and issues give bugs more time and more opportunity to hide.

Now: consider that the opposite applies too. If you’re having a hard time testing your product, what does that say about people’s experiences when they try to use it?

Related post: You’ve Got Issues

(This post was updated 2020-12-10.)

I’ve Had It With Defects

Wednesday, April 2nd, 2014

The longer I stay in the testing business and reflect on the matter, the more I believe the concept of “defects” to be unclear and unhelpful.

A program may have a coding error that is clearly inconsistent with the program’s specification, whereupon I might claim that I’ve found a defect. The other day, an automatic product update failed in the middle of the process, rendering the product unusable. Apparently a defect. Yet let’s look at some other scenarios.

  • I perform a bunch of testing without seeing anything that looks like a bug, but upon reviewing the code, I see that it’s so confusing and unmaintainable in its current state that future changes will be risky. Have I found a defect? And how many have I found?
  • I observe that a program seems to be perfectly coded, but to a terrible specification. Is the product defective?
  • A program may be perfectly coded to a wonderfully written specification— even though the writer of the specification may have done a great job at specifying implementation for a set of poorly conceived requirements. Should I call the product defective?
  • Our development project is nearing release, but I discover a competitive product with this totally compelling feature that makes our product look like an also-ran. Is our product defective?
  • Half the users I interview say that our product should behave this way, saying that it’s ugly and should be easier to learn; the other half say it should behave that way, pointing out that looks don’t matter, and once you’ve used the product for a while, you can use it quickly and efficiently. Have I identified a defect?
  • The product doesn’t produce a log file. If there were a log file, my testing might be faster, easier, or more reliable. If the product is less testable than it could be, is it defective?
  • I notice that the Web service that supports our chain of pizza stores slows down noticeably dinner time, when more people are logging in to order. I see a risk that if business gets much better, the site may bog down sufficiently that we may lose some customers. But at the moment, everything is working within the parameters. Is this a defect? If it’s not a defect now, will it magically change to a defect later?

On top of all this, the construct “defect” is at the centre of a bunch of unhelpful ideas about how to measure the quality of software or of testing: “defect count”; “defect detection rate”; “defect removal efficiency”. But what is a defect? If you visit LinkedIn, you can often read some school-marmish clucking about defects. People who talk about defects seem refer to things that are absolutely and indisputably wrong with the product. Yet in my experience, matters are rarely so clear. If it’s not clear what is and is not a defect, then counting them makes no sense.

That’s why, as a tester, I find it much more helpful to think in terms of problems. A problem is “a difference between what is perceived and what is desired” or “an undesirable situation that is significant to and maybe solvable by some agent, though probably with some difficulty”. (I’ve written more about that here.) A problem is not something that exists in the software as such; a problem is relative, a relationship between the software and some person(s). A problem may take the form of a bug—something that threatens the value of the product—or an issue—something that threatens the value of the testing, or of the project, or of the business.

As a tester, I do not break the software. As a reminder of my actual role, I often use a joke that I heard attributed to Alan Jorgenson, but which may well have originated with my colleague James Bach: “I didn’t break the software; it was broken when I got it.” That is, rather than breaking the software, I find out how and where it’s broken. But even that doesn’t feel quite right. I often find that I can’t describe the product as “broken” per se; yet the relationship between the product and some person might be broken. I identify and illuminate problematic relationships by using and describing oracles, the means by which we recognize problems as we’re testing.

Oracles are not perfect and testers are not judges, so to me it would seem presumptuous of me to label something a defect. As James points out, “If I tell my wife that she has a defect, that is not likely to go over well. But I might safely say that she is doing something that bugs me.” Or as Cem Kaner has suggested, shipping a product with known defects means shipping “defective software”, which could have contractual or other legal implications (see here and here, for examples).

On the one hand, I find that “searching for defects” seems too narrow, too absolute, too presumptuous, and politically risky for me. On the other, if you look at the list above, all those things that were questionable as defects could be described more easily and less controversially as problems that potentially threaten the value of the product. So “looking for problems” provides me with wider scope, recognizes ambiguity, encourages epistemic humility, and acknowledges subjectivity. That in turn means that I have to up my game, using many different ways to model the product, considering lots of different quality criteria, and looking not only for functional problems but anything that might cause loss, harm, or annoyance to people who matter.

Moreover, rejecting the concept of defects ought to help discourage us from counting them. Given the open-ended and uncertain nature of “problem”, the idea of counting problems would sound silly to most people—but we can talk about problems. That would be a good first step towards solving them—addressing some part of the difference between what is perceived and what is desired by some person or persons who matter.

That’s why I prefer looking for problems—and those are my problems with “defects”.

Very Short Blog Posts (15): “Manual” and “Automated” Testers

Tuesday, April 1st, 2014

“Help Wanted. Established scientific research lab seeks Intermediate Level Manual Scientist. Role is intended to complement our team of Automated and Semi-Automated Scientists. The successful candidate will perform research and scientific experiments without any use of tools (including computer hardware or software). Requires good communication skills and knowledge of the Hypothesis Development Life Cycle. Bachelor’s degree or five years of experience in manual science preferred.”

Sounds ridiculous, doesn’t it? It should.

Related post:

“Manual” and “Automated” Testing