Blog Posts from September, 2011

Testing: Difficult or Time-Consuming?

Thursday, September 29th, 2011

In my recent blog post, Testing Problems Are Test Results, I noted a question that we might ask about people’s perceptions of testing itself:

Does someone perceive testing to be difficult or time-consuming? Who? What’s the basis for that perception? What assumptions underlie it?

The answer to that question may provide important clues to the way people think about testing, which in turn influences the cost and value of testing.

As an example, an pseudonymous person (“PM Hut”) who is evidently associated with project management in some sense (s/he provides the URL http://www.pmhut.com) answered my questions above.

Just to answer your question “Does someone perceive testing to be difficult or time-consuming?” Yes, everyone, I can’t think of a single team member I have managed who doesn’t think that testing is time consuming, and they’d rather do something else.

This, alas, isn’t an unusual response. To someone like me who offers help in increasing the value and reducing the cost of testing, it triggers some questions that might prompt reframes or further questions.

  • What do the team members think testing is? Do they think that it’s something ancillary to the project, rather than an essential and integrated aspect of software development? To me, testing is about gathering information and raising awareness that’s essential for identifying product risks and steering the project. That’s incredibly important and valuable.

    So when the team members are driving a car, do they perceive looking out the windshield to be difficult or time-consuming? Do they perceive looking at the dashboard to be difficult or time-consuming? If so, why? What are the differences between the way they obtain awareness when they’re driving a car, versus the way they obtain awareness when they’re contributing to the development of a product or service?

  • Do the team members think testing is the mindless repetition of actions and observation of specific outputs, as prescribed by someone else? If so, I’d agree with them that testing is an unpalatable activity—except I don’t call that testing. I call it checking, and I’d rather let a machine do it. I’d also ask if checking is being done automatically by the programmers at lower levels where it tends to be fast, cheap, easy, useful and timely—or manually at higher levels, where it tends to be slower, more expensive, more difficult, less useful, and less timely—and tedious?
  • Is testing focused mostly on confirmation of things that we already know or hope to be true? Is it mostly focused on the functional aspects of the program (which are amenable to checking)? People tend to find this dull and tedious, and rightly so. Or is testing an active search for new information, problems, and risks? Does it include focus on parafunctional aspects of the product—the things that provide important perceptions of real value to real people? Are the testers given the freedom and responsibility to manage a good deal of their own investigation? Testers tend to find this kind of approach a lot more engaging and a lot more interesting, and the results are typically more wide-ranging, informative, and valuable to programmers and managers.
  • Is testing overburdened by meaningless and valueless paperwork, bureaucracy, and administrivia? How did that come to pass? Are team members aware that there are simple, lightweight, rapid, and highly effective ways of planning, recording, and reporting testing work and project status?
  • Are there political issues? Are testers (or people acting temporarily in a testing role) routinely blown off (as in this example)? Are the nuggets of information revealed by testing habitually dismissed? Is that because testing is revealing trivial information? If so, is there a problem with specific testing skills like modeling the test space, determining coverage, determining oracles, recording, or reporting?
  • Have people been trained on the basis of testing as a skilled, sophisticated thinking art? Or is testing something for which capability can be assessed by a trivial, 40-question multiple choice exam?
  • If testing is being done well (which given people’s attitudes expressed above would be a surprise), are programmers or managers afraid of having to deal with the information that testing reveals? Does that lead to recrimination and conflict?
  • If there’s a perception that testing is by its nature dull and slow, are the testers aware of the quick testing approaches in our Rapid Software Testing class (PDF, page 97-99) , in the Black Box Software Testing course offered by the Association for Software Testing, or in James Whittaker’s How to Break Software? Has anyone read and absorbed Lessons Learned in Software Testing?
  • If there’s a perception that technical reviews are slow, have the testers, programmers, or managers read Perfect Software and Other Illusions About Testing? Do they recognize the ways in which careful observation provides us with “instant reviews” (see Perfect Software, page 143)? Has anyone on the team read any other of Jerry Weinberg’s books on software management and measurement?
  • Have the testers, programmers, and managers recognized the extent to which exploratory testing is going on all the time? Do they recognize that issues revealed by testing might be even more important than bugs? Do they understand that every test result and every testing problem points to meta-information that can be extremely valuable in managing the project?

On PM Hut’s own Web site, there’s an article entitled “Why Project Managers Fail“. The author, Jim Benson, lists five common problems, each of which could be quickly revealed by looking at testing as a source of information, rather than by simply going through the motions. Take it from the former program manager of a product that, in its day, was the best-selling piece of commercial software in the world: testers, testing, and the information they reveal are a project manager’s best friends and most valuable assets—when you have the awareness to recognize them.

Testing need not be difficult, tedious or time-consuming. A perception that it is so, or that it must be so, suggests a problem with testing as practised or testing as perceived. Astute managers and teams will investigate that important and largely mistaken perception.

The Cooking Detector

Friday, September 23rd, 2011

A heuristic is a fallible method for solving a problem or making a decision. “Heuristic” as an adjective means “something that helps us to learn”. In testing, an oracle is a heuristic principle or mechanism by which we recognize a problem.

Some years ago, during a lunch break from the Rapid Software Testing class, a tester remarked that he was having a good time, but that he wanted to know how to get over the boredom that he experienced whenever he was testing. I suggested to him that if found that testing was boring, something was wrong, and that he could consider the boredom as something we call a trigger heuristic. A trigger heuristic is like an alarm clock for a slumbering mind. Emotions are natural trigger heuristics, nature’s way of stirring us to wake up and pay attention. What was the boredom signalling? Maybe he was covering ground that he had covered before. Maybe the risks that he had in mind weren’t terribly significant, and other, more important risks were looming. Maybe the work he was doing was repetitive and mechanical, better left to a machine.

Somewhat later, I realized that every time I had seen a bug in a piece of software, an emotion had been involved in the discovery. Surprise naturally suggested some kind of unexpected outcome. Amusement followed an observation of something that looked silly and that posed a threat to someone’s image. Frustration typically meant that I had been stymied in something that I wanted to accomplish.

There is a catch with emotions, though: they don’t tell you explicitly what they’re about. In that, they’re like this device we have in our home. It’s mounted in a hallway, and it’s designed to alert us to danger. It does that heuristically: it emits a terrible, piercing noise whenever I’m baking bread or broiling a steak. And that’s why, in our house, we call it the cooking detector. The cooking detector, as you may have guessed, came in a clear plastic package labelled “smoke detector”.

When the cooking detector goes off, it startles us and definitely gets our attention. When that happens, we make more careful observations (look around; look at the oven; check for a fire; observe the air around us). We determine the meaning of our observations (typically “there’s enough smoke to set off the cooking detector, and it’s because we’re cooking“); and we evaluate the significance of them (typically, “no big deal, but the noise is enough to make us want to do something”). Whereupon we perform some appropriate control action: turn on the fan over the stove, open a door or a window, turn down the oven temperature, mop up any oil that has spilled inside the oven, check to make sure that the steak hasn’t caught fire. Oh, and reset the damned cooking detector.

Notice that the package says “smoke detector”, not “fire detector”. The cooking detector apparently can’t detect fires. Indeed, on the two occasions that we’ve had an actual fire in the kitchen (once in the oven and once it the toaster over), the cooking detector remained resolutely and ironically silent. We were already in the kitchen, and noticed the fires and put them out before the cooking detector detected the smoke. Had one of the fires got bad enough, I’m reasonably certain the cooking detector would have squawked eventually. That’s a good thing. Even though our wiring is in good shape, we don’t smoke, and the kids are fire-aware, one never knows what could happen. The alarm could give us a chance to extinguish a fire early, to help to reduce damage, or to escape life-threatening danger.

The cooking detector is like a programmer’s unit test—an automated check. It makes a low-level, one-dimensional, one-bit observation: smoke, or no smoke. It’s oblivious to any threat that doesn’t manifest itself as smoke, such as the arrival of a burglar or a structural weakness in the building. The maximum value of the cooking detector is unlikely to be realized. It occasionally raises a ruckus, and when it does, it doesn’t tell us what the ruckus is about. Usually it’s for something that we can understand, explain, and deal with quickly and easily. Smoke doesn’t automatically mean fire. The cooking detector is an oracle, a device that provides a heuristic trigger, and heuristic devices are fallible. The cooking detector doesn’t tell us that there is a problem; only that there might be a problem. We have to figure out whether there’s a problem, and if so, what the problem is.

Yet the cooking detector comes at low cost. It didn’t cost much to buy, it takes one battery a year, and it’s easy to reset. More importantly, the problem to which it alerts us is a potentially terrible problems. Although it doesn’t tell us what the problem is, it tells us to pay attention so that we can investigate and decide on what to do, before a problem gets serious without our notice. Smoke doesn’t automatically mean fire, but it does mean smoke. Where there’s smoke, maybe there’s fire, or maybe there’s something else that’s unpleasant or dangerous. The cooking detector reminds us to check the steak, open the windows, clean the oven every once in a while, evaluate what’s going on. I don’t believe that the smoking detector will ever detect a real, serious problem that we don’t know about already—but I’m not prepared to bet my family’s life on that.

At Least Three Good Reasons for Testers to Learn to Program

Tuesday, September 20th, 2011

There is a common claim, especially in the Agile community, that suggests that all testers should be able to write programs. I don’t think that’s the case. In the Rapid Software Testing class, James Bach and I say that testing is “questioning a product in order to evaluate it”. That’s the short form of my personal definition of testing, “investigation of people, software, computers, and related goods and services, and the relationships between them”. Most people who use computer programs are not computer programmers, and there are many ways in which a tester can question and investigate a product without programming.

Yet there are at least three very good reasons why it might be a good idea for a tester to learn to program.

Tooling. Computer programs extend our capacity to sense what’s happening, to make decisions, and to perform actions. There are many wonderful packaged tools, in dozens of categories, available to us testers. Every tool offers some level of control through a variety of affordances. Some provide restrictive controls, like text fields or drop-down lists or check boxes. Other tools provide macros so that we can string together sequences of actions. Some tools come with full-blown scripting languages that provide the capacity for the tester to sense, decide, and act through the tool in very flexible and specific ways. There are also, of course, general-purpose programming and scripting languages in their own right. When we can use programming concepts and programming languages, we have a more powerful and adaptable tool set for our investigations.

Insight. When we learn to program, we develop understanding about the elements and construction of programs and the computers on which they run. We learn how data is represented inside the computer, and how bits can be interpreted and misinterpreted. We learn about flow control, decision points, looping, and branching—and how mistakes can be made. We might even be able to read the source code of the programs that we’re testing, which can be valuable in review, troubleshooting, or debugging. Even if we never see our programs’ source code, when we learn about how programs work, we gain insight into how they might not work.

Humility. Dan Spear, a great programmer at Quarterdeck and a good friend, once pointed out to me that programming a computer is one of the most humbling experiences available to us. When you program a computer to perform some task, it will reflect—often immediately and often dramatically—any imprecision in your instructions and your underlying ideas. When we learn to program, we get insight not only into how a program works, but also into how difficult programming can be. This should trigger respect for programmers, but it should also trigger something else: empathy, which—as Jerry Weinberg says—is the first and most important aspect of the emotional set and setting for the tester.

Testing Problems Are Test Results

Tuesday, September 6th, 2011

I often do an exercise in the Rapid Software Testing class in which I ask people to catalog things that, for them, make testing harder or slower. Their lists fit a pattern I hear over and over from testers (you can see an example of the pattern in this recent question on Stack Exchange). Typical points include:

  • I’m a tester working alone with several programmers (or one of a handful of testers working with many programmers).
  • I’m under enormous time pressure. Builds are coming in continuously, and we’re organized on one- or two-week development cycles.
  • The product(s) I’m testing is (are) very complex.
  • There are many interdependencies between modules within the product, or between products.
  • I’m seeing a consistent pattern of failures specifically related to those interdependencies; the tiniest change here can have devastating impact there—or anywhere.
  • I believe that I have to run a complete regression test on every build to try to detect those failures.
  • I’m trying to cope by using automated checks, but the complexity makes the automation difficult, the program’s testing hooks are minimal at best, and frequent product changes make the whole relationship brittle.
  • The maintenance effort for the test automation is significant, at a cost to other testing I’d like to do.
  • I’m feeling overwhelmed by all this, but I’m trying to cope.

On top of that,

  • The organization in which I’m working calls itself Agile.
  • Other than the two-week iterations, we’re actually using at most two other practices associated with Agile development, (typically) daily scrums or Kanban boards.

Oh, and for extra points,

  • The builds that I’m getting are very unstable. The system falls over under the most basic of smoke tests. I have to do a lot of waiting or reconfiguring or both before I can even get started on the other stuff.

How might we consider these observations?

We could choose to interpret them as problems for testing, but we could think of them differently: as test results.

Test results don’t tell us whether something is good or bad, but they may inform a decision or an evaluation or more questions. People observe test results and decide whether there are problems and what the problems are, what further questions are warranted, and what decisions should be made. Doing that requires human judgement and wisdom, consideration of lots of factors, and a number of possible interpretations.

Just as for automated checks and other test results, it’s important to consider a variety of explanations and interpretations for testing meta-results—observations about testing. If we don’t do that, we risk missing important problems that threaten the quality of testing effort, and the quality of the product, too.

As Jerry Weinberg points out in Perfect Software and Other Illusions About Testing, whatever else something might be, it’s information. If testing is, as Jerry says, gathering information with the intention of informing a decision, it seems a mistake to leave potentially valuable observations lying around on the floor.

We often run into problems when we test. But instead of thinking of them as problems for testing, we could also choose to think of them as symptoms of product or project problems—problems that testing can help to solve.

For example, when a tester feels outnumbered by programmers, or when a tester feels under time pressure, that’s a test result. The feeling often comes from the programmers generating more work and more complexity than the tester can handle without help. Yet complexity, like quality, is a relationship between some person and something else. Complexity on its own isn’t necessarily a problem, but the way people react to it might be. When we observe the ways in which people react to perceived complexity and risk, we might learn a lot.

  • Do we, as testers, help people to become conscious of the risks—especially the Black Swans—that typically accompany complexity?
  • If people are conscious of risk, are they paying attention to it? Are they panicking over it? Or are they ignoring it and whistling past the graveyard? Or…
  • Are people reacting calmly and pragmatically? Are they acknowledging and dealing with the complexity of the product?
  • If they can’t make the product or the process that it models less complex, are they at least taking steps to make that product or process easier to understand?
  • Might the programmers be generating or modifying code so quickly that they’re not taking the time to understand what’s really going on with it?
  • If someone feels that more testers are needed, what’s behind that feeling? (I took a stab at an answer to that question a few years back.)

How might we figure that out answers to those questions? One way might be to look at more of the test results and test meta-results.

  • Does someone perceive testing to be difficult or time-consuming? Who?
  • What’s the basis for that perception? What assumptions underlie it?
  • Does the need to investigate and report bugs overwhelm the testers’ capacity to obtain good test coverage? (I wrote about that problem here.)
  • Does testing consistently reveal consistent patterns of failure?
  • Are programmers consistently surprised by such failures and patterns?
  • Do small changes in the code cause problems that are disproportionately large or hard to find?
  • Do the programmers understand the product’s interdependencies clearly? Are those interdependencies necessary, or could they be eliminated?
  • Are programmers taking steps to anticipate or prevent problems related to interfaces and interactions?
  • If automated checks are difficult to develop and maintain, does that say something about the skill of the tester, the quality of the automation interfaces, or the scope of checks? Or about something else?
  • Do unstable builds get in the way of deeper testing?
  • Could we interpret “unstable builds” as a sign that the product has problems so numerous and serious that even shallow testing reveals them?
  • When a “stable” build appears after a long series of unstable builds, how stable is it really?

Perhaps, with the answers to those questions, we could raise even more questions.

  • What risks do those problems present for the success of the product, whether in the short term or the longer term?
  • When testing consistently reveals patterns of failures and attendant risk, what does the product team do with that information?
  • Are the programmers mandated to deliver code? Or are the programmers mandated to deliver code with a warrant that the code does what it should (and doesn’t do what it shouldn’t), to the best of their knowledge? Do the programmers adamantly prefer the latter mandate?
  • Is someone pressuring the programmers to make schedule or scope commitments that they can’t really fulfill?
  • Are the programmers and the testers empowered to push back on scope or schedule pressure when it adds to product or project risk?
  • Do the business people listen to the development team’s concerns? Are they aware of the risks that testers and programmers bring to their attention? When the development team points out risks, do managers and business people deal with them congruently?
  • Is the team working at a sustainable pace? Or is the product and the project being overwhelmed by complexity, interdependencies, fragility, and problems that lurk just beyond the reach of our development and testing effort?
  • Is the development team really Agile, in the sense of the precepts of the Agile Manifesto? Or is “agility” being used in a cargo-cult way, using practices or artifacts to mask over an incoherent project?

Testers often feel that their role is to find, investigate, and report on bugs in a running software product. That’s usually true, but it’s also a pretty limited view of what testers could test. A product can be anything that someone has produced: a program, a requirements document, a diagram, a specification, a flowchart, a prototype, a development process mode, a development process, an idea. Testing can reveal information about all of those things, if we pay attention.

When seen one way, the problems that appear at the top of this article look like serious problems for testing. They may be, but they’re more than that too. When we remember Jerry’s definition of testing as “gathering information with the intention of informing a decision”, then everything that we notice or discover during testing is a test result.

(See also this discussion for an example of looking beyond the test result for possible product and project risks.)

This post was edited in small ways, for clarity, on 2017-03-11.

Can You Test a Clock in a Sealed Box?

Friday, September 2nd, 2011

A while ago, James Bach and I did a transpection session. The object of the conversation was to think critically about the common trope that every test consists of at least an input and an expected result. We wanted to go deeper than that, and in the process we discovered a number of useful ideas. A test can be informed by an expectation, but oracles can also be developed on the fly. Oracles can also be applied retrospectively, after the test has been “completed”, such that you never know when a test ends. James has a wonderful example of that here. We also came up with the notion of implicit and explicit inputs, and symbolic and non-symbolic inputs.

As the basis of our chat, James presented the thought experiment of testing a clock that you can’t get at. Just recently my friend Adam White pointed me to this little gem. Enjoy!