Blog Posts for the ‘Systems Thinking’ Category

Very Short Blog Posts (21): You Had It Last!

Tuesday, November 4th, 2014

Sometimes testers say to me “My development team (or the support people, or the managers) keeping saying that any bugs in the product are the testers’ fault. ‘It’s obvious that any bug in the product is the tester’s responsibility,’ they say, ‘since the tester had the product last.’ How do I answer them?”

Well, you could say that the product’s problems are the responsibility of the tester because the tester had the product last—and that a successful product was successful because the programmers and the business people did such a good job at preventing bugs. But that would be to explain any failures in the product in one way, and to explain any successes in the product in a completely different way.

The idea that testers are responsible for bugs because “the testers had it last” is something Jerry Weinberg once called the “Hot Potato Theory of Software Development”. This suggests a way around the problem—or at least to highlight how silly the theory is. If you’re a tester, send a message to the developers and the product managers noting that you’ve finished testing, and ask them to take on final look at the product before they decide to ship it. Then they had it last.

Instead, let’s be consistent. Testers don’t put the bugs in, and testers miss some of the bugs because bugs are, by their nature, hidden. Moreover, the bugs are hidden so well that not even the people who put them in could find them. The bugs are hidden by people, and by the consequences of how we choose to do software development.

So let’s all work to prevent the bugs, and to find them more quickly. Let’s talk about problems in development that allow bugs to hide. Let’s all work on testability, so that we can find bugs earlier, and more easily, before the bugs have a chance to hide deeply. And let’s all share responsibility for our failures and our successes.

This post, original published in 2014. got a little update in December 2021.

Testing is…

Tuesday, October 28th, 2014

Every now and again, someone makes some statement about testing that I find highly questionable or indefensible, whereupon I might ask them what testing means to them. All too often, they’re at a loss to reply because they haven’t really thought deeply about the matter; or because they haven’t internalized what they’ve thought about; or because they’re unwilling to commit to any statement about testing. And then they say something vague or non-committal like “it depends” or “different things to different people” or “that’s a matter of context”, without suggesting relevant dependencies, people, or context factors.

So, for those people, I offer a set of answers from which they can choose one; or they can adopt the entire list wholesale; or they use one or more items as a point of departure for something of their own invention. You don’t have to agree with any of these things; in that case, invent your own ideas about testing from whole cloth. But please: if you claim to be a tester, or if you are making some claim about testing, please prepare yourself and have some answer ready when someone asks you “what is testing?”. Please.

Here are some possible replies; I believe everything is Tweetable, or pretty close.

  • Testing is—among other things—reviewing the product and ideas and descriptions of it, looking for significant and relevant inconsistencies.
  • Testing is—among other things—experimenting with the product to find out how it may be having problems—which is not “breaking the product”, by the way.
  • Testing is—among other things—something that informs quality assurance, but is not in and of itself quality assurance.
  • Testing is—among other things—helping our clients to make empirically informed decisions about the product, project, or business.
  • Testing is—among other things—a process by which we systematically examine any aspect of the product with the goal of preventing surprises.
  • Testing is—among other things—a process of interacting with the product and its systems in many ways that challenge unwarranted optimism.
  • Testing is—among other things—observing and evaluating the product, to see where all those defect prevention ideas might have failed.
  • Testing is—among other things—a special part of the development process focused on discovering what could go badly (or what is going badly).
  • Testing is—among other things—exploring, discovering, investigating, learning, and reporting about the product to reveal new information.
  • Testing is—among other things—gathering information about the product, its users, and conditions of its use, to help defend value.
  • Testing is—among other things—raising questions to help teams to develop products that more quickly and easily reveal their own problems.
  • Testing is—among other things—helping programmers and the team to learn about unanticipated aspects of the product we’re developing.
  • Testing is—among other things—helping our clients to understand the product they’ve got so they can decide if it’s the product they want.
  • Testing is—among other things—using both tools and direct interaction with the product to question and evaluate its behaviours and states.
  • Testing is—among other things—exploring products deeply, imaginatively, and suspiciously, to help find problems that threaten value.
  • Testing is—among other things—performing actual and thought experiments on products and ideas to identify problems and risks.
  • Testing is—among other things—thinking critically and skeptically about products and ideas around them, with the goal of not being fooled.
  • Testing is—among other things—evaluating a product by learning about it through exploration, experimentation, observation and inference.

You’re welcome.

Harry Collins and The Motive for Distinctions

Monday, March 3rd, 2014

“Computers and their software are two things. As collections of interacting cogs they must be ‘checked’ to make sure there are no missing teeth and the wheels spin together nicely. Machines are also ‘social prostheses’, fitting into social life where a human once fitted. It is a characteristic of medical prostheses, like replacement hearts, that they do not do exactly the same job as the thing they replace; the surrounding body compensates.

“Contemporary computers cannot do just the same thing as humans because they do not fit into society as humans do, so the surrounding society must compensate for the way the computer fails to reproduce what it replaces. This means that a complex judgment is needed to test whether software fits well enough for the surrounding humans to happily ‘repair’ the differences between humans and machines. This is much more than a matter of deciding whether the cogs spin right.”

—Harry Collins

Harry Collins—sociologist of science, author, professor at Cardiff University, a researcher in the fields of the public understanding of science, the nature of expertise, and artificial intelligence—was slated to give a keynote speech at EuroSTAR 2013. Due to illness, he was unable to do so. The quote above is the abstract from the talk that Harry never gave. (The EuroSTAR community was very lucky and grateful to have his colleague, Rob Evans, step in at the last minute with his own terrific presentation.)

Since I was directed to Harry’s work in 2010 (thank you, Simon Schaffer), James Bach and I have been galvanized by it. As we’ve been trying to remind people for years, software testing is a complex, cognitive, social task that requires skill, tacit knowledge, and many kinds of expertise if we want people to do it well. Yet explaining testing is tricky, precisely because so much of what skilled testers do is tacit, and not explicit; learned by practice and by immersion in a culture, not from documents or other artifacts; not only mechanical and algorithmic, but heuristic and social.

Harry helps us by taking a scalpel to concepts and ideas that many people consider obvious or unimportant, and dissecting those ideas to reveal the subtle and crucial details under the surface.

As an example, in Tacit and Explicit Knowledge, he takes the idea of tacit knowledge—formerly, any kind of knowledge that was not told—and divides it into three kinds: relational, the kind of knowledge that resides in an individual human mind, and that in general could be told; somatic, resident in the system of a human body and a human mind; and collective, residing in society and in the ever-changing relationships between people in a culture.

How does that matter? Consider the Google car. On the surface, operating a car looks like a straightforward activity, easily made explicit in terms of the laws of physics and the rules of the road. Look deeper, and you’ll realize that driving is a social activity, and that interaction between drivers, cyclists, and other pedestrians is negotiated in real time, in different ways, all over the world.

So we’ve got Google cars on the road experimentally in California and Washington; how will they do in Beijing, in Bangalore, or in Rome? How will they interact with human drivers in each society? How will they know, as human drivers do, the extent to which it is socially acceptable to bend the rules—and socially unacceptable not to bend them?

In many respects, machinery can do far better than humans in the mechanical aspects of driving. Yet testing the Google car will require far more than unit checks or a Cucumber suite—it will require complex evaluation and judgement by human testers to see whether the machinery—with no awareness or understanding of social interactions, for the foreseeable future—can be accommodated by the surrounding culture.

That will require a shift from the way testing is done at Google according to some popular stories. If you want to find problems that matter to people before inflicting your product on them, you must test—not only the product in isolation, but in its relationships with other people.

In Rapid Software Testing, our goal all the way along has been to probe into the nature of testing and the way we talk about it, with the intention of empowering people to do it well. Part of this task involves taking relational tacit knowledge and making it explicit. Another part involves realizing that certain skills cannot be transferred by books or diagrams or video tutorials, but must be learned through experience and immersion in the task. Rather than hand-waving about “intuition” and “error guessing”, we’d prefer to talk about and study specific, observable, trainable, and manageable skills.

We could talk about “test automation” as though it were a single subject, but it’s more helpful to distinguish the many ways that we could use tools to support and amplify our testing—for checking specific facts or states, for generating data, for visualization, for modeling, for coverage analysis… Instead of talking about “automated testing” as though machines and people were capable of the same things, we’d rather distinguish between checking (something that machines can do, an activity embedded in testing) and testing (which requires humans), so as to make both our checking and our testing more powerful.

The abstract for Prof. Collins’ talk, quoted above, is an astute, concise description of why skilled testing matters. It’s also why the distinction between testing and checking matters, too. For that, we are grateful.

There will be much more to come in these pages relating Harry’s work to our craft of testing; stay tuned. Meanwhile, I give his books my highest recommendation.

Tacit and Explicit Knowledge
Rethinking Expertise (co-authored with Rob Evans)
The Shape of Actions: What Humans and Machines Can Do (co-authored with Martin Kusch)
The Golem: What You Should Know About Science (co-authored with Trevor Pinch)
The Golem at Large: What You Should Know About Technology (co-authored with Trevor Pinch)
Changing Order: Replication and Induction in Scientific Practice
Artificial Experts: Social Knowledge and Intelligent Machines


Friday, October 28th, 2011

Several years ago, I worked for a few weeks as a tester on a big retail project. The project was spectacularly mismanaged, already a year behind schedule by the time I arrived. Just before I left, the oft-revised target date slipped by another three months. Three months later, the project was deployed, then pulled out of production for another six months to be fixed. Project managers and a CIO, among many others, lost their jobs. The company pinned an eight-figure loss on the project.

The software infrastructure was supplied by a big database company, and the software to glue everything together was supplied by development organization in another country. That software was an embarrassment—bloated, incoherent, hard to use, and buggy. Fixes were rarely complete and often introduced new bugs. At one point during my short tenure, all effective worked stopped for five days because the development organization’s servers crashed and no backups were available. All this despite the fact that the software development company claimed CMMI Level 5.

This morning, I was greeted by a Tweet that said

“Deloittes show how a level 5 CMMi company has bad test process at #TMMi conf in Korea! So CMMi needs TMMi – good.”

The TMMi is the Testing Maturity Model Integration. Here’s what the TMMi Foundation says about it:

“The Test Maturity Model Integration has been developed to complement the existing CMMI framework. It provides a structured presentation of maturity levels, allowing for standard TMMi assessments and certification, enabling a consistent deployment of the standards and the collection of industry metrics.”

Here’s what the SEI—the CMMi’s co-ordinator and sponsor—says about it:

“CMMI (Capability Maturity Model Integration) is a process improvement approach that provides organizations with the essential elements of effective processes, which will improve their performance. CMMI-based process improvement includes identifying your organization’s process strengths and weaknesses and making process changes to turn weaknesses into strengths.”

What conclusions could we draw from these three statements?

If a company has achieved CMMI Level 5, yet has a bad test process, then there’s a logical problem here. Either testing isn’t an essential element of effective processes (in which case the TMMI should be unnecessary) or it is (in which case the SEI’s claim of providing the essential processes is unsupportable).

One clear solution to the problem would be to adjudicate all this by way of a Maturity Model Maturity Model (Integrated), the MMMMI, whereby your organization can determine (in a mature fashion, of course) what essential processes are in the first place. Mind you, that could be flawed too. You’d need a set of essential processes to determine how to determine essential processes, so you’ll also need a Maturity Model Maturity Model Maturity Model (Integrated), an MMMMMMI. And in fairly short order, your organization will disappear up its own ass.

Jerry Weinberg points in a different direction, using very strong language. This is from Quality Software Management, Volume 1: Systems Thinking, p. 21:

…cultural patterns are not more or less mature, they are just more or less fitting. Of course, some people have an emotional need for perfection, and they will impose this emotional need on everything they do. Their comparisons have nothing to do with the organization’s problems, but with their own.

“The quest for unjustified perfection is not mature, but infantile.

“Hitler was quite clear on who was the ‘master race’. His definition of Aryan race was supposed to represent the mature end product of all human history, and that allowed Hitler and the Nazis to justify atrocities on “less mature” cultures such as Gypsies, Catholics, Jews, Poles, Czechs, and anyone else who got in their way. Many would-be reformers of software engineering require their ‘targets’ to confess to their previous inferiority. These little Hitlers have not been very successful.

“Very few healthy people will make such a confession voluntarily, and even concentration camps didn’t cause many people to change their minds. This is not ‘just a matter of words’. Words are essential to any change project because they give us models of the world as it was and as we hope it to be. So if your goal is changing an organization, start by dropping the comparisons such as those implied in the loaded term ‘maturity.'”

It’s time for us, the worldwide testing community, to urge Deloitte, the SEI, the TMMI, and the unfortunate testers in Korea who are presently being exposed to the nonsense to recognize what many of us have known for years: maturity models have it backwards.

Testing: Difficult or Time-Consuming?

Thursday, September 29th, 2011

In my recent blog post, Testing Problems Are Test Results, I noted a question that we might ask about people’s perceptions of testing itself:

Does someone perceive testing to be difficult or time-consuming? Who? What’s the basis for that perception? What assumptions underlie it?

The answer to that question may provide important clues to the way people think about testing, which in turn influences the cost and value of testing.

As an example, an pseudonymous person (“PM Hut”) who is evidently associated with project management in some sense (s/he provides the URL answered my questions above.

Just to answer your question “Does someone perceive testing to be difficult or time-consuming?” Yes, everyone, I can’t think of a single team member I have managed who doesn’t think that testing is time consuming, and they’d rather do something else.

This, alas, isn’t an unusual response. To someone like me who offers help in increasing the value and reducing the cost of testing, it triggers some questions that might prompt reframes or further questions.

  • What do the team members think testing is? Do they think that it’s something ancillary to the project, rather than an essential and integrated aspect of software development? To me, testing is about gathering information and raising awareness that’s essential for identifying product risks and steering the project. That’s incredibly important and valuable.

    So when the team members are driving a car, do they perceive looking out the windshield to be difficult or time-consuming? Do they perceive looking at the dashboard to be difficult or time-consuming? If so, why? What are the differences between the way they obtain awareness when they’re driving a car, versus the way they obtain awareness when they’re contributing to the development of a product or service?

  • Do the team members think testing is the mindless repetition of actions and observation of specific outputs, as prescribed by someone else? If so, I’d agree with them that testing is an unpalatable activity—except I don’t call that testing. I call it checking, and I’d rather let a machine do it. I’d also ask if checking is being done automatically by the programmers at lower levels where it tends to be fast, cheap, easy, useful and timely—or manually at higher levels, where it tends to be slower, more expensive, more difficult, less useful, and less timely—and tedious?
  • Is testing focused mostly on confirmation of things that we already know or hope to be true? Is it mostly focused on the functional aspects of the program (which are amenable to checking)? People tend to find this dull and tedious, and rightly so. Or is testing an active search for new information, problems, and risks? Does it include focus on parafunctional aspects of the product—the things that provide important perceptions of real value to real people? Are the testers given the freedom and responsibility to manage a good deal of their own investigation? Testers tend to find this kind of approach a lot more engaging and a lot more interesting, and the results are typically more wide-ranging, informative, and valuable to programmers and managers.
  • Is testing overburdened by meaningless and valueless paperwork, bureaucracy, and administrivia? How did that come to pass? Are team members aware that there are simple, lightweight, rapid, and highly effective ways of planning, recording, and reporting testing work and project status?
  • Are there political issues? Are testers (or people acting temporarily in a testing role) routinely blown off (as in this example)? Are the nuggets of information revealed by testing habitually dismissed? Is that because testing is revealing trivial information? If so, is there a problem with specific testing skills like modeling the test space, determining coverage, determining oracles, recording, or reporting?
  • Have people been trained on the basis of testing as a skilled, sophisticated thinking art? Or is testing something for which capability can be assessed by a trivial, 40-question multiple choice exam?
  • If testing is being done well (which given people’s attitudes expressed above would be a surprise), are programmers or managers afraid of having to deal with the information that testing reveals? Does that lead to recrimination and conflict?
  • If there’s a perception that testing is by its nature dull and slow, are the testers aware of the quick testing approaches in our Rapid Software Testing class (PDF, page 97-99) , in the Black Box Software Testing course offered by the Association for Software Testing, or in James Whittaker’s How to Break Software? Has anyone read and absorbed Lessons Learned in Software Testing?
  • If there’s a perception that technical reviews are slow, have the testers, programmers, or managers read Perfect Software and Other Illusions About Testing? Do they recognize the ways in which careful observation provides us with “instant reviews” (see Perfect Software, page 143)? Has anyone on the team read any other of Jerry Weinberg’s books on software management and measurement?
  • Have the testers, programmers, and managers recognized the extent to which exploratory testing is going on all the time? Do they recognize that issues revealed by testing might be even more important than bugs? Do they understand that every test result and every testing problem points to meta-information that can be extremely valuable in managing the project?

On PM Hut’s own Web site, there’s an article entitled “Why Project Managers Fail“. The author, Jim Benson, lists five common problems, each of which could be quickly revealed by looking at testing as a source of information, rather than by simply going through the motions. Take it from the former program manager of a product that, in its day, was the best-selling piece of commercial software in the world: testers, testing, and the information they reveal are a project manager’s best friends and most valuable assets—when you have the awareness to recognize them.

Testing need not be difficult, tedious or time-consuming. A perception that it is so, or that it must be so, suggests a problem with testing as practised or testing as perceived. Astute managers and teams will investigate that important and largely mistaken perception.

Testing Problems Are Test Results

Tuesday, September 6th, 2011

I often do an exercise in the Rapid Software Testing class in which I ask people to catalog things that, for them, make testing harder or slower. Their lists fit a pattern I hear over and over from testers (you can see an example of the pattern in this recent question on Stack Exchange). Typical points include:

  • I’m a tester working alone with several programmers (or one of a handful of testers working with many programmers).
  • I’m under enormous time pressure. Builds are coming in continuously, and we’re organized on one- or two-week development cycles.
  • The product(s) I’m testing is (are) very complex.
  • There are many interdependencies between modules within the product, or between products.
  • I’m seeing a consistent pattern of failures specifically related to those interdependencies; the tiniest change here can have devastating impact there—or anywhere.
  • I believe that I have to run a complete regression test on every build to try to detect those failures.
  • I’m trying to cope by using automated checks, but the complexity makes the automation difficult, the program’s testing hooks are minimal at best, and frequent product changes make the whole relationship brittle.
  • The maintenance effort for the test automation is significant, at a cost to other testing I’d like to do.
  • I’m feeling overwhelmed by all this, but I’m trying to cope.

On top of that,

  • The organization in which I’m working calls itself Agile.
  • Other than the two-week iterations, we’re actually using at most two other practices associated with Agile development, (typically) daily scrums or Kanban boards.

Oh, and for extra points,

  • The builds that I’m getting are very unstable. The system falls over under the most basic of smoke tests. I have to do a lot of waiting or reconfiguring or both before I can even get started on the other stuff.

How might we consider these observations?

We could choose to interpret them as problems for testing, but we could think of them differently: as test results.

Test results don’t tell us whether something is good or bad, but they may inform a decision, or an evaluation, or more questions. People observe test results and decide whether there are problems, what the problems are, what further questions are warranted, and what decisions should be made. Doing that requires human judgement and wisdom, consideration of lots of factors, and a number of possible interpretations.

Just as for automated checks and other test results, it’s important to consider a variety of explanations and interpretations for testing meta-results—observations about testing. If we don’t do that, we risk missing important problems that threaten the quality of testing effort, and the quality of the product, too.

As Jerry Weinberg points out in Perfect Software and Other Illusions About Testing, whatever else something might be, it’s information. If testing is, as Jerry says, gathering information with the intention of informing a decision, it seems a mistake to leave potentially valuable observations lying around on the floor.

We often run into problems when we test. But instead of thinking of them as problems for testing, we could also choose to think of them as symptoms of product or project problems—problems that testing can help to solve.

For example, when a tester feels outnumbered by programmers, or when a tester feels under time pressure, that’s a test result. The feeling often comes from the programmers generating more work and more complexity than the tester can handle without help.

Complexity, like quality, is a relationship between some person and something else. Complexity on its own isn’t necessarily a problem, but the way people react to it might be. When we observe the ways in which people react to perceived complexity and risk, we might learn a lot.

  • Do we, as testers, help people to become conscious of the risks—especially the Black Swans—that typically accompany complexity?
  • If people are conscious of risk, are they paying attention to it? Are they panicking over it? Or are they ignoring it and whistling past the graveyard? Or…
  • Are people reacting calmly and pragmatically? Are they acknowledging and dealing with the complexity of the product?
  • If they can’t make the product or the process that it models less complex, are they at least taking steps to make that product or process easier to understand?
  • Might the programmers be generating or modifying code so quickly that they’re not taking the time to understand what’s really going on with it?
  • If someone feels that more testers are needed, what’s behind that feeling? (I took a stab at an answer to that question a few years back.)

How might we figure that out answers to those questions? One way might be to look at more of the test results and test meta-results.

  • Does someone perceive testing to be difficult or time-consuming? Who?
  • What’s the basis for that perception? What assumptions underlie it?
  • Does the need to investigate and report bugs overwhelm the testers’ capacity to obtain good test coverage? (I wrote about that problem here.)
  • Does testing consistently reveal consistent patterns of failure?
  • Are programmers consistently surprised by such failures and patterns?
  • Do small changes in the code cause problems that are disproportionately large or hard to find?
  • Do the programmers understand the product’s interdependencies clearly? Are those interdependencies necessary, or could they be eliminated?
  • Are programmers taking steps to anticipate or prevent problems related to interfaces and interactions?
  • If automated checks are difficult to develop and maintain, does that say something about the skill of the tester, the quality of the automation interfaces, or the scope of checks? Or about something else?
  • Do unstable builds get in the way of deeper testing?
  • Could we interpret “unstable builds” as a sign that the product has problems so numerous and serious that even shallow testing reveals them?
  • When a “stable” build appears after a long series of unstable builds, how stable is it really?

Perhaps, with the answers to those questions, we could raise even more questions.

  • What risks do those problems present for the success of the product, whether in the short term or the longer term?
  • When testing consistently reveals patterns of failures and attendant risk, what does the product team do with that information?
  • Are the programmers mandated to deliver code? Or are the programmers mandated to deliver code with a warrant that the code does what it should (and doesn’t do what it shouldn’t), to the best of their knowledge? Do the programmers adamantly prefer the latter mandate?
  • Is someone pressuring the programmers to make schedule or scope commitments that they can’t really fulfill?
  • Are the programmers and the testers empowered to push back on scope or schedule pressure when it adds to product or project risk?
  • Do the business people listen to the development team’s concerns? Are they aware of the risks that testers and programmers bring to their attention? When the development team points out risks, do managers and business people deal with them congruently?
  • Is the team working at a sustainable pace? Or is the product and the project being overwhelmed by complexity, interdependencies, fragility, and problems that lurk just beyond the reach of our development and testing effort?
  • Is the development team really Agile, in the sense of the precepts of the Agile Manifesto? Or is “agility” being used in a cargo-cult way, using practices or artifacts to mask over an incoherent project?

Testers often feel that their role is to find, investigate, and report on bugs in a running software product. That’s usually true, but it’s also a pretty limited view of what testers could test. A product can be anything that someone has produced: a program, a requirements document, a diagram, a specification, a flowchart, a prototype, a development process mode, a development process, an idea. Testing can reveal information about all of those things, if we pay attention.

When seen one way, the problems that appear at the top of this article look like serious problems for testing. They may be, but they’re more than that too. When we remember Jerry’s definition of testing as “gathering information with the intention of informing a decision”, then everything that we notice or discover during testing is a test result.

Here’s a follow-up to this post. (See also this discussion for an example of looking beyond the test result for possible product and project risks.)

This post was edited in small ways, for clarity, on 2017-03-11.

Can You Test a Clock in a Sealed Box?

Friday, September 2nd, 2011

A while ago, James Bach and I did a transpection session. The object of the conversation was to think critically about the common trope that every test consists of at least an input and an expected result. We wanted to go deeper than that, and in the process we discovered a number of useful ideas. A test can be informed by an expectation, but oracles can also be developed on the fly. Oracles can also be applied retrospectively, after the test has been “completed”, such that you never know when a test ends. James has a wonderful example of that here. We also came up with the notion of implicit and explicit inputs, and symbolic and non-symbolic inputs.

As the basis of our chat, James presented the thought experiment of testing a clock that you can’t get at. Just recently my friend Adam White pointed me to this little gem. Enjoy!

A Few Observations on Structure in Testing

Tuesday, April 12th, 2011

On Twitter, Johan Jonasson reported today that he was about to attend a presentation called “Structured Testing vs Exploratory Testing”. This led to a few observations and comments that I’d like to collect here.

Over the years, it’s been common for people in our community to mention exploratory testing, only to have someone reply, “Oh, so that’s like unstructured testing, right?” That’s a little like someone refer to a cadenza or a musical solo as “unstructured music”; to improv theatre as “unstructured theatre”; to hiking without a map or a precise schedule as “unstructured walking”; to all forms of education outside of a school as “unstructured learning”. When someone says “Exploratory testing is unstructured,” I immediately hear, “I have a very limited view of what ‘structure’ means.”

Typically, by “structured testing”, such people appear to mean “scripted testing”. To me, a script is a set of detailed, ordered set of steps of specific actions and specific observations. Some people call that a procedure, but I think a procedure is characterized by a set of activities that might be but are not necessarily highly specific, and that might be but are not necessarily strictly ordered. So what is structure?

Structure, to me, is any aspect of something (like an object or system) that is required to maintain the thing as it is. You can change some element of a system, add something to it or remove something from it. If the system explodes, or collapses, or changes such that we would consider it a different system, the element or the change was structural. If the system retains its identity, the element or the change wasn’t structural. Someone once described structure to me as “that which remains”.

A system, in James Bach’s useful definition, is a set of things in meaningful relationship. As such, structure is the specific set of things and factors and relationships that make the system a system. Note that the observer is central here: you may see a system where I see only a mess. That is, you see structure, but I don’t. When a system changes, you may call it a different system, where I might see the original system, but modified. In this case, I’m seeing structure that you don’t see or acknowledge as such. We see these things differently because systems, structures, elements, changes, meaningfulness, and relationships are all subject to the Relative Rule: “For any abstract X, X is X to some person.”

One of the principles of general systems thinking is that the extent to which you are able to observe and control a system is limited by the Law of Requisite Variety, coined by W. Ross Ashby: “only variety can destroy variety.” Jurgen Appelo recently produced a marvelous blog post on the subject, which you can read here.

Variety, says Ashby, is the number of distinct states of a system. In order to observe variety, “The observer and his powers of discrimination may have to be specified if the variety is to be well defined.” The observer is limited by the number of states that he (or she or it) can see. This also implies that however many states there are to a system, another system that controls it has to be able to observe that many states and have at least one state in which it can exert control. Wow, that’s pretty heady. What does it mean? The practical upshot is that if you intend to control something, you have to be able to observe it and to change something about it, and that means that you need to have more states available to you than the object has. Or, as Karl Weick put it in Sensemaking in Organizations, “if you want to understand something complicated, you have to complicate yourself.”

This pattern repeats in lots of places. Glenford Myers suggested that testing a program requires more creativity than writing one; more things can go wrong than go right in writing a program. Machines can do a lot of the work of keeping an aircraft aloft, but managing those machines and the work that they do requires more intelligent and more adaptable systems—people. The Law of Requisite Variety also suggests that we can never understand or control ourselves completely, people are insufficient to understand people completely.

So: some people might see exploratory testing as unstructured. I suspect they’re thinking of a particular kind of structure: that scripted procedure that I mentioned above. For me, that suggests an impoverished view of both exploratory testing and of structure. Exploratory testing has tons of structures.

Over the years, many colleagues have been collecting and cataloguing the structures of exploratory testing. James and Jon Bach have led the way, but there have been lots of other contributions from people like Cem Kaner, Mike Kelly, Elisabeth Hendrickson, Michael Hunter, Karen Johnston, Jonathan Kohl, and me (if there’s someone else who should appear in this list, remind me). I’ve collected some of these in this list of structures of exploratory testing. For those who are just joining us, and for the old hands who might need a refresher, you can find my list of exploratory testing structures here.

Programs had structures long before “structured programming” came along. Programming that isn’t structured programing is not “unstructured programming.” The structure in “structured programming” is a particular kind of structure. To many people, there is only one kind of testing structure: scripted procedures. I will guarantee that an incapacity to see alternative kinds of structure will limit your testing. And to paraphrase Weick, if you want to understand testing, you must test yourself.

More of What Testers Find, Part II

Friday, April 1st, 2011

As a followup to “More of What Testers Find“, here are some more ideas inspired by James Bach’s blog post, What Testers Find. Today we’ll talk about risk. James noted that…

Testers also find risks. We notice situations that seem likely to produce bugs. We notice behaviors of the product that look likely to go wrong in important ways, even if we haven’t yet seen that happen. Example: A web form is using a deprecated HTML tag, which works fine in current browsers, but may stop working in future browsers. This suggests that we ought to do a validation scan. Maybe there are more things like that on the site.

A long time ago, James developed The Four-Part Risk Story, which we teach in the Rapid Software Testing class that we co-author. The Four-Part Risk Story is a general pattern for describing and considering risk. It goes like this:

  1. Some victim
  2. will suffer loss or harm
  3. due to a vulnerability in the product
  4. triggered by some threat.

A legitimate risk requires all four elements. A problem is only a problem with respect to some person, so if a person isn’t affected, there’s no problem. Even if there’s a flaw in a product, there’s no problem unless some person becomes a victim, suffering loss or harm. If there’s no trigger to make a particular vulnerability manifest, there’s no problem. If there’s no flaw to be triggered, a trigger is irrelevant. Testers find risk stories, and the victims, harm, vulnerabilities, and threats around which they are built.

In this analysis, though, a meta-risk lurks: failure of imagination, something at which humans appear to be expert. People often have a hard time imagining potentional threats, and discount the possibility or severity of threats they have imagined. People fail to notice vulnerabilities in a product, or having noticed them, fail to recognize their potential to become problems for other people. People often have trouble making the connection between inanimate objects (like nuclear reactor vessels), the commons (like the atmosphere or sea water), or intangible things (like trust) on the one hand, and people who are affected by damage to those things on the other. Excellent testers recognize that a ten-cent problem multiplied by a hundred thousand instances is a ten-thousand dollar problem (see Chapter 10 of Jerry Weinberg’s Quality Software Management, Volume 2: First Order Measurement). Testers find connections and extrapolations for risks.

In order to do all that, we have to construct and narrate and edit and justify coherent risk stories. To to that well, we must (as Jerry Weinberg put it in Computer Programming Fundamentals in 1961) develop a suspicious nature and a lively imagination. We must ask the basic questions about our products and how they will be used: who? what? when? where? why? how? and how much? We must anticipate and forestall future Five Whys by asking Five What Ifs. Testers find questions to ask about risks.

When James introduced me to his risk model, I realized that there people held at least three different but intersecting notions of risk.

  1. A Bad Thing might happen. A programmer might make a coding error. A programming team might design a data structure poorly. A business analyst might mischaracterize some required feature. A tester might fail to investigate some part of the product. These are, essentially, technical risks.
  2. A Bad Thing might have consequences. The coding error could result in miscalculation that misrepresents the amount of money that a business should collect. The poorly designed data structure might lead to someone without authorization getting access to privileged information. The mischaracterized feature might lead to weeks of wasted work until the misunderstanding is detected. The failure to investigate might lead to an important problem being released into production. These are, in essence, business risks that follow from technical risks.
  3. A risk might not be a Bad Thing, but an Uncertain Thing on which the business is willing to take a chance. Businesses are always evaluating and acting on this kind of risk. Businesses never know for sure whether the Good Things about the product are sufficiently compelling for the business to produce it or for people to buy it. Correspondingly, the business might consider Bad Things (or the absence of Good Things) and dismiss them as Not Bad Enough to prevent shipment of the product.

So: Testers find not only risks, but links between technical risk and business risk. Establishing and articulating those links are depend on the related skills of test framing and bug advocacy. Test framing is the set of logical connections that structure and inform a test. Bug advocacy is the skill of determining the meaning and significance of a bug, and reporting the bug in terms of potential risks and consequences that other people might have overlooked. Bug advocacy doesn’t mean jumping up and down and screaming until every bug—or even one particular bug—is fixed. It means providing context for your bug report, helping managers to understand and decide why they might to choose to fix a problem, right now, later, or never.

In my travels around the world and around the Web, I observe that some people in our craft have some fuzzy notions about risk. There are at least three serious problems that I see with that.

Tests are focused on (documented) requirements. That is, test strategies are centred around making sure that requirements are checked, or (in Agile contexts) that acceptance tests derived from user stories pass. The result is that tests are focused on showing that a product can meet some requirement, typically in a controlled circumstance in which certain stated conditions assumed necessary have been met. That’s not a bad thing on its own. Risk, however, lives in places where where necessary conditions haven’t been stated, where stated conditions haven’t been met, or where assumptions have been buried, unfulfilled, or inaccurate. Testing is not only about demonstrating that some instance of a requirement has been satisfied. It’s also about identifying things that threaten the successful fulfillment of that requirement. Testers find alternative ideas about risk.

Tests don’t get framed in terms of important risks. Many organizations and many testers focus on functional correctness. That can often lead to noisy testing—lots of problems reported, where those problems might not be the most important problems. Testers find ways to help prioritize risks.

Important risks aren’t addressed by tests. A focus on stated requirements and functional correctness can leave parafunctional aspects of the product in (at best) peripheral vision. To address that problem, instead of starting with the requirements, start with an idea of a Bad Thing happening. Think of a quality criterion (see this post) and test for its presence or its absences, or for problems that might threaten it. Want to go farther? My colleague Fiona Charles likes to mention “story on the front page of the Wall Street Journal” or “question raised in Parliament” as triggers for risk stories. Testers find ways of developing risk ideas.

James’ post will doubtless trigger more ideas about what testers find. Stay tuned!

P.S. I’ll be at the London Testing Gathering, Wednesday, April 6, 2011 starting at around 6:00pm. It’s at The Shooting Star pub (near Liverpool St. Station), 129 Middlesex St., London, UK. All welcome!

More of What Testers Find

Wednesday, March 30th, 2011

Damn that James Bach, for publishing his ideas before I had a chance to publish his ideas! Now I’ll have to do even more work!

A couple of weeks back, James introduced a few ideas to me about things that testers find in addition to bugs.  He enumerated issues, artifacts, and curios.  The other day I was delighted to find an elaboration of these ideas (to which he added risks and testability issues) in his blog post called What Testers Find.  Delighted, because it notes so many important things that testers learn and report beyond bugs.  Delighted, because it gives me an opportunity and an incentive to dive into James’ ideas more deeply. Delighted, because it gives us all a chance to explore and identify a much richer view of testing than the simplistic notion that “testers find bugs”.

Despite the fact that testers find much more than bugs, let’s start with bugs.  James begins his list of what testers find by saying

Testers find bugs. In other words, we look for anything that threatens the value of the product.

How do we know that something threatens the value of the product?  The fact is, we don’t know for sure.  Quality is value to some person, and different people will have different perceptions of value.  Since we don’t own the product, the project, or the business, we can’t make absolute declarations of whether something is a bug or whether it’s worth fixing.  The programmers, the managers, and the project owner will make those determinations, and often they’re running in different directions.  Some will see a problem as a bug; some won’t.  Some won’t even see a problem. It seems like the only certain thing here is uncertainty.  So what can we testers do?

We find problems that might threaten the value of the product to some person who matters. How do we do that? We identify quality criteria–aspects of the product that provide some kind of value to customers or users that we like, or that help to defend the product from users that we don’t like, such as unethical hackers or fraudsters or thieves.  If we’re doing a great job, we also to account for the fact that users we do like will make mistakes from time to time.  So defending value also means making the product robust to human ineptitude and imperfection.  In the Heuristic Test Strategy Model (which we teach as part of the Rapid Software Testing course), we identify these quality criteria:

  • Capability (or functionality)
  • Reliability
  • Usability
  • Security
  • Scalability
  • Performance
  • Installability
  • Compatibility
  • Supportability
  • Testability
  • Maintainability
  • Portability
  • Localizability

In order to identify threats to the quality of the product, we use oracles.  Oracles are heuristic (useful, fast, inexpensive, and fallible) principles or mechanisms by which we recognize problems.  Most oracles are based on the notion of consistency.  We expect a product to be consistent with

  • History (the product’s own history, prior results from earlier test runs, our experience with the product or other products like it…)
  • Image (a reputation our development organization wants to project, our brand identity,…)
  • Comparable products (products like this one that we develop, competitors’ products, test programs or algorithms,…)
  • Claims (things that important people say about the product, requirements, specifications, user documentation, marketing material,…)
  • User expections (what reasonable people might anticipate the product could or should do, new features, fixed bugs,…)
  • Product (behaviour of the interface and UI elements, values that should be the same in different views,…)
  • Purpose (explicitly stated uses of the product, uses that might be implicit or inferred from the product’s design, no excessive bells and whistles,…)
  • Standards (relevant published guidelines, conventions for use or appearance for products of this class or in this domain, behaviour appropriate to the local market,…)
  • Statutes (relevant laws, relevant regulations,…)

In addition to these consistency heuristics, there’s an inconsistency heuristic too:  we’d like the product to be inconsistent with patterns of problems that we’ve seen before.  Typically those problems are founded in one of the consistency heuristics listed above. Yet it’s perfectly reasonable to observe a problem and recognize it first by its familiarity. We’ve seen lots of testers do that over the years.

We encourage people do come up with their own lists, or modifications to ours. You don’t have to use Heuristic Test Strategy Model if it doesn’t work for you.  You can create your own models for testing, and we actively encourage people who want to become great testers to do that.  Testers find models, ways of looking at the product, the project, and testing itself, in the effort to wrestle down the complexity of the systems we’re testing and the approaches that we need to test them.

In your context, do you see a useful distinction between compatibility (playing nice with other programs that happen to co-exist on the system) and  interoperability (working well with programs with which your application specifically interacts)?  Put interoperability on your quality criteria list.  Is accessibility for disabled users so important for your product that you want to highlight it in a separate quality criterion?  Put it on your list. Recently, James noticed that explicablility is a consistency heuristic that can act as an oracle too:  when we see behaviour we can’t explain or make sense of, we have reason to suspect that there might be a problem.  Testers find factors, relevant and material aspects of our models, products, projects, businesses, and test strategies.

When testers see some inconsistency in the product that threatens one or more of the quality criteria, we report.  For the report to be relevant and meaningful, it must link quality criteria, oracles, and risk in ways that are clear, meaningful, and important to our clients. Rather than simply noticing an inconsistency, we must show why the inconsistency threatens some quality criterion for some person who matters.  Establishing and describing those links in a chain of logic from the test mission to the test result is an activity that James and I call test framing.  So:  Testers find frames, the logical relationships between the test mission, our observations of the product, potential problems, and why we think they might be problems. James gave an example of a bug (“a list of countries in a form is missing ‘France'”). That might mean a minor usabilty problem based on one quality criterion, with a simple workaround (the customer trying to choose a time zone from a list of countries presented as examples; so pick Spain, which is in the same time zone). Based on another criterion like localizability, we’d perceive a more devastating problem (the customer is trying to choose a language, so despite the fact that the Web site has been translated, it won’t be presented in French, cutting our service off from a nation of 65 million people).

In finding bugs, testers find many other things too.  Excellent testing depends on our being able to identify and articulate what we find, how we find it, and how we contextualize it. That’s an ongoing process.  Testers find testing itself.

And there’s more, if you follow the link.