Blog Posts from September, 2015

Oracles from the Inside Out, Part 6: Oracles as Extensions of Testers

Monday, September 21st, 2015

The previous post in this series was something of a diversion from the main thread, but we’ll get back on track in this long-ish post. To review: Marshall McLuhan famously said that “the medium is the message”. He used this snappy slogan to point out that media, tools, and technologies are not only about what they contain; they’re also about themselves, changing the scale, pace, or pattern of human affairs and activities. As he somewhat less famously said, “We shape our tools; thereafter, our tools shape us.”

In the Rapid Software Testing namespace, we recognize the traditional interpretation of an oracle in software testing as “an external mechanism which can be used to check test output for correctness”. W.E. Howden, who introduced the term, said that oracles “can consist of tables, hand calculated values, simulated results, or informal design and requirements descriptions”. But after recognizing that interpretation, we offer a more general and, to us, a more useful notion of “oracle”: a means by which we recognize a problem when we encounter one during testing.

We also make a distinction between testing and checking. Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc. Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product. A check produces a bit—true or false, green or red, pass or fail. A check is a kind of formal testing; testing that is done is a specific way, or with the goal of determining specific facts. Checking is a tactic of testing, but it’s certainly not all there is to testing.

In that light, Howden’s examples of oracles—references—form bases for checks. Add McLuhan’s insights, and we can recognize that checks are media that extend and accelerate our capacity to observe inconsistency with a specific claim (“the product, given these inputs, should produce output consistent with the values in this table”), or with a comparable product (“the product, given this inputs, should produce output consistent with the values produced by this simulator”). So: automated checks are media that extend accelerate our capacity to perform checks; to observe and evaluate the product in accordance with explicit, algorithmic decision rules.

Why is all this a big deal?

One: because media don’t recognize problems; media extend, enhance, enable, accelerate, intensify, or amplify our capability to recognize problems.

My friend Pradeep Soundararajan attracted my attention and respect several years ago by this astute observation: “It is not a test that finds a bug; it is a human that finds a bug and a test plays a role in helping the human find it.” To paraphrase Pradeep, the tool doesn’t recognize the problem; the tool plays a role in recongnizing the problem. The microscope does not see; the microscope helps us to see. The burglar alarm doesn’t detect a burglary; the burglar alarm extends our senses over distance to recognize movement, whereby we can infer that a burglary might be happening. The Wikipedia entry Exploration of Mars errs by saying “The exploration of Mars is the study of Mars by spacecraft.” In fact, the exploration of Mars is the study of Mars by humans, enabled by spacecraft. Tools amplify whatever we are. Tools can extend people’s competence and capabilities to help them focus on what matters, allowing tehm to become aware of important new things. Tools can just as easily extend people’s incompetence and incapabilities to overfocus their attention on the known or the trivial, allowing to be oblivious to important things.

Two: because tools can do so much more for testers than automated checking.

Testing is not simply checking to determine whether we’re getting the right answers. Testing is also about making sure we’re asking important questions—and discovering important questions, and discovering things about how we might develop and apply checks.

For instance: my colleague James Bach was working on testing a medical device a few years back. I’ve done some work with this company too, so in order to respect non-disclosure agreements, let’s just say that the medical device is a Zapper Box, intended to deliver Healing Energy to a patient’s body over an Appropriate Period of Time. The Zapper Box is controlled by a Control Box and its software. Too much of a Good Thing can be a Very Bad Thing so, crucially, the Control Box is also intended to stop the Zapper Box from delivering that energy when the Appropriate Period of Time becomes an Inappropriate Period of Time, whereupon the Healing Energy becomes Killing Energy.

The Control Box has a display that shows the operator the amount of Healing Energy that is supposedly being delivered. James and one of the other testers set up a test rig, a meter to monitor the amount of Healing Energy that was actually being delivered. In one of the tests, they monitored and logged the amount of Healing Energy delivered after the device’s operator turned off the Healing Energy Tap via the Control Box. The log recorded measurements at intervals of one-tenth of a second. The testers did several hundred instances of this test. Then James took the logs, and used Microsoft Excel’s conditional formatting feature to highlight various levels of Healing Energy. Red indicated that the device was delivering Energy that could be described here as Very Hot; yellow represented Somewhat Hot; grey represented Cooling Down; green represented Cool. Then, James took over a hundred lines of data, and shrank the lines down until the numbers were too small to see, such that only two digits are readable: the numbers 1 and 2 labelling the vertical lines, which represent the one- and two-second marks after the Healing Energy Tap had been turned off. In Rapid Software Testing we call this a Blink Test, or using a Blink Oracle. Here’s what the testers saw.

What do you observe? Here’s what I observe:

  • Over about 150 test runs, the level of Energy appears to remain Very Hot for .3 seconds to over a second after the Healing Energy Tap was turned off.
  • Towards the end of the observed period, the device seems to remain in a Very Hot state for longer, and more often. (“Longer” here means tending closer one second than to .3 seconds.)
  • The variance in the Energy during the cooling period seems to be greater than at the beginning.
  • Over time, the level of Energy appears to remain in a Cooling Down state for longer and longer.
  • In at least one instance, the Zapper goes from a Cooling Down state back up to a Somewhat Hot state before returning to a Cooling Down state.
  • Starting from about the middle of the observed period, the Zapper appears many times not to reach the Cool state at all.

As I’m observing these results, two questions are looping in my mind. The first is “Is there a problem here?” The second is “Is my client okay with all this?”

When I see the inconsistency of the results, I experience a vague feeling of unease and confusion, followed by feelings of curiosity. The curiosity is about the product’s behaviour, but it’s also about why I feel uneasy and confused. Notice that my feelings and mental models are internal, tacit, and private, at the moment I have them. If I want to make sense of the relationships between my observations and my feelings, I have to do some detective work; feelings don’t come with return addresses on them.

I pause for a moment, and quickly consider: the cooldown period is not the same every time. I make an inference that that could be a problem, since we typically desire a product’s behaviour to be consistent with itself. Is there a problem here? I wonder if the product has shown a consistent, stable cooldown pattern in the past. If it has, it’s not doing that now. I make an inference that product might be inconsistent with its history. Is there a problem here? There’s a related inference: all other things being equal, we desire a product to be consistent with its history, so an inconsistency with its history would point to a possible problem. Another inference: there may be a document that makes a claim about the behaviour required to fulfill the desire of some important person. Inconsistency with that document (a reference; a medium that extends communication and expression of desire) could represent a problem, since (by inference) inconsistency with the client’s desire would probably be a problem too. I don’t have a copy of such a document. Is my client okay with that? Where is the document? Who wrote it? Whose desires are being represented? Can I confer with those people; that is, can I have conference with them? If I can’t, the quality and relevance of my analysis will be compromised. Is my client okay with that? Since this is a medical device, is there a standards document (a reference) to which it should conform? I don’t know; in order to sort that out, I might have engage in conference with a domain expert who is aware of such things. The expert is also a medium, extending my capacity the domain, and the desires of people in that domain. I might have to examine a reference and engage in conference to find a relevant reference, or to understand it.

If I pay attention, I notice that Excel (and its conditional formatting feature) are media too, extending my capacity to observe patterns in the behaviour of the Zapper Box. The image I’m looking at is a reference. The test rig hooked up to the Zapper Box is also a medium, a reference, enabling testers’ capacity to observe the amount of Energy at the electrode. The test rig’s log is a medium to which I can refer, providing a record of precise observations that I can analyse over the longer term. The Control Box also includes testability features that produce a log (yet another reference). Those features extend the testers’ capacity to record the actions of the control box, and the log enables the comparison between the test rig’s log and the operator’s intended actions. There is a whole network of references here. I could apply consistency oracles to all of them, singly and in combination.

My curiosity is also aroused by questions about the test. Did the testers follow the same protocol on each test run? Was the Zapper Box powered on for the same amount of time before the tester flipped the switch at the Control Box? Did the measurement start exactly when the Zapper Box was told to turn off? Was the connection between the Control Box and the Zapper Box reliable? How would we know? Did the testers allow the Zapper Box to cool down between test runs? For how long? For the same amount of time each time? Is the test rig measuring the Energy properly? How would we know?

Notice: although there has been extensive use of tools, no checking has occurred here! James and the testers tested the product, gathered data, and presented that data in a form that allowed them (and others) to see interesting patterns. I have been analyzing those patterns (and I hope you have too).

Some oracles just provide us with “pass or fail”, “true or false”, “green or red” results. Other oracles point us to possible problems upon which we may shine light. In this case, we would need more information—more references, more tools, more conference—before we could determine more clearly whether there is a problem. We might or might not need any of these things before the client could decide that there is a problem. More significantly, we would need more information to determine how we might be able to check for a problem. We must apply informal oracles before we can learn how to apply formal oracles well.

In other words: the process of recognizing a problem sometimes requires us to travel all over the oracle quadrants. That’s far more than “trying the product and comparing it to the specification”.

All this sets us up for the next few posts: if we considered the oracle principles (themselves media for recognizing problems that threaten people’s desires!), how could we imagine applying tools to enable, extend, enhance, accelerate, or intensify our capabilities as testers? What other roles might oracles play in the process? And if we are to use oracles and tools wisely, what effects—good and bad—could we anticipate from applying them? As we shape our tools, how might our tools shape us?

The first part of an answer is in the next post in this series.

Oracles from the Inside Out, Part 5: Oracles as References as Media

Tuesday, September 15th, 2015

Try asking testers how they recognize problems. Many will respond that they compare the product to its specification, and when they see an inconsistency between the product and its specification, they report a bug. Others will talk about creating and running automated checks, using tools to compare output from the product to specific, pre-determined, expected results; when the product produces a result inconsistent with expectations, the check identifies a bug which the tester then reports to the developer or manager. It might be tempting to think of this as moving from the bottom right quadrant on this table to the bottom left.

Traditional talk about oracles refers almost exclusively to references. W.E. Howden, who introduced “oracle” as a term of testing art, said that an oracle as “an external mechanism which can be used to check test output for correctness”. Yet thinking of oracles in terms of correctness leads to some pretty serious problems. (I’ve outlined some of them here).

In the Rapid Software Testing namespace, we take a different, broader view of oracles. Rather than focusing on correctness, we focus on problems: an oracle is a means by which we recognize a problem when we encounter one during testing. Checking for correctness, as Howden puts it, may severely limit our capacity to notice many kinds of problems. A product or service can be correct with respect to some principle, but have plenty of problems that aren’t identified by that principle; and a product can produce incorrect results without the incorrectness representing a problem for anyone. When testers fixate on documented requirements, there’s a risk that they will restrict their attention to looking for inconsistencies with specific claims; when testers fixate on automated checks, there’s a risk that they will restrict their focus to inconsistency with a comparable algorithm. Focus your attention too narrowly on a particular oracle—or a particular class of oracle—and you can be confident of one thing: you’ll miss lots of bugs.

Documents and tools are media. In the most general sense, “medium” is descriptive of something in between, like “small” and “large”. But “medium” as a noun, a medium, can be between lots of things. A communication medium like radio sits between performers and an audience; a psychic medium, so the claim goes, provides a bridge between a person and the spirit world; when people want to exchange things of value, they use often use money as a medium for the exchange. Marshall McLuhan, an early and influential media theorist, said that a medium is anything that humans create or use to effect change. Media are tools, technologies that people use to extend, enhance, enable, accelerate, or intensify human capabilities. Extension is the most obvious and prominent effect of media. Most people think of media in terms of communications media. A medium can certainly be printed pages or television screens that enable messages to be conveyed from one person to another. McLuhan viewed the phonetic alphabet as a technology—a medium that extended the range of speech over great distances and accelerated its transmission. But a cup of coffee is a medium too; it extends alertness and wakefulness, and when consumed socially with others, it can extend conversation and friendliness. Media, placed between a product and our observation of it, extend our capacity to recognize bugs.

McLuhan emphasized that media change things in many different ways at the same time. In addition to extending or enabling or accelerating our capabilities, McLuhan said, every new medium obsolesces one or more existing media, grabbing our attention away from old things; every new medium retrieves notions of formerly obsolescent media, making old things new again. McLuhan used heat as a metaphor for the degree to which media require the involvement of the user; a “cool” medium like radio, he said, requires the listener to participate and fill in the missing pieces of the experience; a “hot” medium like a movie, provides stimulation to the ear and especially the eye, requiring less engagement from the viewer. Every medium, when “overheated” (McLuhan’s term for a medium that has been stretched or extended beyond its original or intended capacity), reverses into the opposite of what it might have been originally intended to accomplish. Socrates (and the King of Egypt) recognized that writing could extend memory, but could reverse into forgetfulness (see Plato’s dialogue Phaedrus). Coffee extends alertness and conversation, but too much of it and people become too wired work and too addled to chat. A medium always draws attention to itself to some degree; an overheated medium may dazzle us so much that we begin to ignore what it contains or what we intended it to do for us. More importantly, a medium affects us. This is one of the implications of McLuhan’s famous but oblique statement “the medium is the message”. By “message”, he means “the change of scale or pace or pattern” that a new invention or innovation “introduces into human affairs.” (This explanation comes from Mark Federman, to whom I’m indebted for explaining McLuhan’s work to me over the years.)

When we pay attention, we can easily observe media overheating both in talk about testing and development work and in the work itself. Documents and tools frequently dominate conversations. In some organizations, a problem won’t be considered a bug unless it is inconsistent with an explicit statement in a specification or requirements document. Yet documents are only partial representations, subsets, of what people claim to have known or believed at some point in time, and times change. In some places, testing work is dominated by automated checking. Checks can be very valuable, providing great precision and fast feedback. But checks may focus on functional aspects of the product, and less on other parafunctional attributes.

McLuhan’s work emphasizes that media are essentially neutral, agnostic to our purposes. It is our engagement with media that produces good or bad outcomes—good and bad outcomes. Perhaps the most important implication of McLuhan’s work is that media amplify whatever we are. If we’re fabulous testers, our tools extend our capabilities, helping us to be even more fabulous. But if we’re incompetent, tools extend our incompetence, allowing us to do bad testing faster and worse than we’ve ever been able to do it before. To the degee that we are inclined to avoid conflict and arguments, we will use documents to help us avoid conflict and arguments; to the degree that we are inclined to welcome discussion and the refinement of ideas, then documents can help us do that. If we are disposed to be alert to a wide range of problems, automated checks will help us as we diversify our scope; if we are oblivious to certain kinds of problems in the product, automated checks will amplify our oblivion.

Reference oracles—documents, checking tools, representative data, comparable products—are unquestionably media, extending all of the other kinds of oracles: private and shared mental models, both private and shared feelings, conversations with others, and principles of consistency. How can we evaluate them? What do we use them for? And how can we use them to help us find problems without letting them overwhelm or displace all the other ways we might have of finding problems? That’s the subject of the next post.

Oracles from the Inside Out, Part 4: From Experience to Inference

Tuesday, September 8th, 2015

In the previous post, I gave an example of the happy (and, alas, rare) circumstance in which the programmer and I share models and feelings, such that the programmer becomes aware of a problem I’ve found without my being explicit about it. That is, on this table, I can go directly from top-left, where I experience the problem, to conference in the bottom left, where awareness of that problem has been successfully conveyed to someone else.

More often than not, my testing client will not recognize the meaning and significance of the problem immediately in the same way that I do. My awareness of a problem tends to start with a feeling. To impart my feeling of a problem successfully to others, most of the time I must move from my tacit, internal, and emotional reaction about it, and develop an explicit and rational description of the problem. That move often begins with framing the problem in terms of logical inferences, based on an undesirable inconsistency between my observation of the product and my understanding of some desirable principle. (An inference is conclusion that we arrive at by a line of reasoning, in which we make logical connections between facts. Inference is also an activity—the process of making those connections.) In the diagram, that’s a move from the upper left to the upper right, from a tacit feeling to an observable and explicit inconsistency; from experience to inference.

I’ve been testing a version of Adobe Acrobat. In its File / Properties dialog, there’s a metadata field called “Keywords”. I enter some data using Perlclip’s counterstring feature (a counterstring is a string that allows you to see its own length easily).

I observe that this field appears to be limited to a maximum of 1,999 characters. I try typing beyond this apparent limit, and nothing happens. When I click on the “Additional Metadata” button a new dialog appears. That dialog also has a “Keywords” field; the text that I entered into the previous dialog is observable there too. The field looks a little different, though. I experience a feeling of curiosity, and make an inference that something else might be different here. I move to the cursor into the Keywords field, and press Ctrl-End. Then I try to enter some text.

I observe that I’m able to enter more than I was able to enter into the “Keywords” field in the previous dialog; the limit appears to be broken. That’s interesting. I paste the counterstring from Perlclip, and discover the the limit is 30,000 characters.

When I dismiss the Additional Metadata dialog, the original File / Properties dialog shows the 30,000-character string. I experiment a little more, and find I can delete characters from that string and replace them with new ones; there’s no longer a 1,999-character limit. Hmmm.

I go back to the Additional Metadata dialog, and delete some text from the Keywords field. I dismiss the dialog, and discover my changes have been reflected in the File/Properties/Keyword field. As before, I can edit what’s there, but the limit is now the length of string that was in the Additional Metadata dialog; less than 30,000, but more than 1,999.

This is all a little surprising, but is this a problem? I refer to my feelings and make some inferences, based on principles related to consistencies and inconsistencies.

In one dialog, there appears to be an enforced limit of 1,999 characters for the Keywords field; in another, the limit is 30,000 characters. I’m surprised and little confused, because the product appears to be inconsistent with itself. I don’t understand why all this should be so; the product is behaving in a way that is inconsistent with my ability to explain it. I infer that wherever there’s a limit, someone had some purpose in setting it. I don’t have access to the programmer’s or designer’s intentions at the moment, but whichever limit someone intended, the product seems inconsistent with one of them—inconsistent with purpose. I’ve seen problems like this before—in some cases, data from one field overwrites data in another, corrupting the file or providing the opportunity for a buffer-overflow attack—so I can infer some risks based on the product being consistent with patterns of familiar problems.

In Rapid Testing, we’ve collected these and several other principles by which we can make inferences about problems in the product; you can find our current list of these oracles here. Armed with oracles in the form of consistency heuristics—principles and inferences about them—, I’ve got more—much more—than “huh?” or “euuugh!” available to me when I’m relating the problem to my client.

Some testers habitually report problems in terms of a “defect”, “actual result” and an “expected result”. This kind of templatized reporting often seems imprecise and even a little lame to me, for reasons that I set out here and here. It’s premature and presumptuous of me even to think of this as a defect, never mind making such a claim in a formal report. Testers confuse “expectation” with “desire”; something can be desirable or undesirable regardless of what we expected from it. Neither expectations nor desires are absolutes; they’re relative to some person(s), and based on specific models, quality criteria, and principles. So, rather than using the tired “expected result/actual result” pattern, try providing your observation of a problem and the inconsistency principle that gives warrant to your belief that it is a problem.

Some of the oracle principles can be applied fairly directly and immediately to a reasonably solid inference. For example, I know tacitly that Adobe’s business is all about rendering text and images beautifully on paper and on screens, so it seems to me that the font rendering issue (as I noted in an earlier post, and as you can see above in the Additional Metadata dialog) is inconsistent with an image that Adobe might reasonably want to project. Some problems, though, are more easily noticed or described with the help of some medium—a tool or an artifact based on and representing an explicit, shared model; something that we can point to; a reference. We’ll talk about that in the next post.

Oracles From the Inside Out, Part 3: From Experience Directly to Conference

Sunday, September 6th, 2015

Yesterday I described the moment of recognition of a problem, which tends to happen in the upper-left corner of this table:

When I perceive a problem during testing, it’s my job to let people know about what I’ve found, and to arrive at a shared set of feelings and mental models about the problem, represented by the lower left quadrant. That is, I want to move from experience—my private, internal, tacit mental models of the product—to the conclusion of successful conference, such that my clients understand what I’ve observed and why I think it’s a problem; and such that I understand their response to the problem, even though that doesn’t necessarily mean that the problem will be fixed.

I might not have to talk about the problem. Sometimes I can get the message across tacitly, without actually telling someone explicitly what the problem is. If I’ve been working with people for a while on the same project, I might be able to communicate without using words at all. In a society or culture, people develop collective tacit knowledge, an understanding of what things are and how things are done in that culture. Collective tacit knowledge is not contained in any one person’s head, but in the social group, and in the relationships between the people in it.

From the previous post, recall the cropped fonts that surprised and amused me.

A developer is walking by. I gesture or grunt something to get his attention. “Yo!”

The programmer replies, “Huh?”

I turn to my screen as he looks over my shoulder. I have the Windows Change Font Size dialog open. With a glance, I can see that he sees it too, and that he’s watching. I point to the Large Fonts radio button. “Hmmm?” I open Acrobat’s File / Properties / Additional Metadata dialog. We both see this:

I say “Euuugh!”

The programmer peers down at the screen, and I can tell that he sees the cropped font problem just as I do—just as you do. “Euuugh!” he says. “Ugh!”

“Mmmmm!”, I reply. He shrugs, sighs, rolls his eyes, and walks off towards his desk. By his manner and his expression, I know that he’s intending to fix the problem, even though we’ve only communicated in grunts. I’ve just had a conference with the programmer, with the communication structured and framed by our shared feelings and mental models; by our collective tacit knowledge. I didn’t have to describe the problem or cite an oracle. Without having exchanged a single word intelligible word, we’ve arrived at our destination. We’ve gone straight from my tacit, private feelings and mental models (experience) to shared feelings and a shared understanding about the problem (conference) and that has prompted some action. A bug is going to be fixed! Huzzah!

Of course, such simple and straightforward exchanges don’t happen very often. Far more often, there’s more work to do. We’ll talk about some of that in the next post.

Oracles from the Inside Out, Part 2: Experience, Mental Models, and Feelings

Saturday, September 5th, 2015

In the first post in this series, I introduced some of the factors in recognizing a problem when one occurs during testing. Let’s walk through some examples of recognizing and relating problems.

Imagine that I’m a tester at Adobe, a few years back, testing a version of Acrobat. As it happens, my laptop’s screen is smaller than a desktop monitor, but at 1920 by 960 pixels, the resolution is quite fine (in more ways than one). On my display, the standard Windows fonts are tiny and my eyes aren’t so good, so I use Windows’ Large Fonts setting. When large fonts are active, Adobe Acrobat displays the text in its File / Properties / Additional Metadata dialog like this:

My process of recognizing and reporting a problem typically starts with a feeling that I experience. I might be confused by some behaviour that I don’t understand; or I might be frustrated that I can’t get something done. I might be annoyed by a feature that’s obscure, hard to find, or hard to use. I’m impatient waiting for the system to finish what it’s doing so that I can move on to the next thing. Maybe I’ve seen this problem before, and I’m mildly disgusted that it has returned, or I’m worried that the programmers misunderstood a bug report or misimplemented a fix somehow. When I experience feelings like these, I begin to suspect that I’m seeing a problem.

When I see the Advanced File Properties dialog, I’m surprised and to some degree amused by the cut-off text in its labels, which seems inappropriate and somewhat silly because it’s in conflict with the way I think text should look. Formally, that’s called a schema; in this case, it’s an internal, private, tacit mental model of how things should be. My models provide framing for why I believe something is problem, and the intensity of my feelings provides clues about the seriousness of the problem. My models have been shaped by experiences of one kind or another, and in turn my models shape my perceptions and my experiences.

Sometimes I might encode some of my mental models, actions, and observations in a set of automated checks, and at some point one might be returning a red result. You could say that my recognition of a problem starts not with a feeling, but with my observation of the red bar, or even with my encoding of the check. I would reply that the observation alone doesn’t have any real meaning or significance for me—that is, I don’t interpret the observation as a problem until the feeling arrives, which it does pretty much immediately anyway: I’m surprised by the red bar and curious about what has produced it.

An experience of a problem—represented by the upper left quadrant of the diagram—is private to me, internal and tacit. Inside, I say something like “Euuugh!” or “Oh!” or “Damn!” or “Ha ha!”. I have the feeling that there’s a problem, but I haven’t yet made it explicit, not to anyone else, and maybe not even to myself. Feelings don’t come with return addresses, and we usually don’t have to justify or explain what we feel or why we feel them. When I’m testing, though, I’m acting as an agent for other people, so when I get a feeling of a problem, there’s more work to do. That’s the subject of the next post in this series.

Oracles from the Inside Out, Part 1: Introduction

Friday, September 4th, 2015

In Rapid Testing (and, we believe, in all worthwhile testing) learning about the product and finding problems that matter are central to the mission. In the role of a tester, I support the business and development team by finding and describing things I believe to be important problems that might threaten the value of the product.

A skilled tester should be able do that at any stage—as soon as there’s something to test—with any kind of product, whether it’s a compiled program, a component, a function, source code, a specification, a requirements document, a feature idea, a model, a diagram… Any of these products may have observable, describable problems associated with them.

To be effective, I must be able to provide a credible description of why I believe those problems to be problems that matter. The goal is to give people information that helps them to decide whether the product they’ve got is the product they want. Anyone who is a consumer of that information is a client of the service that I offer.

As a tester, I’m neither the developer nor the product owner. I don’t have authority over the product, the project, or the business. It is not part of the testing role to fix the problem (that rests with the developers); it is not a tester’s responsibility to decide that the problem shall be fixed, nor when, nor how (that rests with the developers and the managers); nor is it a tester’s role to make decisions about the scope of the product (that too rests with others).

I don’t even have authority to declare definitively that something is a problem. For some problems, designers and developers make that determination. A higher authority rests within a business role—the person responsible for decisions about developing and releasing the product, like a project manager or product manager.

Some testers are uncomfortable with the reality of these limitations. To me, that would be like an investigative reporter being upset that he can’t stamp out malfeasance, prosecute corrupt politicians, and restructure local government. Investigative reporters investigate to reveal problems and issues. That’s an honourable and important job on its own—and it requires a specialized set of skills.

When an investigative reporter yearns to change things to be the way he wants them, he becomes a politician. Politicians need skill sets, mindsets, and priorities that are different from those held by reporters. Politicians have certain kinds of authority that investigative reporters don’t have. Politicians can—and perhaps should—take heed of the information that investigative reporters provide, but being a politician means that you’re entitled to ignore that information.

Similarly, being a tester is an honourable and important job, requiring a specialized mindset and a specialized set of skills. Building a quality product is a goal that everyone shares, but the priorities of developers and managers may be different from yours. If you want to be the one making decisions about the product, become a designer, a programmer, or a project manager.

When I encounter something that looks like a problem, I want my clients to understand the nature of the problem as I perceive it. In Rapid Testing, we say that an oracle is a means by which we recognize a problem that we encounter during testing. Yet a couple of years ago, we began to notice another role that oracles play in the social dynamics of software testing. Oracles help us to describe and articulate the problems that we see; and oracles calibrate our feelings about those problems with the rest of the team.

A few years ago, James and I developed this chart to help testers to illustrate the relationships between things as we process our recognition of a problem.

Recognizing a problem tends to start with a feeling, and tends to end with an understanding of the problem and feelings about it that are shared between the tester and the other members of the project community. How does that play out in practice? That’s going to be the subject of some posts over the next little while. Please stay tuned.