Blog Posts from March, 2016

Oracles from the Inside Out, Part 9: Conference as Oracle and as Destination

Thursday, March 17th, 2016

Over this long series, I’ve described my process of reasoning about problems, using this table:

So far, I’ve mostly talked about the role of experience, inference, and reference. However, I’m typically testing for and with clients—product managers, developers, designers, documenters, and so forth. In doing so, I’m trying to establish a shared understanding of the product with the rest of the team. That understanding is developed through conference; conversation and interaction with those other people. So the lower left quadrant represents two things at once: a set of oracles on the one hand, and my destination on the other.

A brief recap: while testing, I experience and develop my own set of mental models of the product and feelings about it, and reason about possible problems in it. In many cases—for instance, when I get a feeling of surprise or confusion, I’m able to use the consistency principles in the upper right to make inferences that I’m seeing a problem. My inferences might be mediated by references like a document (a specification, or a diagram, or a standard) or a tool (a suite of automated checks, or something that helps me to aggregate and visualize patterns in the data). Those media afford a move from upper right to lower right, and back again to a stronger inference in the upper right.

In other cases, my experiences, inferences, and references may not be enough for me to convince myself that I’m seeing a problem or missing one. If so, one possible move is to ask another tester, a developer, a expert user, a novice user, a product owner, or subject matter expert for information or an opinion. (In Rapid Testing, we often call such a person a live oracle.) When I do that, I’m moving from inference to conference, from upper right to lower left. Occasionally that communication happens immediately and tacitly, without my having to refer to explicit inferences or references. More often, it’s a longer and more involved discussion.

I could use the expertise of a particular person as an oracle, and rely upon that person to declare that he or she is seeing a problem. However, perspectives differ, people have blind spots, everyone is capable of making a mistake, and what was true yesterday may not be true today. Thus there is a risk that a live oracle could be oblivious to certain kinds of problems, or could mislead me into believing there’s a problem where there isn’t one. No oracle—not even a live one, nor a group of them—is infallible. The expert user might not notice an ease-of-learning problem that would cause a novice to stumble. A new programmer might not see a usability problem that an experienced tester would notice right away.

Perhaps more interestingly, people might disagree about whether there’s a problem or not. Such disagreements themselves are oracles, alerting me to problems in the project as well as the product. Feelings can provide important clues about the meaning and the significance of a problem. As we work together, I can listen to people’s opinions, observe the emotional weight they carry, weigh agreements and disagreements between people who matter, and compare their feelings with my own. I move between conference and inference to to recognize or refine my perception of a problem.

The ultimate goal for my testing is to end up in that lower left quadrant with one person in particular: my most important client, the person responsible for making content and release decisions about the product. (That person may have one of a number of titles or labels, including product manager, program manager, project manager, development manager… Here, let’s call that person the Client.) I want my models and feelings about the product to be consistent with the Client’s models and feelings. Experience, inference, reference, and conference help me to do that.

Here’s a fact-based but somewhat fictionalized example. A few years ago, I was working at a financial institution. One of the technical support people mentioned in passing that a surprisingly high proportion of her work was dealing with failed transactions involving two banks out of the hundreds that we interacted with. That triggered a feeling of curiosity: was there a bug in our code? That feeling prompted me to investigate.

Each record had a transaction identifier associated with it. The transaction ID was generated from various bits of data, including the customer account number, and it included a calculated check digit. When I started testing, I noticed that the two banks in question used six-digit account numbers, rather than the more common seven-digit form. I cooked up a script to perform a large number of simulated transactions with those two banks. When I examined the logs, I found that a small number of transactions had invalid account numbers. That problem should have been trapped by the check digit functions, but the transactions were allowed to pass through the system unhindered.

When I mentioned the problem in passing to the product owner, I observed that she seemed unperturbed; she didn’t seem to be taking the problem very seriously. The discrepancy between our feelings suggested that one of two things must have be true: either I hadn’t framed the problem sufficiently well for her to recognize its significance; or she had information that I didn’t, information that would have changed my perception of the problem and lessened my emotional reaction to what I was seeing.

“The problem is only with those two banks,” she said. “Six-digit account numbers, right? We have to special-case those by adding a trailing zero for the check digit function. Something about the check digit calculation fails about one time in a couple of hundred, but the transaction goes through anyway. But later, when we send the acknowledgement packet, those two banks reject it. So six-digit numbers are a pain, but we’ve always been able to deal with the occasional failure.” Here she was using the “patterns of familiar problems” and “history” oracle principles as her means of recognizing a problem. But something else was going on: she was using those two principles to calibrate the significance of the problem in terms of her own mental models, and those principles were helping to dampen her concern. Those oracles suggested that to her that I was observing a problem, but not a big problem.

I did a search of the database, and discovered that there were eight other banks that used six-digit numbers. I wrote a quick script to extract all of the records for those banks. All of transactions had happened successfully.

“OK, but here’s what I found out,” I replied. “There are eight other banks that use six-digit numbers, and we’ve never seen a check-digit failure in those.”

“Really?” she said. “Wow. I thought those were the only two.” I could see that she was suddently more engaged. The fact that the product was inconsistent with itself was a powerful oracle. Awareness of the inconsistency raised her emotional state.

“Yep,” I said. “Here’s the thing: for those two banks—and only for those two—we’re serving up the wrong Web page to get input, which is obviously inconsistent with our design. That page provides the customer with a seven-digit input field. I looked at the logs, and I tried a bunch of stuff myself. Here’s what I think is happening: when the customer enters in a six-digit account number, the page rejects their input because it’s too short, and tells them they need to put in a seven-digit number. It looks to me like a few of the customers are trying to work around the error message by putting in a leading zero. They do that because we show an image to illustrate example input. That image is a seven-digit number that has a leading zero in it. What’s funny is that that the wrong thing to do—putting in a leading zero—actually succeeds every now and again; the hash function for the check digit generates a valid transaction ID by coincidence. Not very often, but enough for it to register.”

“Interesting!” she said. She smiled. “Good detective work there.”

“So, are we going to fix it?” I asked, confident that we finally had a shared understanding of the problem.

“Nope.”

I was surprised, and felt myself becoming a little agitated. “Nope?!”

“Well, probably not. We’re replacing the whole input process in six months or so. Since we can deal with the problem as it is, and since the developers are busy on the new version, we’re cool with muddling along.” She noticed from my expression that I suddenly felt deflated. “Listen, that was some really good testing,” she said. “And I really appreciate the effort, and I understand your concern. I get that it’s a real problem for a handful of customers (here, she was acknowledging the inconsistency with user desires oracle), although once they’ve called us, they’re aware of the workaround. I know it does sound like a pretty easy fix, and we could fix it. But then we’d want to test it to make sure that the whole process keeps working for all of the customers of those banks, not just the ones who have had the problems. And with the new version coming up, trust me: you’ll have more than enough to do.”

I was a little disappointed that my investigation hadn’t resulted in a fix, but I did feel that she’d been listening. I had heard enough from her to dampen my own emotional state down so that it was well calibrated with hers.

When I observe a problem, the Client might or might not agree with me that it is a problem. That’s okay. As a tester, I’m not judge or jury for the problem, but I do want to make sure that my report has been heard and understood. After that, the Client can decide what she likes.

She might decide that it’s an important and urgent problem, and that it needs to be addressed right away. She might agree that it’s a problem, but not a problem worth fixing. She might believe that the problem is worth fixing, but not right away. She might dismiss my report of an inconsistency between the product some principle by citing other, more important principles with which the product is consistent.

Oracles give us means not only to recognize problems, but also to interpret and explain our feelings about them. When I can frame my experience—feelings and mental models—in terms of inferences about inconsistencies, I’m better prepared for a conversation—a conference—with my client about each problem, and why I believe it’s a problem.