Archive for the ‘Test Framing’ Category

Questioning Test Cases, Part 1

Monday, April 4th, 2011

Over the years, LinkedIn seems to have replaced comp.software.testing as the prime repository for wooly thinking and poorly conceived questions about testing.

Recently I was involved in a conversation with someone who, at least, seemed to be more articulate than most of the people on LinkedIn. Alas, I’ve since lost the thread, and after some searching I’ve been unable to find it. No matter: the points of the discussion are, to me, are still worth addressing. She was asking a question about how much time to allocate to writing test cases before starting testing, and I questioned the usefulness of doing that. The first part of her reply went like this:

I would like to point out that I doubt anyone wants to write something that they don’t need to write.

I agree that most people probably don’t want to write things they don’t need to write. But they often feel compelled, or are compelled to write things they don’t need to write.

I find value in writing test cases for a number of reason. One is that I train more junior engineers in testing and it is a good method to have them execute tests that I have written so they learn how a good test plan is put together.

If that were so, wouldn’t your junior engineers learn even more from writing test cases themselves, and getting feedback on their design and their writing? There’s a feedback loop in the design of a test, the execution of a test, the interpretation of a test result, and the learning that happens between them; wouldn’t it be a good idea to keep the feedback loop—and the learning—as rapid as possible? Wouldn’t your junior engineers learn still more from actually testing—under your close supervision, at first, and then with the freedom and responsibility to act more independently as they gain skill? You might want to have a look at this article: http://www.developsense.com/articles/2008-06-KnowWhereYourWheelsAre.pdf.

There’s a common misconception that testing happens in the characters of a written test case. It doesn’t. Testing happens in the mind and actions of the tester. It happens in the design of the test, in the execution of the test, in the observation and interpretation of the outcome of the test. Testing happens in the discovery of problems, in the investigation of those problems, and the learning about those problems and the product. At most, a fraction of this can be written down.

A test is far less something you execute, and far more a line of inquiry that you follow. To me, a good test case is idea-stuff; it’s a question that we want to ask of the program, based on some motivating idea about discovery or risk. In my observation, in writing test cases, people generally write down the least important stuff. They appear to be trying to program the tester’s actions, rather trying than to prime the tester’s thinking and observation.

Moreover, a test plan something quite different from a pile of test cases.

Secondly, [writing test cases] communicates the testing coverage with everyone involved in developing the software. If you are a contractor, this is very important since you want to leave with the client feeling like you did your job and they have the documentation to prove that they have done due diligence if they shop their company around or look for for VC money.

Were you, as a contractor, given the mission to produce test scripts specifically, or is that a mission that you have inferred? Bear in mind, I’ve been witness to many takeovers, as a program manager at a company where our senior managers were ambitiously acquiring technologies, products and companies, and as a consultant to several companies that were taken over by larger companies. In no case did anyone ever ask to see any test case documentation. Those who are investigating the company to be acquired typically don’t go to that level of detail. In my experience, alas, due diligence largely doesn’t happen at all. I’m puzzled, too, by the appeal to the least likely instances in which people might interact with test documention, rather than the everyday.

Meanwhile, there are many ways to communicate test coverage. (For example, see here, here, and here.) There are also many ways to fool yourself (and others) into believing that the more documentation the more coverage, or the more specific the documentation the more coverage—especially when that documentation is prospective, rather than retrospective. In Rapid Testing, we don’t encourage people to eliminate documentation. We encourage people to reduce documentation to what is actually necessary for the mission, and to eliminate wasteful documentation.

We focus on documenting test ideas concisely; on producing coverage outlines and risk lists that can be used to guide, rather than control a tester and her line of inquiry; on producing records of the tester’s thought process, observations, risk ideas, motivations, and so forth (see below). The goal is to capture things far more important than the tester’s mechanical actions. If someone wants a record of that, we recommend video capture software (tools like BB Test Assistant or Camtasia). An automatic log allows the tester to focus on testing the product and recording the ideas, rather than splitting focus between operating the product and writing about operating the product.

You can find examples of the kind of test documentation I’m talking about here, in the appendices to the Rapid Software Testing class, starting at page 47. Note that each example has varying degrees of polish and formal structure. Sometime it’s highly informal, used to aid memory, to frame a conversation, or trigger ideas. Sometimes it’s more formal and polished, when the audience is outside the test group. The over-riding point is to fit the documentation to the mission and to the task at hand.

More to come…

More of What Testers Find, Part II

Friday, April 1st, 2011

As a followup to “More of What Testers Find“, here are some more ideas inspired by James Bach’s blog post, What Testers Find. Today we’ll talk about risk. James noted that…

Testers also find risks. We notice situations that seem likely to produce bugs. We notice behaviors of the product that look likely to go wrong in important ways, even if we haven’t yet seen that happen. Example: A web form is using a deprecated HTML tag, which works fine in current browsers, but may stop working in future browsers. This suggests that we ought to do a validation scan. Maybe there are more things like that on the site.

A long time ago, James developed The Four-Part Risk Story, which we teach in the Rapid Software Testing class that we co-author. The Four-Part Risk Story is a general pattern for describing and considering risk. It goes like this:

  1. Some victim
  2. will suffer loss or harm
  3. due to a vulnerability in the product
  4. triggered by some threat.

A legitimate risk requires all four elements. A problem is only a problem with respect to some person, so if a person isn’t affected, there’s no problem. Even if there’s a flaw in a product, there’s no problem unless some person becomes a victim, suffering loss or harm. If there’s no trigger to make a particular vulnerability manifest, there’s no problem. If there’s no flaw to be triggered, a trigger is irrelevant. Testers find risk stories, and the victims, harm, vulnerabilities, and threats around which they are built.

In this analysis, though, a meta-risk lurks: failure of imagination, something at which humans appear to be expert. People often have a hard time imagining potentional threats, and discount the possibility or severity of threats they have imagined. People fail to notice vulnerabilities in a product, or having noticed them, fail to recognize their potential to become problems for other people. People often have trouble making the connection between inanimate objects (like nuclear reactor vessels), the commons (like the atmosphere or sea water), or intangible things (like trust) on the one hand, and people who are affected by damage to those things on the other. Excellent testers recognize that a ten-cent problem multiplied by a hundred thousand instances is a ten-thousand dollar problem (see Chapter 10 of Jerry Weinberg’s Quality Software Management, Volume 2: First Order Measurement). Testers find connections and extrapolations for risks.

In order to do all that, we have to construct and narrate and edit and justify coherent risk stories. To to that well, we must (as Jerry Weinberg put it in Computer Programming Fundamentals in 1961) develop a suspicious nature and a lively imagination. We must ask the basic questions about our products and how they will be used: who? what? when? where? why? how? and how much? We must anticipate and forestall future Five Whys by asking Five What Ifs. Testers find questions to ask about risks.

When James introduced me to his risk model, I realized that there people held at least three different but intersecting notions of risk.

  1. A Bad Thing might happen. A programmer might make a coding error. A programming team might design a data structure poorly. A business analyst might mischaracterize some required feature. A tester might fail to investigate some part of the product. These are, essentially, technical risks.
  2. A Bad Thing might have consequences. The coding error could result in miscalculation that misrepresents the amount of money that a business should collect. The poorly designed data structure might lead to someone without authorization getting access to privileged information. The mischaracterized feature might lead to weeks of wasted work until the misunderstanding is detected. The failure to investigate might lead to an important problem being released into production. These are, in essence, business risks that follow from technical risks.
  3. A risk might not be a Bad Thing, but an Uncertain Thing on which the business is willing to take a chance. Businesses are always evaluating and acting on this kind of risk. Businesses never know for sure whether the Good Things about the product are sufficiently compelling for the business to produce it or for people to buy it. Correspondingly, the business might consider Bad Things (or the absence of Good Things) and dismiss them as Not Bad Enough to prevent shipment of the product.

So: Testers find not only risks, but links between technical risk and business risk. Establishing and articulating those links are depend on the related skills of test framing and bug advocacy. Test framing is the set of logical connections that structure and inform a test. Bug advocacy is the skill of determining the meaning and significance of a bug, and reporting the bug in terms of potential risks and consequences that other people might have overlooked. Bug advocacy doesn’t mean jumping up and down and screaming until every bug—or even one particular bug—is fixed. It means providing context for your bug report, helping managers to understand and decide why they might to choose to fix a problem, right now, later, or never.

In my travels around the world and around the Web, I observe that some people in our craft have some fuzzy notions about risk. There are at least three serious problems that I see with that.

Tests are focused on (documented) requirements. That is, test strategies are centred around making sure that requirements are checked, or (in Agile contexts) that acceptance tests derived from user stories pass. The result is that tests are focused on showing that a product can meet some requirement, typically in a controlled circumstance in which certain stated conditions assumed necessary have been met. That’s not a bad thing on its own. Risk, however, lives in places where where necessary conditions haven’t been stated, where stated conditions haven’t been met, or where assumptions have been buried, unfulfilled, or inaccurate. Testing is not only about demonstrating that some instance of a requirement has been satisfied. It’s also about identifying things that threaten the successful fulfillment of that requirement. Testers find alternative ideas about risk.

Tests don’t get framed in terms of important risks. Many organizations and many testers focus on functional correctness. That can often lead to noisy testing—lots of problems reported, where those problems might not be the most important problems. Testers find ways to help prioritize risks.

Important risks aren’t addressed by tests. A focus on stated requirements and functional correctness can leave parafunctional aspects of the product in (at best) peripheral vision. To address that problem, instead of starting with the requirements, start with an idea of a Bad Thing happening. Think of a quality criterion (see this post) and test for its presence or its absences, or for problems that might threaten it. Want to go farther? My colleague Fiona Charles likes to mention “story on the front page of the Wall Street Journal” or “question raised in Parliament” as triggers for risk stories. Testers find ways of developing risk ideas.

James’ post will doubtless trigger more ideas about what testers find. Stay tuned!

P.S. I’ll be at the London Testing Gathering, Wednesday, April 6, 2011 starting at around 6:00pm. It’s at The Shooting Star pub (near Liverpool St. Station), 129 Middlesex St., London, UK. All welcome!

More of What Testers Find

Wednesday, March 30th, 2011

Damn that James Bach, for publishing his ideas before I had a chance to publish his ideas! Now I’ll have to do even more work!

A couple of weeks back, James introduced a few ideas to me about things that testers find in addition to bugs.  He enumerated issues, artifacts, and curios.  The other day I was delighted to find an elaboration of these ideas (to which he added risks and testability issues) in his blog post called What Testers Find.  Delighted, because it notes so many important things that testers learn and report beyond bugs.  Delighted, because it gives me an opportunity and an incentive to dive into James’ ideas more deeply. Delighted, because it gives us all a chance to explore and identify a much richer view of testing than the simplistic notion that “testers find bugs”.

Despite the fact that testers find much more than bugs, let’s start with bugs.  James begins his list of what testers find by saying

Testers find bugs. In other words, we look for anything that threatens the value of the product.

How do we know that something threatens the value of the product?  The fact is, we don’t know for sure.  Quality is value to some person, and different people will have different perceptions of value.  Since we don’t own the product, the project, or the business, we can’t make absolute declarations of whether something is a bug or whether it’s worth fixing.  The programmers, the managers, and the project owner will make those determinations, and often they’re running in different directions.  Some will see a problem as a bug; some won’t.  Some won’t even see a problem. It seems like the only certain thing here is uncertainty.  So what can we testers do?

We find problems that might threaten the value of the product to some person who matters. How do we do that? We identify quality criteria–aspects of the product that provide some kind of value to customers or users that we like, or that help to defend the product from users that we don’t like, such as unethical hackers or fraudsters or thieves.  If we’re doing a great job, we also to account for the fact that users we do like will make mistakes from time to time.  So defending value also means making the product robust to human ineptitude and imperfection.  In the Heuristic Test Strategy Model (which we teach as part of the Rapid Software Testing course), we identify these quality criteria:

  • Capability (or functionality)
  • Reliability
  • Usability
  • Security
  • Scalability
  • Performance
  • Installability
  • Compatibility
  • Supportability
  • Testability
  • Maintainability
  • Portability
  • Localizability

In order to identify threats to the quality of the product, we use oracles.  Oracles are heuristic (useful, fast, inexpensive, and fallible) principles or mechanisms by which we recognize problems.  Most oracles are based on the notion of consistency.  We expect a product to be consistent with

  • History (the product’s own history, prior results from earlier test runs, our experience with the product or other products like it…)
  • Image (a reputation our development organization wants to project, our brand identity,…)
  • Comparable products (products like this one that we develop, competitors’ products, test programs or algorithms,…)
  • Claims (things that important people say about the product, requirements, specifications, user documentation, marketing material,…)
  • User expections (what reasonable people might anticipate the product could or should do, new features, fixed bugs,…)
  • Product (behaviour of the interface and UI elements, values that should be the same in different views,…)
  • Purpose (explicitly stated uses of the product, uses that might be implicit or inferred from the product’s design, no excessive bells and whistles,…)
  • Standards (relevant published guidelines, conventions for use or appearance for products of this class or in this domain, behaviour appropriate to the local market,…)
  • Statutes (relevant laws, relevant regulations,…)

In addition to these consistency heuristics, there’s an inconsistency heuristic too:  we’d like the product to be inconsistent with patterns of problems that we’ve seen before.  Typically those problems are founded in one of the consistency heuristics listed above. Yet it’s perfectly reasonable to observe a problem and recognize it first by its familiarity. We’ve seen lots of testers do that over the years.

We encourage people do come up with their own lists, or modifications to ours. You don’t have to use Heuristic Test Strategy Model if it doesn’t work for you.  You can create your own models for testing, and we actively encourage people who want to become great testers to do that.  Testers find models, ways of looking at the product, the project, and testing itself, in the effort to wrestle down the complexity of the systems we’re testing and the approaches that we need to test them.

In your context, do you see a useful distinction between compatibility (playing nice with other programs that happen to co-exist on the system) and  interoperability (working well with programs with which your application specifically interacts)?  Put interoperability on your quality criteria list.  Is accessibility for disabled users so important for your product that you want to highlight it in a separate quality criterion?  Put it on your list. Recently, James noticed that explicablility is a consistency heuristic that can act as an oracle too:  when we see behaviour we can’t explain or make sense of, we have reason to suspect that there might be a problem.  Testers find factors, relevant and material aspects of our models, products, projects, businesses, and test strategies.

When testers see some inconsistency in the product that threatens one or more of the quality criteria, we report.  For the report to be relevant and meaningful, it must link quality criteria, oracles, and risk in ways that are clear, meaningful, and important to our clients. Rather than simply noticing an inconsistency, we must show why the inconsistency threatens some quality criterion for some person who matters.  Establishing and describing those links in a chain of logic from the test mission to the test result is an activity that James and I call test framing.  So:  Testers find frames, the logical relationships between the test mission, our observations of the product, potential problems, and why we think they might be problems. James gave an example of a bug (“a list of countries in a form is missing ‘France’”). That might mean a minor usabilty problem based on one quality criterion, with a simple workaround (the customer trying to choose a time zone from a list of countries presented as examples; so pick Spain, which is in the same time zone). Based on another criterion like localizability, we’d perceive a more devastating problem (the customer is trying to choose a language, so despite the fact that the Web site has been translated, it won’t be presented in French, cutting our service off from a nation of 65 million people).

In finding bugs, testers find many other things too.  Excellent testing depends on our being able to identify and articulate what we find, how we find it, and how we contextualize it. That’s an ongoing process.  Testers find testing itself.

And there’s more, if you follow the link.

Why Do Some Testers Find The Critical Problems?

Saturday, February 5th, 2011

Today, someone on Twitter pointed to an interesting blog post by Alan Page of Microsoft. He says:

“How do testers determine if a bug is a bug anyone would care about vs. a bug that directly impacts quality (or the customers perception of quality)? (or something in between?) Of course, testers should report anything that may annoy a user, but learning to differentiate between an ‘it could be better’ bug and a ‘oh-my-gosh-fix-this’ bug is a skill that some testers seem to learn slowly. … “So what is it that makes some testers zero in on critical issues, while others get lost in the weeds?”

I believe I have some answers to this. My answers are based on roughly 20 years of observation and experience in consulting, training, and working with other testers. The forms of interaction have included in-class training; online coaching via video, voice, and text; face-to-face conversation in workplaces, conferences, and workshops; direct collaboration with other working testers in mass-market commercial software, financial services, retail services, specialized mathematical applications, and several other domains.

My first answer is that testing, for a long time and in many places, has been myopically focused on functional correctness, rather than on value to people. Cem Kaner discusses this issue in his talk Software Testing as a Social Science, and later variations on it. This problem in testing is a subset of a larger problem in computer science and software engineering. Introductory texts often observe that a computer program is “a set of instructions for a computer”. Kaner’s defintion of a computer program as “a communication among several humans and computers, distributed over distance and time, that contains instructions that can be executed a computer” goes some distance towards addressing the problem; his explication that “the point of the program is to provide value to the stakeholders” goes further still. When the definition of programming is reduced to producing “a set of instructions for a computer”, it misses the point—value to people—and when testing is reduced to the checking of those instructions, the “testing” will miss the same point. I’ve suggested in recent talks that testing is “the investigation of systems composed of people, computer programs, related products and services.” Successful testers avoid a fascination with functional correctness, and focus on ways in which people might obtain value from a program—or have their value unfulfilled or threatened.

This first answer gives rise to my second: that when testing is focused on functional correctness, it becomes a confirmatory, verification-oriented task, rather than an exploratory, discovery-oriented set of processes. This is not a new problem. It’s old enough that Glenford Myers tried (more or less unsuccessfully, it seems) to argue against it in The Art of Software Testing in 1979. Myers’ point was the testing should be premised on trying to expose the program’s failures, rather than on trying to confirm that it works. Psychological research before and since Myers’ book (in particular Klayman and Ha’s paper on confirmation bias) shows that the positive test heuristic biases people towards choosing tests that demonstrate fit with a working hypothesis (showing THAT it works), rather than tests that drive towards final rule discovery (showing how it works, and more important, how it might fail). Worse yet, I’ve heard numerous reports of development and test managers urging testers to “make sure the tests pass”. The trouble with passing tests is that they don’t expose threats to value. Every function in the program code might be checked and found correct, but the product might be unusable. As in Alan’s example, the phone might make calls perfectly, but unless we model the way people actually use the product—talking for more than three minutes at a time, say—we will miss important problems. Every function might work perfectly, but we might fail to observe missing functionality. Every function might work perfectly, but we might miss terrible compatibility problems. Functional correctness is a very important thing in computer software, but it’s not the only thing. (See the “Quality Criteria” section of the Heuristic Test Strategy Model for suggestions.) Testers “who zero in critical issues” avoid the confirmation trap.

My third answer (related to the first two) is that when testing is focused on confirming functional correctness, a lot of other information gets left lying on the table. Testing becomes a search for finding errors, rather than on finding issues. That is, testers become oriented towards reporting bugs, and less oriented towards the discovery of issues—things that aren’t bugs, necessarily, but that threaten the value of testing and of the project generally. I’ve written recently about issues here. Successful testers recognize issues that represent obstacles to their missions and strategies, and work around them or seek help.

My fourth answer is that many (in my unscientific sample, most) testers are poorly versed in the skills of test framing. This is understandable, at least in part because test framing itself wasn’t known by that name as recently as a year ago as I write. Test framing is the set of logical connections that structure and inform a test. It involves the capacity to follow and express a line of perhaps informal yet reasonably structured logic that directly links the testing mission to the tests and their results. In my experience, most testers are unable to trace this logical line quickly and expertly. There are many roots for this problem. The earlier answers above provide part of the explanation; the mission of value to the customer is overwhelmed by the mission of proving functional correctness. In situations where the process of test design is separated from test execution (as in environments that take a highly scripted approach to testing), the steps to perform the test and observe the results are typically listed explicitly, but the motivation for performing the test is often left out. In situations where test execution, observation of outcomes, and reporting of test results is heavily delegated to automation, motivation is even further disconnected from the mission. In such environments, focus is directed towards getting the automation to follow a script, rather using than automation to assist in probing for problems. In such environments, focus is often on the quantity of tests or the quantity of bug reports, rather than on the quality, the value, of the information revealed by testing. Testers who find problems successfully can link tests, test activities, and test results to the mission. They’re far more concerned about the quality of the information they provide than the quantity.

My fifth answer is that in many organizations there is insufficient diversity of tester skills, mindsets, and approaches for finding the great diversity of problems that might lurk in the product. This problem starts in various ways. In some organizations, testers are drawn exclusively from the business. In others, testers are required to have programming skills before they can be considered for the job. And then things get left out. Testers who need training or experience in the business domain don’t get it, and are kept separated from the business people (that’s a classic example of an issue). Testers aren’t given training in software design, programming, or related skills. They’re not given training in testing, problem reporting and bug advocacy, design of experiments. They’re not given training or education in anthropology, critical thinking, systems thinking, or philosophy and other disciplines that inform excellent testing. Successful testers tend to take on diversified skills, knowledge, and tactics, and when those skills are lacking, they collaborate with people who have them.

Note that I’m not suggesting here that anyone become a Donald Knuth-level programmer, a Pierre Bourdieu-league anthropologist, a Ross Ashby-class systems thinker, a Wittgenstein-grade philosopher. I am suggesting that testers be given sufficient training and opportunity to learn to program to the level of Brian Marick’s Everyday Scripting with Ruby, and that they be given classes, experience, and challenges in observation, the business domain, systems thinking and critical thinking. I am suggesting that people who are testing computer software do need some exposure to core ideas about logic (if we see this, can we justifiably infer that?), about ontology (what are our systems of knowledge about the way things work—especially related to computer programs and to testing), and about epistemology (how do we know what we know?).

I’ve been told by people involved in the design of testing standards that “you can’t expect regular testers to learn epistemology, for goodness’ sake”. Well, I’m saying that we can and that we must at least provide opportunities for learning, to the degree that testers can frame their mission, their ideas about risk, their testing, and their evaluation of the product in the ways that their clients value. Moreover, I’ve worked with testing organizations that have done that, and the results have been impressive. Sometimes I hear people saying “what if we train our testers and they leave?” As one wag on Twitter replied (I wish I knew who), “What if you don’t train them and they stay?”

In our classes, James Bach and I have the experience of inspriring testers to become interested in and excited by these topics. We find that it’s not hard to do that. We remain concerned about the capacity of some organizations to sustain that enthusiasm, often because some middle managers’ misconceptions about the practice and value of testing can squash both enthusiasm and value in a hurry. Testers, to be successful, must be given the freedom and responsibility to explore and to contribute what they’ve learned back to their team and to the rest of the organization.

So, what would we advise?

Read this set of ideas as a system, rather than as a linear list:

  • The purpose of testing is to identify threats to the value of the program. Functional errors are only one kind of threat to the value of the program.
  • Take on expansive ideas about what might constitute—or threaten—the quality of the product.
  • Dynamically manage your focus to exercise the product and test those ideas about value.
  • In hiring, staffing, and training, focus on the mindset and the skill set of the individual tester as a member of a highly diversified team.
  • As an individual tester, develop and diversify your skills and your strategies.
  • Immediately identify report issues that threaten the value of the testing effort and of the project generally. Solve the ones you can; raise team and management awareness of the costs and risks of issues, in order to get attention and help.
  • Learn to frame your testing and to compose, edit, narrate and justify a compelling testing story.
  • Don’t try to control or restrain testers; grant them the freedom—along with the responsibility to discover what they will. Given that… they will.

Exegesis Saves! (Part 1)

Friday, January 21st, 2011

This morning, I read a sentence that bugged me.

“In successful agile development teams, every team member takes responsibility for quality.”

I’ve seen sentences of that general form plenty of times before. Whether I’ve reacted or not, they’ve always bugged me, and today I decided to probe into why.

Rather than doing so on my own, I thought it would be more fun and more interesting to involve my community, so I posted a challenge on Twitter. If I want to get any other work done, I’m going to have to learn to stop doing that. I posted:

“Thinking Thursday. Test this sentence: ‘In successful agile development teams, every team member takes responsibility for quality.’”

Although I have great respect for my colleagues, I hadn’t anticipated so many interesting replies. So, today, my summary of the responses on Twitter. Soon, a brief conversation between James Bach and me, and shortly thereafter, my own assessment.

Questioning the Mission

Adam Goucher was the first to reply. He responded to my tweet by noting, “I could also check it by putting into a wordprocessor and seeing what is underlined.” I read this, laughed, and replied, “You could. But I ask for testing. :) ” Adam promptly replied that he had thus clarified the mission. Yes; absolutely. One tactic for refining your mission is to make an offer. Acceptance, rejection, or other reactions to the offer may be revealing.

Before testing the sentence, Shrini Kulkarni immediately (and wisely) questioned the testing assignment. “First question: Why are you asking this question, and how will you use my answer? Question #2: What is your motivation for asking this question and how will you evaluate my response?”
Those were splendid questions. I emphasize, especially for all those new to testing: if you feel uncertain or confused, it’s vitally important to question the mission to reduce the risk that your assumptions don’t align with your client’s assumptions.

I answered. “I’m exploring what bugs me (and others) about the sentence. I’m not sure how (or even if) I’m going to use the information just yet; that’s part of my exploration.” That’s the why, my motivation. In answer to the evaluation question, I listed intellectual or emotional stimulation; insight, epiphanies, and reminders of old epiphanies as factors. I didn’t add then (but I will now) that I didn’t think of it as a competition. (To that end, I’ve presented as many of the replies as I’ve been able, and although I wasn’t intending to evaluate them in a competitive way, I can’t help but be impressed.)

Shrini had a couple more questions. “What is your interpretation of ‘testing’ of a sentence like this?” The goal, he said, was to know how my notion of testing the sentence might be different from his. At the time, I was away from my computer, so I didn’t respond to Shrini right away. So Shrini did something else that an expert tester does; if you can’t get an answer, state your assumptions. “One way I can think of ‘testing’ a sentence is to check if it is true. So my question would be “how do you [missing word] it is true?” (I believe the missing word was a typo.) “Another interpretation of testing a sentence—a linguistic construct—is to check if it is formed as per the rules of grammar.” So here Shrini, in the face of an ambiguous assignment, gave two possible interpretations and got to work, focusing on the one that he figured was the deeper problem.

Martin Jansson also asked for clarification: “When asking me to test this, are you the only stakeholder?” I was away when that message came in too. In reply, I’ll say now that I was the only client, but in a way, lots of people might be stakeholders. I leave it as an exercise to figure out who those stakeholders might be.

Questioning the Context of the Sentence

In addition to questioning the mission, questioning the context is crucial. Martin Jansson had a bunch of context questions. “Since Twitter limits only to 140 chars, can I see the full length of what was originally intended? Is this sentence localized and is originally from another language? Can I see the original one? What other sentences are connected to the one we are testing? What will the system and its environment look like?” Griffin Jones asked, “What is the history of this sentence in this context? Is this sentence the image we want to project as a company or group? Do our peers/competition make a similar claim? Do we publicly claim to follow that sentence?”

Identifying the Problems

The very first reply that I received was from Florin Duriou, who responded simply and directly, “too generic”. As Lynn McKee said, “Many words in that sentence are subjective.” I agree with that, so let’s get specific. How?

Shrini offered two approaches. “I do little of Jerry Weinberg’s “Mary had a little lamb” exercise to see possible meanings of every word and their combinations.” That exercise, from Exploring Requirements, involves going through the simple sentence “Mary had a little lamb,” and emphasizing each word in turn to see what other interpretations might be lurking.

  • “MARY had a little lamb” (but Joe didn’t, so he was jealous)
  • “Mary HAD a little lamb” (but she doesn’t any more)
  • “Mary had A little lamb” (then she had two—and now she has a flock)
  • “Mary had a LITTLE lamb” (but too much lamb would have been bad for her diet)
  • “Mary had a little LAMB” (why did you think I said “ham”?)

Shrini’s second approach: “Testing a sentence, I look for ‘words’ in it—like successful, agile, development, teams, team, member, quality, responsibility.” By putting “words” in quotes, I think Shrini was emphasizing that every one of these ideas are concepts, constructs, thought-stuff. So let’s look at some of those words and constructs in more detail.

“Successful”

Success is context-dependent. Griffin, Lynn, Peter Walen, and Stephen Hill all questioned the meaning of “success”. Simon Morley wanted to know whether we might deem a team successful without knowing what the team was doing or producing-and whether its efforts were desired. Both Pete Houghton and Martin asked whether criteria for success included shipping on time. Martin also asked if quality was the only factor that makes an Agile team “successful”. “What if they hate each other and learn nothing from what they do?”

“Responsibility”, “Agile”, “Team”

What does “responsibility” mean? Áine McGovern held that programmers should take responsibility for testing at low levels before testers get their hands on the product; that everyone is are responsible for a product that works well. Yet responsibility might mean “blame”, as both Peter Houghton and Peter Walen observed. What does “agile” mean? As Martin joked, “Did we mean Agile or perhaps AGILE? ‘Cause being AGILE is so much better than just lowercase agile.” Does the Agile part even matter? Anand or Komal Ramdeo ( I don’t know which; they have a joint Twitter account) noted that “agile” could be removed from the sentence without loss of meaning, if we believe that teams are successful when every team member takes responsibility. Pete Houghton thought to question what we mean by “team”. Griffin even questioned the role of “in” in the sentence.

Words in Combination

Joining all these words into a sentence leads to a kind of combinatorial explosion of possible interpretations. Lynn noted that success might not equal quality, depending on the project or organizational goals. I agree—and the project and organizational goals might come into conflict from time to time too, as the organization might have many projects on the go, and they projects might compete for resources.

Simon asked what “taking responsibility for quality” might involve or exclude. Peter Walen asked if we could infer that in non-agile teams, no team members are responsible for quality—or if successful waterfall teams didn’t need to worry about quality. Tim Western and had another variation: unsuccessful agile development teams don’t take responsibility for quality? Michel Kraaij asked pointedly, “So if the quality sucks, no one takes responsibility?” Adam Brown had yet another variation: wouldn’t every team member take responsibility for quality even if the project was unsuccessful?

Our sentence might be subject to the graveyard fallacy, a central theme of The Black Swan. The successful often attribute their success to some factor or practice or approach; the unsuccessful who used the same factor, practice, or approach but who were less lucky don’t survive to tell of their experience with the factor—and people seem disinclined to listen to the unsuccessful.  What can you learn from losers, after all? Nassim Nicholas Taleb, in The Black Swan, retells a story from Marcus Tullis Cicero, a tale of skeptical inquiry. “One Diagoras, a nonbeliever in the gods, was shown painted tablets bearing the portraits of some worshipers who prayed to survive the subsequent shipwreck. The implication was that praying protects you from drowning. Diagoras asked, ‘Where were the pictures of those who prayed, then drowned?’ The drowned worshippers, being dead, would have a lot of trouble advertising their experiences from the bottom of the sea. This can fool the casual observer into believing in miracles.” (You can keep reading here.) That’s what I thought of when I read Peter Houghton’s reply: “What about the unsuccessful teams where everyone also takes responsibility?”

“Quality”

Griffin asked for the definition of quality in this context. Markus Deibel asked if the notion of quality was based on each team member’s definition of quality, or on a common set of quality rules. Good questions; as Anand (or Komal) Ramdeo put it, “‘quality’ might have different meaning to different people—so everyone is responsible for quality but the product can still suck.” Quite right, I would argue—with the additional observation that the product might suck to some people and not to others.

Adam Goucher suggested replacing “quality” in the troublesome sentence with “the product”, as customers likely care more about that then the “quality”. I think Adam’s suggestion is to focus on something more concrete (the product), and less abstract. A good idea, but I think it shifts the problem rather than confronting it. Whatever people have to say about the product, their evaluation is an expression of a quality judgement. Martin suggested that in a successful agile development team, everyone would know that quality is abstract—and that its meaning is therefore not to be taken for granted. I agree (yet everyone? would?), and I’d be specific about this: quality isn’t an intrinsic and objective attribute of something. Instead, it’s a relationship between something and some person.

More Probing Questions

Griffin asked “compared to what?” A little while ago I jokingly suggested to Markus Gärtner that he write a macro that would at a keystroke issue the question “compared to what?” Zeger Van Hese reported that I had triggered his macro successfully. “Compared to what? Successful to whom? Quality to whom? Every time? Really?”

Zeger and Martin both raised interesting questions about equality and responsibility. Does every member of the agile team define quality equally? Can all team members ever be equally responsible for quality? Is it even possible to take responsibility for quality in the team? (For my part, I’ll speak to this issue later.) Zeger even raised what I will now call The Animal Farm Take on Responsibility: “I can’t help but thinking ‘…but some take more responsibility than others,’” he wrote.

The explicit talk of equality and implicit reference to power reminded me of Jerry Weinberg’s observation that decisions about quality are always political, made by people who have the authority—the political power—to make them.

Deciding What and How to Observe

Griffin emphasized a question that we must ask in addition to “compared to what?” He wanted to know “according to whom”. That’s important because observations, comparisons, and measurements are never value-neutral; they always depend upon at least one person, and what and how each person observes. “What would I see/hear/feel when a ‘team member’ ‘takes responsibility’?” Simon adds, “What would a team member /not/ ‘taking responsibility for quality’ look like?” Anand (or Komal) asks, “Is there any scale to measure responsibility? How do you compare more or less responsible?”

No Problem

Some people found no problem at all with the sentence. Alan Cooper retweeted it, adding “True”. Bill Clark responded to that, saying, “If the team looks good, everyone on it looks good. A bunch of lone coyotes running here and there not so much.” Ron Jeffries said, “The sentence is perfect. By the way, what did you want it to do?” An important question, but I might have placed it before my evaluation of the sentence’s perfection.

All of these responses were interesting and valuable. The most gratifying one, though, came from Zeger Van Hese, who raced to his blog before I could get to mine. You can read his response here.

More to come!

Context-Free Questions for Testing

Wednesday, November 24th, 2010

In Jerry Weinberg and Don Gause’s Exploring Requirements, there’s a set of context-free questions to ask about a product or service. The authors call them context-free questions, but to me, many of them are more like context-revealing questions.

In the Rapid Software Testing class, the participants and the instructors make discoveries courtesy of our exercises and conversations. Here’s a list of questions that come up fairly consistently, or that we try to encourage people to ask. Whether you’re working with something new or re-evaluating your status, you might find these questions helpful to you as you probe the context of the test project, your givens, and your mission.

I leave it as an exercise for the reader to link these questions to specific points in the Heuristic Test Strategy Model and the Satisfice Context Model.


  • Is it okay if I ask you questions?
  • Who is my client?
  • Are you my only client?
  • Who is the customer of the product?
  • Who are the other stakeholders?
  • What is my mission?
  • What else might be part of my mission?
  • What problems are you aware of that would threaten the value of this product or service?
  • Do you want a quick, practical, or deep answer to the mission or question you have in mind?
  • How much time do I have?
  • How long before the next release or deployment?
  • How long before the end of this testing or development cycle?
  • When do you want reports or answers?

  • How do you want me to provide them? How often?
  • When were you thinking of shipping or deploying this product or service?
  • What else do you want me to deliver?
  • How do you want me to deliver it?
  • This thing I’m testing… could I have it myself, please?
  • Is there another one like it?
  • Are there more than that?
  • Is that all there are?
  • How is this one expected to be the same or different from the other ones?
  • Here’s what I believe I see in front of me. What else could it be?
  • Here’s what I’m thinking right now. What else might be true? What if the opposite were true?
  • Could you describe how it works?
  • Could you draw me a diagram of how it works?
  • How would I recognize a problem?
  • I think I’m seeing a problem. Why do I think it’s a problem? For whom might it be a problem?
  • What does this thing depend upon?
  • What tools or materials were used to construct it?
  • Who built this thing?
  • Can I talk to them?
  • Are they easy to talk to? Helpful?
  • Have they ever built anything like this before?
  • Is there anyone that I should actively avoid?
  • Who else knows something about this?
  • Who’s the best person to ask about this?
  • Who are the local experts in this field?
  • Who are the acknowledged experts, even if they don’t work here?
  • Has anyone else tested this?
  • Can I see their results, please?
  • Who else is on my test team?
  • What skills and competencies are expected of me?
  • What other skills and competencies can be found on the test team? Elsewhere?
  • What skills and competencies might we be lacking?
  • What information is available to me?
  • Is there more information available?
  • Where could I find more information? Is that the last source you can think of?
  • In what other forms could I find information?
  • Is that all the information there is? Is there more? Are there more rules? Requirements? Specifications?
  • If information is in some way wanting, what can I do to help you discover or develop the information you need?
  • What equipment and tools are available to help with my testing?
  • What tools would you like me to build? Expect me to build?
  • Is there some data that is being processed by this thing?
  • Can I have some of that data?
  • Can I have a description of the data’s structures?
  • What are your feelings about this thing?
  • Who might feel differently?
  • How might they feel?
  • What do customers say about it?
  • Can I talk to the technical support people?
  • (How do I feel about this thing?)
  • Who can we trust? Is there anyone that we should distrust?
  • Is there anything that you would like to prohibit me explicitly from doing?
  • Are there any other questions I should be asking you?

Test Framing

Wednesday, September 29th, 2010

A few months ago, James Bach introduced me to the idea of test framing. He identified it as a testing skill, and did some work in developing the concept by field-testing it with some of his online students. We’ve been refining it lately. I’ll be giving a brief talk on it at the Kitchener-Waterloo Software Quality Association on Thursday, September 30, 2010, and I’ll be leading a half-day workshop on it at EuroSTAR. Here’s our first public cut at a description.

The basic idea is this: in any given testing situation

  • You have a testing mission (a search for information, and your mission may change over time).
  • You have information about requirements (some of that information is explicit, some implicit; and it will likely change over time).
  • You have risks that inform the mission (and awareness of those risks will change over time).
  • You have ideas about what would provide value in the product, and what would threaten it (and you’ll refine those ideas as you go).
  • You have a context in which you’re working (and that context will change over time).
  • You have oracles that will allow you to recognize a problem (and you’ll discover other oracles as you go).
  • You have models of the product that you intend to cover (and you’ll extend those models as you go).
  • You have test techniques that you may apply (and choices about which ones you use, and how you apply them).
  • You have lab procedures that you follow (that you may wish to follow more strictly, or relax).
  • You configure, operate, and observe the product (using test techniques, as mentioned above), and you evaluate the product (by comparing it to the oracles mentioned above, in relation to the value of the product and threats to that value).
  • You have skills and heuristics that you may apply.
  • You have issues related to the cost versus the value of your activities that you must assess.
  • You have time (which may be severely limited) in which to perform your tests.
  • You have tests that you (may) perform (out of an infinite selection of possible tests that you could perform).

Test framing involves the capacity to follow and express a direct line of logic that connects the mission to the tests. Along the way, the line of logical reasoning will typically touch on elements between the top and the bottom of the list above. The goal of framing the test is to be able to answer questions like

  • Why are you running (did you run, will you run) this test (and not some other test)?
  • Why are you running that test now (did you run that test then, will you run that test later)?
  • Why are you testing (did you test, will you test) for this requirement, rather than that requirement?
  • How are you testing (did you test, well you test) for this requirement?
  • How does the configuration you used in your tests relate to the real-world configuration of the product?
  • How does your test result relate to your test design?
  • Was the mission related to risk? How does this test relate to that risk?
  • How does this test relate to other tests you might have chosen?
  • Are you qualified (were you qualified, can you become qualified) to test this?
  • Why do you think that is (was, would be) a problem?
  • The form of the framing is a line of propositions and logical connectives that relate the test to the mission. A proposition is a statement that expresses a concept that can be true or false. We could think of these as affirmative declarations or assumptions. Connectives are word or phrases (“and”, “not”, “if”, “therefore”, “and so”, “unless”, “because”, “since”, “on the other hand”, “but maybe”, and so forth) that link or relate propositions to each other, generating new propositions by inference. This is not a strictly formal system, but one that is heuristically and reasonably well structured. Here’s a fairly straightforward example:

    GIVEN: (The Mission:) Find problems that might threaten the value of the product, such as program misbehaviour or data loss.

    Proposition: There’s an input field here.
    Proposition: Upon the user pressing Enter, the input field sends data to a buffer.
    Proposition: Unconstrained input may overflow a buffer.
    Proposition: Buffers that overflow clobber data or program code.
    Proposition: Clobbered data can result in data loss.
    Proposition: Clobbered program code can result in observable misbehaviour.

    Connecting the propositions: IF this input field is unconstrained, AND IF it consequently overflows a buffer, THEREFORE there’s a risk of data loss OR program misbehaviour.

    Proposition: The larger the data set that is sent to this input field, the greater the chance of clobbering program code or data.

    Connection: THEREFORE, the larger the data set, the better chance of triggering an observable problem.

    Connection: IF I put an extremely long string into this field, I’ll be more likely to observe the problem.

    Conclusion: (Test:) THEREFORE I will try to paste an extremely long string in this input field AND look for signs of mischief such as garbage in records that I observed as intact before, or memory leaks, or crashes, or other odd behaviour.

    Now, to some, this might sound quite straightforward and, well, logical. However, in our experience, some testers have surprising difficulty with tracing the path from mission down to the test, or from the test back up to mission—or with expressing the line of reasoning immediately and cogently.

    Our approach, so far, is to give testers something to test and a mission. We might ask them to describe a test that they might choose to run; and to have them describe their reasoning. As an alternative, we might ask them why they chose to run a particular test, and to explain that choice in terms of tracing a logical path back to the mission.

    If you have an unframed test, try framing it. You should be able to do that for most of your tests, but if you can’t frame a given test right away, it might be okay. Why? Because as we test, we not only apply information; we also reveal it. Therefore, we think it’s usually a good idea to alternate between focusing and defocusing approaches. After you’ve been testing very systematically using well-framed tests, mix in some tests that you can’t immediately or completely justify. One of the possible justifications for an unframed test is that we’re always dealing with hidden frames. Revealing hidden or unknown frames is a motivation behind randomized high-volume automated tests, or stress tests, or galumphing, or any other test that might (but not certainly) reveal a startling result. The fact that you’re startled provides a reason, in retrospect, to have performed the test. So, you might justify unframed tests in terms of plausible outcomes or surprises, rather than known theories of error. You might encounter a “predicatable” problem, or one more surprising to you. In that case, better that you should say “Who knew?!” than a customer.

    To test is to tell two parallel stories: a story of the product, and the story of our testing. James and I believe that test framing is a key skill that helps us to compose, edit, narrate, and justify the story of our testing in a logical, coherent, and rapid way. Expect to hear more about test framing, and please join us (or, if you like, argue with us) as we develop the idea.

    See http://www.developsense.com/resources/TestFraming.pdf for current updates.

    Statistician or Journalist?

    Friday, August 27th, 2010

    Eric Jacobson has a problem, which he thoughtfully relates on his thoughtful blog in a post called “How Can I Tell Users What Testers Did?”. In this post, I’ll try to answer his question, so you might want to read his original post for context.

    I see something interesting here: Eric tells a clear story to relate to his readers some problem that he’s having with explaining his work to others who, by his account, don’t seem to understand it well. In that story, he mentions some numbers in passing. Yet the numbers that he presents are incidental to the story, not central to it. On the contrary, in fact: when he uses numbers, he’s using them as examples of how poorly numbers tell the kind of story he wants to tell. Yet he tells a fine story, don’t you think?

    In the Rapid Software Testing course, we present this idea (Note to Eric: we’ve added this since you took the class): To test is to compose, edit, narrate, and justify two parallel stories. You must tell a story about the product: how it works, how it fails, and how it might not work in ways that matter to your client (and in the context of a retrospective, you might like to talk about how the product was failing and is now working). But in order to give that story its warrant, you must tell another story: you must tell a story about your testing. In a case like Eric’s, that story would take the form of a summary report focused on two things: what you want to convey to your clients, and what they want to know from you (and, ideally, those two things should be in sync with each other).

    To do that, you might like to consider various structures to frame your story. Let’s start with the elements of what we (somewhat whimsically) call The Universal Test Procedure (you can find it in the course notes for the class). From a retrospective view, that would include

    • your model of the test space (that is, what was inside and outside the scope of your testing, and in particular the risks that you were trying to address)
    • the oracles that you used
    • the coverage that you obtained
    • the test techniques you applied
    • the ways in which you configured the product
    • the ways in which you operated the product
    • the ways in which you observed the product
    • the ways in which you evaluated the product; and
    • the heuristics by which you decided to stop testing
    • what you discovered and reported, and how you reported

    You might also consider the structures of exploratory testing. Even if your testing isn’t highly exploratory, a lot of the structures have parallels in scripted testing.

    Jon Bach says (and I agree) that testing is journalism, so look at the way journalists structure a story: they often start with the classic pyramid lead. They might also start with a compelling anecdote as recounted in What’s Your Story, by Craig Wortmann, or Made to Stick, by Chip and Dan Heath. If you’re in the room with your clients, you can use a whiteboard talk with diagrams, as in Dan Roam’s The Back of the Napkin. At the centre of your story, you could talk about risks that you addressed with your testing; problems that you found and that got addressed; problems that you found and that didn’t get addressed; things that slowed you down as you were testing; effort that you spent in each area; coverage that you obtained. You could provide testimonials from the programmers about the most important problems you found; the assistance that you provided to them to help prevent problems; your contributions to design meetings or bug triage sessions; obstacles that you surmounted; a set of charters that you performed, and the feature areas that they covered. Again, focus on what you want to convey to your clients, and what they want to know from you.

    Incidentally, the more often and the more coherently you tell your story, the less explaining you’ll have to do about the general stuff. That means keeping as close to your clients as you can, so that they can observe the story unfolding as it happens. But when you ask “What metric or easily understood information can my test team provide users, to show our contribution to the software we release?”, ask yourself this: “Am I a statistician or a journalist?”


    Other resources for telling testing stories:

    Thread-Based Test Management: Introducing Thread-Based Test Management, by James Bach; and A New Thread, by Jon Bach (as of this writing, this is brand new stuff)

    Telling Your Exploratory Story: A presentation at Agile 2010, by Jonathan Bach (I was unable to download anything other than a damaged version this, but maybe it’s working now; please let me know)

    Constructing the Quality Story (from Better Software, November 2009): Knowledge doesn’t just exist; we build it. Sometimes we disagree on what we’ve got, and sometimes we disagree on how to get it. Hard as it may be to imagine, the experimental approach itself was once controversial. What can we learn from the disputes of the past? How do we manage skepticism and trust and tell the testing story?

    On Metrics:

    Three Kinds of Measurement (And Two Ways to Use Them) (from Better Software, July 2009): How do we know what’s going on? We measure. Are software development and testing sciences, subject to the same kind of quantitative measurement that we use in physics? If not, what kinds of measurements should we use? How could we think more usefully about measurement to get maximum value with a minimum of fuss? One thing is for sure: we waste time and effort when we try to obtain six-decimal-place answers to whole-number questions. Unquantifiable doesn’t mean unmeasurable. We measure constantly WITHOUT resorting to numbers. Goldilocks did it.

    Issues About Metrics About Bugs (Better Software, May 2009): Managers often use metrics to help make decisions about the state of the product or the quality of the work done by the test group. Yet measurements derived from bug counts can be highly misleading because a “bug” isn’t a tangible, countable thing; it’s a label for some aspect of some relationship between some person and some product, and it’s influenced by when and how we count… and by who is doing the counting.

    On Coverage:

    Got You Covered (from Better Software, October 2008): Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

    Cover or Discover (from Better Software, November 2008): Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

    A Map By Any Other Name (from Better Software, December 2008): A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

    Questions from Listeners (2): Is Unit Testing Automated?

    Monday, June 28th, 2010

    On April 19, 2010, I was interviewed by Gil Broza.  In preparation for that interview, we solicited questions from the listeners, and I promised to answer them either in the interview or in my blog.  Here’s the second one.

    Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

    This question provides a great opportunity to look at a number of points—so many that I’d like to address only the first sentence in the question this time around. I’ll look at the second part of the question later on.

    Expansive Definitions

    I find the most helpful definitions and descriptions to be those that are expansive and inclusive. While testing, one big risk is that I might have narrow ideas about certain risks or threats to the value of the product. Thinking expansively helps me to avoid tunnel vision that would lead to my missing important problems. In conversations, thinking expansively helps me to remain alert to the possibility that the other person and I might be talking at cross-purposes. That can happen when one of us uses a word that means different things to each of us. It can also happen when we’re thinking of the same thing, but using different words. In fact, as Jerry Weinberg once remarked to James Bach, “A tester is someone who knows that things can be different.” Here’s an example of that. The questioner says that “unit testing is automated”. I’d argue that this refers to one part of testing, test execution, the part we can automate. Well, to me, things can be different.

    Testing Includes Many Activities

    Testing includes not only test execution, but also test design, learning, and reporting, all performed in cycles or loops. What is test design? As we say in the Rapid Software Testing course notes, test design includes

    • modeling the test space (that is, considering questions of what we could test; what’s in scope);
    • determining oracles (that is, figuring out the principles or mechanisms by which we’d recognize a problem, and considering how those principles or mechanisms might fail to help us recognize a problem)
    • determining coverage (that is, how much testing we’re going to do, given the scope)
    • determining procedures (how we’re going to perform the tests; how we’ll go about the business of test execution)

    Test execution includes

  • configuring the product (obtaining it, setting it up for the purposes of a given test)
  • operating the product (exercising the product in some way to obtain coverage)
  • observing the product (applying the oracles that we’ve determined in advance, but also recognizing behaviours that trigger us to recognize and apply new oracles)
  • evaluating the product (comparing its behaviour to our oracles)
  • applying a stopping heuristic (deciding when the test is done)
  • Test execution may or may not include reporting, but reporting happens at some point. And when testing is being done well, learning is happening pretty much all the time. This isn’t a strictly linear process, by the way. Depending on your approach to testing, and depending on what you’re these things may happen in the order that you see above, or they may happen all at once in an organic tangled ball, with lots of tight little loops. Sometimes all of the elements of testing are done by the same person, and the elements interact with each other very quickly. Sometimes one person designs a test and another person handles the execution, in which case the loops will be long or broken. If you separate test design and test execution (as happens in scripted testing), you separate the learning associated with each. Sometimes we’ll evaluate a result and stop a test; sometimes we’ll stop first and then interpret what we’ve seen. For a given test, some aspects may take much longer than others; some may be done more consciously or thoughtfully than others. But at some point in pretty much every test, each of the steps above happen.

    Unit Testing Includes Many Activities

    Like any other kind of testing, unit testing consists of cycles of design, execution, learning, and reporting. Like any other test, a unit test starts with some person having a test idea, a question that we want to ask about the program. A person designing a unit test typically frames that question in terms of a check—an observation linked to a decision rule such that both can be performed by a machine. The person writes program code to express that yes-or-no question, usually assisted by some kind of unit testing framework. Next, some person—or, more often, some process that a person has initiated—performs the checks. The check produces a result. Sometimes a person observes that result independently of other results; more often, some person (the author of the automation framework) has programmed a mechanism that provides a means of aggregating the results. Then some person interprets the aggregated results and figures out what needs to be done next—whether everything is okay, whether a test result suggests that the product should be revised, or whether the check is excellent or wanting or broken irrelevant. And then the development cycle continues, in a loop that includes some development of the actual product too.

    Most Parts of Unit Testing Are Sapient, Not Mechanical

    Notice how many times the word “person” appears in the above description of unit testing. None of the steps in the process (with the exception of the running of the checks) can be automated, since each step requires a thinking person, rather than a machine, to seek information, to make decisions, and to control the overall process. Parts of unit testing can be assisted by automation, but the automation isn’t doing anything particularly on its own; it remains an extension of the person’s ability to execute and to observe.

    What form might unit test automation take? Many people think in terms of a testing framework that sets up some conditions, executes some code from the product under test, makes some assertions about the output of some function or some aspect the state of the system. That’s cool, and quite powerful. But for years at Quarterdeck, I watched programmers doing unit testing (and did some myself) by stepping though code under various debuggers (DEBUG, SYMDEB, WDEB386, or Soft-ICE, a software-based simulacrum of an in-circuit emulator), watching the registers and the ports for each instruction. Sometimes I’m writing some stuff in Ruby, and I want to do a quick little test of a fairly trivial function that I know I’m going to throw away. In that case, I don’t bother with the testing framework; I run the code and inspect the variables in IRB, the Ruby interpreter, and get my information that way. Sometimes I write a function, and generate some data to test it using automation. Sometimes, while unit testing, I use tools to examine the contents of a database table or a file or the Windows registry. Are all these different things unit testing? Jerry Weinberg says that testing is “gathering information with the intention of informing a decision”. I’m testing a unit, and I’m using automation to assist that testing, even though (so it seems) people tend to hold a more narrow view of what unit testing is. Unit testing is testing done at the unit level.

    Is stepping through the code the way that we should always do unit testing? Of course not. For the purpose of creating easily-runnable change detectors, the unit test framework is the way to go. Yet different approaches, tools, and techniques that we employ allow us to observe in different ways, discover different problems, and learn different things about the unit under test.

    Finally, it’s important to note that the development of unit-level checks tends to reveal more problems than the running of them. Chip Groeder won a best paper award at the STAR conference in 1997, in which he claimed that 88% of the bugs that he found with automated tests were found during development of the tests (that is, the non-automated parts of the testing). (Thanks to Cem Kaner for pointing me to this.)  Anecdotally, everyone that I speak to who uses automation for the execution of tests—whether at the unit level or not—says exactly the same thing.  That’s not to say that automated checks are useless.  On the contrary; checks, as change detectors, are very useful.  Instead, my point is that unit testing is not automated; not the interesting parts. Unit checking is automated.

    In summary:

    • Unit testing is a highly exploratory process, in the that the loops are short, tightly integrated, and typically performed by the same person.
    • The most important parts of unit test are the sapient parts—the design, programming, design of reports, interpretation of results, and the evaluation of what to do next.
    • The scripted part of unit testing—the execution of the checks—is the least interesting part of unit testing. And yet…
    • Many people seem to be fascinated by the mechanical parts, dazzled by lines on the screen, blissful upon observation of the green bar. And the same people say things like “unit testing is automated”. Why is that?

    That’s a lot for now. I’ll answer the rest of the question in a future post.

    Transpection Transpected

    Tuesday, May 25th, 2010

    Part of the joy of producing this blog is in seeing what happens when other people pick up the ideas and run with them.  That happened when I posted a scenario on management mistakes a few weeks ago, and Markus Gärtner responded with far more energy and thought than I would have expected. Thanks, Markus.

    Last week I posted a transcript of a transpection session between me and James Bach.  The responses and the comments were very gratifying, but Oliver Vilson’s comment has sparked a discussion of its own. Oliver says,

    I would have to say it is not only possible to test the clock-in-the-box but actually necessary.

    I see it as an exercise when you have to test part of a system which you have no control over.

    For example I’ve had problems with integration to the third party systems that gave absolute nonsense errors about things nobody could think of at that time and it messed up the correct behavior of the primary system pretty badly. We could do nothing but to observe what happened. Almost no possible way to change input data by end user. It either happened or not. But it ended up as very useful experience about testing.

    I discussed your exercise with my colleague Rasmus and we found at least few ways to test it without giving it direct input itself

    1) Expectations – for example: What format does it show time? Is it understandable?
    2) End-values – turnover of seconds/minutes/hours where, for example, 59 -> 00
    3) Load testing – how much does it starts to lie in 10 seconds, 1 minute, 1 hour, 1 day, 1 month, 1 year etc compared to let’s say quantum clock or NIST-F1.
    4) What time zone time is it showing? Can be tricky because look at India’s time zone for example.
    5) How long does the battery last before it shuts down? or before it starts to “lie”? How rapidly does it start to lie when batteries are running lower?
    6) How are the digits shown? Are they visible via any other angle? Are they too small or too big?

    And few ways to have direct input without moving or touching the box itself
    1) Put powerful-enough magnet next to the box to see what happens.
    2) Set EMP-bomb off near the box to see what happens.

    With best regards
    Oliver V.

    I’ve had the pleasure of meeting Oliver Vilson a couple of times.  I find his thinking to be incisive and insightful, and he has provided me with a couple of excellent stories.  The first thing that Oliver has done here is to help with transfer:  the idea that our odd little thought experiment about the clock can be transferred to real-world contexts.  Oliver is right:  no matter what we test, much of the time we interact with things that are black boxes, closed to us.  Sometimes we have to take the operation of the black boxes on trust.  Other time we have to test them, and as we’re testing them, we’re nastily constrained by our inability to control or influence the factors in the experiment.  Identifying those factors, getting around those constraints (to the degree that we can), and figuring out what and how to observe are all central to testing skill.

    As I was reading, it also occurred to me that Oliver’s list of test ideas could provide a very nice example of the way to use the HICCUPPS(F) mnemonic for oracle heuristics and the CRUSSPICSTMPL mnemonic (!) for quality criteria in both a retrospective and in a generative way.

    Let’s recall:  HICCUPPS(F) is a mnenomic by which we remember consistency heuristics for oracles, the principles or mechanisms by which we might recognize a problem.  We perceive no problem when all of the following heuristics hold, and we suspect a problem when any one of the following heuristics is violated:

    History: The present version of the system is consistent with past versions of itself.
    Image: The system is consistent with an image that the organization wants to project.
    Comparable Products: The system is consistent with comparable systems.
    Claims: The system is consistent with what important people say it’s supposed to be.
    Users’ Expectations: The system is consistent with what users want.
    Product: Each element of the system is consistent with comparable elements in the same system.
    Purpose: The system is consistent with its purposes, both explicit and implicit.
    Statutes: The system is consistent with applicable laws.
    That’s the HICCUPPS part.  What’s with the (F)?  “F” stands for “Familiar problems”:
    Familiarity: The system is not consistent with the pattern of any familiar problem.

    That is, we suspect a problem in the item to be tested if we see some consistency with a problem that we’ve seen before.  We perceive “no problem” in the item to be tested when it doesn’t present a familiar problem to us while we’re testing.  I’ve written about an earlier version of this list of oracle heuristics here.

    The quality criteria for a product are those aspects of it that would tend to please favoured users—customers, or people who benefit from the efficient and accurate work of that customer.  Quality critieria can also be seen as things that would stymie disfavoured users—users that we don’t like, such as intruders, black hat hackers, snoops, denial-of-service enthusiasts, thieves, and so forth.

    In the Rapid Software Testing course, we talk about quality criteria in terms of a set of guideword heuristics—labels for groups of ideas that trigger deeper analysis.  Our quality criteria include:

    • Capability
    • Reliability
    • Usability
    • Security
    • Scalability
    • Performance
    • Installability
    • Compatibility
    • Supportability
    • Testability
    • Maintainability
    • Portability
    • Localizability

    These criteria are part of the Heuristic Test Strategy Model, first developed by James Bach.

    So let’s look at Oliver’s example in terms of the oracles that are being used and the quality criteria that are being questioned here. I’ll start by tagging each test idea with one or more oracle heuristics and one or more quality criteria.

    1) Expectations – for example: What format does it show time? Is it understandable?

    Oracles:  user expectations, (implicit) purpose.  Quality critieria:  Usability, localizability

    2) End-values – turnover of seconds/minutes/hours where, for example, 59 -> 00

    Oracles:  user expectations, relevant standards.  Quality critieria:  capability, reliability.

    3) Load testing – how much does it starts to lie in 10 seconds, 1 minute, 1 hour, 1 day, 1 month, 1 year etc compared to let’s say quantum clock or NIST-F1.

    Oracles:  History, comparable products; familiar problem (clocks gaining or losing time).  Quality criteria:  reliability, performance.

    4) What time zone time is it showing? Can be tricky because look at India’s time zone for example.

    Oracles:  User expectations; implicit purpose.  Quality criteria:  usability, localizability.

    5) How long does the battery last before it shuts down? or before it starts to “lie”? How rapidly does it start to lie when batteries are running lower?

    Oracles:  History, user expectations.  Quality criteria:  Reliability, performance

    6) How are the digits shown? Are they visible via any other angle? Are they too small or too big?

    Oracles:  User expectations, implicit purpose.  Quality criteria:  Usability, testability.

    Now, I’d like you to notice a few things.  First, the classifications that I’ve set here are my own.  They’re arbitrary.  You can agree with them or disagree.  That doesn’t matter so much.

    What matters more, I think, is the excerise in which we think about the relationships between the test ideas, the quality criteria, the oracles, and the risks.  For a product of any kind, there’s risk associated with the idea that a relevant quality criterion of some kind will not be fulfilled.    By using the oracle and quality criteria guidewords, we can become conscious of the chaing of logic or “framing” of the test, which in turn helps us to compose, edit, narrate, and justify the product story and the testing story.

    After we’ve applied oracle and quality-criteria tags to each of Oliver’s test ideas, we might start to notice some things. First, he has used a number of diverse heuristics by which he might recognize a problem.  In doing that, he has also identified tests that would address a number of quality criteria.  He did that quite spontaneously, without specifications or other documentation.  That is, as we’ve emphasized so often, it’s perfectly possible to test with incomplete or insufficient or inconsistent or ambiguous or out-of-date information, because

    When information is missing, testing is a great way to generate it.

    In providing a set of test ideas as he’s done, Oliver also brings to the surface a number of ideas and assumptions about the clock.  Whether those assumptions turn out to be right or wrong isn’t so important.  What’s far more important is getting started in observing similarities and differences between the assumptions and the reality.  The process of doing this is central to generating knowledge about the product.  This is very similar to Karl Weick’s observation, responding to a story in which a platoon of soldiers had a map that didn’t match the territory, but found their way home anyway:

    “This raises the intriguing possibility that when you’re lost, any old map will do … maybe when you are confused any old strategic plan will do. Strategic plans are a lot like maps. They animate and orient people. Once people begin to act, they generate tangible outcomes in some context, and this helps them discover what is occurring, what needs to be explained, and what should be done next. Managers keep forgetting that it is what they do, not what they plan, that explains their success. They keep giving credit to the wrong thing—namely, the plan—and having made this error, they then spend more time planning and less time acting. They are astonished when more planning improves nothing.“  (Karl Weick, Sensemaking in Organizations, p. 54-55)

    Oliver’s list (implicitly) includes test ideas that take advantage of the user expectations, comparable product, purpose, standards, and familiar problem heuristics.  We can see and justify what’s there by comparing it with the HICCUPPS(F) list, and noting that inconsistency with those items would point us to a problem.  “User expectations” seems to dominate the list of oracle heuristics.  One question we could ask is “how might we refine or expand the set of user expectations that we have?”  Another question is “are our ideas about oracles overloaded in the direction of user expectations?”

    We can use the HICCUPPS(F) list to see what’s there, but with the list we can also see what might be missing:  questions about history (is there another clock like this?  is this the first one that we’ve ever seen?); about image (who is our client here?  what are possible perceptions that the client might want to project?); claims (what do people say about this clock, anyway? how is it supposed to work?  is there any useful information, whether documented or not, on this?); product (can we learn anything about the product by observing parts of it that should be consistent with one another? does the product include any internal sanity checks?).

    Similarly, we can use the quality criteria list to help us generate ideas based on the things that might threaten the value of the product.  We can see some test ideas based on capability, reliability, usability, performance, and localizability.  What other factors might we choose to consider?  Which ones might be more important in our testing mission?  Less important?  Are there any that are crucial, or irrelevant?

    Are there security concerns related to the clock?  Why is it in this box?  What would happen if someone were to get inside?  Could the functioning of the clock be affected by heat, cold, light, acceleration, bombardment?  What are the boundaries between the clock, its containers, and other systems?   Scalability:  is this a prototype clock, or are there going to be many like it?  Could it be used for very short-term or long-term measurements of time?  What if large numbers of people need access to the information it provides?  Installability:  How did it get there?  Can it be updated?  How would we get rid of it?  Compatibility:  does the clock interface with anything else?  How?  Supportability:  What do we do if someone has a problem with the clock? Can we get at it then?  How?  And if we can get at it then, why not now?  Testability:  You say that there is no way to provide input to the clock.  Really?  Is there some other way that you might be interpreting “input”?  What interfaces might be available?  What reference material?  What oracles?  Does the clock produce any information other than its display?  Are there any markings on it?  Guides to its internals?  Maintainability:  Supposedly I’m testing this because you want to be able to identify problems with it.  Do you want to be able to fix those problems?  Who would be responsible for doing that?  Is there source code or are there architectural drawings for the program that runs the clock?  Portability:  does that program work on other clocks?  What information can we learn about this clock that might be transferrable to other clocks?

    As tools to help us see what’s there and see what’s missing, we can use the HICCUPPS(F) list to evaluate our oracles.  We can use the quality criteria list to evaluate our requirements coverage and make decisions about it.  At some point, we’ll also talk about product elements that point to coverage ideas.  We’ll also talk about the project environment that influences our context and our choices, both of which evolve over time.  But for now, that’s for later.  Thank you to Oliver for providing an excellent example on which, in this space, we could do a little something like transpection.