Archive for the ‘Exploratory Testing’ Category

EuroSTAR Trip Report, Part 2

Thursday, December 9th, 2010

In this post, I’ll highlight a few more of the people that I met at EuroSTAR 2010. Please note that because there were so many people that I’d like to mention, there’s still more to come in subsequent posts. Also, I’ve included tons of links to these people and their work. Please use those links!

Shmuel Gershon (@sgershon on Twitter) was in the Test Lab a lot, only one of the reasons that he won the Test Lab Rats’ informal yet prestigious “Most Enthusiastic Tester” award and T-shirt. Rapid Reporter, Shmuel’s tool for taking notes in exploratory testing sessions, was prominent too. Shmuel used a number of strategies for meeting other people at the conference; he announced a pizza-and-drinks session for old and new members of the Vanguard community (also known as the Rebel Alliance or, for this occasion, Danish Alliance), and he used a cute strategy for introducing himself. On a personal note, Shmuel also helped me enormously by agreeing on the spur of the moment to act as my interviewer for an upcoming EuroSTAR “Take 10″ video spot. After the conference, Shmuel and I spent a pleasant afternoon at Copenhagen’s Experimentarium, browsing the exhibits, chatting, discovering bugs in the displays, and exploring patterns of exploratory testing. I was also pleased to beat him in a virtual bicycle race. Fortunately for me, my bike was the one with the seat. And the doctors say I’m recovering well from the experience.

Teemu Vesela (@teemuvesela on Twitter) received the second award from the Lab Rats: “Most Evil Tester”. He established his reputation by asking for—and getting—the Lab’s server and router passwords from the Lab Rats. His claim was that he needed that information to see if he could perform exploit of the applications that were installed in the Test Lab. But maybe, just maybe, he was testing to see if he could obtain the trust of the network administrators, just as one would try to do in a real security penetration test. Teemu exuberantly investigated several potential vulnerabilities, found some cool bugs, and enthusiastically told concise little stories about weaknesses in system defenses. And now I’ve got someone new to talk to when I want to learn quickly about potential security risks.

Henrik Andersson (Twitter: henkeandresson) is a long-time student and advocate of the practice of exploratory approaches to testing, especially within the Swedish testing community. His success has been all the more remarkable considering that, for many years, he worked for an organization that advocates strongly scripted approaches. Henrik gave an excellent talk on his experience introducing exploratory testing extremely rapidly at a large corporation that was, in general, resistant to the idea. His focus was on the role of champions—passionate people who will support and sustain excellent work, philosophically much like those in the Vanguard. Henrik described his approach: little experiments followed by intensive debriefs; granting people the freedom and responsibility to design and evaluate their work; emphasizing the roles of discovery, learning, and feelings. Within the constraints, he was quite successful, but once again incomprehending middle management provided only tepid support.  Thanks to the ubiquitous Markus Gärtner (of whom more quite soon), here’s a detailed account of Henrik’s presentation.

Fredrik Rydberg—someone whom I didn’t know before and (alas!) did not meet in person—gave a superb experience report titled “Can Exploratory Testing Save Lives?” on using exploratory approaches in a regulated, medical context.  His conclusion was an emphatic Yes.  There’s a lot of nonsense in our craft that suggests that you can’t or shouldn’t do exploratory testing in a mission- or safety-critical environment. In fact, as Fredrik made clear, it’s exactly the opposite: if you want to reduce risk and save lives, you must take an exploratory approach to develop tests, to incorporate new information, to continuously re-evaluate your work, and to reveal previously unrecognized risks. Fredrik aptly pointed out that curiosity, patient communication, and networking skill are crucial to a successful exploratory approach; indeed, they’re important to collaborative work of any kind. I hope to meet Fredrik and chat with him more in the future. We need more stories from him, and more stories like his.

Carsten Feilberg (@carsten_f on Twitter) blew me away at the CAST 2008 conference, where he provided a mischievous foreign element during a simulation in Jerry Weinberg’s Tester’s Communication Clinic. His impishness appealed to me, but there’s far more to Carsten than that. At EuroSTAR 2010, he gave a fabulous talk on Session Based Test Management (SBTM). One of the biggest takeaways was the simplest, yet psychologically the most powerful: he took the subtle step of renaming the practice to “Managing Testing Based on Sessions” (MTBS), in order to emphasize to managers the significance of the management aspect. This allowed him to obtain rapid buy-in from skeptical managers at his organization. That simple trick reminded me of Thomas Huxley’s wonderful observation on Charles Darwin’s On the Origin of Species: “How stupid of us all not to have thought of that.” He also provided an elegant visual metaphor for the development process. He started by showing a picture of a cartoon elephant (“the requirements”)—smooth, uniform, clear lines.  Then, over a part of the cartoon elephant, he superimposed the kind of view we’d see after testing: a photograph of the same part of a real elephant—wrinkled, lumpy, hairy. It was a great image, and a great visual explanation. He gradually revealed the bits and pieces of the elephant—and noted that the real elephant had tusks, where the cartoon elephant had none. Exploring the actual product allows us to see things that we wouldn’t see otherwise.

Carsten’s experience report underscored the fact that SBTM/MTBS makes exploratory testing more legible—more readable, more understandable—for managers who might otherwise see it as undisciplined, unstructured, or incomprehensible. I’ve written a couple of blog posts on some approaches that might help clear things up here, here, and here particularly.  Yet if you want advice on how to persuade management to recognize and adopt exploratory testing in your organization, it would also be a really good idea to contact Carsten.  Alas, his presentation slides are not yet online, but Markus Gärtner’s report on Carsten’s talk is.

Ah, Markus Gärtner, another of those fellows who was everywhere, all the time, and he has the blog posts to prove it. In the Test Lab, he was a vigourous participant, asking questions, probing for ideas, and sharing insights. At the conference presentations, Markus was like an old-fashioned on-the-scene radio reporter, “blogcasting” live and typically posting his transcription of the presentation a few seconds into the question-and-answer period. He also gave a presentation on self-education for testers, which for the Vanguard means not only study, but actively practicing testing. Apropos of that, Markus was one of the founders of the European chapter of the Weekend Testing. And apropos of that

Weekend Testing was started in Bangalore, India, by Parimala Shankaraiah (@curioustester on Twitter), Manoj Nair (@manoj_mv), Sharath Byregowda (@sharathb on Twitter)), and Ajay Balamurugadas (@ajay184f on Twitter). The latter was at EuroSTAR to tell the story of how the movement began and how it has developed since then. Inspired by Pradeep Soundararajan (@testertested on Twitter), and soon assisted by Santosh Tuppad (@santhoshst on Twitter), the founders decided to take responsibility for their own education and training. On August 15, 2009, they began meeting online on Saturday afternoons to practice testing, to challenge each other, and to help each other develop skills. Sessions were structured as an hour of testing (typically in pairs) and an hour of group discussion and sharing afterwards. Side effects quickly followed: their reputations blossomed; several open source projects benefitted from their testing; and the larger community became engaged. Weekend Testing quickly sprouted chapters in Mumbai, Chennai, Europe, Australia/New Zealand, and (finally!) North America. Ambitious and eager testers have come out of the woodwork, and more senior colleagues have facilitated sessions. The great conversation of skilled testing goes on, and the Vanguard is growing! I’ll mention more of its people in the next post.

EuroSTAR Trip Report, Part 1

Tuesday, December 7th, 2010

Way way back in 2003, Bret Pettichord first published a paper on schools of software testing. The paper was controversial. Some people found it helpful to identify different schools of thought, for the purpose of understanding ways in which reasonable people might disagree reasonably.  Others found even the mention of disagreements within the field to be distasteful and divisive.  Some people identified with particular schools. Others, sometimes indignantly, refused to be pigeonholed. Yet it’s clear that in any field of endeavour, including testing, there are always communities of thought and practice. Sometimes those communities are isolated; sometimes there are trading zones between them.

No matter how one might label the communities, two broad categories were apparent to me at this year’s EuroSTAR conference. One group seems to focus on testing in terms of confirmation, verification, validation, quality assurance; getting the right answers to prescribed questions; checking. This group’s approach includes a strong focus on artifacts—requirement documents, detailed test plans, and scripted test cases. This group (let’s call it the Traditionalists) also seems to focus on processes and tools, on negotiated contracts, and on following plans—items on the right side of the Agile Manifesto. I don’t claim membership in the Agile School. Although though I greatly admire the principles in the Manifesto, for me, the first thing to look at is the project’s context, and to proceed accordingly. The Traditionalistas, as I see it, emphasize the Agile Manifesto’s “things on the right”. Probably they do so with the desire to dispel variability, subjectivity, and unpredictability from testing.  I try to be empathetic towards those who advocate the things on the right, since those aren’t unreasonable things to want; it’s just unreasonable, in my view, to believe they’re the more important things in the complex, messy, human, and constantly changing world of software development.

The other, significantly smaller—and, in general, younger—group that I observed at EuroSTAR sees testing as questioning, exploration, discovery, investigation, and learning—and quality assistance. Let’s call that group the Vanguard. The Vanguard realizes that getting the right answers is important, but asking the right questions is more important—and recognizing that today’s “right questions” today are probably different from yesterday’s “right questions” is more important still. In broad strokes, the Vanguard prefers

experience reports over “best practice” talks
conversation over lectures
hands-on exercises over PowerPoint presentations
tools for investigation over tools for confirmation
dialogue over monologue
sitting in a circle over classroom format
finding things out over hearing the answer

And, as in the Agile Manifesto, they recognize value in the things on the right, but they value the things on the left more.

The Vanguardistas are eager to participate in testing exercises, and to exchange testing skills by example and by dialog. The Vanguard raises some difficulties for traditional trainers and presenters, because the Vanguard tends to want to ask questions and challenge authority—and as a trainer and presenter, I think that’s great. Many of the Vanguardistas participate in or organize Weekend Testing sessions. Almost all of them are on Twitter. They want to revive and reinvent testing as a sophisticated art that requires vigourous critical thinking. They’re indefatigably curious and engaged, and they’re becoming recognized as leaders in their community and in the testing craft.

One hallmark of the Vanguard at EuroSTAR was that they gravitated towards doing testing in the Test Lab, once again run by James Lyndsay (@workroomprds on Twitter) and Bart Knaack (@Btknaack on Twitter) after their impressive success at EuroSTAR last year. This year, 180 people visited the Test Lab. Though probably a minority, that’s a significant percentage of the overall attendees, and is all the more remarkable because, for space reasons, the Test Lab was quite a distance away from most of the presentations. This year there were more applications to test, more sharply focused vendor presentations, specific guidance for those who needed it, and lots of pairing and sharing. For me, one of the more memorable events was a relatively impromptu exploratory testing management roundtable, facilitated by James, with more than 20 people attending—remarkable because the event wasn’t noted specifically as a scheduled part of the conference programme; it was set up in the Test Lab, advertised by word of mouth, and fundamentally collaborative. The roundtable was one of those things that put the confer back in conference.

Of many high points of the roundtable conversation, the big one for me was the group’s recognition that testers don’t need to be domain experts from the outset of a testing assignment. Instead, testers can partner with domain experts in review and hands-on testing sessions, and in that collaboration get some excellent testing work done immediately. An exploratory testing cycle—test design, test execution, test result interpretation, learning, debriefing—drives rapid and highly effective learning about the domain. As Rob Sabourin (more on Rob later) articulated it: “Here’s a beautiful charter for a test session: Sit with a customer/user and ask ‘What gets in the way of you doing your work?’”

James and Bart were assisted this year by the Test Lab apprentices, Henrik Emilsson (@henrikemilsson on Twitter) and Martin Jansson (@martin_jansson on Twitter). At EuroSTAR 2011, management of the Test Lab will pass to Henrik and Martin. It’s in good hands. Henrik and Martin are members of a blogging cabal called thoughts from the test eye, which has been producing incisive, thoughtful reflections on testing since February 2008. An outstanding example is a blog post announcing their own list of software quality characteristics, in which they build on one of the pillars of James Bach‘s Heuristic Test Strategy Model. But that’s just one example. Read the back issues and put the new ones in your feed reader.

Another member of the test eye collaborative is Rikard Edgren. Rikard was one of the conference chairs of EuroSTAR this year. He seems to have found a way to violate some fundamental law of physics by being everywhere at the same time; whenever I turned around, he was there with an expression on his face that reflected his keen observational skill and his sly humour. I’ve been lucky to have many interesting chats with him, not only this year but in years previous.

More on EuroSTAR 2010 tomorrow.

Project Estimation and Black Swans (Part 5): Test Estimation

Sunday, October 31st, 2010

In this series of blog posts, I’ve been talking about project estimation. But I’m a tester, and if you’re reading this blog, presumably you’re a tester too, or at least you’re interested in testing. So, all this has might have been interesting for project estimation in general, but what are the implications for test project estimation?

Let’s start with the tester’s approach: question the question.

Is there ever such a thing as a test project? Specifically, is there such a thing as a test project that happens outside of a development project?

“Test projects” are never completely independent of some other project. There’s always a client, and typically there are other stakeholders too. There’s always an information mission, whether general or specific. There’s always some development work that has been done, such that someone is seeking information about it. There’s always a tester, or some number of testers (let’s assume plural, even if it’s only one). There’s always some kind of time box, whether it’s the end of an agile iteration, a project milestone, a pre-set ship date, or a vague notion of when the project will end. Within that time box, there is at least one cycle of testing, and typically several of them. And there are risks that testing tries to address by seeking and providing information. From time to time, whether continuously or at the end of a cycle, testers report to the client on what they have discovered.

The project might be a product review for a periodical. The project might be a lawsuit, in which a legal team tries to show that a product doesn’t meet contracted requirements. The project might be an academic or industrial research program in which software plays a key role. More commonly, the project is some kind of software development, whether mass-market commercial software, an online service, or IT support inside a company. The project may entail customization of an existing product, or it may involve lots of new code development. But no matter what, testing isn’t the project in and of itself; testing is a part of a project, a part that informs the project. Testing doesn’t happen in isolation; it’s part of a system. Testing observes outputs and outcomes of the system of which it is a part, and feeds that information back into the system. And testing is only one of several feedback mechanisms available to the system.

Although testing may be arranged in cycles, it would be odd to think of testing as an activity that can be separated from the rest of its project, just as it would be odd to think of seeing as a separate phase of your day. People may say a lot of strange things, but you’ll rarely hear them say “I just need to get this work done, and then I’ll start seeing”; and you almost never get asked “When are you going to be done seeing?” Now, there might be part of your day when you need to pay a lot of attention to your eyes—when you’re driving a car, or cutting vegetables, or watching your child walk across a cluttered room. But, even when you’re focused (sorry) on seeing, the seeing part happens in the context of—and in the service of—some other activity.

Does it make sense to think in terms of a “testing phase”?

Many organizations (in particular, the non-agile ones) divide a project into two discrete parts: a “development phase” and a “testing phase”. My colleague James Bach notes an interesting fallacy there.

What happens during the “development phase”? The programmers are programming. Programming may include a host of activities such as research, design, experimentation, prototyping, coding, unit testing (and in TDD, a unit check is created just before the code to be be checked), integration testing, debugging, or refactoring. And what are the testers doing during the “development phase”? The testers are testing. More specifically, they may be engaged in review, planning, test design, toolsmithing, data generation, environment setup, or the running of relatively low-level integration tests, or even very high-level system tests. All of those activities can be wrapped up under the rubric of “testing”.

What happens during the “testing phase”? The programmers are still programming, and the testers are still testing. The primary thing that distinguishes the two phases, though, is the focus of the programming work: the programmers have generally stopped adding new features, but are instead fixing the problems that have been found so far. In the first phase, programmers focused on developing new features; in the second, programmers are focused on fixing. By that reckoning, James reckons, the “testing phase” should be called the fixing phase. It seems to me that if we took James’ suggestion seriously, it might change the nature of some of the questions are often asked in a development project. Replace the word “test” with the word “fix”: “How long are you going to need to fix this product?” “When is fixing going to be done?” “Can’t we just automate the fixing?” “Shouldn’t fixing get involved early in the project?” “Why was that feature broken when the customer got it? Didn’t you fix it?” And when we ask those questions, should we be asking the testers?

As James also points out, no one ever held up the release or deployment of a product because there was more testing to be done. Products are delayed because of a present concern that there might be more development work to be done. Testing can stop as soon as product owners believe that they have sufficient information to accept the risk of shipping. If that’s so, the question for the testers “When are you going to be done testing?” translates to in a question for the product owner: “When am I going to believe that I have sufficient technical information to inform a risk-based business decision?” At that point, the product owner should—appropriately—be skeptical about anyone else’s determination that they are “done” testing.

Now, for a program manager, the “when do I have sufficient information” question might sound hard to answer. It is hard to answer. When I was a program manager for a commercial software company, it was impossible for me to answer before the information had been marshalled. Look at the variables involved in answering the question well: technical information, technical risk, test coverage, the quality of our models, the quality of our oracles, business information, business risk, the notion of sufficiency, decisiveness… Most of those variables must be accumulated and weighed and decided in the head of a single person—and that person isn’t the tester. That person is the product owner. The evaluation of those variables and the decision to ship are all in play from one moment to the next. The final state of the contributing variables and the final decision on when to ship are in the future. Asking the tester “When are you going to be done testing?” is like asking the eyes, “When are you going to be done seeing?” Eyes will continue to scan the surroundings, providing information in parallel with the other senses, until the brain decides upon a course of action. In a similar way, testers continue to test, generating information in parallel with the other members of the project community, until the product owner decides to ship the product. Neither the tester alone nor the eyes alone can answer the “when are you going to be done” question usefully; they’re not in charge. Until it makes a decision, the brain (optionally) takes in more data which the eyes and the other sense organs, by default, continue to supply. Those of us who have ogled the dessert table, or who have gone out on disastrous dates, know the consequences of letting our eyes make decisions for us. Moreover, if there is a problem, it’s not likely the eyes that will make the problem go away.

Some people believe that they can estimate when testing will be done by breaking down testing into measurable units, like test cases or test steps. To me, that’s like proposing “vision cases” or “vision steps”, which leads to our next question:

Can we estimate the duration of a “testing project” by counting “test cases” or “test steps”?

Recently I attended a conference presentation in which the speaker presented a method for estimating when testing would be completed. Essentially, it was a formula: break testing down into test cases, break test cases down into test steps, observe and time some test steps, average them out (or something) to find out how long a test step takes, and then multiply that time by the number of test steps. Voila! an estimate.

Only one small problem: there was no validity to the basis of the calculation. What is a test step? Is it a physical action? The speaker seem to suggest that you can tell a tester has moved on to the next step when he performs another input action. Yet surely all input actions are not created equal. What counts as an input action? A mouse click? A mouse movement? The entry of some data into a field? Into a number of fields, followed by the press of an Enter key? Does the test step include an observation? Several observations? Evaluation? What happens when a human notices something odd and starts thinking? What happens when, in the middle of test execution, a tester recognizes a risk and decides to search for a related problem? What happens to the unit of measurement when a tester finds a problem, and begins to investigate and report it?

The speaker seemed to acknowledge the problem when she said that a step might take five seconds, or half a day. A margin of error of about 3000 to one per test step—the unit on which the estimate is based—would seem to jeopardize the validity of the estimate. Yet the margin of error, profound as it is, is orthogonal to a much bigger problem with this approach to estimation.

Excellent testing is not the monotonic or repetitive execution of scripted ideas. (That’s something that my community calls checking.) Instead, testing is an investigation of code, computers, people, value, risks, and the relationships between them. Investigation requires loops of exploration, experimentation, discovery, research, result interpretation, and learning. Variation and adaptation are essential to the process. Execution of a test often involves reflecting on what has just happened, backtracking over a set of steps, and then repeating or varying the steps while posing different questions or making observations. An investigation cannot follow a prescribed set of steps. Indeed, an investigation that follows a predetermined set of steps is not an investigation at all.

In an investigation, any question you ask may—starting with the first—may yield an answer that completely derails your preconceptions. In an investigation, assumptions need to be surfaced, attacked, and refined. In an investigation, the answer to the most recent question may be far more relevant to the mission than anything that has gone before. If we want to investigate well, we cannot assume that the most critical risk has already been identified. If we want to investigate well, we can’t do it by rote. (If there are rote questions, let’s put them into low-level automated checks. And let’s do it skillfully.)

If we can’t estimate by counting test cases, how can we estimate how much time we’ll need for testing?

There are plenty of activities that don’t yield to piecework models because they are inseparable from the project in which they happen. In another of James Bach’s analogies, no one estimates the looking-out-the-window phase of driving an automobile journey. You can estimate the length of the journey, but looking out the window happens continuously, until the travellers have reached the destination. Indeed, looking out the window informs the driver’s evaluation of whether journey is on track, and whether the destination has been reached. No one estimates the customer service phase of a hotel stay. You can estimate the length of the stay, but customer service (when it’s good) is available continuously until the visitor has left the hotel. For management purposes, customer service people (the front desk, the room cleaners) inform the observation that the visitor has left. No one estimates the “management phase” of a software development project. You can estimate how long development will take, but management (when it’s good) happens continuously until the product owner has decided to release the product. Observations and actions from managers (the development manager, the support manager, the documentation manager, and yes, the test manager) inform the product owner’s decision as to whether the product is ready to ship.

So it goes for testing. Test estimation becomes a problem only if one makes the mistake of treating testing as a separate activity or phase, rather than as an open-ended, ongoing investigation that continues throughout the project.

My manager says that I have to provide an estimate, so what do I do?

At the beginning of the project, we know very little relative to what we’ll know later. We can’t know everything we’ll need to know. We can’t know at the beginning of the project whether the product will meet its schedule without being visited by a Black Swan or a flock of Black Cygnets. So instead of thinking in terms of test estimation, try thinking in terms of strategy, logistics, negotiation, and refinement.

Our strategy is the set of ideas that guide our test design. Those ideas are informed by the project environment, or context; by the quality criteria that might be valued by users and other stakeholders; by the test coverage that we might wish to obtain; and by the test techniques that we might choose to apply. (See the Heuristic Test Strategy Model that we use in Rapid Testing as an example of a framework for developing a strategy.) Logistics is the set of ideas that guide our application of people, equipment, tools, and other resources to fulfill our strategy. Put strategy and logistics together and we’ve got a plan.

Since we’re working with—and, more importantly, for—a client, the client’s mission, schedule, and budget are central to choices on the elements of our strategy and logistics. Some of those choices may follow history or the current state of affairs. For example, many projects happen in shops that already have a roster of programmers and testers; many projects are extensions of an existing product or service. Sometimes project strategy ideas based on projections or guesswork or hopes; for example, the product owner already has some idea of when she wants to ship the product. So we use whatever information is available to create a preliminary test plan. Our client may like our plan—and she may not. Either way, in an effective relationship, neither party can dictate the terms of service. Instead, we negotiate. Many of our preconceptions (and the client’s) will be invalid and will change as the project evolves. But that’s okay; the project environment, excellent testing, and a continuous flow of reporting and interaction will immediately start helping to reveal unwarranted assumptions and new risk ideas. If we treat testing as something happens continuously with development, and if we view development in cycles that provide a kind of pulse for the project, we have opportunities to review and refine our plans.

So: instead of thinking about estimation of the “testing phase”, think about negotiation and refinement of your test strategy within the context of the overall project. That’s what happens anyway, isn’t it?

But my management loves estimates! Isn’t there something we can estimate?

Although it doesn’t make sense to estimate testing effort outside the context of the overall project, we can charter and estimate testing effort within a development cycle. The basic idea comes from Session Based Test Management, James and Jon Bach’s approach to plan, estimate, manage, and measure exploratory testing in circumstances that require high levels of accountability. The key factors are:

  • time-boxed sessions of uninterrupted testing, ranging from 45 minutes to two hours and fifteen minutes, with the goal of making a normal session 90 minutes or so;

  • test coverage areas—typically functions or features of the product to which we would like to dedicate some testing time;
  • activities such as research, review, test design, data generation, toolsmithing, research, or retesting, to which we might also like to dedicate testing time;
  • charters, in the form of a one- to three-sentence mission statement that guides the session to focus on specific coverage areas and/or activities;

  • debriefings, in which a tester and a test lead or manager discuss the outcome of a session;

  • reviewable results, in the form of a session sheet that provides structure for the debrief, and that can be scanned and parsed by a Perl script; and, optionally,

  • a screen-capture recording of the session when detailed retrospective investigation or analysis might be needed;

  • metrics whose purposes are to determine how much time is spent on test design and execution (activities that yield test coverage) vs. bug investigation and reporting, and setup (activities that interrupt the generation of test coverage).

The timebox provides a structure intended to make estimation and accounting for time fairly imprecise, but reasonably accurate. (What’s the difference? As I write, the time and date is 9:43:02.1872 in the morning, January 23, 1953. That’s a very precise reckoning of the time and date, but it’s completely inaccurate.)

Let’s also assume that a development cycle is two weeks, or ten working days—the length of a typical agile iteration. Let’s assume that we have four testers on the team, and that each tester can accomplish three sessions of work per day (meetings, e-mail, breaks, conversations, and other non-session activities take up the rest of the time).

ten days * four testers * three sessions = 120 sessions

Let’s assume further that sessions cannot be completely effective, in that test design and execution will be interrupted by setup and bug investigation. Suppose that we reckon 10% of the time spent on setup, and 25% of the time spent on investigating and reporting bugs. That’s 35% in total; for convenience, let’s call it 1/3 of the time.

120 sessions – 120 * 1/3 interruption time = 80 sessions

Thus in our two-week iteration we estimate that we have time for 80 focused, targeted effective idealized sessions of test coverage, embedded in 120 actual sessions of testing. Again, this is not a precise figure; it couldn’t possibly be. If our designers and programmers have done very well in a particular area, we won’t find lots of bugs and our effective coverage per session will go up. If setup is in some way lacking, we may find that interruptions account for more than one-third of the time, which means that our effective coverage will be reduced, or that we have to allocate more sessions to obtain the same coverage. So as soon as we start obtaining information about what actually went on in the sessions, we feed that information back into the estimation. I wrote extensively about that here.

On its own, the metrics on interruptions could be fascinating and actionable information for managers. But note that the metrics on their own are not conclusive. They can’t be. Instead, they inform questions. Why has there been more bug investigation than we expected? Are there more problems than we anticipated, or are testers spending too much time investigating before consulting with the programmers? Is setup taking longer than it should, such that customers will have setup problems too? Even if the setup problems will be experienced only in testing, are there ways to make setup more rapid so that we can spend more time on test coverage? The real value of any metrics is in the questions they raise, rather than in the answers they give.

There’s an alternative approach, for those who want to estimate the duration or staffing for a test cycle: set the desired amount of coverage, and apply the fixed variables and calculate for the free ones. Break the product down into areas, and assign some desired number of sessions to each based on risk, scope, complexity, or any combination of factors you choose. Based on prior experience or even on a guess, adjust for interruptions and effectiveness. If you know the number of testers, you can figure the amount of time required; if you want to set the amount of time, you can calculate for the number of testers required. This provides you with a quick estimate.

Which, of course, you should immediately distrust. What influence does tester experience and skill have on your estimate? On the eventual reality? If you’re thinking of adding testers, can you avoid banging into Brooks’ Law? Are your notions of risk static? Are they valid? And so forth. Estimation done well should provoke a large number of questions. Not to worry; actual testing will inform the answers to those questions.

Wait a second. We paid a lot of money for an expensive test management tool, and we sent all of our people to a one-week course on test estimation, and we now spend several weeks preparing our estimates. And since we started with all that, our estimates have come out really accurate.

If experience tells us anything, it should tell us that we should be suspicious of any person or process that claims to predict the future reliably. Such claims tend to be fulfilled via the Ludic Fallacy and the narrative bias, central pillars of the philosophy of The Black Swan. Since we already have an answer to the question “When are we going to be done?”, we have the opporutunity (and often the mandate) to turn an estimate into a self-fulfilling prophecy. Jerry Weinberg‘s Zeroth Law of Quality (“If you ignore quality, you can meet any other requirement“) is a special case of my own, more general Zeroth Law of Wish Fulfillment: “If you ignore some factors, you can achieve anything you like.” If your estimates always match reality, what assumptions and observations have you jettisoned in order to make reality fit the estimate? And if you’re spending weeks on estimation, might that time be better spent on testing?

Why Exploratory? Isn’t It All Just Testing?

Friday, September 24th, 2010

The post “Exploratory Testing and Review” continues to prompt comments whose responses, I think, are worthy of their own posts. Thank you to Parthi, who provides some thoughtful comments and questions.

I always wondered and in attempted to see the difference between the Exploratory testing that you are talking about and the testing that I am doing. Unlike the rest of the commenter’s, this post made this question all the more valid and haunting.

From what you have written, as long as there is a loop between the test design and execution, its exploratory testing? And the shorter the loop, exploratory nature goes up?

Yes, that’s right. A completely linear process would be entirely scripted, with no exploratory element to it. The existence of a loop suggests that the testing is to some degree exploratory. This suggests (to me, at least) a link to one of the points of Jerry Weinberg’s Perfect Software and Other Illusions About Testing. Testing, he suggests, is gathering information with the intention of informing a decision, and he also says that if you’re not going to use that information, you might as well not test. I’ll go a little further and suggest that if you “test” with no intention of using the information in any way, you might be doing something, but you’re not really testing.

As we’ve said before, some people seem to have interpreted the fact that there’s a distinction between exploratory testing and scripted testing as meaning that you can only be doing one or the other. That’s a misconception. It’s like saying that there are only two kinds of liquid water: hot or cold. Yet there are varying gradations of water: almost freezing, extremely cold, chilly, cool, room temperature, tepid, warm, hot, scalding, boiling. To stretch the metaphor, a test is it’s being done by a machine (that is, a check) is like ice. It’s frozen and it’s not going anywhere. An investigation of a piece of software done by a tester with no purpose other than to assuage his curiosity is like steam; it’s invisible and vaporous. But testing in most cases is to some extent scripted and to some extent exploratory. No matter how exploratory, a test is to some degree informed by a mission that typically came from someone else, at some point in the past; that is, the test is to some degree scripted. No matter how scripted, a test is to some degree informed by decisions and actions that come from the individual tester in the moment—otherwise the tester would freeze and stop working, just like a machine, as soon as he or she was unable to perform some step specified in the script. That is, all testing is to some degree exploratory.

In addition to the existence of loops, there other elements too. Very generally,

  • the extent to which the tester has freedom to make his or her own choices about which step to take next, which tests to perform, which tools to use, which oracles to apply, and which coverage to obtain (more freedom means more exploratory and less scripted; more control means less exploratory and more scripted);
  • the extent to which the tester is held responsible for the choices being made and the quality of his or her work. More responsibility on the tester means more exploratory and less scripted; more responsibility on some other agency means less exploratory and more scripted.
  • the extent to which all available information (including the most recent information) informs the design and execution of the next test. The broader the scope of the information that informs the test, the more exploratory; the narrower the scope of information that informs the test , the more scripted.
  • the extent to which the mission—the search for information—is open-ended and new information is welcomed. The more new information will be embraced, the more exploratory the mission; the more new information will be ignored or rejected, the less exploratory the mission.
  • again, very generally, the length of the loops that include designing, performing, and interpreting an activity and learning from it, and then feeding that information back into the next cycle of design, performance, interpretation, and learning. I’m not talking here so much about timing and sequences of actions so much as the cognitive engagement. Timing is a factor; that’s one reason one reason that we now favour “parallel” over “simultaneous”. But more importantly, the more difficult it is to unsnarl the tangle of your interactions and your ideas, the more exploratory a process you’re in. The more rapidly you are able to shift from one heavy focus (say on executing the test) to another heavy focus (pondering the implications of what you’ve just seen) to another (running a thought experiment in your head) to yet another (revising your design), very generally, the more exploratory the process. Another way to put it: the more organic the process, the more exploratory it is; the more linear the process, the more scripted it is.

Is this what you are saying? If yes, there is hardly any difference in what I do at my work and what you preach and this is true with most of my team (am talking about 600+ testers in my organization) and we simply call this Testing.

I’d smilingly suggest that you can “simply” call it whatever you like. The more important issue is whether you want to simply call it something, or whether you want to achieve a deeper understanding of it. The risk associated with “simply” calling it something is that you’ll end up doing it simply, and that may fail to serve your clients when they are producing and using very complex products and services and systems. Which is, these days, mostly what’s happening.

For example, is there really a difference between what I’m talking about and what are your 600+ testers doing? Can you describe what they’re doing? How would you describe it? How would you frame their actions in terms of risk, cost, value, skill, diversity, heuristics, oracles, coverage, procedures, context, quality criteria, product elements, recording, reporting? Is all that stuff “simply” testing? For any one of those elements of testing, where are your testers in control of their own process, and when are they being controlled? Are all 600+ at equivalent stages of development and experience? Are they all simply testing simply, or are some testing in more complex ways?

Watch out for the magic words “simply” or “just”. Those are magic words. They cast a spell, blinding and deafening people to complexity. Yet the blindness and deafness don’t make the complexity go away. Even though these words have all the weight of snowflakes, their cumulative effect is to cover up complexity like a heavy snowfall covers up a garden.

May be these posts should be titled “Testing” than “Exploratory Testing”?

There is already good number of groups/people taking advantage of the (confused state of the larger) testing community (like certification boards). Why to add fuel to this instead of simplifying things?

There’s a set of important answers to that, in my view.

  • Testing is a complex cognitive activity comprising many other complex cognitive activities. If we want to understand testing and learn how to do it well, we need to confront and embrace that complexity, instead of trying to simplify it away.
  • If we want our clients to understand the value, the costs, the extents, and the limitations of the services we can provide for them, we need to be able to explain what we’re doing, how we’re doing it, and why we’re doing it. That’s important so that both we and they can collaborate in making better informed choices about the information that we’re all seeking and the ways we go about obtaining that information.
  • One way to “simplify” matters is to pretend that testing is “simply” the preparation and then following of a script, or that exploratory testing is “simply” fooling around with the computer. If you’re upset at all about the certification boards that trivialize testing (as I am), it’s important to articulate and demonstrate the fact that testing is not at all a simple activity, or that comprehension of it can be assessed with any validity via a 40-question multiple choice test. Such a claim, in my opinion, is false, and charging money for such a test while making such a claim is, in my opinion, morally equivalent to theft. The whole scheme is founded in the premise that testing a tester is “simply” a matter of putting the tester through 40 checks. If we really wanted to evaluate and qualify a tester, we’d use an exploratory process: interviews, auditions, field testing, long sequence tests, compatibility tests, and so on. And we wouldn’t weed people out on the basis of them failing to take a bogus exam, any more than we’d reject a program for not being run against a set of automated checks that were irrelevant to what the program was actually supposed to do.
  • Just as software development is done in many contexts, so testing is done in many contexts. As we say in the Rapid Testing class, in excellent testing, your context informs your choices and vice versa. And in excellent testing, both your context and your choices evolve over time. I would argue that a heavily scripted process is more resistant to this evolution. That might be a good thing for certain purposes and certain contexts, and a not-at-all good thing for other purposes and other contexts.

Many people say, for example, that to test medical devices, you must do scripted testing. There is indeed much in medical device testing that must be checked. Problems of a certain class yield very nicely to scripted tests (checks), such that a scripted approach is warranted. The trouble comes with the implicit suggestion that if you must do scripted testing, you must not do exploratory testing. Yet if we agree that problems in a product don’t follow scripts; if we agree that there will be problems in requirements as well as in code; if we agree that we can’t recognize incompleteness or ambiguity in advance of encountering their consequences; if we agree that although we can address the unexpected we can’t eliminate it; and if we agree that people’s lives may be at stake: isn’t it the case that we must do exploratory testing in addition to any scripted testing that we might or might not do?

The answer is, to my mind, certainly Yes. So, to what extent, from moment to moment, are we emphasising one approach or the other? That’s not a question that we can answer by saying that we’re “just” testing.

Thanks again, Parthi, for prompting this post.

Can Exploratory Testing Be Automated?

Wednesday, September 22nd, 2010

In a comment on the previous post, Rahul asks,

One doubt which is lingering in my mind for quite sometime now, “Can exploratory testing be automated?”

There are (at least) two ways to interpret and answer that question. Let’s look first at answering the literal version of the question, by looking at Cem Kaner’s definition of exploratory testing:

Exploratory software testing is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.

If we take this defintion of exploratory testing, we see that it’s not a thing that a person does, so much as a way that a person does it. An exploratory approach emphasizes the individual tester, and his/her freedom and responsibility. The definition identifies design, interpretation, and learning as key elements of an exploratory approach. None of these are things that we associate with machines or automation, except in terms of automation as a medium in the McLuhan sense: an extension (or enablement, or enhancement, or acceleration, or intensification) of human capabilities. The machine to a great degree handles the execution part, but the work in getting the machine to do it is governed by exploratory—not scripted—work.

Which brings us to the second way of looking at the question: can an exploratory approach include automation? The answer there is absolutely Yes.

Some people might have a problem with the idea, because of a parsimonious view of what test automation is, or does. To some, test automation is “getting the machine to perform the test”. I call that checking. I prefer to think of test automation in terms of what we say in the Rapid Software Testing course: test automation is any use of tools to support testing.

If yes then up to what extent? While I do exploration (investigation) on a product, I do whatever comes to my mind by thinking in reverse direction as how this piece of functionality would break? I am not sure if my approach is correct but so far it’s been working for me.

That’s certainly one way of applying the idea. Note that when you think in a reverse direction, you’re not following a script. “Thinking backwards” isn’t an algorithm; it’s a heuristic approach that you apply and that you interact with. Yet there’s more to test automation than breaking. I like your use of “investigation”, which to me suggests that you can use automation in any way to assist learning something about the program.

I read somewhere on Shrini Kulkarni’s blog that automating exploratory testing is an oxymoron, is it so?

In the first sense of the question, Yes, it is an oxymoron. Machines can do checking, but they can’t do testing, because they’re missing the ability to evaluate. Here, I don’t mean “evaluation” in the sense of performing a calculation and setting a bit. I mean evaluation in the sense of making a determination about what people value; what they might choose or prefer.

In the second way of interpreting the question, automating exploratory testing is impossible—but using automation as part of an exploratory process is entirely possible. Moreover, it can be exceedingly powerful, about which more below.

I see a general perception among junior testers (even among ignorant seniors) that in exploratory testing, there are no scripts (read test cases) to follow but first version of the definition i.e. “simultaneous test design, test execution, and learning” talks about test design also, which I have been following by writing basic test cases, building my understanding and then observing the application’s behavior once it is done, I move back to update the test cases and this continues till stakeholders agree with state of the application.

Please guide if it is what you call exploratory testing or my understanding of exploratory testing needs modifications.

That is an exploratory process, isn’t it? Let’s use the rubric of Kaner’s defintion: it’s a style of working; it emphasizes your freedom and responsibility; it’s focused on optimizing the quality of your work; it treats design, execution, interpretation, and learning in a mutually supportive way; and it continues throughout the project. Yet it seems that the focus of what you’re trying to get to is a set of checks. Automation-assisted exploration can be very good for that, but it can be good for so much more besides.

So, modification? No, probably not much, so it seems. Expansion, maybe. Let me give you an example.

A while ago, I developed a program to be used in our testing classes. I developed that program test-first, creating some examples of input that it should accept and process, and input that it should reject. That was an exploratory process, in that I designed, executed, and interpreted unit checks, and I learned. It was also an automated process, to the degree that the execution of the checks and the aggregating and reporting of results was handled by the test framework. I used the result of each test, each set of checks, to inform both my design of the next check and the design of the program. So let me state this clearly:

Test-driven development is an exploratory process.

The running of the checks is not an exploratory process; that’s entirely scripted. But the design of the checks, the interpretation of the checks, the learning derived from the checks, the looping back into more design or coding of either program code or test code, or of interactive tests that don’t rely on automation so much: that’s all exploratory stuff.

The program that I wrote is a kind of puzzle that requires class participants to test and reverse-engineer what the program does. That’s an exploratory process; there aren’t scripted approaches to reverse engineering something, because the first unexpected piece of information derails the script. In workshopping this program with colleagues, one in particular—James Lyndsay—got curious about something that he saw. Curiosity can’t be automated. He decided to generate some test values to refine what he had discovered in earlier exploration. Sapient decisions can’t be automated. He used Excel, which is a powerful test automation tool, when you use it to support testing. He invented a couple of formulas. Invention can’t be automated. The formulas allowed Excel to generate a great big table. The actual generation of the data can be automated. He took that data from Excel, and used the Windows clipboard to throw the data against the input mechanism of the puzzle. Sending the output of one program to the input of another can be automated. The puzzle, as I wrote it, generates a log file automatically. Output logging can be automated. James noticed the logs without me telling him about them. Noticing can’t be automated. Since the program had just put out 256 lines of output, James scanned it with his eyes, looking for patterns in the output. Looking for specific patterns and noticing them can’t be automated unless and until you know what to look for.. BUT automation can help to reveal hitherto unnoticed patterns by changing the context of your observation. James decided that the output he was observing was very interesting. Deciding whether something is interesting can’t be automated. James could have filtered the output by grepping for other instance of that pattern. Searching for a pattern, using regular expressions, is something that can be automated. James instead decided that a visual scan was fast enough and valuable enough for the task at hand. Evaluation of cost and value, and making decisions about them, can’t be automated. He discovered the answer to the puzzle that I had expressed in the program… and he identified results that blew my mind—ways in which the program was interpreting data in a way that was entirely correct, but far beyond my model of what I thought the program did.

Learning can’t be automated. Yet there is no way that we would have learned this so quickly without automation. The automation didn’t do the exploration on its own; instead, it super-charged our exploration. There were no automated checks in the testing that we did, so no automation in the record-and-playback sense, no automation in the expected/predicted result sense. Since then, I’ve done much more investigation of that seemingly simple puzzle, in which I’ve fed back what I’ve learned into more testing, using variations on James’ technique to explore the input and output space a lot more. And I’ve discovered that the program is far more complex than I could have imagined.

So: is that automating exploratory testing? I don’t think so. Is that using automation to assist an exploratory process? Absolutely.

For a more thorough treatment of exploratory approaches to automation, see

Investment Modeling as an Exemplar of Exploratory Test Automation (Cem Kaner)

Boost Your Testing Superpowers (James Bach)

Man and Machine: Combining the Power of the Human Mind with Automation Tools (Jonathan Kohl)

“Agile Automation” an Oxymoron? Resolved and Testing as a Creative Endeavor (Karen Wysopal)

…and those are just a few.

Thank you, Rahul, for the question.

Exploratory Testing and Review

Wednesday, September 22nd, 2010

The following is a lightly-edited version of something that I wrote on the software-testing mailing list, based on a misapprehension that we who advocate exploratory testing suggest that review or other forms of testing should be dropped.

Exploratory testing was, for many years, described as “simultaneous test design, test execution, and learning”. In 2006, a few of us who have been practising and studying exploratory testing got together to exchange some of what we had learned over the years, and to see if we could work on refining the definition. I did a presentation that described some of my experience at those meetings. Cem Kaner wrote a synthesis of our ideas, and several of us who were there (and many who weren’t) have since explicitly agreed with it.

Exploratory software testing is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.

I use that definition when I want to be explicit. Much of the time, though, I keep things shorter. I still use something close to the older defintion, with one minor change.

There’s a problem with the word “simultaneous”. When we say “simultaneous”, people seem to think that means that everything is happening at the same time and to the same degree; that is, all three of design, execution, and learning are turned up to 10, all the time. Some people believe that exploratory testing is something that happens only as direct, hands-on interaction with a working product, only after code has been written and compiled and linked. But that’s only one of the times at which something can be explored. In fact, exploratory approaches can be applied to any idea or artifact, at any stage of devlopment of the product or service. That means you can be emphasizing test execution and learning, and relaxing emphasis on test design for a while; you can be designing a test and learning from that, while not executing the test immediately. You can be designing and executing in very short cycles, but learning less now than you might learn later. So, for that reason, we’ve started to say “parallel”, rather than “simultaneous”.

One of the things that I get from Cem’s synthesis is the notion of mutually supporting activities. The traditional, more linear approaches suggest that excellent test execution depends on excellent test design. It’s hard to disagree with that. But excellent test design—and improving test design— depends on feedback from test execution, too. In general, when the loops of design, execution, and learning are shorter, the feedback from one can inform the others more quickly. But that’s not to say that you can’t design a test and then wait to act on it, if that’s the most appropriate thing to do for the moment. However, when there are very long loops (or no loops), then you’re working in a scripted way, rather than an exploratory way. Shorter loops mean that testing is more exploratory.

In addition, something is more exploratory when an individual or a group of people (rather than a process or a script) is in charge. You can do test design and test planning in a less exploratory way by mixing it with a only little test execution. You can do test design and test planning in a more exploratory way by mixing it with a lot of test execution. (Even in a heavily scripted process, that exploratory activity happens a lot without people noticing it, so it seems.)

For example, review is a testing activity—questioning a product in order to evaluate it. There are scripted and exploratory forms of review. Consider code review. A completely scripted form of code review is a static analysis tool that looks for problems that it has been programmed to identify. A more exploratory form of code review is a bunch of people looking over a couple of pages of code, looking for specific problems that have been outlined on a checklist. A still more exploratory form of code review is a bunch of people looking for problems from a checklist, but also looking for any other problems that they might see. Perhaps the most exploratory form of code review is pair programming—people looking over code that is sort-of working, creating unit checks, revising the code, running the checks, and iterating right then and there.

Other forms of technical review can take the same arc. In the most scripted form, people receive (say) a functional design document, run it through a spelling and grammar checker, and sign off on it—and that’s the only review of the document that ever happens. In a less scripted form, people receive the design document and review it, comparing it to a list of specified requirements and quality criteria. In a more exploratory form, people look at examples or a prototype of various functions, and discuss what they’ve seen; at the end of the conversation, the designer takes the notes away and goes back to build a new prototype. In an extremely exploratory form of design, people sit around a projector and work on a Fitnesse page, raising ideas and concerns, discussing them, resolving them, and updating the examples and notes on the prototype in real time.

No one who talks seriously about exploratory testing, so far as I know, talks about getting rid of review. What we do talk about is getting rid of things that waste time and mental power by introducing interruptions, needless documentation, and processes or tools that over-mediate interaction between the tester and the product. Don’t get rid of documentation; get rid of excessive amounts of documentation, or unhelpful documentation. Don’t test thoughtlessly, and don’t get rid of thinking; get rid of overthinking or freezing in the headlights. Don’t get rid of test design; shorten the feedback loops between getting an idea and acting on an idea, and then feed what you’ve learned through action back into the design. Don’t control testers’ activities though a script; guide a tester with concise documentation&charters, checklists, coverage outlines, or risk lists—that help the tester to keep focused, but that allow them to defocus and investigate using their own mindsets and skill sets when it’s appropriate to do so.

Testing is investigation of a product. Investigation can be applied at any time, to any idea or artifact. That investigation is ongoing, and it comprises design, execution, and learning. From one moment to the next, one might take precedence over the others, but which one is at the fore can flip at any instant. What distinguishes the exploratory mindset from the scripted mindset is the degree to which the tester, rather than some other agency, has the freedom and responsibility to make decisions about what he or she will do next.

All Testing is (not) Confirmatory

Tuesday, August 24th, 2010

In a recent blog post, Rahul Verma suggests that all testing is confirmatory.

First, I applaud his writing of an exploratory essay. I also welcome and appreciate critique of the testing vs. checking idea. I don’t agree with his conclusions, but maybe in the long run we can work something out.

In mythology, there was a fellow called Procrustes, an ironmonger. He had a iron bed which he claimed fit anyone perfectly. He accomplished a perfect fit by violently lengthening or shortening the guest. I think that, to some degree, Rahul is putting the idea of confirmation into Procrustes’ bed.

He cites the cites the Oxford Online Dictionary definition of confirm: (verb) establish the truth or correctness of (something previously believed or suspected to be the case). (Rahul doesn’t cite the usage notes, which show some different senses of the word.)

When I describe a certain approach to testing as “confirmatory” in my discussion of testing vs. checking, I’m not trying to introduce another term. Instead, I’m using an ordinary English adjective to identify an approach or a mindset to testing. My emphasis is twofold: 1) not on the role of confirmation in test results, but rather on the role of confirmation in test design; and 2) on a key word in the definition Rahul cites, “previously“.

A confirmatory mindset would steer the tester towards designing a test based on a particular and  specific hypothesis. A tester working in a confirmatory way would be oriented towards saying, “Someone or something has told me that the product should do be able to do X. My test will demonstrate that it can do X.” Upon the execution of the (passing) test, the tester would say “See? The product can do X.” Such tests are aimed in the direction of showing that the product can work.

Someone working from an exploratory or investigative mindset would have a different, broader, more open-ended mission. “Someone or something has told me that the product does X. What are the extents and limitations of what we think of as X? What are the implications of doing X? What essential component of X might we have missed in our thinking about previous tests? What else happens when I ask the product to do X? Can I make the product do P, by asking it to do X in a slightly different way? What haven’t I noticed? What could I learn from the test that I’ve just executed?” Upon performing the test, the tester would report on whatever interesting information she might have discovered, which might include a pass or fail component, but might not. Exploratory tests are aimed at learning something about the product, how it can work, how it might work, and how it might not work; or if you like, on “if it will work”, rather than “that it can work”. To those who would reasonably object: yes, yes, no test ever shows that a product will work in all circumstances. But the focus here is on learning something novel, often focusing on robustness and adaptability. In this mindset, we’re typically seeking to find out how the program deals with whatever we throw at it, rather than on demonstrating that it can hit a pitch in the centre of the strike zone.

I believe that, in his post, Rahul is focused on the evaluation of the test, rather than on test design. That’s different from what I’m on about. He puts confirmation squarely into result interpretation, defining the confirmation step as “a decision (on) whether the test passed or failed or needs further investigation, based on observations made on the system as a result of the interaction. The observations are compared against the assumption(s).” I don’t think of that as confirmation (“establishing the truth or correctness of something previously believed or suspected to be the case”). I think of that as application of an oracle; as a comparison of the observed behaviour with a principle or mechanism that would allow us to recognize a problem. In the absence of any countervailing reason for it to be otherwise, we expect a product to be consistent with its history; with an image that someone wants to project; with comparable products; with specific claims; with reasonable user expectations; with the explicit or implicit purpose of the product; with itself in any set of observable aspects; and with relevant standards, statutes, regulations, or laws. (These heuristics, with an example of how they can be applied in an exploratory way, are listed as the HICCUPP heuristics here. It’s now “HICCUPPS”; we recognized the “Standards and Statutes” oracle after the article was written.)

At best, your starting hypothesis determines whether applying an oracle suggests confirmation. If your hypothesis is that the product works—that is, that the product behaves in a manner consistent with the oracle heuristics—then your approach might be described as confirmatory. Yet the confirmatory mindset has been identified in both general psychological literature and testing literature as highly problematic. Klayman and Ha point out in their 1987 paper Confirmation, Disconfirmation, and Information in Hypothesis Testing that “In rule discovery, the positive test strategy leads to the predominant use of positive hypothesis tests, in other words, a tendency to test cases you think will have the target property.” For software testing, this tendency (a form of confirmation bias) is dangerous because of the influence it has on your selection of tests. If you want to find problems, it’s important to take a disconfirmatory strategy—one that includes tests of conditions outside the space of the hypothesis that program works. “For example, when dealing with a major communicable disease (or software bugs —MB), it is more serious to allow a true case to go undiagnosed and untreated than it is to mistakenly treat someone.” Here, Klayman and Ha point out, if we want to prevent disease, the emphasis should be on tests that are outside of those that would exemplify a desired attribute (like good health). In the medical case, they say that would involve “examining people who test negative for the disease, to find any missed cases, because they reveal potential false negatives.” In testing, the object would be to run tests that challenge the idea that the test should pass. This is consistent with Myers’ analysis in The Art of Software Testing (which, interestingly, as it was written in 1979, predates Klayman and Ha’s paper).

As I see it, if we’re testing the product (rather than, say, demonstrating it), we’re not looking for confirmation of the idea that it works; we’re seeking to disconfirm the idea that it works. Or, as James Bach might put it, we’re in the illusion demolition business.

One other point: Rahul suggests “Testing should be considered complete for a given interaction only when the result of confirmation in terms of pass or fail is available.” To me, that’s checking. A test should reveal information, but it does not have to pass or fail. For example, I might test a competitive product to discover the features that it offers; such tests don’t have a pass or fail component to them. A tester might be asked to compare a current product with a past version to look for differences between the two. A tester might be asked to use a product and describe her experience with it, such that there’s an evaluation with explicit, atomic pass or fail criteria. “Pass and fail” are highly limiting in terms of our view of the product: I’m sure that the arrival of yet another damned security message on Windows Vista was deemed as a pass in the suite of automated checks that got run on the system every night. But in terms of my happiness with the product, it’s a grinding and repeated failure. I think Rahul’s notion that a test must pass or fail is confused with the idea that a test should involve the application of a stopping heuristic.  For a check, “pass or fail” is essential, since a check relies on the non-sapient application of a decision rule.  For a test, pass-vs.-fail might an example of the “mission accomplished” stopping heuristic, but there are plenty of other conditions that we might use to trigger the end of a test.

Since Rahul appears to be a performance tester, perhaps he’ll relate to this example (the framing of which I owe to the work of Cem Kaner). Imagine a system that has an explicit requirement to handle 100,000 transactions per minute. We have two performance testing questions that we’d like to address. One is the load testing question: “Can this system in fact handle 100,000 transactions per minute?” To me, that kind of question often gets addressed with a confirmatory mindset. The tester forms a hypothesis that the system does handle 100,000 transactions per minute; he sets up some automation to pump 100,000 transactions per minute through the system; and if the system stays up and exhibits no other problems, he asserts that the test passes.

The other performance question is a stress testing question: “In what circumstances will the system be unable to handle a given load, and fail?” For that we design a different kind of experiment. We have a hypothesis that the system will fail eventually as we ramp up the number of transactions. But we don’t know how many transactions will trigger the failure, nor do we know the part of the system in which the failure will occur, nor do we know way in which the failure will manifest itself.  We want to know those things, so have a different information objective here than for the load test, and we have a mission that can’t be handled by a check.

In the latter test, there is a confirmatory dimension if you’re willing to look hard enough for it. We “confirm” our hypothesis that, given heavy enough stress, the system will exhibit some problem. When we apply an oracle that exposes a failure like a crash, maybe one could say that we “confirm” that the the crash is a problem, or that behaviour we consider to be bad is bad. Even in the former test, we could flip the hypothesis, and suggest that we’re seeking to confirm the hypothesis that the program doesn’t support a load of 100,000 transactions per minute . If Rahul wants to do that, he’s welcome to do so. To me, though, labelling all that stuff as “confirmatory” testing reminds me of Procrustes.

Questions from Listeners (2): Is Unit Testing Automated?

Monday, June 28th, 2010

On April 19, 2010, I was interviewed by Gil Broza.  In preparation for that interview, we solicited questions from the listeners, and I promised to answer them either in the interview or in my blog.  Here’s the second one.

Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

This question provides a great opportunity to look at a number of points—so many that I’d like to address only the first sentence in the question this time around. I’ll look at the second part of the question later on.

Expansive Definitions

I find the most helpful definitions and descriptions to be those that are expansive and inclusive. While testing, one big risk is that I might have narrow ideas about certain risks or threats to the value of the product. Thinking expansively helps me to avoid tunnel vision that would lead to my missing important problems. In conversations, thinking expansively helps me to remain alert to the possibility that the other person and I might be talking at cross-purposes. That can happen when one of us uses a word that means different things to each of us. It can also happen when we’re thinking of the same thing, but using different words. In fact, as Jerry Weinberg once remarked to James Bach, “A tester is someone who knows that things can be different.” Here’s an example of that. The questioner says that “unit testing is automated”. I’d argue that this refers to one part of testing, test execution, the part we can automate. Well, to me, things can be different.

Testing Includes Many Activities

Testing includes not only test execution, but also test design, learning, and reporting, all performed in cycles or loops. What is test design? As we say in the Rapid Software Testing course notes, test design includes

  • modeling the test space (that is, considering questions of what we could test; what’s in scope);
  • determining oracles (that is, figuring out the principles or mechanisms by which we’d recognize a problem, and considering how those principles or mechanisms might fail to help us recognize a problem)
  • determining coverage (that is, how much testing we’re going to do, given the scope)
  • determining procedures (how we’re going to perform the tests; how we’ll go about the business of test execution)

Test execution includes

  • configuring the product (obtaining it, setting it up for the purposes of a given test)
  • operating the product (exercising the product in some way to obtain coverage)
  • observing the product (applying the oracles that we’ve determined in advance, but also recognizing behaviours that trigger us to recognize and apply new oracles)
  • evaluating the product (comparing its behaviour to our oracles)
  • applying a stopping heuristic (deciding when the test is done)
  • Test execution may or may not include reporting, but reporting happens at some point. And when testing is being done well, learning is happening pretty much all the time. This isn’t a strictly linear process, by the way. Depending on your approach to testing, and depending on what you’re these things may happen in the order that you see above, or they may happen all at once in an organic tangled ball, with lots of tight little loops. Sometimes all of the elements of testing are done by the same person, and the elements interact with each other very quickly. Sometimes one person designs a test and another person handles the execution, in which case the loops will be long or broken. If you separate test design and test execution (as happens in scripted testing), you separate the learning associated with each. Sometimes we’ll evaluate a result and stop a test; sometimes we’ll stop first and then interpret what we’ve seen. For a given test, some aspects may take much longer than others; some may be done more consciously or thoughtfully than others. But at some point in pretty much every test, each of the steps above happen.

    Unit Testing Includes Many Activities

    Like any other kind of testing, unit testing consists of cycles of design, execution, learning, and reporting. Like any other test, a unit test starts with some person having a test idea, a question that we want to ask about the program. A person designing a unit test typically frames that question in terms of a check—an observation linked to a decision rule such that both can be performed by a machine. The person writes program code to express that yes-or-no question, usually assisted by some kind of unit testing framework. Next, some person—or, more often, some process that a person has initiated—performs the checks. The check produces a result. Sometimes a person observes that result independently of other results; more often, some person (the author of the automation framework) has programmed a mechanism that provides a means of aggregating the results. Then some person interprets the aggregated results and figures out what needs to be done next—whether everything is okay, whether a test result suggests that the product should be revised, or whether the check is excellent or wanting or broken irrelevant. And then the development cycle continues, in a loop that includes some development of the actual product too.

    Most Parts of Unit Testing Are Sapient, Not Mechanical

    Notice how many times the word “person” appears in the above description of unit testing. None of the steps in the process (with the exception of the running of the checks) can be automated, since each step requires a thinking person, rather than a machine, to seek information, to make decisions, and to control the overall process. Parts of unit testing can be assisted by automation, but the automation isn’t doing anything particularly on its own; it remains an extension of the person’s ability to execute and to observe.

    What form might unit test automation take? Many people think in terms of a testing framework that sets up some conditions, executes some code from the product under test, makes some assertions about the output of some function or some aspect the state of the system. That’s cool, and quite powerful. But for years at Quarterdeck, I watched programmers doing unit testing (and did some myself) by stepping though code under various debuggers (DEBUG, SYMDEB, WDEB386, or Soft-ICE, a software-based simulacrum of an in-circuit emulator), watching the registers and the ports for each instruction. Sometimes I’m writing some stuff in Ruby, and I want to do a quick little test of a fairly trivial function that I know I’m going to throw away. In that case, I don’t bother with the testing framework; I run the code and inspect the variables in IRB, the Ruby interpreter, and get my information that way. Sometimes I write a function, and generate some data to test it using automation. Sometimes, while unit testing, I use tools to examine the contents of a database table or a file or the Windows registry. Are all these different things unit testing? Jerry Weinberg says that testing is “gathering information with the intention of informing a decision”. I’m testing a unit, and I’m using automation to assist that testing, even though (so it seems) people tend to hold a more narrow view of what unit testing is. Unit testing is testing done at the unit level.

    Is stepping through the code the way that we should always do unit testing? Of course not. For the purpose of creating easily-runnable change detectors, the unit test framework is the way to go. Yet different approaches, tools, and techniques that we employ allow us to observe in different ways, discover different problems, and learn different things about the unit under test.

    Finally, it’s important to note that the development of unit-level checks tends to reveal more problems than the running of them. Chip Groeder won a best paper award at the STAR conference in 1997, in which he claimed that 88% of the bugs that he found with automated tests were found during development of the tests (that is, the non-automated parts of the testing). (Thanks to Cem Kaner for pointing me to this.)  Anecdotally, everyone that I speak to who uses automation for the execution of tests—whether at the unit level or not—says exactly the same thing.  That’s not to say that automated checks are useless.  On the contrary; checks, as change detectors, are very useful.  Instead, my point is that unit testing is not automated; not the interesting parts. Unit checking is automated.

    In summary:

    • Unit testing is a highly exploratory process, in the that the loops are short, tightly integrated, and typically performed by the same person.
    • The most important parts of unit test are the sapient parts—the design, programming, design of reports, interpretation of results, and the evaluation of what to do next.
    • The scripted part of unit testing—the execution of the checks—is the least interesting part of unit testing. And yet…
    • Many people seem to be fascinated by the mechanical parts, dazzled by lines on the screen, blissful upon observation of the green bar. And the same people say things like “unit testing is automated”. Why is that?

    That’s a lot for now. I’ll answer the rest of the question in a future post.

    Coding QA Podcast on Exploratory Testing (Part 2)

    Sunday, April 11th, 2010

    Here’s Part 2 of my notes on the CodingQA podcast, in which Matthew Osborn and Federico Silva Armas chat with James Bach about the exploratory testing and session-based test management.

    Skills of Exploratory Testing

    • If you want to develop a list of testing skills, you might find it helpful to start with the Exploratory Skills and Dynamics sheet, by James Bach, Jon Bach, and Michael Bolton.
    • One of the core skills of excellent exploratory testing is self-awareness and self-management; managing your attention and your focus.
    • Another skill of excellent testing is using your emotions as a test tool. Frustration or confusion about something may affect the user in the same kind of way that it affects you. Treat your emotional reaction as a trigger; start looking into why you might be experiencing that emotion.

    Test Cases

    • Except in well-defined contexts, counting test cases means absolutely nothing. When someone gives you a number, telling you that they have N test cases, they’re telling you they have N files.
    • Test cases are like briefcases; if you want to know something about them, you have to look inside them to know what’s going on.
    • What people do when they test is different from what’s in the test case. The test is what the tester does (and thinks –MB), not what the test case says.
    • In an automated test, the problem is that there might not be an oracle to observe a bug that’s outside the scope of that test, and there’s no human to say, “That’s odd. I wonder what’s going on there.”
    • When managers ask you to count your test cases, they’re asking you to count checks, and that doesn’t get to the essence of what testing is.

    Managing Exploratory Testing

    • If you’re going to manage and train exploratory testers, it would probably be a good idea to break out a list of constituent skills of excellent testing (see above). Then ask, for each skill: How can we observe it? And how can we teach it?
    • If you’re going to manage any highly cognitive work, you’re going to have to participate in the work on a regular basis. If managers aren’t involved in the work themselves, they don’t really know what their people are doing; they know only the story that their people are telling them. The less management is involved in testing, the more likely they are to come up with weird astrology-like methods for managing testing.
    • How do you manage testing on a large scale? Like the military, develop a corps of sergeants. Test leads—skilled, trusted testers—are trained on the job, and train other testers and other test leads. This is also consistent with the way that doctors, pilots, scientists, and other cognitively skilled professionals are trained.
    • Good doctors, welders, and pilots are not trained by filling out templates.
    • The other component to managing exploratory testing is session-based test management: uninterrupted blocks of tester time.

    Session-Based Test Management

    • A block of uninterrupted time (a session, typically 90 minutes) is accounted in terms of three activities: test design and execution (T-time); bug investigation and reporting (B-time); and setup (S-time). You account from your time as test design and execution, unless it’s interrupted by B-time or S-time.
    • T-time has the highest precedence, then B-time, then S-time. If you’re doing more than one thing at once (e.g. investigating a bug while you’re setting up), report it as the higher-precendence activity.
    • Accounting for time in this way allows you to identify ways in which more testing can get done. (Here’s the first part of an example, and here’s the second part. Note that the metrics don’t tell you what’s good; they suggest where to look. The purpose of this kind of measurement is not to provide answers, but to provide more focused questions on what we should do next and how we should allocate our time. –MB)
    • In a session format, after you’ve gathered a bit of data, a picture emerges of how much you can get done. A middle-level test strategy language evolves: “That part of the product looks like it will take two recon sessions, three analysis sessions, ten deep coverage sessions, and two closure sessions.” (The CodingQA guys have developed their own language—discovery sessions and targeted sessions. By this kind of language, testing sessions turn into calendar time.
    • In exploratory testing, testing resources develop: test data, test platforms, test ideas in documents.
    • Test leads handle the debriefing of session. Debriefings and formal acceptance of the test notes are essential in developing discipline in note-taking and reporting.
    • The SBTM protocol was initially to debrief at the end of every session, but this wasn’t always possible. More recently, the protocol is to debrief in groups; everyone reporting to a given test lead meets at the end of the day for a half-hour debriefing session.
    • The overall philosophy should be to tinker with the process to get the bureaucracy out of the way of the thinking tester.
    • After two years of success with SBTM, at the place where it was developed, management stopped asking for metrics, but the team kept using the session concept.

    For more information on session-based test management, see http://www.satisfice.com/sbtm.

    Recording Exploratory Testing

    • Every novice’s first set of test notes looks like this: a single page that says, “I DID THE TESTING”. The second set of test notes contains every keystroke and every mouse click. Novices need training and instruction in how to create good notes that strike the right balance between detail and conciseness.
    • The fastest way, in the long run, to train novices is to provide them with pairing and personal supervision, showing them how to do the notes.
    • Testers are not often challenged—and therefore have not developed the skill—to give expert reports on what they actually did as they were testing. For a while, senior testers who are being trained in this kind of reporting can get annoyed and frustrated.
    • The more practice you get in note-taking, the more skill you develop, and the less it interferes with your thought process.
    • Watch the episode “Gallileo Was Right” from the HBO series “From the Earth to the Moon”, and look for the segment in which astronauts are trained to become geological observers and reporters. It’s all about rapid testing!
    • Federico reports an irony: that on his team, exploratory testing (so often mistaken for “testing without writing anything down”) became subject to the opposite problem: it became all about the test notes, which disrupted testing and learning.
    • Test notes in SBTM should be focused not on the literal steps of execution, but on questions, wondering, models, test ideas, coverage ideas, evidence of learning.
    • Correct the problem by sitting with testers and test with them.
    • In excellent exploratory testing, the testing is paramount. If the paperwork gets in the way, then the paperwork has to change until there’s a reasonable balance between paper and testing.
    • Combine the notes with recordings of the session—a videotape; a screen recording system; application and server log files; tools like Burp Proxy or Fiddler. Lean on automation and tools to keep records of the minutiae of what’s going on.
    • Beware of systems that have encoded or encrypted communication between server and client. The transactions might be logged, but might be hard to decode or decrypt.
    • You don’t look at the data unless you experience a problem; you normally file the video and not worry about it unless you need hard evidence or review.
    • With a screen recorder, it might be hard to make sense of what happened. Therefore: keep notes electronically on the same machine that you’re testing on. This causes the notes to pop up on the screen concurrent with the test, which provides insight into what the tester was thinking.
    • The notes should be just enough to figure out the highlights of what you were doing.
    • Timestamps help to link the test idea with what actually happened and when it happened. This allows you to move quickly, taking notes that are more easily reviewable.

    Exploratory Testing and Interviews

    Wednesday, April 7th, 2010

    I’m going to be interviewed on April 19, 2010 by Gil Broza, an expert Agile coach who is a colleague and friend here in Toronto.

    Gil’s request for an interview reminded me of an experience I had a few weeks ago. I received an email from a couple of researchers in Sweden who are studying exploratory testing.  I was honoured to be asked for my point of view on the subject. However, I was a little startled when the gentlemen provided me with a list of 27 questions about the problems of exploratory testing.  And 23 about the benefits of exploratory testing. And 16 about the problems of test-case-based testing. And 17 about the benefits of test-case-based testing.

    That seemed like a lot of questions to answer. To answer some of them sufficiently would have required a straight Yes or No.  To answer others well would have involved sprawling issues of ontology and epistemology.  Some questions asked for more than one answer (“Please list down at least five problems related to ET.”) Some questions were about experiences, and asked for stories. The only sure thing was a that thorough reply would have been hours of writing (and even though it’s about a subject I love, fairly tedious writing.) I was traveling, and figured I could only give them an hour, so I got them to phone me instead, one morning while I was in Trondheim, Norway.

    It was a great experience. We had a grand chat (we went a few minutes over the hour, we were having so much fun) and what’s more, it provided a wonderful set of metaphors for testing.

    • Excellent exploratory testing is like interviewing a program. Imagine that you work at a placement agency, linking candidate workers to your clients.  One of your key tasks is to qualify your candidates before you send them out for jobs or for interviews.  To make sure they’ll be ready for whatever your clients might throw at them, you test them through an interview of your own.  You can plan for that interview by all means, but what happens during the interview is to some degree unpredictable, because for each question, the answer that you get informs decisions about the next question. One way to test (a great way, and an important way, I believe) is to treat your program like a prospective employee for your customers.  You’re not merely going to test that the candidate can answer some questions correctly (that is, is the candidate capable?); you’re going to look at the whole package.  Does the candidate deal appropriately with surprising or malformed situations (that is, is the candidate reliable)?  When he gets stuck, does the candidate know how to ask for help politely and attempt to move forward, or does he just sit there stupidly (in the software world, we’d ask questions about user-friendliness, usability, and error messages)?  Can the candidate deal with being stressed out or overwhelmed, or does the candidate just collapse in a heap (performance?  scalability?  reliability?)?

      Scripted testing is like sending someone a list of 83 written questions and expecting 83 written answers. The answer to one question will not inform the next question, unless you design a mechanism for feedback and course correction, such as only submitting a few questions or answers at a time.  Notice also that like “83 test cases”, the quantity “83 questions” doesn’t really mean very much to you.  Until you’ve seen the questions, you can’t really know anything about them.  You can only know what I’ve told you about them.  Were they good questions?  Bad?  Multiple choice?  Worthy of a quick, a deep, or a practical answer?  Did they reflect what the personnel agency’s clients really wanted to know about the candidate?  What they really needed to know?

      Exploratory testing takes into account the possibility that you might get a different answer to your question from one day to the next, and that that might result in you asking new and different questions. Thus exploratory testing focuses on adaptability. Scripted testing emphasizes the answers that you want to hear from a program, the same way every time you ask, without varation. Therefore, scripted testing focuses on invariance. Note that invariance can be checked, but adaptability must be tested. It makes sense to delegate the most extreme kind of scripted testing—checking—to a machine.  Humans are, as Jerry Weinberg put it, at best variable when it comes to showing that the machine can do the same thing with different numbers.  Automation is great at that stuff.  If humans are indeed unpredicatable and variable, it makes sense to treat those tendencies as assets when it comes to testing, and exploit the natural human capacity for variation and adaptation to test the system’s capacity to adapt.

    • When I received the questions, I felt overwhelmed and, frankly, irritated.  During the interview I felt comfortable and relaxed. I noted that the dialogue, the conversation, felt very natural and immediate. The list of questions, although thoughtfully conceived, had felt stilted and disjointed.  Rather than interacting with a static piece of paper, I could hear human voices at the other end of the line. The different feeling offered by the two modes of communication is natural. We enter the world as listeners and speakers, not as readers and writers. Our minds and our sensory systems are biased by our heritage to prefer immediacy, dealing with people as directly as possible. Computers and documents are media, technologies. They extend our capabilities and our senses in some ways and diminish them in others. A paper document turns a conversation from a set of loops into a linear sequence. A computer allows us to converse in real time over great distances, sending video, audio, text, and documents, but it still lacks the immediacy of actual presence and the rich sensory environment in which people evolved.
    • Since we only had an hour, the researchers and I realized that we had to begin by focusing on the questions that were most important to them. Except we started the interview with different ideas about what might be important. Thus we discovered what was really important along the way. At the end of the session, we agreed that we had covered almost all of the initial questions anyway.  It turned out that a little over an hour of conversation was enough to give them plenty of material to consider. That’s like testing too. We often find that with the right approach, discovering what we want to know might take a lot less time and a lot less effort than we think. Rapid feedback loops—like those available in an exploratory approach—can help us to work out quickly what is important to test and what might not be so important.  (That’s why, when a test team is under time pressure, there’s a powerful and natural desire to explore rather than to stick with an overly scripted process.  The trick is to have the skill to do the exploratory testing well, in expert and skillful ways.)

    So, interviews are like exploratory testing.  They’re fun, and that’s why I’m looking forward to Monday, April 19th, 2010 when Gil Broza will be interviewing me on the subject, “Is There a Problem Here?”.  Join us by signing up here.