Blog Posts for the ‘Scenario Testing’ Category

The Honest Manual Writer Heuristic

Monday, May 30th, 2016

Want a quick idea for a burst of activity that will reveal both bugs and opportunities for further exploration? Play “Honest Manual Writer”.

Here’s how it works: imagine you’re the world’s most organized, most thorough, and—above all—most honest documentation writer. Your client has assigned you to write a user manual, including both reference and tutorial material, that describes the product or a particular feature of it. The catch is that, unlike other documentation writers, you won’t base your manual on what the product should do, but on what it does do.

You’re also highly skeptical. If other people have helpfully provided you with requirements documents, specifications, process diagrams or the like, you’re grateful for them, but you treat them as rumours to be mistrusted and challenged. Maybe someone has told you some things about the product. You treat those as rumours too. You know that even with the best of intentions, there’s a risk that even the most skillful people will make mistakes from time to time, so the product may not perform exactly as they have intended or declared. If you’ve got use cases in hand, you recognize that they were written by optimists. You know that in real life, there’s a risk that people will inadvertently blunder or actively misuse the product in ways that its designers and builders never imagined. You’ll definitely keep that possibility in mind as you do the research for the manual.

You’re skeptical about your own understanding of the product, too. You realize that when the product appears to be doing something appropriately, it might be fooling you, or it might be doing something inappropriate at the same time. To reduce the risk of being fooled, you model the product and look at it from lots of perspectives (for example, consider its structure, functions, data, interfaces, platform, operations, and its relationship to time; and business risk, and technical risk). You’re also humble enough to realize that you can be fooled in another way: even when you think you see a problem, the product might be working just fine.

Your diligence and your ethics require you to envision multiple kinds of users and to consider their needs and desires for the product (capability, reliability, usability, charisma, security, scalability, performance, installability, supportability…). Your tutorial will be based on plausible stories about how people would use the product in ways that bring value to them.

You aspire to provide a full accounting of how the product works, how it doesn’t work, and how it might not work—warts and all. To do that well, you’ll have to study the product carefully, exploring it and experimenting with it so that your description of it is as complete and as accurate as it can be.

There’s a risk that problems could happen, and if they do, you certainly don’t want either your client or the reader of your manual to be surprised. So you’ll develop a diversified set of ways to recognize problems that might cause loss, harm, annoyance, or diminished value. Armed with those, you’ll try out the product’s functions, using a wide variety of data. You’ll try to stress out the product, doing one thing after another, just like people do in real life. You’ll involve other people and apply lots of tools to assist you as you go.

For the next 90 minutes, your job is to prepare to write this manual (not to write it, but to do the research you would need to write it well) by interacting with the product or feature. To reduce the risk that you’ll lose track of something important, you’ll probably find it a good idea to map out the product, take notes, make sketches, and so forth. At the end of 90 minutes, check in with your client. Present your findings so far and discuss them. If you have reason to believe that there’s still work to be done, identify what it is, and describe it to your client. If you didn’t do as thorough a job as you could have done, report that forthrightly (remember, you’re super-honest). If anything that got in the way of your research or made it more difficult, highlight that; tell your client what you need or recommend. Then have a discussion with your client to agree on what you’ll do next.

Did you notice that I’ve just described testing without using the word “testing”?

As Expected

Tuesday, April 12th, 2016

This morning, I started a local backup. Moments later, I started an online backup. I was greeted with this dialog:

Looks a little sparse. Unhelpful. But there is that “More details” drop-down to click on. Let’s do that.

Ah. Well, that’s more information. But it’s confusing and unhelpful, but I suppose it holds the promise of something more helpful to come. I notice that there’s a URL, but that it’s not a clickable link. I notice that if the dialog means what it says, I should copy those error codes and be ready to paste them into the page that comes up. I can also infer that there’s not local help for these error codes. Well, let’s click on the Knowledge Base button.

Oh. The issue is that another backup is running, and starting a second one is not allowed.

As a tester, I wonder how this was tested.

Was an automated check programmed to start a backup, start a second backup, and then query to see if a dialog appeared with the words “Failed to run now: task not executed” in it? If so, the behaviour is as expected, and the check passed.

Was an automated check programmed to start a backup, start a second backup, and then check for any old dialog to appear? If so, the behaviour is as expected, and the check passed.

Was a test script given to a tester that included the instruction to start a backup, start a second backup, and then check for a dialog to appear, including the words “Failed to run now: task not executed”? Or any old dialog that hinted at something? If so, the behaviour is as expected, and the “manual” test passed.

Here’s what that first dialog could have said: “A backup is in progress. Please wait for that backup to complete before starting another.”

At this company, what is the basic premise for testing? When testing is designed, and when results are interpreted, is the focus on confirming that the product “works as expected”? If so, and if the expectations above are met, no bug will be noticed. To me, this illustrates the basic bankruptcy of testing to confirm expectations; to “make sure the tests all pass”; to show that the product “meets requirements”. “Meets requirements”, in practice, is typically taken to mean “is consistent with statements in a requirements document, however misbegotten those statements might be”.

Instead of confirmation, “pass or fail”, “meets the requirements (documents)” or “as expected”, let’s test from the perspective of two questions: “Is there a problem here?” and “Are we okay with this?” As we do so, let’s look at some of the observations that we might make were and questions we might ask. (Notice that I’m doing this without reference to a specification or requirements document.)

Upon starting a local backup and then attempting to start an online backup, I observe this dialog.

I am surprised by the dialog. My surprise is an oracle, a means by which I might recognize a problem. Why am I surprised? Is there a problem here?

I had a desire to create a local backup and an online backup at the same time. On a multi-tasking, multi-threaded operating system, that desire seems reasonable to me, and I’m surprised that it didn’t happen.

Inconsistency with reasonable user desire is an oracle principle, linked to quality criteria that might include capability, usability, performance, and charisma. The product apparently fails to fulfill quality criteria that, in my opinion, a reasonable user might have. Of course, as a tester, I don’t run the project. So I must ask the designer, or the developer, or the product manager: Are we okay with this?

This might be exactly the dialog that has been programmed to appear under this condition—whatever the condition is. I don’t know that condition, though, because the dialog doesn’t tell me anything specific about the problem that the software is having with fulfilling my desire. So I’m somewhat frustrated, and confused. Is there a problem here?

I can’t explain or even understand what’s going on, other than the fact that my desire has been thwarted. My oracle—pointing to a problem—is inconsistency with explainability, in addition to inconsistency with my desires. So I’m seeing a potential problem not only with the product’s behaviour, but also in the dialog. Are we okay with this?

Maybe more information will clear that up.

Still nothing more useful here. All I see is a bunch of error codes; no further explanation of why the product won’t do what I want. I remain frustrated, and even more confused than before. In fact, I’m getting annoyed. Is there a problem here?

One key purpose of a dialog is to provide a user with useful information, and the product seems inconsistent with that (the inconsistency-with-purpose oracle). Are these codes correct? Maybe these error codes are wildly wrong. If they are, that would be a problem too. If that’s the case, I don’t have a spec available, so that’s a problem I’m simply going to miss. Are we okay with that?

I have to accept that, as a human being, there are some problems I’m going to miss—although, if I were testing this in-house, there are things I could do to address the gaps in my knowledge and awareness. I could note the codes and ask the developer about them; or I could ask for a table of the available codes. (Oh… no one has collected a comprehensive listing of the error codes; they’re just scattered through the product’s source code. Are we okay with this?)

Back to the dialog. Maybe those error codes are precisely correct, but they’re not helping me. Are we okay with this?

All right, so there’s that Knowledge Base button. Let’s try it. When I click on the button, this appears:

Let’s look at this in detail. I observe the title: 32493: Acronis True Image: “Failed to run now: task not executed.” That’s consistent with the message that was in the dialog. I notice the dates; something like this has been appeared in the knowledgebase for a while. In that sense, it seems that the product is consistent with its history, but is that a desirable consistency? Is there a problem here?

The error codes being displayed on this Web page seem consistent with the error codes in the dialog, so if there’s a problem with that, I don’t see it. Then I notice the line that says “You cannot run two tasks simultaneously.” Reading down over a long list of products, and through the symptoms, I observe that the product is not intended to perform two tasks simultaneously. The workaround is to wait until the first task is done; then start the second one. In that sense, the product indeed “works as expected”. And yet…are we okay with this?

Once again, it seems to me that attempting to start a second task could be a reasonable user desire. The product doesn’t support that, but maybe we’re okay with that. Yet is there a problem here?

The product displays a terse, cryptic error message that confuses and annoys the user without fulfilling its apparent intended purpose to inform the user of something. The product sends the user to the Web (not even to a local Help file!) to find that the issue is an ordinary, easily anticipated limitation of the program. It does look kind of amateurish to deal with this situation in this convoluted way, instead of simply putting the relevant information in the initial dialog. Is there a problem here?

I believe that this behaviour is inconsistent with an image that the company might reasonably want to project. The behaviour is also inconsistent with the quality criteria we call usability and charisma. A usable product is one that behaves in a way that allows the user to accomplish a task (including dealing with the product’s limitations) quickly and smoothly. A charismatic product is one that does its thing in an elegant way; that engages the user instead of irritating the user; that doesn’t make the development group look silly; that doesn’t prompt a blog post from a customer highlighting the silliness.

So here’s my bug report. Note that I don’t mention expectations, but I do talk about desires, and I cite two oracles. The title is “Unhelpful dialog inconsistent with purpose.” The body would say “Upon attempting to start a second backup while one is in progress, a dialog appears saying ‘Failed to run now: task not executed.’ While technically correct, this message seems inconsistent with the purpose of informing the user that we can’t perform two backup tasks at once. The user is then sent to the (online) knowledge base to find this out. This also seems inconsistent with the product’s image of giving the user a seamless, reliable experience. Is all this desired behaviour?”

Finally: it could be that the testers discovered all of these problems, and laid them out for the the product’s designers, developers, and managers, just as I’ve done here. And maybe the reports were dismissed because the product works “as expected”. But “as expected” doesn’t mean “no problem”. If I can’t trust a backup product to post a simple, helpful dialog, can I really trust it to back up my data?

Very Short Blog Posts (28): Users vs. Use Cases

Thursday, May 7th, 2015

As a tester, you’ve probably seen use cases, and they’ve probably informed some of the choices you make about how to test your product or service. (Maybe you’ve based test cases on use cases. I don’t find test cases a very helpful way of framing testing work, but that’s a topic for another post—or for further reading; see page 31. But I digress.)

Have you ever noticed that people represented in use cases are kind of… unusual?

They’re very careful, methodical, and well trained, so they don’t seem to make mistakes, get confused, change their minds, or backtrack in the middle of a task. They never seem to be under pressure from the boss, so they don’t rush through a task. They’re never working on two things at once, and they’re never interrupted. Their phones don’t ring, they don’t check Twitter, and they don’t answer instant messages, so they don’t get distracted, forget important details, or do things out of order. They don’t run into problems in the middle of a task, so they don’t take novel or surprising approaches to get around the problems. They don’t get impatient or frustrated. In other words: they don’t behave like real people in real situations.

So, in addition to use cases, you might also want to imagine and consider misuse cases, abuse cases, obtuse cases, abstruse cases, diffuse cases, confuse cases, and loose cases; and then act on them, as real people would. You can do that—and helpful and powerful as they might be, your automated checks won’t.

Scenarios Ain’t Just Use Cases

Thursday, May 15th, 2014

How do people use a software product? Some development groups model use through use cases. Typically use cases are expressed in terms of the user performing a set of step-by-step behaviours: 1, then 2, then 3, then 4, then 5. In those groups, testers may create test cases that map directly onto the use cases. Sometimes, that gets called a scenario, and the testing of it is called a scenario test.

According to Cem Kaner, a scenario is a “hypothetical story, used to help a person think through a complex problem or system.” He also says that a scenario test has several characteristics: it is motivating, in that stakeholders would push to fix problems that the test revealed; credible, in that it not only could happen, but that things like it could probably happen; that it involves complexity in terms of use, environments, or data. (Read his paper on scenario testing here.)

Taking the steps directly from a use case and then calling it a scenario limits your view of what a scenario is, which in turn limits your testing. People do not do 1, 2, 3, 4, and 5 in real life. Instead, they

  • do 1
  • start 2
  • respond to one email, and delete a bunch of get-rich-quick offers
  • resume 2
  • take a phone call from the dog grooming studio; Fluffy will be ready at 4:30
  • realize they’ve lost track of what they were doing in 2
  • go back to 1
  • restart 2
  • look up some figures in Excel
  • place a pizza order for the lunchtime meeting
  • finish 2
  • go to 3
  • accidentally paste the pizza order into some field in 3
  • dismiss the error message, after a fruitless attempt to decipher what it means it
  • finish 3
  • forget to save their work; thank heaven for the auto-save feature
  • get called to an all-hands meeting
  • return to find that the machine has entered sleep mode
  • wake up the machine
  • dismiss a dialog saying that it’s now safe to remove the device, even though they didn’t want to remove the device; the message is due to an operating-system bug related to sleep mode
  • discuss rumours raised from the all-hands meeting on Instant Messaging
  • start 4
  • realize they got something wrong in step 2
  • go back through 3 to 2
  • go to lunch
  • wake up the damned machine again
  • dismiss the damned dialog again
  • correct 2
  • go forward to 3
  • accept the values that were left there from (auto-)saving the first time through (but which due to the changes in 2 are now invalid)
  • go into 4
  • get confused about an element of the user interface in 4
  • realize something looks wrong because of the inappropriately saved value from 3
  • go back to 3
  • fix the values and save the page
  • go to 4
  • look away from the computer, notice there’s a new plant in the corner, under the picture—when did that get there, anyway?
  • complete 4
  • start 5
  • get invited for coffee
  • come back
  • wake up the damned machine again
  • dismiss the damned dialog again
  • worry irrationally that they didn’t complete 4
  • open the app in a second window to confirm that they have in fact completed 4, inadvertently jostling 4’s state
  • restart 5
  • take a phone call in the middle of trying to do 5; “Fluffy appears to be sick and could you show up half an hour earlier?”
  • change their minds about something in 4
  • go back and fix it
  • get tapped on the shoulder by the boss
  • start 5
  • almost finish 5
  • forget to save their work
  • program crashes; thank heaven for the auto-save feature
  • find out that auto-save mode doesn’t actually save every time.

If you want to show that the system can work, by all means check the system by following the procedure that the use case prescribes. That’s something we call sympathetic testing, and it’s a reasonable place to start testing; to learn about the feature; to find how people might derive value from the feature; to begin building your models of the product, and how there might be problems in it.

But if you want to discover problems that matter to people before those people find them, test the system by introducing lots of variation, pauses, distractions, concurrent actions, and galumphing.

Related post: Why We Do Scenario Testing

Why We Do Scenario Testing

Saturday, May 1st, 2010

Last night I booked a hotel room using a Web-based discount travel service. The service’s particular shtick is that, in exchange for a heavy discount, you don’t get to know the name of the airline, hotel, or car company until you pay for the reservation. (Apparently the vendors are loath to admit that they’re offering these huge discounts—until they’ve received the cash; then they’re okay with the secret getting out.) When you’re booking a hotel, the service reveals the general location and the amenities. I made a choice that looked reasonable to me, and charged it to my credit card.

I had screwed up. When I got the confirmation, I noticed that I had booked for one night, when I should have booked for two. I wanted to extend my stay, but when I went back to the Web page, I couldn’t be sure that I was booking the same hotel. The names of the hotels are hidden, and I knew that the rates might change from night to night. One can obtain clues by looking at the amenities and the general location of the hotels, but I wanted to be sure. So instead of booking online, I called the travel service’s 1-800 number.

Jim answered the phone sympathetically. It turns out that not even the employees of the service can see the hotel name before a booking is made. However, this was a familiar problem to him, so it seemed, and he told me that he’d match the hotel by location and amenities, back out the first credit-card transaction for one night, and charge me for a new transaction of two nights. He managed to book the same hotel. So far so good.

I went to the hotel and checked in. The woman behind the counter asked for identification and a credit card for extras, and then she asked me, “How many keys will you be needing tonight, sir?” “Just one”, I said. She put a single key card into the electronic key programming machine, and handed the card to me.

I took the elevator to room 761, which had a comfortable bed and desk with a window behind it, including a nice view. I went up to my room, unpacked some of my things, and decided to go for a dip in the hot tub. When I came back upstairs, I changed into dry clothes, took out my laptop, plugged it in, and sat down at the desk.

The floor was shaking. I mean, it was really vibrating. Some big motor—an air-conditioning compressor? a water pump?—had turned the office chair into a massager. I stood up, and it seemed that half of the room, including the bed was shaking. I tried to do a little work, but the vibration was enormously distracting. I called down to the front desk.

Peter answered the phone sympathetically. “I’ll send someone right up to check it out,” he said. Fair enough, but this problem was unlikely to go away any time soon, and until it did, I wanted another room. “No worries,” said Peter. “I’ll start the process now, and send someone up to check out the problem. Then you can come downstairs to exchange your key.” (“Why not send the new key up with the person coming upstairs?” I thought, but I didn’t say anything.)

“I’ll need a few minutes to tidy up,” I said.

“Very well, sir,” said Peter.

I repacked my bags. A few minutes later, the phone rang, and Peter asked if I was ready for the staff member to arrive.  Yes.

After a short time, someone knocked on the door. He had a pair of new keys (two, not one), which he passed to me. He appeared skeptical about the problem at first, but I sat him down in the desk chair. “Oh, now I feel it,” he said. “Stand over here, next to the bed” I said. He got up, moved over, and felt the shaking. “Wow,” he said. We chatted for a few more moments, speculating on where the shaking was coming from. He left to investigate, and I decamped to my new room, 1021, on another floor on the other side of the building. So far so good.

This morning on my way to the shower, I noticed that a piece of paper had been slipped under the door. It was the checkout statement for my stay, noting my arrival and departure date and the various charges had been made to my credit card, including state sales tax, county tax, and a service fee for Internet use. I noticed that the checkout date and time was this morning, but I’m not supposed to be leaving until tomorrow morning. I called the front desk.

Zhong-li answered the phone sympathetically. I explained the situation, noting that I had booked through a travel service twice, once for one night and then later for two, and that the first booking should have been backed out (but maybe the service hadn’t done that), plus I had changed rooms the night before, so maybe it was an issue with the service but maybe it was an issue with the hotel’s own system too. Or maybe it was only the hotel.

“No problem,” Zhong-li said. “We can extend your stay for another night. But you’ll have to come downstairs at some point today so that we can re-author your room keys.”

So here’s the thing: how many variables can you see here? How many interconnected systems? How many different hardware platforms are involved? What protocols do they use to communicate?  To create, read, update, and delete? What are the overall transactions here?  What are the atomic elements of each one?

How does each transaction influence others?  How is each influenced by others?  What are the chances that everything is going to work right, and that I will neither under nor overpay?  What are the chances that the travel service will overpay (or underpay) the hotel for my stay, even if my credit card shows the appropriate entries and reversals?

It’s not even a terribly complicated story, but look at how many subtleties there are to the scenario. Have you ever seen a user story that has the richness and complexity of even this relatively simple little story? And yet, if we pay attention, aren’t there lots of stories like mine every day? Does my story, long as it is, include everything that we’d need to program or test the scenario? Does the card below include everything?

Index Card

Next question: if you want to create automated acceptance tests, do you want a scenario like this to be static, using record and playback to lock in on checking specific values in specific fields? Are we really going to get value from the story if we use the same data and the same outputs over and over again? This approach will not only be tricky to program, but it will tend to be very brittle, resistant to variation, and vulnerable to changes in the product or business rules. It will tend to miss details in the scenario that we would only learn about through human interaction and experience with the product. Experience depends on things being repeated in some senses yet varied in others.

Or would you prefer to have a flexible framework that allows you to explore and vary the scenario, designing and acting upon new test ideas, and observing the flow of each piece of data through each interconnected system? Might you be able to do this by exploiting testing tools that you’ve developed for the lower levels of the system and assembling them into progressively more powerful suites?

This second approach will likely be even harder to program, but inducing variation allows you to cover more conditions.  You might be able to take advantage of lower-level test APIs, probes, and data generators that you and the programmers have developed as you’ve gone along. This approach will tend to be far more powerful and robust to change, to learning, and to incorporating new and varied test ideas. Think well, and choose wisely.

In either case, unless you have people exploring and interacting with the product and the story directly, I guarantee you will miss important points in the story and you’ll miss important problems in the product.  Your tools, as helpful as they are, won’t ever pause and say, “What if…?” or “I wonder…” or “That’s funny…” You’ll need people to exercise skill, judgment, imagination, and interaction with the system, not in a linear set of prescribed steps but in a thoughtful, inventive, risk-focused, and variable set of interactions.

In either case, you’ll also have a choice as to how to account for what you’re doing.  It’s one scenario, but is it only one test?  Is it dozens of tests?  Thousands?  If you use the second framework and induce variation, what does that do for your test count?  Or would it be better to report your work in an entirely different way? How about reporting on risks and test ideas and test activities and coverage, rather than try to quantify a complex intellectual interaction by using meaningless, quantitatively invalid units like “test cases” or “test steps”?

It’s been a while since I’ve posted this, but it’s time to do it again. This passage comes from a book on programming and on testing, written by Herbert Leeds and Gerald M. Weinberg (Jerry wrote this passage, he says). It’s understandable that people haven’t got the point yet, since the book is relatively new: it came out only 49 years ago (in 1961).  The emphases are mine.

“One of the lessons to be learned … is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.