Blog Posts for the ‘Uncategorized’ Category

A Naïve Request from Management

Wednesday, October 21st, 2020

A tester recently asked “If you’re asked to write a ‘test plan’ for a new feature before development starts, what type of thing do you produce?”

I answered that I would produce a reply: “I’d be happy to do that. What would you like to see in this test plan?”

The manager’s reply was, apparently, “test cases covering all edge cases we’ll need to test”.

That’s a pretty naïve request. Here’s my answer:

“Making sure the product handles edge cases properly is definitely an important task. If I were to take your request literally—test cases covering all edge cases we’d need to test—it could take a lot of time for me to prepare, and a long time for you to review and figure out all the things I might have left out.

“And there’s another issue: I don’t know in advance what all the edge cases are, or even what they might be—and neither do the developers, and neither do you. No one does. But that’s okay! We can start right now by learning about possible edge cases through testing. We can’t perform testing on a running product yet, obviously, but we can perform some thought experiments and test people’s ideas about the product.

“So how about I give you a short summary—a list or a mind map—of some of the broad risk areas we can start considering right away? We can share the list with the developers to help them anticipate problems, defend against them, and check their work. That will greatly reduce the need to test edge cases later, when the product has been built and the problems are harder to find.

“We can add to that risk list as we develop the product—and we can take things off it as we address those risks. That will help focus the testing work. When we start working with builds of the product, I’ll explore it with an eye to finding edge cases that we didn’t anticipate. And I’ll keep the quick summaries coming whenever you like. You can review those and give me feedback, so that we’re both on top of things all the way along.”

The software business, alas, still runs on folklore and mythodology about testing. Too few managers understand testing. Many managers—and alas, many testers—don’t realize that testing isn’t about test cases, but are nonetheless addicted to test cases. When we provide responsible answers to naïve questions, we can help to address that problem.

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.

Regression Testing and Discipline

Friday, October 9th, 2020

Another tester on an “Agile” team complains of being overwhelmed by the volume of regression testing he says he must do at the end of each sprint.

Why are some development organizations fixated on regression testing? Not why do they do it (that can be quite reasonable), but why are they fixated on it? I have a theory.

It goes without saying that every change to the product or system holds the risk of problems that could cause quality to backslide in some sense. That’s regression, slipping backwards to some presumably less advanced state. Regress is the opposite of progress.

With change, there’s a risk of regression, so it seems sensible to focus some testing on that risk. But is testing a sure-fire, reliable way to deal with the risk of regression?

Sure-fire? No. Testing can certainly help to find bugs, so that bugs can be recognized and dealt with. But no matter how thorough testing is, or how early it starts, testing can miss bugs too. So let’s remember that the easiest bug to deal with is the one that is never hatched in the first place; the next easiest is the one that gets squashed before it can bury itself in a mass of code.

No matter how skillful or powerful the testing, to some degree, finding a bug remains a matter of luck. In the face of regression risk, we’d prefer not to leave things at that; better to start with fewer bugs to reduce our dependence on luck. Thus, it would seem like a good idea for the people making the changes to avoid bugs by working in a careful and disciplined way.

Discipline, says Chambers, is “1. training designed to engender self-control and an ordered way of life; 2. The state of self-control achieved by such training.” The idea of self-control suggests the idea of agency, which is essential to exploratory work, which is in turn essential to engineering work.

Depending on the product, the project, and the preferences of the individual programmer and the programming team, what might we see and hear as they did disciplined work? Try pausing for a moment to remember the scene when you noticed people doing work you considered “disciplined”.

How’s your list? Here are a few things I’ve seen and heard from time to time in work I’d call “disciplined”:

  • When a change or a new feature was on the table, groups of people reviewed and discussed ideas to understand the change and the motivation for it. Talk was focused on making the system better, and on the problems that the changes were intended to solve. But that focus softened and sharpened, zoomed in and zoomed out, and moved around to help people see everything they could see—including problems. People often disagreed, but they were willing to try little experiments to sort out the disagreements.
  • I’ve seen people consulting with colleagues and with users to get a variety of ideas about design, implementation, and risk. Conversations happen at desks and in conference rooms, but also outside the office, in restaurants, eating, drinking, joking, walking, playing games, shopping… Discipline gets relaxed sometimes. Social life can foster trust and responsibility that helps people aspire to discipline.
  • I’ve seen people using talk, text, tables, sketches, diagrams, stories, mind maps, toys, and props to help describe things in lots of different ways for analysis and for memory. Disciplined work often seems associated with careful note-taking, too.
  • In disciplined shops, order doesn’t necessarily come right away; sometimes it has to be bootstrapped. Stuff tends to start messy and get more tidy if it needs to; when things get too formal too soon, ideas get lost. Development work is one way of life, and a self-controlled, ordered way of life often starts with being uncontrolled and disordered when we’re starting to build something new. Order emerges.
  • Some disciplined places were quiet and focused, but in others I heard lots of regular background chatter, too. Highlights were stories about how people solved problems—and created new ones on the way. Storytelling of this kind helped people to think about risk in a vivid way, which prompted thinking about discipline.
  • I’ve heard open and honest disagreement when there were things worth disagreeing about. I’ve people getting upset… and taking responsibility for working things out. Discipline isn’t always smooth.
  • I saw builders paying attention to testability—which includes simplicity, cleanliness of code, modularity, visibility, and controllability—to make it easy to do less expensive deeper testing later on.
  • In the disciplined shops, the developers were resolved not to take on too much change all at once. They would make patient, careful, reflective, unhurried changes, and try them out themselves. When they felt the work was ready for other people, they’d make it easily accessible, asking for and getting feedback right away.
  • While designing, building and trying things, developers would try to anticipate potential exceptions and error conditions, and they’d generally be quite successful. Then they would give the product to someone else to test, whereupon they would learn something about what they had missed.
  • Developers who were really good at debugging carefully tried out specific little changes as they worked on solving a problem.
  • The disciplined builders would tend to have a sober preference for reliable, widely-used, field-tested components over a mad rush to implement new stuff developed from scratch. As a consequence, there tended to be fewer surprising bugs.
  • I’ve seen programmers whose style was test-first or test-driven development—and who were given the time to apply it. And I’ve worked with disciplined programmers who don’t bother with TDD, exercising discipline in other ways.
  • I’ve seen code that contained inline assertions in debug builds. I’ve seen exception handling built into the product and logs to report on its status. (Every now and again, I see well-thought-out, helpful error messages.)
  • I’ve seen see developers checking their own work with configuration checks, unwanted-change detectors, and unit testing, including programmed output checks.
  • I’ve watched people spending hours and days in each other’s offices or cubicles, doing pair programming for immediate, real-time review.
  • I’ve seen formalized review sessions throughout—wherein new developers learned from more senior developers and, interestingly, vice-versa.
  • I’ve seen developers using lots of appropriate tools to see hidden things, or to see unhidden things in different ways (e.g. IDE syntax checking while writing code; attention to compiler warnings; database schema diagramming; dependency checking; profiling for performance; etc….);
  • I’ve seen consistent refactoring for readability, maintainability, and portability; paying down technical debt, as they say.
  • I’ve listened in on discussions about the development of shared coding styles, which also helped with readability.
  • I’ve observed developers keeping careful notes about setup procedures and configuration settings.
  • I’ve watched the entire team working collaboratively throughout so that there are lots of eyes and minds to notice things that could go wrong.
  • I’ve seen teams cultivate good relations with technical support.
  • I’ve noticed disciplined people who went home consistently on time. Also, disciplined people who stayed late from time to time.
  • In disciplined shops, I’ve seen shared skepticism about the completeness, accuracy, or relevance, of requirement statements, acceptance criteria, or a “definition of done”. Amidst optimism, I’ve noticed a suspended certainty about whether things were really done.
  • Disciplined shops often do frequent bursts of shallow, non-invasive interactive testing near the coal face, to help confirm that what the programmers were doing is reasonably close to what they intended to do.
  • I’ve seen project managers provide support staff, including people to set up test systems, to help keep track of the backlog, and a group administrator to help the manager in acquiring resources.
  • I’ve seen frequent building, to make builds for deep testing and bug fixes available at the drop of a hat. But I’ve also seen relatively infrequent yet still reliable building, too.

These are ideas and practices I’ve seen people applying to help them keep on track while building products. Most or all of these things would be done by the developers in collaboration with people working reasonably close to them (some of those people might be testers, and others might not be).

Each item on the list lends a kind of discipline to a development process. Each one represents something people might mean when they murmur something vague about “building quality in”. They’re heuristics, not rules. No one did all of them. I’ll bet you’ve got a ton of stuff on your list that’s missing from this list. Notice, too, how each item above could represent disciplined action in one context and a lapse of discipline in some other context.

Discipline doesn’t have to be burdensome, bureaucratic, or otherwise slow. Informal actions can support discipline, and help people find out where they might need to apply discipline. Remember, according to Chambers, discipline means “self-control to obtain an ordered way of life”; the self-control part suggests that discipline comes from within, rather than being imposed from outside.

Some forms of discipline might feel slow to some, at first, but prudent driving feels slow to people who are used to driving recklessly. When we’re driving, we almost always drive more slowly than we could possibly drive. Driving faster than that increases the risk that we’ll arrive late—or not at all.

Some of the discipline-related activities above represent some form of testing; others don’t. However, the processes of building a product are very different from the processes of experiencing a product. Bugs, especially of the latter kind, can elude even a disciplined development process. Accordingly, it makes sense for there to be different kinds of testing: testing for examining a product as it’s being built; and testing for obtaining experience with the built product.

So when builds are available, it’s probably wise to do some periodic deeper testing, some of it focused on potential, reasonably foreseeable, undesirable effects and side effects of a change—the risk of regression. That regression testing can be far better targeted when the product has been carefully built and already tested to some degree.

Deep testing doesn’t have to happen on every build; indeed, it probably shouldn’t. In lots of places, it can’t. Testing for hidden, rare, subtle, intermittent, emergent bugs tends to take time—the kind of time that can interrupt or slow down development. It can take a while to set up data and tools for deep testing. When systems have complex interactions, problems emerge at the interfaces between things that worked fine on their own. Working out those interactions and studying them in a search for problems can take time. That time might be worthwhile when safety or health or money are on the line. If there’s discipline in the building, the rewards of testing a build deeply tend to dominate the risk of skipping a few well-controlled builds.

Critical distance can aid deep testing to be done by people at some critical and even social distance from the people who are changing the product. Risk is a big deciding factor on that score—including the risk of regression.

And there’s the rub. In many organizations, people don’t mandate, or foster, or do well-disciplined work; or they exercise discipline in a very shallow way, cherry-picking one or two items from the list above, and ignoring the others. In such organizations, it seems as though the object is for the developers to write code, rather than to write code that works.

But perhaps, triggered by subconscious recognition of the risk of regression, managers (and, often, testers) feel compelled to do an overwhelming amount of expensive work: sitting at the keyboard and repeating every scripted test procedure that has been performed before, as quickly as possible. When you ask them why, they often reply, “because the developers have no idea of what might be affected by this change.” Then some of them proceed to convert those scripted procedures into automated scripted procedures, whereupon they gain a second undisciplined development project and a new maintenance nightmare. And they feel even more overwhelmed.

If someone feels overwhelmed, that’s a sign that there’s something probably something overwhelming going on.

If the developers really do have no idea about what might be affected by change, then that’s a problem—one that the organization should definitely address. It’s like the principle that you shouldn’t try to automate a process that you don’t understand; when you’re working with something important, you shouldn’t rush to change it unless and until you’ve got a reasonably good idea of the extents and effects and risks of the change, and how to manage them.

Now: there’s a problem here for testers. Testers don’t design, write, or fix the code. Many testers don’t have significant programming experience; of the few who do, few have experience with writing production code. Testers don’t manage the project, and very few testers indeed have been project managers. Testers don’t manage the developers. In light of that, it’s inappropriate, in my view, for testers to tell programmers and managers how to do their jobs. Testers cannot and should not try to force, or enforce, discipline.

It’s quite reasonable, though, for testers to report on problems with the product. It’s reasonable for testers to identify patterns of problems related to particular coverage areas or quality criteria. It’s reasonable for testers to report on patterns of regression-related problems.

It’s also reasonable for testers to report on where testing time is going. If investigating and reporting shallow bugs is dominating testing work, testers will obtain less thorough coverage of the product. Developers and managers need to be aware of that. If troubleshooting and maintenance of automated checks is swamping the testers’ ability to gain critical experience with the product, that’s noteworthy; that work will displace the testers’ opportunities to learn about the product deeply, and perform new experiments on it. Things that slow down testing and make it harder allow deeper and possibly more dangerous bugs to hide and survive.

That’s why it’s important for testers to learn the skills of analyzing and describing the state of the product, the state of the testing, and the quality of the testing—including problems that threaten any of these things. It seems that managers and developers are often unaware of problems of lapsed discipline. Testers shouldn’t be trying manage the project, but they can shine light on the problems.

Obsession with regression testing is a hint that something else might be amiss in the process that leads to it. Sure, it’s a good idea to do some testing after a change. But it’s a lot less expensive to test after a change when people have been testing during the change.

Discipline is a heuristic for reducing the risk of regression and the need for regression testing. When people apply discipline, the effects of change tend to be better known, the code tends to be cleaner, the feedback loops get faster, and the risks tend to be lower—and deep testing can become targeted on the risk, faster, cheaper, and deeper—helping to find hidden problems that matter.

====================

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.

To Avoid Trouble Successfully, We Must Look For It

Monday, September 28th, 2020

Software testing can be socially difficult because of people’s natural desire to avoid trouble. This prompts them to avoid thinking about trouble, which means that they don’t look for it. But if you don’t try to find the trouble that’s in your product, that trouble will eventually find you.

Some might say we do think about trouble, and we try to avoid it by getting clear on our intentions in design work, and by checking our work as we go. Those are fine things to do, but they come with their own problems. In design and planning, we are often unaware of problems that may emerge as we combine elements in a system. Developers are rationally and justifiably resistant to slowing down the pace of their work. Even when we do our best, some problems will elude us.

So when value is at risk, when risk is significant, and when that risk can manifest as real problems that hurt people, deep testing done efficiently is a responsible thing to do—and not doing it means we did not do our best.

A correspondent on LinkedIn, Aaron Emery, asks:

How do you suggest dealing with management that want to ‘shoot the messenger’ in instances like these?

It depends on the management, the message, and the messenger.

Some social awkwardness can come from the message itself and the way it’s frame. “This feature sucks” is probably not as easily digestible as “this behaviour in the product is inconsistent with this requirement noted in the spec” or “…inconsistent with this other part of the product” or “…with what we’ve seen in previous versions of this product” or “…with reasonable desires of this until-now-forgotten user”. Point out the inconsistency dispassionately, and let the receiver of the message come to his or her own feelings about it. In other words: know your oracles.

Another approach is to point out that the message, although momentarily bad news, is offered in order to help make everyone look their best. “Yes; fixing this might take some work, but at least we won’t be inflicting it on customers” — or even “Yes, even though we’re not going to fix this, at least tech support will be prepared for it and can offer a workaround.”

It’s critical for testers to know that the product doesn’t have to look or behave the way we want it to. We don’t design the product, we don’t code it, we don’t sell it, and we don’t run the business. We’re trying to help our testing clients understand the product they’ve got, so that they can decide whether it’s the product they want. So if the client hears us and understands the nature of the product but doesn’t want to fix it, that’s fine—and that’s not shooting the messenger, either. That’s business.

If management says “why are you only telling us about this NOW?”, the reply is “because I only found out about it now. It’s a pity our planning and our coding discipline didn’t prevent this problem, but at least now we can fix it while there’s still time, or learn from this experience.”

If management is truly reckless and wants to suppress awareness of problems, driving the school bus blindfolded, then they probably don’t want your services as a tester. That’s okay too; testing is always optional — and so is your choice of testing clients. You might want to avoid that company’s products in the future, though.

====================

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.

Testing Doesn’t Add Value to the Product

Saturday, August 29th, 2020

Testers consistently ask how to show (or demonstrate, or prove, or calculate) that testing adds value.

Programmers, designers, and other builders create and add value by creating and building and improving the product. Testing does not add value to the product. And that’s fine.

Managers assure quality by helping programmers, designers, and others to obtain the resources they need, and by removing (or at least reducing) obstacles to their work. Testing does not assure quality. That’s fine too.

Testing does not add value to the product. You can test all you like and the product won’t get any better, nor will it get any worse. Similarly, weighing yourself will neither increase nor reduce your weight.

Based on what you read off the scale when you weigh yourself, you might choose some action to increase or decrease your weight. Based on the observations that we make and the problems that we find in testing, people might choose improve to the product in some way—or to live with the product they have. Testing itself adds no value to the product, though, and that’s fine.

Testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, investigation, critical thinking, risk analysis, etc. Testing helps us to learn the actual status of the product. Significantly, testing provides people with a means of determining whether there are problems in the product that threaten its value. By revealing problems in the product, and analysing those problems, testing can also help to cast light on problems in the project that can contribute to product problems.

In other words: testing doesn’t add value; it provides value. Testing helps people to understand the product they’ve got, to help them decide whether it’s the product they want. That can be valuable, since deep, accurate knowledge about the actual product, its actual status, and problems in it can have considerable value for the people who are building product and managing the project. Without testing—experiments on the product —our theories about the goodness of the product are not grounded in experience of the product. They’re only theories; or beliefs, or hopes, or wishes.

So don’t worry about whether testing is adding value. It isn’t, and that’s not a problem. Consider instead whether testing is providing value to people who need to know deeply about the product (and especially about problems and risks that threaten its value), and who need to make decisions about it. Consider critically (and self-critically) whether testing is providing valuable knowledge at reasonable speed and reasonable cost—and whether your clients would agree with your assessment. If it isn’t, or they wouldn’t, that’s a problem. Fix it.

Further reading:

There Is No ROI in Social Media Marketing (read this article, replacing “social media marketing” with “testing”)
How is the Testing Going?
Testers, Get Out of the Quality Assurance Business

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

It’s Not About The Typing

Thursday, August 6th, 2020

Garbage truckloads of marketing bumph are being dumped into the testing space about “codeless” testing tools. For the companies producing these tools, to “test” seems to mean “performing a sequence of keystrokes or mouse clicks or button presses on an app”. (You can see the same pattern in many tutorials on “test automation”; write a script that executes a sequence of actions, and that’s a “test”.) But the marketing material is mute on how the tool aids the tester in recognizing problems in the product. The marketing focus is on how quickly the product can repeat some sequence of keystrokes and clicks and pushes.

Automated data entry can be a useful part of a test, but automated data entry is not a test. Alas, the marketing approach works really well on managers (and, sadly, some testers) who can perceive only the visible activities of testing; not the cognitive aspects of it, and not the essential mission of testing: revealing the status of the product, and finding problems before it’s too late to do anything about them.

In Rapid Software Testing, testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, critical thinking, risk analysis, etc. Above all, testing requires us to focus on the risk that there are problems in the product, to anticipate problems, and to recognize problems that are present. That requires oracles; an oracle is a means by which we recognize a problem when we encounter one in testing.

The “no-code automation” tools supply weak oracles at best; typically checking for the presence of a particular element on the screen, or a particular value in some output field. If that element is there, or if that value matches some specified and presumably desirable result, the “test” “passes”. But that doesn’t mean that there is no problem; a product can have plenty of problems even when it arrives a correct calculation, or drops the user on the requested page. You know this from your own experience. You’ve used iTunes, right? You’ve been on LinkedIn. You’ve tried to fix an indentation issue in Microsoft Word. Maybe not those things specifically, but I bet you’ve felt annoyed, frustrated, impatient, or baffled when try to use software to get something done. Quite possibly today.

And there’s the rub: rather than a means to gain experience with the product, most of these tools represent a means to check the product for specific conditions that can be specified easily. There’s a seductive story to be told about that: you can run those checks over and over, really quickly, and find a few shallow bugs when something changes in a bad way. Yet the tools are fussy; a change in the product can throw the script off even when it’s a desirable change. Addressing that requiring investigation, repair, and continuous maintenance, which takes time.

Then something even worse happens: testers, deliberately or not, don’t report the time it takes to deal with problems around the tool. Why wouldn’t they do that? One reason could be that management has spent a wad of money on the tool, and the vendor says it’s supposed to be simple. As a tester, to suggest that there are difficulties with the tool is to risk your reputation. Come on. It’s codeless. It’s supposed to be simple.

While all that is going on, the tool misses problems that would be easily apparent if testers were gaining real, human experience with the product. People find problems that tools miss because humans have a wonderful capacity to recognize problems that they have not been told about in advance. Humans bring rich sets of oracles to testing. Testers use their feelings, their social awareness, their memories, their tacit knowledge, their experience of the world, their familiarity with comparable or competitive products or features—all of these things, and more—to generate and apply oracles on the fly. But there’s less time available for gaining experience with the product and identifying unanticipated problems whenever the tester is repairing and maintaining the scripts.

Whether your testing tool is “codeless”, whether your input is delivered by a script, or input directly via keyboard and mouse, the means of entering data is usually one of the least significant aspects of a test.

What matters is not typing quickly, but the capacity for the tester to recognize problem that matter. If there’s no oracle, there’s no test. If there are weak oracles, there’s weak testing.

Further reading:

A Context-Driven Approach to Automation in Testing
Oracles from the Inside Out

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

A Testopsy: Learning from Performance

Monday, July 27th, 2020

What’s the difference between Rapid Software Testing (RST) and other forms of testing? In RST, the process model is not the centre of testing; neither is formal documentation; nor are tools. All of those things play a role in testing, of course, but they’re not at the centre.

In RST, the centre of testing is the skill set and the mindset of the individual tester, and heuristics that testers apply.

A heuristic is a fallible means of solving a problem. That is, a heuristic might work, or it might fail. A heuristic will fail when it is applied to the wrong kind of problem; or when it is applied with insufficient judgement, wisdom, skill, or care; or when some context factor or another derails it. All of the models that we apply to the product and to the test space are heuristic. All test techniques are heuristic. All of the ways in which we could apply tools are heuristic. All the ways we have of deciding that there’s a problem (that is, all of our oracles) are heuristic. And this doesn’t apply only to testing; everything in software development, and in the broader field of engineering itself, is heuristic.

So, in order to get good at testing, we must learn about heuristics that we can apply powerfully in our work. We must also consider how our heuristics can fail, too. One of the better ways to do that is to review and evaluate our work periodically in a very detailed way. In Rapid Software Testing, we call that a testopsy.

Earlier this year, James Bach and I did a testopsy on a session of testing that we had performed together about six months earlier, in preparation for the Rapid Software Testing Applied class that we teach. By examining our performance, we able to notice and name heuristics and patterns that help us to think about testing, to describe it, and to understand how testing can go right—and sometimes not so right.

Here are just a few things we learned—or learned more deeply—from that session and the testopsy we performed:

  • When we’re doing pair testing, a lot of tacit knowledge emerges into the explicit. Each person’s performance is visible to the other, raising observations and questions about things that have not been shared up to that point. Through that, knowledge gets shared, discussed, and refined.
  • Products often give us lists of their own features in odd places, in interesting ways, that afford some efficiency for identifying coverage ideas.
  • There’s a phenomenon that happens in testing that we’re calling a “bug cascade“—periods where we are stressed or even overwhelmed by overlapping and competing investigations of complex and confusing behaviour.
  • During a bug cascade, we often recognize that we don’t know enough about the product to perform good analysis and troubleshooting.
  • Bugs get noticed and then lost, or missed altogether, during a bug cascade…
  • …but having a video and reviewing it can help us to recover what we’ve lost.
  • Analyzing the product (which had been our original misison for the session) can be severely disrupted by a cascade of bugs.
  • We coined a term, “mutually disruptive processes”, to describe one of the consequence of the bug cascade—which, when you’re working alone, is self-disruptive.
  • We coined another term, “the money booth effect”, to account for the collapse of productivity that is the consequence of mutually- or self-disruptive processes.
  • It is a good idea to be forgiving of ourselves for these problems. Although we can try to manage them to some degree, they are intrinsic to the process of learning and testing a product.

There’s lots more to the testopsy, which you can see here.

Why is this all important? Because in order to do something well, we must understand it, and testing is often terribly misunderstood—by managers, by developers, and, sadly, by testers themselves. By doing deep study of our work from time to time, we can begin the process of framing it, describing it, dicussing it, and developing expertise in it.

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

Breaking the Test Case Addiction (Part 12)

Saturday, July 25th, 2020

In previous posts in this series, I made a claim about the audience for a test report:

They almost certainly don’t want to know about when the testing is going to be done (although they might think they do).

It’s true that managers frequently ask testers when the testing will be done. That’s a hard question to answer, but maybe not for reasons that you—or they—might have considered.

By definition, testers who are working for clients do not work independently. We are providing services to our clients. We gain experience with the product, explore it and experiment with it so that that our clients can determine the status of the product. Knowledge of the status of the product allows our clients to decide whether product is ready to ship, or whether there is more development work to do.

Whatever testing we may have performed, we could always perform more; but once the client decides more development work won’t be worthwhile, development stops, and testing stops along with it. (At least, pre-release testing stops. Live-site monitoring and other forms of information gathering begin when the product is released, presenting an opportunity for learning about the quality of the product and about the quality of the testing that’s been done on it. Sometimes that learning comes with a big price tag.) The real question on the table, then, is not when testing work will be done, but when the development work will be done.

So, brace yourself: the fact is that no one really cares when testing will be done, because testing is never done; it only stops. Testing stops when the client determines that there is no more development work worth doing. The client—not the tester—decides when development is done. And how does the client decide that?

The client decides based on economics, reasoning, politics, and emotion. This is a complex decision, and here comes a long sentence that illustrates just how complex the decision is.

The client will decide to ship the product when she believes that

  • she knows enough about the product, the actual known problems about it, and the potential for unknown problems about it, such that…
  • the product provides sufficient benefits—that is, the product will help its users to accomplish a task, or some set of tasks; and
  • the product has a sufficiently small number of known bad problems about it; and
  • the product is sufficiently unlikely to have unknown bad problems; and
  • more development work—adding new features and fixing problems—will not be worthwhile, because
  • the benefits from the product outweigh the known problems to a sufficient degree that customers will obtain the value they want; and
  • the business can deal with known problems about the product, sufficiently inexpensively for the business to sustain the product and the business; and
  • the business can deal with whatever unknown problems may still exist; and
  • the client will not be in political trouble with her social group (including the team, management, and society at large) if she turns out to be wrong about any of all this; and
  • she feels okay about all of these things.

So when will testing be done? The client can declare testing to be done at any moment when the client is satisfied that all of these conditions have been fulfilled. So when the client asks “When will testing be done?”, that question amounts to “When will I be satisfied that development work is done?” And how can you, the tester, predict when someone else will be satisfied by work being done by other people?

You can’t. So I would recommend that you don’t, and that you don’t try. Instead, I’d suggest that you negotiate your role and your commitments. At first, this may look like a long conversation.

Try something like this:

“I understand that you want to know when testing will be done, because you want to know when development will be done; that is, when you will be satisfied that the product is ready to ship. I don’t know how to make a reliable prediction about when you will be satisfied, but here’s something that I can propose in return.

“I will start testing right now; that is, I will start obtaining experience with the product, exploring it, performing experiments on it, analyzing it. I’ll learn rapidly about the technology, the clients for the product, and the contexts in which the product will be used. As a tester, my special focus will be on evaluating it like a good critic; finding problems that threaten the value of the product to people who matter—especially you.

“Things will tend to go better if I’m able to help find problems early on—in the design of the product, or in our understanding of how its users might get value from it, or in the context that surrounds it. I don’t presume to be the manager or designer of the product, but I may have some suggestions for it—especially in terms of how to make the product more practically testable.

“As the product is being built, I’ll work closely with you and with the developers to help everyone make sure that the product we’re building is reasonably close to the product we think we’re building. The testing we need for that tends to be relatively shallow, focusing on quick feedback that doesn’t slow down or interrupt the pace of development. I’d recommend that you give the developers time and support to do their work in a disciplined way, as good craftspeople do. That discipline includes review, testing, and checking their work as they go, so that easy-to-find problems don’t get buried and cause trouble for everyone later. I can offer help with that, to the degree that the developers welcome it.

“The more that the developers can cover that quick, shallower testing, the more I’ll be able to focus on deep testing to find rare, hidden, subtle, intermittent, platform-dependent, emergent, elusive problems that matter. Deep testing requires a different mindset from the builder’s mindset, and changing mental gears to do deep testing can really disrupt the developers’ flow. So I’ll try to do deep testing as much as I can in parallel with the shallower testing that the developers are doing all the way along.

“At every step, I’ll let you know about any problems that I see in the product. I’ll be giving you bug reports, of course. I’ll also let you know about how the testing is going—what has been covered and what hasn’t. I’ll use coverage outlines in some form to help illustrate that, and I’m happy to offer you a variety of formats for them so you can choose one that works for you.

“If I notice a lot of bugs that seem like they should have been easy to find, I’ll let you know right away. For one thing, when there are lots of shallow bugs, deep testing becomes harder and slower, because I’m obliged to pause to investigate and report those bugs. More significantly, though, lots of shallow bugs might indicate that the developers are working too fast, or are under too much pressure. When people are pressed, they tend to have a hard time maintaining discipline and mental control over their work. In software, that’s a Severity 0 project risk; it leads to bugs, and some of those bugs may be deep enough that they’ll get past us—especially if we’re investigating and reporting the shallower bugs.

“I’m prepared to test or review anything you give me at any time; I’ll let you know how that influences the pace of other work that you’ve asked me to do.

“If there is testing that must be done formally—that is, in a specific way, or to check specific facts—I can certainly do that. I’ll provide you (and the auditors, if necessary) with evidence to support claims about all of the testing that has been done, both formal and informal. I’ll also let you know about extra costs associated with formal work—the time and effort it takes—and how it might affect our ability to find problems that matter.

“Apropos of that, I’ll keep track of anything that might threaten the on-time, successful completion of whatever work we’re doing. If you like, I’ll help to maintain product and project risk lists. (I’d recommend that the project manager be responsible for those, though.)

“I’ll keep track of where my own time is going, so that I’ll be able to produce a credible account of anything that is slowing down my work or making it harder. I’ll let you know what I need or recommend to make testing go as quickly and as easily as possible, and I invite you to ask for anything that helps make the product status or the testing work more legible—visible, readable, or understandable—to you.

“My goal is to help you to be immediately aware of everything you need to know to anticipate and inform a shipping decision.

“I know that this doesn’t directly answer the question of when testing will be done; but testing ends when we know the development work is done. So perhaps the best thing is for us to go together to the designers and developers. You can ask them when they anticipate that the development work will be done, and when the problems we encounter along the way will be fixed. I will help them to identify problems and risks, and to remember to include time and resources for testability as they give their estimate. As we’re working together to build and test the product, we can develop and refine our understanding about it, and we can be continually aware of its status. When that’s the case, you’ll be able to decide quickly whether there’s more development work to do, or whether you believe the product is ready for release.”

That’s a fairly thorough description of testing work. It’s a pretty long statement, isn’t it? Reading it aloud takes me just over five minutes. In real life, it would probably be interrupted by questions from time to time, too. So let’s imagine that the whole conversation might take 15 minutes, or even half an hour. But let me leave this post—and this series of posts—with these questions:

In a project that can take weeks or months, wouldn’t one relatively short conversation describing the testing role and affirming the tester’s commitments be worthwhile?

In that thorough description of testing work, did you notice that the expression “test cases” didn’t come up?

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? See the full schedule, with links to register here.

Breaking the Test Case Addiction (Part 11)

Friday, July 24th, 2020

In the previous post in this series, I made these claims about the audience for test reports:

  • They almost certainly don’t want to know about test case counts (although they might think they do).
  • They almost certainly don’t want to know about pass-fail ratios (although they might think they do).
  • They almost certainly don’t want to know about when the testing is going to be done (although they might think they do).

It’s far more likely that they want an answer to these questions:

What is the actual status of the product? Are there problems that threaten the value of the product? How do you—the tester—know? Do these problems threaten the on-time, successful completion of our work?

In this post, I’ll address the first two claims; I’ll leave the latter claim for next time.

They almost certainly don’t want to know about test case counts (although they might think they do).

Imagine asking a tester to test a cheap pocket calculator for you. We will call him “Eccles” (in honour of The Goon Show). You tell him your intentions for it: you would like use it mostly to help you to divide the bill for a group of friends at a restaurant, and other everyday tasks. Eccles disappears, and returns a few minutes later. You ask him if he has found any problems. He says No. You ask to see his results, and he shows you his two test cases:

Input: 1 + 1 Result: 2 (Pass)
Input: 2 + 2 Result: 4 (Pass)

You quite reasonably believe that Eccles’ testing is inadequate. You tell him that you want more test cases. He listens, appears to understand the problem, and nods. He disappears again, and considerably later he returns, telling you that he has run 100 test cases—50 times more than the first time! And he has carefully documented the results:

Input: 1 + 1 Result: 2 (Pass)
Input: 2 + 2 Result: 4 (Pass)
Input: 3 + 3 Result: 6 (Pass)
Input: 4 + 4 Result: 8 (Pass)
Input: 5 + 5 Result: 10 (Pass)
Input: 6 + 6 Result: 12 (Pass)
Input: 7 + 7 Result: 14 (Pass)
Input: 8 + 8 Result: 16 (Pass)
Input: 9 + 9 Result: 18 (Pass)
Input: 10 + 10 Result: 20 (Pass)
Input: 11 + 11 Result: 22 (Pass)

Input: 99 + 99 Result: 198 (Pass)
Input: 100 + 100 Result: 200 (Pass)

To the degree that more is better here, it’s not very much better.

The trouble, of course, is that the count doesn’t mean anything without context. What aspects of the product are being tested? Has the testing been limited to only mathematical functions within the product? If so, has the tester at least given some coverage to all of them—and if not, which ones has the tester not covered—and why not? Has the tester considered other things that could diminish, damage, or destroy the value of the product? Has the tester considered performance and reliability? Has the tester considered the different people who might use the product, and the ways in which they might use the product in the real world?

Testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, sensemaking, risk analysis, critical thinking—and many other things too. A test is a an instance of testing. Not all tests are equal in terms of effort, time, skill, scope, risk focus,…

Test cases tend represent things that are easy to describe about a test: directly observable behaviour that can be described or encoded explicitly; and observable and describable outputs. Test cases both assume and ignore tacit knowledge.

But neither tests nor test cases are commensurate—that is, they cannot be counted as though they were equivalent units—so “test case” is not a valid unit of measurement.

  • From one case to another, test cases vary widely in scope, in coverage, in cost, in risk focus, and in value.
  • The design of a test case is subjective, based at least to some degree on the mental models and mindset of individual testers.
  • Test cases involve different test techniques.
  • Test cases are not independent; the outcome of one might influence the outcome of another.
  • Test cases are not interchangeable. They’re different, depending on the feature, function, data, and product in front of us.
  • Test cases do not—and cannot—capture all the testing work that occurs, such as learning, conjecture, discoveries, bug investigation, and so forth.
  • Test cases don’t even capture the work of designing the test cases, nor of analyzing the results!
  • And finally… testers often don’t follow the test cases anyway—and certainly not in the same way every time! A test is a performance, and a test case is like a script and stage directions for that performance. As with actors working from a script, the performance will vary from tester to tester, and from time to time.

Note that none of these things is necessarily a problem. Indeed, in testing, there’s considerable value in variation and variability. Bugs aren’t all the same, and they’re not always in the same place. There is a big problem in trying to treat test cases as equivalent for the purposes of counting them. (I’ve talked about that many times before, including here, and here.)

Now, there is at least one argument in favour of test cases:

Perhaps someone wants to verify that a specific procedure can be followed, with specific preconditions and specific inputs, in order to show that the procedure and inputs will produce a specific result. And, in fact, perhaps that procedure, or some part of it at least, can be automated.

That’s okay, although there are at least two problems to consider. First, all that specification tends to take time and effort which can be costly, and which can swamp the value of what we might learn from following the procedure. Second, demonstrating that something can work based on specific procedures and inputs doesn’t mean that it will work. A variation in the procedure, or the conditions, or the inputs will result different output. Even holding the conditions and the procedure steady, and obtaining the correct output might result in an outcome that is terribly wrong in some sense.

Perhaps someone wants certain conditions to be identified and covered. If that’s true, identify those conditions and cover them. There are plenty of ways to do that without over-formalizing or over-proceduralizing the testing work.

Consider

  • noting those conditions in guidance for human interaction with the product;
  • reviewing existing logs or records to see if those conditions have been covered, and if not, cover them; or
  • creating automated low- or middle-level checks for those conditions.

Over 50 years ago, Jerry Weinberg wrote this passage:

One of the lessons to be learned from such experiences is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test really does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.”

Leeds and Weinberg, Computer Programming Fundamentals: Based on the IBM System/360, 1970

So, consider thinking in terms of testing, rather than test cases. And if you are applying test cases, please don’t count them. And if you count them, please don’t believe that the count means anything.

They almost certainly don’t want to know about pass-fail ratios (although they might think they do).

If a test case count is not a valid measure of test coverage, then a ratio derived from that count is invalid too, whether used to evaulate the quality of the product or the quality of the testing. I’ve heard tell of organizations that have a policy that says “when 97% of the test cases pass, the product is ready for shipping”. It shouldn’t take long to see the foolishness of this policy; it’s like a doctor say that when 97% of the data points in your medical checkup indicate no problem, you’re healthy.

Just as “the sheer number of tests is of little interest in itself”, the ratio of passing tests to failing ones is both insignificant and easy to game. Insignificant, because a product can be passing all of the tests that we’ve performed so far and still have terrible problems. Also insignificant, because a product can fail to pass hundreds of tests—but if those tests are outdated, inconsequential, overly precise, or otherwise irrelevant, there’s no problem. Easy to game, because if you want to make the product look better than it is, it’s a simple matter to perform more passing tests.

The point of testing is not to provide a pat on the head for the product; the point is to evaluate its true status, and to identify problems that threaten the value of the product to people who matter—to the users or customers of the software, or to anyone affected by it; to the support organization; to the operations people; and, ultimately, to the business.

Several years ago, a participant in one of my Rapid Software Testing classes approached me after I had mentioned this 97% pass rate business (which I’ll call 97PR henceforth). He said, “It’s funny you should mention it. I’ve worked at two companies where they used that measure to decide when to ship.”

“Really?” I replied. “Do you mind me asking—which ones?”

“Well,” he said. “One was Nortel.” I winced; Nortel was a huge Canadian success story until all of a sudden it wasn’t. “The other,” he said, “was RIM—Research in Motion. The Blackberry people.” I winced again.

Was 97PR responsible for the demise of these two companies? Probably not—certainly not directly. But to me, the 97PR suggests a company where engineering has been reduced to scorekeeping. If you want to fool people about something, providing numbers without context is a great way to do it. And if you want other people to fool you, ask for numbers without context.

For the calculator example above, what would a better test report look like? Here’s what I might offer:

“I’ve tested the calculator for basic math operations that seem likely to be important in calculating restaurant cheques: addition, multiplication, subtraction, and division. I imagined that you would be wanting to do this for groups of up to a dozen people. I did a handful of variations of each math operation, up to the limits of what the display of calculator supports, including stuff like dviding by zero. Beware, because if you do that by accident, you’ll lose what you’ve entered so far. (Aside: Windows Calculator loses the operations before a divide-by-zero too.) I took notes, if you want to see them.”

The client, of course, could stop me at any time. What if she didn’t? What would a deeper test report look like? Given some time, I might offer this:

“I tested the memory-store and memory-recall functions, too, and didn’t observe any problems. Even though they’re present as buttons on the calculator, I didn’t bother to test the higher-order math functions like squares, square roots, and trigonometric functions, since I reckoned you wouldn’t need those for restaurant bills and I didn’t want to waste your time by testing them. But if you want me to, I can.

“The buttons provide haptic feedback, so it’s easy to tell when they’ve been pressed, and there’s no key-repeat function, so it’s easier to avoid accidental double keypresses on this calculator than it is on others. I looked at it in low-light conditions; its LCD screen may be a little hard to see in a dark restaurant. It’s solar-powered, and there’s a feature that turns itself off after five minutes. In that case, it forgets whatever data you’ve entered.

“I dumped some water on the keypad, and it continued to perform without any problems. After I immersed it in a glass of water, though, I had to let it dry for a couple of days before it started working again, but it now seems to be working just fine.”

Yes; all that takes quite a bit longer to say—or to write—than “We’ve run 5163 tests, and of those, 118 are failing, for a pass rate of 97.7 per cent.” It’s also more informative—by a country mile—about the quality of the product and the quality of the testing.

So what do you do when a manager asks for test case counts or pass-fail ratios? Here’s a reply from James Bach: “I’m sorry, but misleading you is not a service that I offer.” Consider offering a three-part testing story instead.

We’ll get to that last claim about a test report’s audience (they almost certainly don’t want to know about when the testing is going to be done (although they might think they do)) in the next and final post in this all-too-long series.

Breaking the Test Case Addiction (Part 9)

Saturday, February 15th, 2020

Last time, Frieda and I had been looking at visualizations of time spent on various testing activities, include work that foster test coverage of the product (T time), bug investigation and reporting (B time) and setup work to get ready to test, or tidying up afterwards (S time).

“So…,” Frieda mused, “I could track T-time, and B-time, and S-time. But I’d be a little worried about watching the clock all the time, instead of concentrating on my testing. It’d be like micro-managing myself.”

“That is worth worrying about,” I replied. “The last thing we want to be is obsessive-compulsive clock watchers. So here’s a secret: to some degree, we misrepresent our accounting of session time.”

“Oh, great,” said Frieda. “I thought this whole discussion has been about establishing trust.”

“It is. But it’s also about accounting for what we do in a way that everybody can make sense of what’s happening. And although we care about accuracy, precision isn’t too big a deal. In session-based test management, we’re trying to account for the effort that we’ve put in, but we’re also trying to make things easy enough for our clients to comprehend. So we don’t watch the clock all the time. A reasonable estimate of how much time we spent on T, B, and S is good enough. Precision to the nearest five or ten per cent will do. We’re not lying, but we are simplifying; smoothing out the details so they don’t get overwhelmed or obsessed or fooled by the numbers. Remember, the point of all this isn’t score-keeping. It’s to prompt us to ask questions. Mostly: are we okay with how we’re spending time?”

“Here’s an example,” I continued. “One day, after the morning standup, I start working a charter that covers some area of the product. Things go smoothly for the first 20 minutes or so, and then a developer comes up and asks me to help him with reproducing a problem that someone else reported. That goes on for 15 minutes.

“Then I get back to work on the charter. There are quick little interruptions along the way—a phone call here, and an instant message there—but by and large I can handle them quickly and keep the flow going for an hour and a half. I run into some bugs, and I run into some problems with a test tool that amount to setup time.

“Then it’s lunch. When I come back, I’m still looking at the same area. I work at it for 25 minutes, and the development manager wanders by for a chat. That takes 20 minutes. I get back into testing for 45 minutes, and then it’s Paula’s birthday, so I go to the lunchroom and eat cake and chat for 15 minutes.

“I get back and do testing work for 40 minutes, and then another tester asks me to look over a coverage outline they’ve done. That takes 10 minutes. Then I get back to the charter, and work it for another 25 minutes, and wrap it up.

“Now: if we add all that up, that’s just over five hours of clock time, of which an hour in total was interruptions. 245 minutes were spent on actual testing. If we think of a session as 90 minutes, that’s pretty close to three sessions worth of work.

“So when I’m reporting, I’ll probably submit that as two session sheets, one to describe what I did the morning and the other for the afternoon. I’ll account for the work as three sessions worth of time. I’ll make a reasonable guess as to how much I spent on T-time, B-time, and S-time for each one. Again, precision to the nearest five or ten per cent is good enough. With the TBS numbers, we’re trying to identify approximately how badly our coverage has been interrupted. If we’re not okay with what the approximation suggests, we’ll look into the specifics.”

“But won’t managers get upset if we don’t report the numbers precisely?” Frieda asked.

“Trust me,” I said, laughing. “They’re not watching that closely. They never are. They can’t want watch that closely; it’s not possible. They don’t have time to scrutinize everyone’s work every minute of every day. There’d be no point to it. Plus supervising people’s every move would undermine the social nature of work. People need to be unsupervised to some degree in order to feel trusted—and be—responsible for what they’re doing.

“Plus,” I noted, “if managers were watching closely, they would be horrified at home much time was being wasted on the care and feeding of test cases, and how little time was being spent on actual testing and collaborative work.”

“Heh,” said Frieda. “That’s true.”

“On the other hand,” I continued, “it would quite reasonable and important for them to know if your session time is being swamped by bug investigation and reporting, or by setup or followup work, or if interruptions from outside the session are preventing you from performing at least a couple of sessions worth of coverage a day.”

“Doesn’t that vary a lot?” Frieda asked. “I mean… some groups do a lot of stuff in meetings. You know, like design meetings and grooming meetings and project planning meetings. Should we track those?”

“Sure,” I said, “if you like. The key is this: if everyone is completely happy with a situation, don’t bother trying to measure anything in particular. But if someone is unhappy, or if someone has a feeling that there might be something to be unhappy about, then pay some attention to it. For instance, someone might say that testing is taking too long…”

“I’ve heard that before,” said Frieda.

“Uh huh. Too long compared to what? What part, or parts, specifically are taking too long? Get some data. After you’ve collected the data, ask questions about it. Analyze it. Are testers spending a lot of time in bug investigation? Why is that? Is it because they’re being overly detailed in preparing their reports? Are they investigating bugs for longer than necessary? Is it because the bugs are subtle and hard to reproduce? Or is it because there are so many bugs that it’s it’s overwhelming the testing time, and any opportunity for test coverage is destroyed?

“Each of those things should prompt a different management action, or a different change in behaviour. Maybe the problem is not really that the testing is taking too long, but that the developers are under too much pressure. They’re producing code so quickly that they don’t have a good handle on what they’re building, and they don’t have time to check their work. Or maybe the problem is that the testers are spending tons of time writing up bug reports—and maybe a solution to that we be to have the testers work right next to the developers. Then, instead of doing unnecessary paperwork, the testers could simply demo some bugs to the developers right away.

“The point of activity-based test management is to avoid turning testing work into production of artifacts. To prevent testers from being turned into test case machines.”

“What happens when somebody wants artifacts?” asked Frieda. “That’s a big reason managers say they want test cases… so they can know for sure that the work got done.”

“You know there’s a term for that, in our lingo: test integrity. Test integrity is about making sure the testing we say we did matches up with the testing we actually did. Are test cases the only way that managers could know that work got done?”, I asked.

“Well…,” she replied. “I guess there’s debriefing, as we were talking about. But they want… evidence. You know, something in writing.”

“How about the tester’s notes?”

“Hmmm…” Frieda paused. “Most testers aren’t that great at taking notes.”

“I agree,” I said. “I’ve seen that too, and it can be a real problem. People doing good investigative work—journalists, lab researchers, detectives—need to keep good notes. Testers do too. I like to tell testers that it’s okay not to keep good notes… as long as you want to forget lots of important stuff.”

“Why aren’t testers good at taking notes?” Frieda asked.

“I think there’s a feedback loop at work,” I replied. “People don’t do good investigative work when they’re following formally scripted test cases — and they don’t tend to take good notes either. Why should they? They just do what the script tells them to do, and the mission turns from ‘test the product’ into ‘follow the script’. That makes testing rote, and boring, and it derails the task of looking for problems. Why even bother to take notes in that case? And then, since people don’t practice taking notes, their note-taking skills decline. And then when they’re given a chance to work in a less scripted way, they don’t take good notes. They forget important details of what they were up to, and even if they remember, they might not have evidence.”

“So,” I continued, “one way to get people to learn to keep good notes is to set them free from writing and executing test cases. But the deal is that, in return, they have to produce some kind of evidence of what they were thinking and doing. They can show me that stuff to supplement the debriefing, and we can review it together. Tidy notes, taken every couple of minutes or so, tend to be helpful. I’d like to see what their test ideas were, or what risks they considered as they went. If they’ve used specific test data and examined specific behaviours, they can show me lists or tables or mind maps. If they’ve written some code to help them test, they can show me the code and the output from it. Their notes don’t have to be ponderous or bureaucratic, but I want to see something that helps me to follow their thought process and develop trust.”

“Some managers are really worried about that integrity stuff,” said Frieda.

“That’s reasonable,” I replied. “If I were managing a project for which integrity were an issue, like in a medical hardware or software context, I still wouldn’t make people follow test cases most of the time. If stuff needs to be checked, automate it. For high integrity, I’d require formal session reports as part of the deliverables, and I’d give the testers constant feedback on them. In session-based test management, for instance, there’s this concept of the session sheet that combines test notes, data about the session, and references to artifacts that were generated during the session. Things like test results, snippets of test code, or even screen shots or videos if they’re helpful.

“Before the session, I might identify specific factors to examine, or output values to check. I might charter them to use to tables of existing data. More often I’d get the tester to develop those things independently, and then show them to me along with the session sheet during the debriefing. Then we can discuss the tester’s choices and actions, and figure out how well we’re covering the product and what needs to be done next. And after that we can summarize session sheets into reports for managers, auditors, regulators, or anyone else who’s looking for something formal.”

For more on note-taking and session sheets, see https://www.developsense.com/presentations/etnotebook.pdf

A Moment of Jerry Weinberg Zen

Thursday, January 10th, 2019

The year was 2006. James Bach and I were running a workshop at the Amplifying Your Effectiveness conference (AYE). We were in one of those large-ish, high-ceiling conference rooms with about 15 programmers and software consultants.

We were showing them one of James Lyndsay’s wonderful testing machines. (You can find it here, but you’ll need Flash active to run it.) It looked like this:

James Lyndsday's Machine 1

At first, it’s all very confusing. When you press the buttons on the left, the red and blue balls on the right move in some way. The slider in the middle influences the range of motion somehow. In general, the mission of the exercise is to describe the behaviour of the machine.

Test cases are not testing. To illustrate this important fact, we give class participants the machines to investigate for a few moments, and then ask a question that James (Bach) asked in our AYE session.

“How many test cases,” he asked, “would you need to be able to understand and describe this product completely?”

Brows immediately furrowed. Clicking sounds from the buttons and murmured conversation between pairs of participants filled the room. “Two states to the power of five buttons with how many stops on that slider…?” “Wait, that button is just momentary…” “Seven hundred and sixt… no, that’s wrong.”

Whereupon, in a moment of perfect timing, a door opened, and Jerry Weinberg walked into the room. His walking stick and his bearing reminded me of Yoda and Gandalf and other sages and wizards.

“Hey, here’s Jerry Weinberg!” said James. “The world’s greatest living software tester! Jerry, how many test cases would you need to understand and describe this product completely?”

The room fell silent. Everyone wanted to know the answer. Jerry observed the laptop that James was holding. He didn’t touch the laptop, press a key, or move the mouse. He just looked for a few moments.

Then he said, “Three.” There was a pause.

Having worked with Jerry over a decade or so, James understood. “Three, Jerry?”, he asked dramatically, in mock astonishment.

“Hm.” (pause) “Yeah. Three,” replied Jerry. Another pause.

Then he peered at James. “Why? Were you expecting some other number?”