First Aid for the Mission Statement

January 23rd, 2021

A while back, a tester brought a patient in for treatment. It wasn’t a human patient; it was a sentence about building and testing in an organization. The tester asked me for help.

“Could you provide me with a first aid kit for this statement that came from my management?”

“We have to move on to DevOps to be able to release code more often but we also have to increase testautomation in any way we can and minimize manual time consuming testing.”

This is the sort of statement that needs more than first aid; it needs emergency room treatment. We’ll start with handling some critical problems right away to get the patient stabilized. Then we need to prepare the way for longer-term recovery, so that the patient can be restored to good health and become an asset to society.

I’ll suggest a number of quick treatments. As I do that, I’ll identify why I believe the patient needs them.

  • Replace “have to” with “choose to” in each case.

Unless someone is about to run afoul laws of nature, of government, or of ethics, no one has to do anything. People and organizations choose to do things. It’s important to preserve your agency. When you have to do things, you don’t have control over them. When you choose to do things, you remain in charge.

  • Replace “move on to DevOps” with “apply DevOps principles and practices”.

DevOps is not something to do; it’s a set of ideas and approaches for getting important things done. The central principle of DevOps—that development and operations people work together to support the needs of the business—is what matters most. If that principle is absent in the organization, there are practically infinite ways in which things will go wrong.

There will be more to say about DevOps-related practices and principles as we proceed.

  • Replace “to be able to” with “to be better able to”.

Not being a thing in and of itself, DevOps is not necessary to be able to do anything in particular. There were plenty of organizations building and releasing software successfully before DevOps, and there will be plenty of successful organizations long after DevOps is forgotten. Nonetheless, many of the ideas currently associated with DevOps could be very helpful.

DevOps doesn’t guarantee success, but applying its core principle might improve the odds.

  • Replace with “release code” with “release valuable products”.

The product is not the code, and the code is not the product. The product is the sum of code, software platforms, machinery, data, documentation, and customer support. The product is all of that together, delivering whatever experience, value, and problems that the customer encounters, good and bad. The code is part of the product, and the code enables the product. The code controls the mechanical parts of the product. That’s not to diminish the importance of the code. The code makes the product possible. If it’s a software product and there’s no code, there’s no product. If there is code, and it’s bad code, it’s less likely that we have a good product.

It’s a good thing to release valuable products. Therefore it’s a good thing to understand the code that we’re releasing—but not just the code, and not just its behaviour. That’s not trivial, but it’s the easy part of testing. The harder part of testing is understanding the relationships between the code, its behaviour and the people who will be using the product or otherwise interacting with it—including developers, operations people, and testers.

  • Replace “more often” with “at an efficient and sustainable pace”.

Delivering working software frequently is one of the principles behind the Manifesto for Agile Software Development. DevOps principles and practices are intended to support agility. Building efficiently and frequently—certainly a DevOps practice, but not a practice exclusive to DevOps—affords the opportunity to discover whether there are problems that threaten value before we inflict them on customers.”More often” isn’t the point, though, because “more often” might be good, bad, or irrelevant.

Consider the extremes.

When we build a product very rarely, problems get buried under layers of increasing complexity in the product and our inexperience with it. Testers become overwhelmed with the volume of learning and investigation (and, very probably, bug reporting) to be done.

When we build a product frenetically, we also get infrequent experience with it, because each new build comes along before we can gain experience with last one. In this case, testers get overwhelmed by the pace of the builds, and shallow testing, at best, is all that’s possible.

There is probably business value in being able to deploy software promptly, but that’s not the same as deploying software constantly. There might be contexts in which frequent deployment might provide real business value. In other contexts, a firehose of deployment can disrupt your customers, such that they don’t care about the deployment, but they do care about the disruption. The key is to recognize what context you’re in, and to minimize costly disruption.

One reasonable compromise in most cases is to set up systems to build the product easily and therefore frequently. In the build and in the processes leading up to it, include a smattering of automated checks to provide a quick alert to problems close to the surface. From the stream of builds, choose one periodically, and spend some time testing each one deeply to find rare, hidden, or subtle problems that can escape even a disciplined development process.

  • Replace “but we also” with “Thus”.

The first clause in the statement sets up the second clause. The first clause doesn’t undermine the second clause; the first clause puts legs under the second.

In our emergency treatment, let’s use “thus” to reattach the legs to the body. Also, we’ll add a sentence break, giving the patient a little more room to breathe.

  • Replace “we have to increase” with “we choose to apply”.

We’ve already covered the “we have to” part of this replacement.

Increasing something is not necessarily a good thing. In engineering work, everything is a tradeoff between desirable factors. Every activity that we might consider valuable comes with some degree of opportunity cost, reducing our capacity to do other things that we might also consider valuable. When we choose to apply something, we can choose to apply it more, or less, or just as we’re currently doing, to obtain the greatest overall benefit.

  • Replace “testautomation” with “powerful tools”.

I’m sincerely hoping that testautomation is a typo, rather than a new term. We’ll put a space in there.

There are many wonderful ways to apply tools in testing. Automated checking is one of them; only one of them.

Tools can help us to build the product, to prepare for testing, and to reconfigure our systems efficiently for better coverage. Tools can help us to probe the internals of the system to see things that would otherwise be invisible. Tools can help us to collect and represent data for analysis; to see patterns in output. Tools can help us to record and review our work.

Replacing “test automation” with “powerful tools” could help reduce the risk that “test automation” will be interpreted only as “output checking”.

  • Replace “in any way we can” with “in ways that help us”.

There are some things that we can do that we probably don’t want to do. We can serve steak with turpentine sauce, but no one should eat it and it’s a waste of good turpentine.

Tools can definitely help us with building the product quickly, and with identifying specific functional problems that might threaten its value. Tools can also be overapplied, reducing our engagement and human interaction with the product.

Fixation on tooling to exercise functions in the product can be a real problem if we forget that people use software. Even if our product is a software service, its API is used by people directly—programmers putting the API to work—and indirectly—end users who interact with the product through an interface designed for non-programmers. Functional correctness is important; so are parafunctional elements of the product: usability, performance, supportability, testability…

Tools can help us to focus our attention on important observations. Tools can also dazzle or distract us, diverting our focus from other important observations. We choose to apply tools not in any we can, but in ways that help—and our choices can include changing or dropping tools when they’re not helping.

Consequently, it might be a good idea to remember what we’re using the tools for. So, in addition to the replacement, let’s add “to build the product, to understand it, and to identify problems efficiently”.

  • Replace “and minimize manual time consuming testing” with… something else.

This particular wound has become infected, and there’s a lot of debris in it. It requires a fair amount of cleanup, emergency surgery, and some stitches.

One principle of DevOps is the idea that teams use “practices to automate processes that historically have been manual and slow”. That’s a good idea for tasks that can be mechanized, and that benefit from being mechanized. It’s not such a great idea to forget that many tasks—and parts of tasks—are non-routine, rely on expertise and tacit knowledge, and can’t be made explicit or mechanized.

Programming involves strategizing, interpreting, designing, speculating, reflecting, analysing… Testing involves all of those things too, and more. Organizations reasonably want programmers work quickly, but no one suggests “minimizing manual time-consuming programming”. This is because no one considers programming a manual process. Programming is an intellectual process; a cognitive process; a social process. So is testing.

Programmers type; that’s the manual part of programming. Just as for programming, the central work of testing is not the typing. No one refers to the typing part as “manual programming”; and when a programmer sets a build process in motion or takes advantage of plugins in an integrated development environment, no one refers to “automated programming”.

Just as there is no manual programming, and no automated programming, there is no manual testing, and there is no automated testing. There is testing.

“Manual” and “Automated” Testing

The End of Manual Testing

Anything worth doing requires some time and effort, and we usually want to apply our limited time and effort to things that are worth doing. It does make sense to minimize or eliminate the amount of time that we spend on unimportant things. It also makes sense to apply appropriate effort to things that are worth doing; to maintain their value; and to increase that value where possible.

Just as some tasks in programming can be carried out by time-saving tools like compilers, some of the tasks in testing can be carried out by time-saving tools like automated checks.

Compiling, though, is not the central task of programming. The central task of programming is modeling and expressing things in the human world in a way that machinery can deal with them. We can then use that programmed machinery to extend, enhance, accelerate, intensify, or enable human capabilities. All that requires a significant degree of preparation, technical savvy and social judgement—and time for cycles of design, experimentation, learning, and refinement. No one should complain about this taking the time it needs.

Checking is not the central task of testing. The central task of testing is the search for problems that matter—ways in which the software fails to meet the needs or desires of its users, or introduces new problems of its own. That requires not only checks for problems that we can anticipate, but a search for problems that we didn’t anticipate.  Like development work, testing work also requires a signficant degree of preparation, technical savvy and social judgement—and time for experiencing, exploring, discovery, and investigation. No one should complain about this taking the time it needs.

At least one class of testing tasks is even faster than running automated checks: the tasks that we choose not to do, because cost, value, and risk don’t warrant them. It’s worth the investment to pause every now and then to assess the relative value of unattended automated checks; instrumented tool-supported testing; and direct, unmediated experience with the product.

The trick here is to set ourselves up to do the fastest, least expensive testing that fulfills the mission of finding problems that matter before it’s too late. One way to get there is to apply fast, easy, non-interruptive checks that don’t slow down development. Then, periodically, do deep testing to find rare, subtle, hidden, intermittent, emergent bugs that might elude even highly capable and well-disciplined programmers.

So, replace “and minimize manual time consuming testing” with with “We want to minimize distracting, unhelpful, or unnecessary work. We want to and maximize our ability to evaluate and learn about the product both efficiently AND sufficiently deeply.”

With all those replacements, the text might be longer, but it’s more accurate and more precise; bigger and stronger.

So:

We choose to move to DevOps to be better able to release valuable products at an efficient and sustainable pace; thus we choose to apply valuable tools in ways that help us to build the product, to understand it, and to identify problems efficiently. We want to minimize distracting, unhelpful or unnecessary work. We want to maximize our ability to evaluate and learn about the product both efficiently AND sufficiently deeply.

Every recovering patient can use motivation and support, so we’ll set our patient up with a motivating and supporting statement, and send them on their way together.

As developers, testers, and operations people working togther, our goal is to enable and support the business by delivering, testing, building, and deploying valuable, problem-free products. As testers, our special focus is to help people to become aware of any important problems that would threaten the value of the product to people that matter; to help our clients determine whether the product that they’ve got is the product they want.

Rapid Software Testing Explored Online for North American time zones runs March 1-4, 2021. Learn more about the class, and then register!

Bug of the Day: AI Sees Bits, Not Things

January 4th, 2021

An article that I was reading this morning was accompanied by a stock photo with an intriguing building in the background.

Students throwing their graduation caps in the air

I wanted to know where the building was, and what it was. I thought that maybe Chrome’s “Search Google for image” feature could help to locate an instance of the photo where the building was identified. That didn’t happen, but I got something else instead.

An assortment of images of migrating geese

Google Images provided me with a reminder that “machine learning” doesn’t see things and make sense of them; it matches patterns of bits to other patterns of bits. A bunch of blobby things in a variegated field? Birds in the sky, then—and the fact that there are students in their graduation gowns just below doesn’t influence that interpretation.

That reminded me of this talk by Martin Krafft:

The MIT network’s concept of a tree (called a symbol) does not extend beyond its visual features. This network has never climbed a tree or heard a branch break. It has never seen a tree sway in the wind. It doesn’t know that a tree has roots, nor that it converts carbon dioxide into oxygen. It doesn’t know that trees can’t move, and that when the leaves have fallen off in winter, it won’t recognize the tree as the same one because it cannot conclude that the tree is still in the same position and therefore must be the same tree.”

Martin Krafft, The Robots Won’t Take Away Our Jobs: Let’s Reframe the Debate on Artificial Intelligence, 14:30</p>

Then I had another idea: what if I fed a URL to the image above to Google Images? This is what I got:

Results from a Google Image search, given a link to an image

Software and machinery assist us in many ways as we’re organizing and sifting and sorting and processing data. That’s cool. When it comes to making sense of the world, drawing inferences, and making decisions that matter to people, we must continue to regard the machinery as cognitively and socially oblivious. Whether we’re processing loan applications, driving cars, or testing software, machinery can help us, but responsible, socially aware humans must remain in charge.

(A couple of friendly correspondents on Twitter have noted that the building is the Marina Bay Sands resort in Singapore.)

Bug of the Day: What Time Are the Class Sessions?

December 17th, 2020

One problem that we face in software development and testing is that data and information aren’t the same. Here’s an example, prompted by email from a correspondent.

There’s a Rapid Software Testing Explored class running January 11-14, 2021. It’s set to run at times that work for people in Europe and the UK, mostly. The service I use for managing registrations, Eventbrite, offers the opportunity to list the starting and ending times for the class. So far, so good.

The class starts 12h00 Central European time, on January 11, 12h00. The class lasts for four days. Each day, there are three webinars of 90 minutes, with a half-hour break between each one. Thus the class ends at 17h30 Central European time, on January 14. How should this be displayed on the landing page for the event?

Eventbrite offered a form for me to fill in the starting and ending date and time for my event. I filled it in. Then Eventbrite provides an option to display the start time and ending time of the class on the landing page for the event. When I accept both options, the page duly presents the class as starting on the start time (2021/01/11 06h00 EST), and ending on the ending time (2021/01/14 11h30). Those times are entirely, factually correct as data. That correctness is pretty easy to check, too.

A person who wrote from Europe wanted to register for the class wrote to ask if he should assume that the class ran from 12h00 to 17h30 on the first day and from 8h30 to 17h30 on the second, third, and fourth days. If you’re like me, and you already know the timetable for the class (you do; I just told you), the writer’s assumption might seem strange—but that’s from the perspective of people with insider knowledge, like you and I. There’s no particularly good reason to label that evaluation as strange from an outsider’s perspective.

The issue here is that, in its template for displaying an upcoming class, Eventbrite allows me to check a box to show the start date and time, and another box to show the end date and time. There isn’t an option to display the dates alone, without the time, nor is there an option to display the date range with starting and ending times for each day of the class.

Is that a bug? Hard to say. From the perspective of someone writing code to gather the data and display the page, it’s almost certainly not a bug—not a coding bug. If the requirement is to “display the starting and ending date and time of the event“, the code gathers that data from me and displays it correctly to my customers. But correctly doesn’t mean informatively.

Is it a bug in that the expressed requirement is wrong, then? Also hard to say. First, I haven’t seen the requirements document. I suspect that Eventbrite’s business is mostly single-day events, so the issue probably doesn’t come up that often, relative to the majority of cases. But it does come up for some people, and for some events. It did for me, and for my customer, this time.

Should Eventbrite be able to display the start and end times for each day of a multi-day event? Maybe. But that would be more complicated to code and harder to test. Maybe it’s not worth the trouble and the risk of trying it.

Should the start and end times be displayed with a time zone beside them? They are. Should those time zones be chosen relative to where the event is happening, or relative to the time zone for the person who is looking at the site? Eventbrite seems to provides the latter, but maybe it doesn’t; maybe it shows Eastern Time worldwide.

It doesn’t take long to enter the rabbit hole of possibilities: if the time is displayed relative to the viewer’s start and end time, what if that user is connecting to the page via a VPN in a time zone different from hers? I tried this, and it seems either Eventbrite figures out the time on my local system OR it displays its times in Eastern time worldwide. How can I be sure what gets displayed in Europe? Will European users be confused if they see the start and end times rendered as North American Eastern time?

What if the user will be traveling, and wants to know the time of the event where it’s being held? (This sure isn’t a problem in December 2020, but what happens when we’re travelling again?)

Should Eventbrite offer an option to display the date alone, and let those running the event identify the daily schedule some other way? Probably, but who’s to say?

And imagine that you’re working at Eventbrite: what should a tester’s role be in all of this?

Here’s what we say in RST: it’s the role of designers, programmers, and managers to develop requirements, designs, and programs that transform the complex, messy, social world of people and their needs into the simpler, cleaner, world of machines and their very stilted languages. It is the tester’s role to look for and to find problems in those transformations, so that the designers, programmers, and managers can recognize those problems and make decisions on how to deal with them.

To fulfill our role, we must experience, explore, and experiment with the product and its requirements. We must develop an understanding of how people might use the product, and how they might be perplexed or surprised or annoyed by it.

When the product is being put in front of people who haven’t seen it, we must struggle to maintain the perspective of the first-timer. When the product is placed in a domain in which it will be used by experts, we must develop expertise in that domain, as quickly and as deeply as we can.

The tester can participate in the development of requirements, design, and code, and can make suggestions about them. But anyone else can do that too—documentation people, customer support people, customers,…

What makes testers special in all this is the testers’ focus on problems. It’s our abiding faith that there are problems, and that those problems might matter to people who might be forgotten by the builders. It’s the tester’s special job to consider how the insider’s perspective might be different from the outsider’s perspective. Some people on the team might consider those things. No one else on the team is focused on them.

It’s the tester’s job to raise questions about the product, its requirements, and its design, and ask “Is there a problem here? Might there be a problem here? Is everyone okay with the product we’re developing? Is everyone willing to live with the problems that we’re aware of?” This is often socially awkward, because people who are focused on solving problems (like developers and designers and managers) often find it distracting and to some degree irritating to hear about new ones. Don’t you?

And, in this case, here’s the rub: the data and the display can be correct, but still fail to solve a problem for the someone who wants to know “What are the danged class times for each day?” Some people guess (and guess correctly); others are willing to wait for an answer (those people find out on a page that gets displayed after they register); and some people write to ask me. That underscores another point: a bug is not a property of a product; it’s a relationship between the product and some person.

It turns out that daily start and end times are hard to express in machine-friendly data structures, but easy to express in the free-form text description of the class that Eventbrite also affords on the class’ landing page. So upon recognizing the problem for one of my customers, and that the problem mattered to him, that’s how I addressed it.

If you’re interested in all this, you might be interested in the Rapid Software Testing Explored class, where we examine the nature of problems and how to look for them and report them skilfully to your clients.

Again: the class runs on four consecutive days, starting at noon CET. Each day, there are three webinars of 90 minutes, with a half-hour break between each one. Just so you know.

Bug of The Day: Bad Data Means Search for Book Title Fails

December 14th, 2020

This is your periodic reminder that data has problems, just like code does.

A correspondent on LinkedIn pointed me towards a book by George Lakoff, an author I admire. For some reason, I had not been aware of the book. So I looked it up. I wanted to go straight to it, so I put the title in quotes:

Where Mathematics Comes From

Hmmm. That’s a little strange. Nothing? Let’s try without the quotes.

Where Mathematics Come From

Do you see the problem? Do you see why the quoted search string didn’t work? It looks to me like there’s a bad entry in a database somewhere.

Data is messy. Data is often wrong. Data can trip up functions that might otherwise appear to be working fine.

Data needs to be checked and examined critically, just like program code does; and so do the interactions of good and bad data with program code. Otherwise, you might lose a sale, mess up a payment, or open the door to a security breach without noticing. That’s why, in Rapid Software Testing, we use a variety of ideas for covering the product and the things around it with testing.

Sure, you might have automated checks set up for certain functions and workflows through your product. That’s fine, and a good thing. Are you using the power of automation to help find problems with your data?

A Naïve Request from Management

October 21st, 2020

A tester recently asked “If you’re asked to write a ‘test plan’ for a new feature before development starts, what type of thing do you produce?”

I answered that I would produce a reply: “I’d be happy to do that. What would you like to see in this test plan?”

The manager’s reply was, apparently, “test cases covering all edge cases we’ll need to test”.

That’s a pretty naïve request. Here’s my answer:

“Making sure the product handles edge cases properly is definitely an important task. If I were to take your request literally—test cases covering all edge cases we’d need to test—it could take a lot of time for me to prepare, and a long time for you to review and figure out all the things I might have left out.

“And there’s another issue: I don’t know in advance what all the edge cases are, or even what they might be—and neither do the developers, and neither do you. No one does. But that’s okay! We can start right now by learning about possible edge cases through testing. We can’t perform testing on a running product yet, obviously, but we can perform some thought experiments and test people’s ideas about the product.

“So how about I give you a short summary—a list or a mind map—of some of the broad risk areas we can start considering right away? We can share the list with the developers to help them anticipate problems, defend against them, and check their work. That will greatly reduce the need to test edge cases later, when the product has been built and the problems are harder to find.

“We can add to that risk list as we develop the product—and we can take things off it as we address those risks. That will help focus the testing work. When we start working with builds of the product, I’ll explore it with an eye to finding edge cases that we didn’t anticipate. And I’ll keep the quick summaries coming whenever you like. You can review those and give me feedback, so that we’re both on top of things all the way along.”

The software business, alas, still runs on folklore and mythodology about testing. Too few managers understand testing. Many managers—and alas, many testers—don’t realize that testing isn’t about test cases, but are nonetheless addicted to test cases. When we provide responsible answers to naïve questions, we can help to address that problem.

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.

Regression Testing and Discipline

October 9th, 2020

Another tester on an “Agile” team complains of being overwhelmed by the volume of regression testing he says he must do at the end of each sprint.

Why are some development organizations fixated on regression testing? Not why do they do it (that can be quite reasonable), but why are they fixated on it? I have a theory.

It goes without saying that every change to the product or system holds the risk of problems that could cause quality to backslide in some sense. That’s regression, slipping backwards to some presumably less advanced state. Regress is the opposite of progress.

With change, there’s a risk of regression, so it seems sensible to focus some testing on that risk. But is testing a sure-fire, reliable way to deal with the risk of regression?

Sure-fire? No. Testing can certainly help to find bugs, so that bugs can be recognized and dealt with. But no matter how thorough testing is, or how early it starts, testing can miss bugs too. So let’s remember that the easiest bug to deal with is the one that is never hatched in the first place; the next easiest is the one that gets squashed before it can bury itself in a mass of code.

No matter how skillful or powerful the testing, to some degree, finding a bug remains a matter of luck. In the face of regression risk, we’d prefer not to leave things at that; better to start with fewer bugs to reduce our dependence on luck. Thus, it would seem like a good idea for the people making the changes to avoid bugs by working in a careful and disciplined way.

Discipline, says Chambers, is “1. training designed to engender self-control and an ordered way of life; 2. The state of self-control achieved by such training.” The idea of self-control suggests the idea of agency, which is essential to exploratory work, which is in turn essential to engineering work.

Depending on the product, the project, and the preferences of the individual programmer and the programming team, what might we see and hear as they did disciplined work? Try pausing for a moment to remember the scene when you noticed people doing work you considered “disciplined”.

How’s your list? Here are a few things I’ve seen and heard from time to time in work I’d call “disciplined”:

  • When a change or a new feature was on the table, groups of people reviewed and discussed ideas to understand the change and the motivation for it. Talk was focused on making the system better, and on the problems that the changes were intended to solve. But that focus softened and sharpened, zoomed in and zoomed out, and moved around to help people see everything they could see—including problems. People often disagreed, but they were willing to try little experiments to sort out the disagreements.
  • I’ve seen people consulting with colleagues and with users to get a variety of ideas about design, implementation, and risk. Conversations happen at desks and in conference rooms, but also outside the office, in restaurants, eating, drinking, joking, walking, playing games, shopping… Discipline gets relaxed sometimes. Social life can foster trust and responsibility that helps people aspire to discipline.
  • I’ve seen people using talk, text, tables, sketches, diagrams, stories, mind maps, toys, and props to help describe things in lots of different ways for analysis and for memory. Disciplined work often seems associated with careful note-taking, too.
  • In disciplined shops, order doesn’t necessarily come right away; sometimes it has to be bootstrapped. Stuff tends to start messy and get more tidy if it needs to; when things get too formal too soon, ideas get lost. Development work is one way of life, and a self-controlled, ordered way of life often starts with being uncontrolled and disordered when we’re starting to build something new. Order emerges.
  • Some disciplined places were quiet and focused, but in others I heard lots of regular background chatter, too. Highlights were stories about how people solved problems—and created new ones on the way. Storytelling of this kind helped people to think about risk in a vivid way, which prompted thinking about discipline.
  • I’ve heard open and honest disagreement when there were things worth disagreeing about. I’ve people getting upset… and taking responsibility for working things out. Discipline isn’t always smooth.
  • I saw builders paying attention to testability—which includes simplicity, cleanliness of code, modularity, visibility, and controllability—to make it easy to do less expensive deeper testing later on.
  • In the disciplined shops, the developers were resolved not to take on too much change all at once. They would make patient, careful, reflective, unhurried changes, and try them out themselves. When they felt the work was ready for other people, they’d make it easily accessible, asking for and getting feedback right away.
  • While designing, building and trying things, developers would try to anticipate potential exceptions and error conditions, and they’d generally be quite successful. Then they would give the product to someone else to test, whereupon they would learn something about what they had missed.
  • Developers who were really good at debugging carefully tried out specific little changes as they worked on solving a problem.
  • The disciplined builders would tend to have a sober preference for reliable, widely-used, field-tested components over a mad rush to implement new stuff developed from scratch. As a consequence, there tended to be fewer surprising bugs.
  • I’ve seen programmers whose style was test-first or test-driven development—and who were given the time to apply it. And I’ve worked with disciplined programmers who don’t bother with TDD, exercising discipline in other ways.
  • I’ve seen code that contained inline assertions in debug builds. I’ve seen exception handling built into the product and logs to report on its status. (Every now and again, I see well-thought-out, helpful error messages.)
  • I’ve seen see developers checking their own work with configuration checks, unwanted-change detectors, and unit testing, including programmed output checks.
  • I’ve watched people spending hours and days in each other’s offices or cubicles, doing pair programming for immediate, real-time review.
  • I’ve seen formalized review sessions throughout—wherein new developers learned from more senior developers and, interestingly, vice-versa.
  • I’ve seen developers using lots of appropriate tools to see hidden things, or to see unhidden things in different ways (e.g. IDE syntax checking while writing code; attention to compiler warnings; database schema diagramming; dependency checking; profiling for performance; etc….);
  • I’ve seen consistent refactoring for readability, maintainability, and portability; paying down technical debt, as they say.
  • I’ve listened in on discussions about the development of shared coding styles, which also helped with readability.
  • I’ve observed developers keeping careful notes about setup procedures and configuration settings.
  • I’ve watched the entire team working collaboratively throughout so that there are lots of eyes and minds to notice things that could go wrong.
  • I’ve seen teams cultivate good relations with technical support.
  • I’ve noticed disciplined people who went home consistently on time. Also, disciplined people who stayed late from time to time.
  • In disciplined shops, I’ve seen shared skepticism about the completeness, accuracy, or relevance, of requirement statements, acceptance criteria, or a “definition of done”. Amidst optimism, I’ve noticed a suspended certainty about whether things were really done.
  • Disciplined shops often do frequent bursts of shallow, non-invasive interactive testing near the coal face, to help confirm that what the programmers were doing is reasonably close to what they intended to do.
  • I’ve seen project managers provide support staff, including people to set up test systems, to help keep track of the backlog, and a group administrator to help the manager in acquiring resources.
  • I’ve seen frequent building, to make builds for deep testing and bug fixes available at the drop of a hat. But I’ve also seen relatively infrequent yet still reliable building, too.

These are ideas and practices I’ve seen people applying to help them keep on track while building products. Most or all of these things would be done by the developers in collaboration with people working reasonably close to them (some of those people might be testers, and others might not be).

Each item on the list lends a kind of discipline to a development process. Each one represents something people might mean when they murmur something vague about “building quality in”. They’re heuristics, not rules. No one did all of them. I’ll bet you’ve got a ton of stuff on your list that’s missing from this list. Notice, too, how each item above could represent disciplined action in one context and a lapse of discipline in some other context.

Discipline doesn’t have to be burdensome, bureaucratic, or otherwise slow. Informal actions can support discipline, and help people find out where they might need to apply discipline. Remember, according to Chambers, discipline means “self-control to obtain an ordered way of life”; the self-control part suggests that discipline comes from within, rather than being imposed from outside.

Some forms of discipline might feel slow to some, at first, but prudent driving feels slow to people who are used to driving recklessly. When we’re driving, we almost always drive more slowly than we could possibly drive. Driving faster than that increases the risk that we’ll arrive late—or not at all.

Some of the discipline-related activities above represent some form of testing; others don’t. However, the processes of building a product are very different from the processes of experiencing a product. Bugs, especially of the latter kind, can elude even a disciplined development process. Accordingly, it makes sense for there to be different kinds of testing: testing for examining a product as it’s being built; and testing for obtaining experience with the built product.

So when builds are available, it’s probably wise to do some periodic deeper testing, some of it focused on potential, reasonably foreseeable, undesirable effects and side effects of a change—the risk of regression. That regression testing can be far better targeted when the product has been carefully built and already tested to some degree.

Deep testing doesn’t have to happen on every build; indeed, it probably shouldn’t. In lots of places, it can’t. Testing for hidden, rare, subtle, intermittent, emergent bugs tends to take time—the kind of time that can interrupt or slow down development. It can take a while to set up data and tools for deep testing. When systems have complex interactions, problems emerge at the interfaces between things that worked fine on their own. Working out those interactions and studying them in a search for problems can take time. That time might be worthwhile when safety or health or money are on the line. If there’s discipline in the building, the rewards of testing a build deeply tend to dominate the risk of skipping a few well-controlled builds.

Critical distance can aid deep testing to be done by people at some critical and even social distance from the people who are changing the product. Risk is a big deciding factor on that score—including the risk of regression.

And there’s the rub. In many organizations, people don’t mandate, or foster, or do well-disciplined work; or they exercise discipline in a very shallow way, cherry-picking one or two items from the list above, and ignoring the others. In such organizations, it seems as though the object is for the developers to write code, rather than to write code that works.

But perhaps, triggered by subconscious recognition of the risk of regression, managers (and, often, testers) feel compelled to do an overwhelming amount of expensive work: sitting at the keyboard and repeating every scripted test procedure that has been performed before, as quickly as possible. When you ask them why, they often reply, “because the developers have no idea of what might be affected by this change.” Then some of them proceed to convert those scripted procedures into automated scripted procedures, whereupon they gain a second undisciplined development project and a new maintenance nightmare. And they feel even more overwhelmed.

If someone feels overwhelmed, that’s a sign that there’s something probably something overwhelming going on.

If the developers really do have no idea about what might be affected by change, then that’s a problem—one that the organization should definitely address. It’s like the principle that you shouldn’t try to automate a process that you don’t understand; when you’re working with something important, you shouldn’t rush to change it unless and until you’ve got a reasonably good idea of the extents and effects and risks of the change, and how to manage them.

Now: there’s a problem here for testers. Testers don’t design, write, or fix the code. Many testers don’t have significant programming experience; of the few who do, few have experience with writing production code. Testers don’t manage the project, and very few testers indeed have been project managers. Testers don’t manage the developers. In light of that, it’s inappropriate, in my view, for testers to tell programmers and managers how to do their jobs. Testers cannot and should not try to force, or enforce, discipline.

It’s quite reasonable, though, for testers to report on problems with the product. It’s reasonable for testers to identify patterns of problems related to particular coverage areas or quality criteria. It’s reasonable for testers to report on patterns of regression-related problems.

It’s also reasonable for testers to report on where testing time is going. If investigating and reporting shallow bugs is dominating testing work, testers will obtain less thorough coverage of the product. Developers and managers need to be aware of that. If troubleshooting and maintenance of automated checks is swamping the testers’ ability to gain critical experience with the product, that’s noteworthy; that work will displace the testers’ opportunities to learn about the product deeply, and perform new experiments on it. Things that slow down testing and make it harder allow deeper and possibly more dangerous bugs to hide and survive.

That’s why it’s important for testers to learn the skills of analyzing and describing the state of the product, the state of the testing, and the quality of the testing—including problems that threaten any of these things. It seems that managers and developers are often unaware of problems of lapsed discipline. Testers shouldn’t be trying manage the project, but they can shine light on the problems.

Obsession with regression testing is a hint that something else might be amiss in the process that leads to it. Sure, it’s a good idea to do some testing after a change. But it’s a lot less expensive to test after a change when people have been testing during the change.

Discipline is a heuristic for reducing the risk of regression and the need for regression testing. When people apply discipline, the effects of change tend to be better known, the code tends to be cleaner, the feedback loops get faster, and the risks tend to be lower—and deep testing can become targeted on the risk, faster, cheaper, and deeper—helping to find hidden problems that matter.

====================

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.

To Avoid Trouble Successfully, We Must Look For It

September 28th, 2020

Software testing can be socially difficult because of people’s natural desire to avoid trouble. This prompts them to avoid thinking about trouble, which means that they don’t look for it. But if you don’t try to find the trouble that’s in your product, that trouble will eventually find you.

Some might say we do think about trouble, and we try to avoid it by getting clear on our intentions in design work, and by checking our work as we go. Those are fine things to do, but they come with their own problems. In design and planning, we are often unaware of problems that may emerge as we combine elements in a system. Developers are rationally and justifiably resistant to slowing down the pace of their work. Even when we do our best, some problems will elude us.

So when value is at risk, when risk is significant, and when that risk can manifest as real problems that hurt people, deep testing done efficiently is a responsible thing to do—and not doing it means we did not do our best.

A correspondent on LinkedIn, Aaron Emery, asks:

How do you suggest dealing with management that want to ‘shoot the messenger’ in instances like these?

It depends on the management, the message, and the messenger.

Some social awkwardness can come from the message itself and the way it’s frame. “This feature sucks” is probably not as easily digestible as “this behaviour in the product is inconsistent with this requirement noted in the spec” or “…inconsistent with this other part of the product” or “…with what we’ve seen in previous versions of this product” or “…with reasonable desires of this until-now-forgotten user”. Point out the inconsistency dispassionately, and let the receiver of the message come to his or her own feelings about it. In other words: know your oracles.

Another approach is to point out that the message, although momentarily bad news, is offered in order to help make everyone look their best. “Yes; fixing this might take some work, but at least we won’t be inflicting it on customers” — or even “Yes, even though we’re not going to fix this, at least tech support will be prepared for it and can offer a workaround.”

It’s critical for testers to know that the product doesn’t have to look or behave the way we want it to. We don’t design the product, we don’t code it, we don’t sell it, and we don’t run the business. We’re trying to help our testing clients understand the product they’ve got, so that they can decide whether it’s the product they want. So if the client hears us and understands the nature of the product but doesn’t want to fix it, that’s fine—and that’s not shooting the messenger, either. That’s business.

If management says “why are you only telling us about this NOW?”, the reply is “because I only found out about it now. It’s a pity our planning and our coding discipline didn’t prevent this problem, but at least now we can fix it while there’s still time, or learn from this experience.”

If management is truly reckless and wants to suppress awareness of problems, driving the school bus blindfolded, then they probably don’t want your services as a tester. That’s okay too; testing is always optional — and so is your choice of testing clients. You might want to avoid that company’s products in the future, though.

====================

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.

Lessons Learned from a Little Bug

September 5th, 2020

Almost 10 years ago, I wrote a series of blog posts on project estimation and black swans.

And, almost 10 years after that, Chris NeJame reported an observation about the following passage towards the end of Part 4 of the series:

As Jerry (Weinberg) has frequently pointed out, plenty of organizations fall victim to back luck, but much of the time, it’s not the bad luck that does them in; it’s how they react to the bad luck.

Did you notice the problem?

Chris did. He courteously reported “possible typo on part 4: ‘back luck’?” Whereupon I fixed the bug.

What are the lessons to be learned here? Lots, I think.

  • Bugs can exist and persist without the author noticing them. As usual with all of the posts I write, I pored over that one as I was writing it. (You wouldn’t believe how long it takes me to write a blog post.) I read it over and over again; I found tons of errors and fixed them. And yet I still didn’t see the “back luck” error. Everyone, everyone, is prone to oblivion to problems in their own work to some degree. When we’ve been looking at something for a long time, our capacity to notice specific bugs diminishes.
  • Bugs can exist without users noticing them, either. Human beings repair problems in communication, often with no conscious effort. When some people’s eyes gather a string of text (“fall victim to ba-something luck”), the sensemaking faculty in their brains may repair the problem and it won’t come to their attention at all. For others, the flow of reading might be interrupted momentarily. They’ll make sense of “back luck” as they read the following two instances of “bad luck”, repair the problem in their minds, and move on.
  • Fresh eyes find failure. That’s one of the most concise and memorable lines from Lessons Learned in Software Testing. This was the first time that Chris had read the post. He had fresh eyes and critical distance from the author’s perspective, making it easier for him to see the problem than it was for me.
  • When testing, it helps to look at things at different times. For several minutes, “One of the most concise…” in the paragraph read “On of the most concise…” I tend to write blog text at the same time as I’m marking it up, and the “list item” tags affect the way I read, so that problem persisted for a while.
  • When testing, it helps to look at things in different ways. As I was composing this post, I was focusing on the words of the text, as usual. I wasn’t focusing much on the presentation. At best, I was imagining it. When I switched to Preview Mode, I began to realize that a single sentence in bold at the beginning of each lesson would help the lessons to stand out. That wasn’t as obvious when writing in text and markup. One antidote to this is to look at the post in preview mode, where such errors are easier to see.
  • The developer’s experience of something is profoundly different from the customer’s experience. When I’m writing something in text, my ideas about the experience of reading it are both imaginary and vague. There’s no replacement for experiencing the product and interacting with it the way its users do. This is why experiential testing is so important. (Please don’t call it “manual testing”.)
  • Bugs can persist for a long, long time without being reported. Chris isn’t alone; I once found a similar problem in a book by Jerry Weinberg. When I reported it to Jerry, he told me that the error had been around for 30 years or so.
  • The idea that “the users will report the bugs” is bogus. People who dismiss the value of testing, or of testers, often use this argument. It’s silly. Lots of users won’t notice the bug. Lots of users will notice it, and won’t report it. Lots of users will notice it, won’t report it, and simply won’t use (or buy) your product. And you won’t hear anything from them. Some users will notice it and report it, but sometimes your crazy-busy support people won’t report it to you. Although it’s possible, Chris was almost certainly not the first person to notice the problem. But he is the first to have bothered to report it.
  • A bug that is not important to your users might be important to you, and vice versa. My readers didn’t notice the problem, or were sufficiently unconcerned about it not to mention it, probably thinking that it didn’t matter. I care about not looking sloppy in blog posts, so it mattered to me.
  • It takes time and energy to report a bug. Therefore, it might be a really, really good idea to eliminate any friction in reporting a bug, both for users and for testers.
  • Checking tools may help us to find checkable problems. Spelling checkers may help us to find spelling errors. Grammar checkers may help us to find grammatical errors (although few of them, in my experience, are much good). The spelling checker built into the browser has flagged several typos which I was able to notice and fix in the course of writing this post. Hurrah.
  • Checking tools can be unreliable. As I write, I’m noticing that after WordPress refreshes a page that is being edited, the spelling checker built into the browser doesn’t flag all of the spelling errors in the text editing window. It only flags an error if the insertion point (that is, the text editing cursor) has been placed in the paragraph with the error in it. I almost missed a bunch of errors because of that. Meanwhile, the spelling checker is flagging “checkable” in the paragraph above, but that’s exactly the word I want, even though it’s not in the browser’s dictionary. And the point is…
  • To find problems, machinery can help, but there’s no replacing human observation and judgment. Checking tools don’t understand our intentions. In the “back luck” case, there was nothing wrong with the spelling of the words, nor was anything wrong with the syntax of the sentence. It was the meaning of the sentence, the semantics of it, that was wrong. In the “checkable” case, I’m using a neologism that humans can interpret just fine. The spelling checker won’t alert me to a missing word, either, and no tools can tell me that the blog post I’ve written is the blog post I want.
  • Critics are important. Some people (like me, and like Chris) have a capacity and a predilection for spotting problems in other people’s work. We have the critic’s mindset, even though we may be oblivious to certain problems in our own work. It’s a very a good thing for testers to have that mindset, and to engage testers who have it. But…
  • It’s socially risky to be a critic. It’s good to know about errors, in the long run, but not many people always love being confronted with errors. It helps for testers to remember that. So…
  • Excellent testers manage social risk. Although he said “possible bug”, I’ll bet that Chris was pretty much certain that there was a bug, and that I would agree. Yet by saying “possible bug”, he left me in charge of the decision about whether there was a bug. This is an important move for a tester. It helps to acknowledge that authors (programmers, designers, managers…) get to decide whether the product they’ve got is the product they want, and that they are responsible for the quality of the work. This can help soften the blow of confronting yet another damned error.
  • We can learn a lot from a small problem. Here’s a case where the problem is two letters where there should have been one, and of those two, one was wrong. This is no big deal in a blog post, but in a software products, two bytes can make a difference between a working product and a devastating problem. Such problems can remain invisible for years until suddenly, one day, they’re not. Yet even though this is a relatively trivial problem, look at what we can learn from it, if we choose! And look at how reflecting on the problem leads to experiences that lead to even more learning!

Thanks to Chris for reporting the bug, but also for triggering the opportunity to explore these lessons.

What lessons would you add?

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

Testing Doesn’t Add Value to the Product

August 29th, 2020

Testers consistently ask how to show (or demonstrate, or prove, or calculate) that testing adds value.

Programmers, designers, and other builders create and add value by creating and building and improving the product. Testing does not add value to the product. And that’s fine.

Managers assure quality by helping programmers, designers, and others to obtain the resources they need, and by removing (or at least reducing) obstacles to their work. Testing does not assure quality. That’s fine too.

Testing does not add value to the product. You can test all you like and the product won’t get any better, nor will it get any worse. Similarly, weighing yourself will neither increase nor reduce your weight.

Based on what you read off the scale when you weigh yourself, you might choose some action to increase or decrease your weight. Based on the observations that we make and the problems that we find in testing, people might choose improve to the product in some way—or to live with the product they have. Testing itself adds no value to the product, though, and that’s fine.

Testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, investigation, critical thinking, risk analysis, etc. Testing helps us to learn the actual status of the product. Significantly, testing provides people with a means of determining whether there are problems in the product that threaten its value. By revealing problems in the product, and analysing those problems, testing can also help to cast light on problems in the project that can contribute to product problems.

In other words: testing doesn’t add value; it provides value. Testing helps people to understand the product they’ve got, to help them decide whether it’s the product they want. That can be valuable, since deep, accurate knowledge about the actual product, its actual status, and problems in it can have considerable value for the people who are building product and managing the project. Without testing—experiments on the product —our theories about the goodness of the product are not grounded in experience of the product. They’re only theories; or beliefs, or hopes, or wishes.

So don’t worry about whether testing is adding value. It isn’t, and that’s not a problem. Consider instead whether testing is providing value to people who need to know deeply about the product (and especially about problems and risks that threaten its value), and who need to make decisions about it. Consider critically (and self-critically) whether testing is providing valuable knowledge at reasonable speed and reasonable cost—and whether your clients would agree with your assessment. If it isn’t, or they wouldn’t, that’s a problem. Fix it.

Further reading:

There Is No ROI in Social Media Marketing (read this article, replacing “social media marketing” with “testing”)
How is the Testing Going?
Testers, Get Out of the Quality Assurance Business

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

Expected Results

August 23rd, 2020

Klára Jánová is a dedicated tester who studies and practices and advocates Rapid Software Testing. Recently, on LinkedIn, she said:

I might EXPECT something to happen. But that doesn’t necessarily mean that I WANT IT/DESIRE for IT to happen. I even may want it to happen, but it not happening doesn’t have to automatically mean that there’s a problem.

The point of this post: no more “expected results” in the bug reports, please!

In reply, Derek Charles asked:

Then how else would you communicate to the developer or the team what is SUPPOSED to happen? I think that expected results are very necessary especially when regressions are found during testing.

Klara replied:

I suggest to describe the behavior that the tester recognizes as problematic and explain WHY it might be a problem for someone—the reasoning why the behavior is perceived as a bug—that’s what really matters.

Exactly so. Klára is referring here to problems and oracles—means by which we recognize problems when we encounter them in testing.

There’s an issue with the “what is supposed to happen” stuff: in development work, what is supposed to happen is not always entirely clear. Moreover, and more importantly, since testers don’t run the project or the business, we don’t mandate what is supposed to happen.

For instance, while testing, I may observe something in the product that I find confusing, or surprising, or wrong. When I look up the intended behaviour in the specification, it says one thing; the developer, claiming that the spec is out of date, contradicts it; and the product owner confirms that the spec is outdated. But she also says that the developer’s interpretation of what should happen is not what she wants him to implement. And then, when I consult an RFC, the product owner’s interpretation is inconsistent with what the RFC says should be the appropriate behaviour.

Fortunately, I don’t have to decide, and I don’t have to say what should happen. My job as a tester is to report on an apparent inconsistency between the product and presumably desirable things, or between the product and someone’s expressed desire or requirement. In the case above, I let the product owner know about the inconsistency between her interpretation and the standard, and she makes the call on what she and the business want from the product.

That is, even though I have certain expectations, I might be wrong about them and about what I think should be. For instance, she might decide that our product is not going to support that standard. She might point out that the standard I’m considering has been superseded by a later one. In any case, what is supposed to happen gets decided not by me, but by the people who run things. That’s what they’re paid for. This is a good thing, not a bad thing.

But still, I’d like to honour Derek’s question: as testers, how should we report a problem without referring to “expected results”?

  • Instead of saying “expected result” and leaving it that, we could say “inconsistent with the specification”.

    Inconsistency with the specification is a special case of a more general way of recognizing and describing a problem: inconsistency with claims. “Inconsistency with claims” is an oracle heuristic. (A heuristic is a fallible means for solving a problem; an oracle is a special kind of heuristic which, fallibly, helps you to solve the problem of identifying and describing a bug.) When a product is inconsistent with a claim that someone important makes about it, there’s likely a problem, either with the product or the claim. As a tester, I don’t have to decide which.

    The specification is a particular form of a claim that someone is making about what the product is like, or what it should be like. Claims can be made in design sessions, planning meetings, pair programming, hallway conversations, training workshops… Claims can be represented in help files, marketing materials, workflow diagrams, lookup tables, user manuals, whiteboard sketches, UML diagrams… Claims can also be represented in the code of an automated check, where someone has written code to compare the output of the product with an anticipated and presumably desirable result. Recognizing many sources of claims and inconsistencies with them makes us more powerful testers.

    Whatever relevant claim you’re referring to, having said “inconsistent with a claim” (and having identified the nature of the claim, and where or whom it comes from), you don’t need to say “expected result”.

  • Instead of saying “expected result” and leaving it that, you could say “inconsistent with how the product used to work”.

    Inconsistency with history is an oracle heuristic. After a change, the product might have a new bug in it. On the other hand, the product might have been wrong all along, and now it’s right. (This is an example of how oracles can mislead us or conflict with each other, which is why it’s a good idea to identify the oracles we’re applying in problem reports.) If you (or others) aren’t aware of why the desirable change was made, that’s a different kind of problem, but a problem nonetheless.

    Either way, having said “inconsistent with how the product used to work” (and having described that in terms of a problem), you don’t need to say “expected result”.

  • Instead of saying “expected result” and leaving it that, you could say “inconsistent with respect to the product itself”.

    Inconsistency within the product is an oracle heuristic. This can takes a number of forms: the product might return inconsistent results from one run to the next; the product could afford a tidy, smooth interface in one place, and a frustrating, confusing interface in another; the product could present output very precisely in one part of the product, and imprecisely in another; one component in the product could log output using one format, while another component’s log output is in a different format, which makes analysis more difficult…

    The inconsistency might be undesirable (because of a reliability problem), or it might be completely desirable (a Web page for a newspaper should change from day to day), or it might desirable or undesirable in ways that you’re not aware of (since, like me, you probably don’t know everything).

    In general, people tend to prefer things that present themselves in a consistent way. Here’s a trivial example from Microsoft Office (Office 365, these days): to search for text in Word, the keyboard command is Ctrl-F. In Outlook, part of the same product suite, Ctrl-F triggers the Forward Message action instead; F4 triggers a search. Had Outlook and Word been designed by the same teams at the same time, this probably would have been identified as a bug, and addressed. In the end, the Office suite’s program managers decided that consistency with history dominated inconsistency within the product, and now we all have to live with that. Oh well.

    In any case, having said “inconsistent with respect to some aspect of the same product” (and having identified the specifics of the inconsistency), you don’t need to say “expected result”.

  • Instead of saying “expected result” and leaving it that, you could say “inconsistency with a comparable product” (and identify the product, and the nature of the inconsistency).

    Inconsistency with a comparable product is an oracle heuristic. Any product (something that someone has produced) that provides a relevant point of comparison is, by defintion, a comparable product. That includes competitive products, of course; Microsoft Word and Google Docs are comparable products, in that sense. Microsoft Word and WordPad are comparable products too; they have many features in common. If Word can’t open an .RTF file generated by WordPad, we have reason to suspect a problem in one product or the other. If WordPad prints an RTF file properly, and Word does not, we have reason to suspect a problem in Word.

    Is the Unix program wc (wc stands for “word count”) a comparable product to Microsoft Word? All wc does is count words in text files, so no, except… Word has a word-counting feature. If Word’s calculation for the number of words in a text file is inexplicably different from wc‘s count, we have reason to suspect a problem in one product or the other.

    Test tools and suites of automated output checks represent comparable products too. If the output from your product is inconsistent with the specified and desired results provided by your test tool, or with some data that it processes to produce such results, you have reason to suspect a problem somewhere.

    In any case, having said “inconsistent with a comparable product”, and having identified the product and the basis for comparison, you don’t need to say “expected result”.

Those are just a few examples. When we teach Rapid Software Testing, we offer a set of oracle heuristics that identify principles of desirable (and undesirable) consistency (and inconsistency) for identifying bugs; you can read more about those here.

James Bach has recently identified another principle that might apply to bugs but that, in my view, more powerfully applies to enhancement requests: we desire the product to be consistent with acceptable quality: that is, not only good, but every bit as good as it can be.

Why is all this a big deal? Several reasons, I think.

First, “expected result” begs the question of where the expectation comes from. It’s just a middleman for something we could say more specifically. Why not get to the point and say it while at the same time sounding like a pro? Because…

Second, being specific about where the expectation comes from saves time and focuses conversation on the (un)desirable (in)consistencies that matter when developers and product owners are deciding whether something is a bug worth fixing. It also helps to focus repair in the appropriate claim (for example, if the product is right and the spec is wrong, it’s a prompt to repair the spec).

Third, it helps for us to remember that our job as testers is not to confirm that the product works “as expected”, but to ask “is there a problem here?” A product can fulfill an expectation and nonetheless have terrible problems about it. It’s our job to seek and find and describe inconsistencies and problems that matter before it’s too late.

And finally…

Fourth, speaking in terms of an oracle instead of an “expected result” can help to avoid patronizing, condescending, time-wasting, and obvious elements of bug reports that cause developers to feel insulted or to roll their eyes.

Actual result: Product crashes.

Expected result: Product does not crash.

Don’t be that tester.

Further reading:

Not-So-Great Expectations
Oracles From the Inside Out
FEW HICCUPPS

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.