Blog Posts from August, 2020

Testing Doesn’t Add Value to the Product

Saturday, August 29th, 2020

Testers consistently ask how to show (or demonstrate, or prove, or calculate) that testing adds value.

Programmers, designers, and other builders create and add value by creating and building and improving the product. Testing does not add value to the product. And that’s fine.

Managers assure quality by helping programmers, designers, and others to obtain the resources they need, and by removing (or at least reducing) obstacles to their work. Testing does not assure quality. That’s fine too.

Testing does not add value to the product. You can test all you like and the product won’t get any better, nor will it get any worse. Similarly, weighing yourself will neither increase nor reduce your weight.

Based on what you read off the scale when you weigh yourself, you might choose some action to increase or decrease your weight. Based on the observations that we make and the problems that we find in testing, people might choose improve to the product in some way—or to live with the product they have. Testing itself adds no value to the product, though, and that’s fine.

Testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, investigation, critical thinking, risk analysis, etc. Testing helps us to learn the actual status of the product. Significantly, testing provides people with a means of determining whether there are problems in the product that threaten its value. By revealing problems in the product, and analysing those problems, testing can also help to cast light on problems in the project that can contribute to product problems.

In other words: testing doesn’t add value; it provides value. Testing helps people to understand the product they’ve got, to help them decide whether it’s the product they want. That can be valuable, since deep, accurate knowledge about the actual product, its actual status, and problems in it can have considerable value for the people who are building product and managing the project. Without testing—experiments on the product —our theories about the goodness of the product are not grounded in experience of the product. They’re only theories; or beliefs, or hopes, or wishes.

So don’t worry about whether testing is adding value. It isn’t, and that’s not a problem. Consider instead whether testing is providing value to people who need to know deeply about the product (and especially about problems and risks that threaten its value), and who need to make decisions about it. Consider critically (and self-critically) whether testing is providing valuable knowledge at reasonable speed and reasonable cost—and whether your clients would agree with your assessment. If it isn’t, or they wouldn’t, that’s a problem. Fix it.

Further reading:

There Is No ROI in Social Media Marketing (read this article, replacing “social media marketing” with “testing”)
How is the Testing Going?
Testers, Get Out of the Quality Assurance Business

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

Expected Results

Sunday, August 23rd, 2020

Klára Jánová is a dedicated tester who studies and practices and advocates Rapid Software Testing. Recently, on LinkedIn, she said:

I might EXPECT something to happen. But that doesn’t necessarily mean that I WANT IT/DESIRE for IT to happen. I even may want it to happen, but it not happening doesn’t have to automatically mean that there’s a problem.

The point of this post: no more “expected results” in the bug reports, please!

In reply, Derek Charles asked:

Then how else would you communicate to the developer or the team what is SUPPOSED to happen? I think that expected results are very necessary especially when regressions are found during testing.

Klara replied:

I suggest to describe the behavior that the tester recognizes as problematic and explain WHY it might be a problem for someone—the reasoning why the behavior is perceived as a bug—that’s what really matters.

Exactly so. Klára is referring here to problems and oracles—means by which we recognize problems when we encounter them in testing.

There’s an issue with the “what is supposed to happen” stuff: in development work, what is supposed to happen is not always entirely clear. Moreover, and more importantly, since testers don’t run the project or the business, we don’t mandate what is supposed to happen.

For instance, while testing, I may observe something in the product that I find confusing, or surprising, or wrong. When I look up the intended behaviour in the specification, it says one thing; the developer, claiming that the spec is out of date, contradicts it; and the product owner confirms that the spec is outdated. But she also says that the developer’s interpretation of what should happen is not what she wants him to implement. And then, when I consult an RFC, the product owner’s interpretation is inconsistent with what the RFC says should be the appropriate behaviour.

Fortunately, I don’t have to decide, and I don’t have to say what should happen. My job as a tester is to report on an apparent inconsistency between the product and presumably desirable things, or between the product and someone’s expressed desire or requirement. In the case above, I let the product owner know about the inconsistency between her interpretation and the standard, and she makes the call on what she and the business want from the product.

That is, even though I have certain expectations, I might be wrong about them and about what I think should be. For instance, she might decide that our product is not going to support that standard. She might point out that the standard I’m considering has been superseded by a later one. In any case, what is supposed to happen gets decided not by me, but by the people who run things. That’s what they’re paid for. This is a good thing, not a bad thing.

But still, I’d like to honour Derek’s question: as testers, how should we report a problem without referring to “expected results”?

  • Instead of saying “expected result” and leaving it that, we could say “inconsistent with the specification”.

    Inconsistency with the specification is a special case of a more general way of recognizing and describing a problem: inconsistency with claims. “Inconsistency with claims” is an oracle heuristic. (A heuristic is a fallible means for solving a problem; an oracle is a special kind of heuristic which, fallibly, helps you to solve the problem of identifying and describing a bug.) When a product is inconsistent with a claim that someone important makes about it, there’s likely a problem, either with the product or the claim. As a tester, I don’t have to decide which.

    The specification is a particular form of a claim that someone is making about what the product is like, or what it should be like. Claims can be made in design sessions, planning meetings, pair programming, hallway conversations, training workshops… Claims can be represented in help files, marketing materials, workflow diagrams, lookup tables, user manuals, whiteboard sketches, UML diagrams… Claims can also be represented in the code of an automated check, where someone has written code to compare the output of the product with an anticipated and presumably desirable result. Recognizing many sources of claims and inconsistencies with them makes us more powerful testers.

    Whatever relevant claim you’re referring to, having said “inconsistent with a claim” (and having identified the nature of the claim, and where or whom it comes from), you don’t need to say “expected result”.

  • Instead of saying “expected result” and leaving it that, you could say “inconsistent with how the product used to work”.

    Inconsistency with history is an oracle heuristic. After a change, the product might have a new bug in it. On the other hand, the product might have been wrong all along, and now it’s right. (This is an example of how oracles can mislead us or conflict with each other, which is why it’s a good idea to identify the oracles we’re applying in problem reports.) If you (or others) aren’t aware of why the desirable change was made, that’s a different kind of problem, but a problem nonetheless.

    Either way, having said “inconsistent with how the product used to work” (and having described that in terms of a problem), you don’t need to say “expected result”.

  • Instead of saying “expected result” and leaving it that, you could say “inconsistent with respect to the product itself”.

    Inconsistency within the product is an oracle heuristic. This can takes a number of forms: the product might return inconsistent results from one run to the next; the product could afford a tidy, smooth interface in one place, and a frustrating, confusing interface in another; the product could present output very precisely in one part of the product, and imprecisely in another; one component in the product could log output using one format, while another component’s log output is in a different format, which makes analysis more difficult…

    The inconsistency might be undesirable (because of a reliability problem), or it might be completely desirable (a Web page for a newspaper should change from day to day), or it might desirable or undesirable in ways that you’re not aware of (since, like me, you probably don’t know everything).

    In general, people tend to prefer things that present themselves in a consistent way. Here’s a trivial example from Microsoft Office (Office 365, these days): to search for text in Word, the keyboard command is Ctrl-F. In Outlook, part of the same product suite, Ctrl-F triggers the Forward Message action instead; F4 triggers a search. Had Outlook and Word been designed by the same teams at the same time, this probably would have been identified as a bug, and addressed. In the end, the Office suite’s program managers decided that consistency with history dominated inconsistency within the product, and now we all have to live with that. Oh well.

    In any case, having said “inconsistent with respect to some aspect of the same product” (and having identified the specifics of the inconsistency), you don’t need to say “expected result”.

  • Instead of saying “expected result” and leaving it that, you could say “inconsistency with a comparable product” (and identify the product, and the nature of the inconsistency).

    Inconsistency with a comparable product is an oracle heuristic. Any product (something that someone has produced) that provides a relevant point of comparison is, by defintion, a comparable product. That includes competitive products, of course; Microsoft Word and Google Docs are comparable products, in that sense. Microsoft Word and WordPad are comparable products too; they have many features in common. If Word can’t open an .RTF file generated by WordPad, we have reason to suspect a problem in one product or the other. If WordPad prints an RTF file properly, and Word does not, we have reason to suspect a problem in Word.

    Is the Unix program wc (wc stands for “word count”) a comparable product to Microsoft Word? All wc does is count words in text files, so no, except… Word has a word-counting feature. If Word’s calculation for the number of words in a text file is inexplicably different from wc‘s count, we have reason to suspect a problem in one product or the other.

    Test tools and suites of automated output checks represent comparable products too. If the output from your product is inconsistent with the specified and desired results provided by your test tool, or with some data that it processes to produce such results, you have reason to suspect a problem somewhere.

    In any case, having said “inconsistent with a comparable product”, and having identified the product and the basis for comparison, you don’t need to say “expected result”.

Those are just a few examples. When we teach Rapid Software Testing, we offer a set of oracle heuristics that identify principles of desirable (and undesirable) consistency (and inconsistency) for identifying bugs; you can read more about those here.

James Bach has recently identified another principle that might apply to bugs but that, in my view, more powerfully applies to enhancement requests: we desire the product to be consistent with acceptable quality: that is, not only good, but every bit as good as it can be.

Why is all this a big deal? Several reasons, I think.

First, “expected result” begs the question of where the expectation comes from. It’s just a middleman for something we could say more specifically. Why not get to the point and say it while at the same time sounding like a pro? Because…

Second, being specific about where the expectation comes from saves time and focuses conversation on the (un)desirable (in)consistencies that matter when developers and product owners are deciding whether something is a bug worth fixing. It also helps to focus repair in the appropriate claim (for example, if the product is right and the spec is wrong, it’s a prompt to repair the spec).

Third, it helps for us to remember that our job as testers is not to confirm that the product works “as expected”, but to ask “is there a problem here?” A product can fulfill an expectation and nonetheless have terrible problems about it. It’s our job to seek and find and describe inconsistencies and problems that matter before it’s too late.

And finally…

Fourth, speaking in terms of an oracle instead of an “expected result” can help to avoid patronizing, condescending, time-wasting, and obvious elements of bug reports that cause developers to feel insulted or to roll their eyes.

Actual result: Product crashes.

Expected result: Product does not crash.

Don’t be that tester.

Further reading:

Not-So-Great Expectations
Oracles From the Inside Out
FEW HICCUPPS

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

“Why Didn’t We Catch This in QA?”

Thursday, August 13th, 2020

My good friend Keith Klain recently posted this on LinkedIn:

“Why didn’t we catch this in QA” might possibly be the most psychologically terrorizing and dysfunctional software testing culture an organization can have. I’ve seen it literally destroy good people and careers. It flies in the face of systems thinking, complexity of failure, risk management, and just about everything we know about the psychology involved in testing, but the bully and blame culture in IT refuses to let it die…”

There’s a lot to unpack here. Let’s start with this: what is “QA”?

If “QA” is quality assurance, then it’s important to figure out who, or what, assures quality—value to some person(s) who matter(s).

Confusion abounds when “QA” is used as a misnomer for testing. Testing is not quality assurance, though it can inform quality assurance. Testing does not assure quality, no more than diagnosis assures good health.

In terms of health, there’s no question that we want good diagnoses so that we can become aware of particular pathologies or diseases. If we’re in poor health, and we’re not aware of it, and diagnosis doesn’t catch it, it’s reasonable to ask why not, so that we can improve the quality of diagnosis. The unreasonableness starts when someone foolishly believes that diagnosis is infallible, or that it assures good health, or that it prevents disease—like believing that lab technicians and epidemiologists are responsible for COVID-19, or for its spread.

Once again, it is high time that we dropped the idea that testing is quality assurance. Who perpetuates this? Everyone, so it seems, and it’s not a new problem. At very least, it would be a great idea if testers stopped using the label to describe themselves. As long as testers persist in calling themselves “QA”, the pandemic of ignorance and blame will continue.

What, or who, does assure quality, then?

In one sense, everyone who performs work has agency or authority over it, which includes an implicit responsibility to assure its quality, just as everyone is responsible to maintain the health of his or her mind and body. Assuring the quality of our work a matter of craft; self-awareness; diligence; discipline; professionalism; and duty of care towards ourselves, our clients and our social groups. If we’re adults, no one else is responsible for washing our hands.

In everyday life, we make choices about lifestyle, diet, and hygiene that influence our health and safety. As adults, those choices, whether wise or reckless, are our responsibility. At work, our agency affords freedom and responsibility to push back or ask for help when we’re pressed to do work in a way that might compromise our own sense of quality. And our agency enables us to leave any situation in which we are required to behave in ways that we consider unprofessional or unethical.

Part of maintaining personal health is maintaining awareness of it. That means asking ourselves how we feel, and soliciting the help of others who can sometimes help us become aware of things that we don’t see, like personal trainers, doctors, or counsellors. Similarly, assuring quality in our work involves evaluating it—often with the help of other people—to become aware of its state, and in particular, its limitation and problems.

Other people might help us, but as authors of our own work, we are responsible for making those evaluations, and we are responsible for what we do based on those evaluations. Choices that bear on our health, or on the quality of our work, are ours to make.

So, in this sense, “why didn’t we catch this in QA?” would mean “why did we not assure the quality of our own work?” And at the centre of that “we” is “I”.

In another sense, responsibility for the quality of work and workplace resides in the management role. While we’re responsible for washing our hands, management is responsible for providing an environment where handwashing is possible—and for ensuring that people aren’t pushed into conditions where they’re endangering themselves, each other, or the business.

Insofar as management engages people to do work and make products, management is responsible for determining what constitutes quality work, and deciding whether the product has met its goals. Management decides whether the product it’s got is the product it wants—and the product it wants to ship. Management can ask testers to learn about the product on management’s behalf, but management is ultimately responsible for assuming the risk of unknown problems in the product.

Management is responsible for setting the course; for co-ordinating people; for marshaling resources; for setting policy; for providing help when it’s needed; for listening and responding and acting appropriately when people are pushing back. While testers help management to become aware of the status of the product, management is responsible for evaluating the quality of the work and the workplace, and for deciding (based on information from everyone, not only testers) whether the work is ready for the outside world.

Management assures quality by creating the conditions that make it possible for people to assure the quality of their own work. And management fails to assure quality when it sets up conditions that make quality assurance impossible, or that undermine it. In that case, “why didn’t we catch this in QA?” would mean “why didn’t management assure the quality of the work for which it is responsible?”

When people get sick, it’s reasonable to ask how people got sick. It’s reasonable to ask what they might need and what they might do to take better care of themselves. It’s also reasonable to ask if government is providing sufficient support for individual health, public health, and public health workers. It’s even reasonable to ask how better epidemiology and diagnosis could help to sound the alarm when people and populations aren’t healthy. It’s not reasonable to put responsibility for personal or public health on the epidemiologists and diagnosticians and lab techs.

So “Why didn’t we catch this in QA?” is a fine question to ask when it means “Why did we not assure the quality of our own work?” or “Why didn’t management assure the quality of the work for which it is responsible?” But don’t mistake testing for quality assurance, and don’t mistake the question for “Why didn’t testers assure the quality of the product?” And if you’re a tester, and being asked the latter question, reframe it to refer to the previous two.

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.

It’s Not About The Typing

Thursday, August 6th, 2020

Garbage truckloads of marketing bumph are being dumped into the testing space about “codeless” testing tools. For the companies producing these tools, to “test” seems to mean “performing a sequence of keystrokes or mouse clicks or button presses on an app”. (You can see the same pattern in many tutorials on “test automation”; write a script that executes a sequence of actions, and that’s a “test”.) But the marketing material is mute on how the tool aids the tester in recognizing problems in the product. The marketing focus is on how quickly the product can repeat some sequence of keystrokes and clicks and pushes.

Automated data entry can be a useful part of a test, but automated data entry is not a test. Alas, the marketing approach works really well on managers (and, sadly, some testers) who can perceive only the visible activities of testing; not the cognitive aspects of it, and not the essential mission of testing: revealing the status of the product, and finding problems before it’s too late to do anything about them.

In Rapid Software Testing, testing is the process of evaluating a product by learning about through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, critical thinking, risk analysis, etc. Above all, testing requires us to focus on the risk that there are problems in the product, to anticipate problems, and to recognize problems that are present. That requires oracles; an oracle is a means by which we recognize a problem when we encounter one in testing.

The “no-code automation” tools supply weak oracles at best; typically checking for the presence of a particular element on the screen, or a particular value in some output field. If that element is there, or if that value matches some specified and presumably desirable result, the “test” “passes”. But that doesn’t mean that there is no problem; a product can have plenty of problems even when it arrives a correct calculation, or drops the user on the requested page. You know this from your own experience. You’ve used iTunes, right? You’ve been on LinkedIn. You’ve tried to fix an indentation issue in Microsoft Word. Maybe not those things specifically, but I bet you’ve felt annoyed, frustrated, impatient, or baffled when try to use software to get something done. Quite possibly today.

And there’s the rub: rather than a means to gain experience with the product, most of these tools represent a means to check the product for specific conditions that can be specified easily. There’s a seductive story to be told about that: you can run those checks over and over, really quickly, and find a few shallow bugs when something changes in a bad way. Yet the tools are fussy; a change in the product can throw the script off even when it’s a desirable change. Addressing that requiring investigation, repair, and continuous maintenance, which takes time.

Then something even worse happens: testers, deliberately or not, don’t report the time it takes to deal with problems around the tool. Why wouldn’t they do that? One reason could be that management has spent a wad of money on the tool, and the vendor says it’s supposed to be simple. As a tester, to suggest that there are difficulties with the tool is to risk your reputation. Come on. It’s codeless. It’s supposed to be simple.

While all that is going on, the tool misses problems that would be easily apparent if testers were gaining real, human experience with the product. People find problems that tools miss because humans have a wonderful capacity to recognize problems that they have not been told about in advance. Humans bring rich sets of oracles to testing. Testers use their feelings, their social awareness, their memories, their tacit knowledge, their experience of the world, their familiarity with comparable or competitive products or features—all of these things, and more—to generate and apply oracles on the fly. But there’s less time available for gaining experience with the product and identifying unanticipated problems whenever the tester is repairing and maintaining the scripts.

Whether your testing tool is “codeless”, whether your input is delivered by a script, or input directly via keyboard and mouse, the means of entering data is usually one of the least significant aspects of a test.

What matters is not typing quickly, but the capacity for the tester to recognize problem that matter. If there’s no oracle, there’s no test. If there are weak oracles, there’s weak testing.

Further reading:

A Context-Driven Approach to Automation in Testing
Oracles from the Inside Out

Want to learn how to observe, analyze, and investigate software? Want to learn how to talk more clearly about testing with your clients and colleagues? Rapid Software Testing Explored, presented by me and set up for the daytime in North America and evenings in Europe and the UK, November 9-12. James Bach will be teaching Rapid Software Testing Managed November 17-20, and a flight of Rapid Software Testing Explored from December 8-11. There are also classes of Rapid Software Testing Applied coming up. See the full schedule, with links to register here.