“Manual Testing”: What’s the Problem?

April 27th, 2021

I used to speak at conferences. For the HUSTEF 2020 conference, I had intended to present a talk called “What’s Wrong with Manual Testing?” In the age of COVID, we’ve all had to turn into movie makers, so instead of delivering a speech, I delivered a video instead.

After I had proposed the talk, and it was accepted, I went through a lot of reflection on what the big deal really was. People have been talking about “manual testing” and “automated testing” for years. What’s the problem? What’s the point? I mulled this over, and video contains some explanations of why I think it’s an important issue. I got some people — a talented musician, an important sociologist, a perceptive journalist and systems thinker, a respected editor and poet, and some testers — to help me out.

In the video, I offer some positive alternatives to “manual testing” that are much less ambiguous, more precise, and more descriptive of what people might be talking about: experiential testing (which we could contrast with “instrumented testing”; exploratory testing (which we have already contrasted with “scripted testing”; attended testing (which we could contrast with “unattended testing”); and there are some others. More about all that in a future post.

I also propose how it came to be that important parts of testing — the rich, cognitive, intellectual social, process of evaluating a product by learning about it through experiencing, exploring and experimenting — came to be diminished and pushed aside by obsessive, compulsive fascination with automated checking.

But there’s a much bigger problem that I didn’t discuss in the video.

You see, a few days before I had to deliver the video, I was visiting an online testing forum. I read a question from a test manager who wanted to interview and qualify “manual testers”. I wanted provide a helpful reply, and as part of that, I asked him what he meant by “manual testing”. (As I do. A lot of people take this as being fussy.)

His reply was that he was wanting to identify candidates who don’t use “automated testing” as part of their tool set, but who were to be given the job of creating and executing manually scripted human-language tests and performing all the critical thinking skills that both approaches require.

(Never mind the fact that testing can’t be automated. Never mind that scripting a test is not what testing is all about. Never mind that no one even considers the idea of scripting programmers, or management. Never mind all that. Wait for what comes next.)

Then he said that “the position does not pay as much as the positions that primarily target automated test creation and execution, but it does require deeper engagement with product owners”. He went on to say that he didn’t want to get into the debate about “manual and automated testing”; he said that he didn’t like “holy wars”.

And there we have it, ladies and gentlemen; that’s the problem. Money talks. And here, the money—the fact that these testers are going to be paid less—is implicitly suggesting that talking to machines is more valuable, more important, than deeper engagement with people.

The money is further suggesting that skills stereotypically associated with men (who are over-represented in the ranks of programmers) are worth more than skills stereotypically associated with women (who are not only under-represented but also underpaid and also pushed out of the ranks of programmers by chauvinism and technochauvinism). (Notice, by the way, that I said “stereotypically” and not “justifiably”; there’s no justification available for this.)

Of course, money doesn’t really talk. It’s not the money that’s doing the talking.  It’s our society, and people within it, who are saying these things. As so often happens, people are using money to say things they dare not speak out loud.

This isn’t a “holy war” about some abstract, obscure point of religious dogma. This is a class struggle that affects very real people and their very real salaries. It’s a struggle about what we value. It’s a humanist struggle. And the test manager’s statement shows that the struggle is very, very real.

Suggestions for the (New) Testers

April 23rd, 2021

A friend that I’m just getting to know runs a training and skills development program for new testers. Today he said, “My students are now starting a project which includes test design, test techniques, and execution of testing. Do you have any input or advice for them?” Here’s my reply.

Test design, test techniques, and execution of testing are all good things. I’d prefer performing tests to “test execution”. In that preference, I’m trying to emphasize that a test is a performance, by an engaged person who adapts to what he or she is experiencing. “Test execution” sounds more like following a recipe, or a programmed set of instructions.

Of these things, my advice is to perform testing first. But that advice can be a little confusing to people who believe that testing is only operating some (nearly) finished product in a search for coding errors. In Rapid Software Testing, we take a much more expansive view: testing is the process of evaluating a product by learning about it through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, etc.

Testing includes analysis of the product, its domain, the people using it, and risk related to all of those. Testing includes critical thinking and scientific thinking. Testing includes performing experiments—that is, tests—all the way along. But I emphasized the learning part just back there, because testing starts with learning, ends with reporting what we’ve learned, feeds back into more learning, and is about learning every step of the way.

We learn more most powerfully from experiencing, exploring, and experimenting; performing experiments; performing tests. So, my advice to the new tester is to start with performing tests to study the product, without focusing too much on test design and test techniques, at first.

Side note: the “product” that you’ve been asked to test may not be a full, working, running piece of software. It may be a feature or component or function that is a part of a product. It may be a document, or a design drawing, a diagram, or even an idea for a product or feature that you’re being asked to review. In the latter cases, “performing a test” might mean the performance of a thought experiment. That’s not the same as the real-world experience of the running product, hence the quotes around “performing a test”. A thought experiment can be a great and useful thing to help nip bugs in the bud, before bugs in an idea turn into bugs in a product. But if we want to determine the real status of the real product, we’ll need to perform real testing on the real product.

So: learn the product (or feature, or design, or document, or idea), and identify how people might get value from it. Survey the product to identify its functions, features, and interfaces. Explore the product, and gain experience with it by engaging in a kind of purposeful play. Don’t look for bugs, particularly—not right away. Look for benefits. Look for how the product is intended to help people get their work done, to help them to communicate with other people, to help them to get something they want or need, to help them to have fun. Try doing things with the product—accomplishing a task, having a conversation, playing the game.

Record your thoughts and ideas and feelings reasonably thoroughly. Pay attention to things that surprise you, or that trigger your interest, or that prompt curiosity. Note things that you find confusing, and notice when the confusion lifts. If you have been learning the product for a while, and that confusion hasn’t gone away, that’s significant; it means there’s some confusing going on. If you get ideas about potential problems (that is, risks), note those. If you get ideas for designing tests, or applying tools, note those too.

Capture what you’re learning in point form, or in mind maps, or in narratives of what you’re doing. Sketeches and diagrams can help too. Don’t make your notes too formal; formality tends to be expensive, and it’s premature at this stage. It might be a good idea to test with someone else, with one person focusing on interacting with the product, and the other minding the task of taking notes and observations. Or you might choose to narrate and record your survey of the product on video to review later on; or to use like the black boxes on airplanes to figure out what led to problems or crashes.

You’ll probably see some bugs right away. If you do, note them quickly, but don’t investigate them. If you spotted a bug this easily, this early, and you take a quick note about it, you’ll almost certainly be able to see the bug again later. Investigating shallow bugs is not the job at the moment. They job right now is to develop your mental model of the product, so that you become prepared to find bugs that are more subtle, more deeply hidden, and potentially much more important or damaging.

Identify the people who might use the product… and then consider other groups of people you might have forgotten. That would include novice users of the product; expert users of the product; experts in the product domain who are novice users of the product; impatient users; plodding users; users under pressure; disabled users… Consider the product in terms of things that people value: capability, reliability, usability, charisma, security, scalability, compability, performance, installability… (As a new tester, or a tester in training, you might know these as quality criteria.)

You might also want to survey the product from the perspective of people who are not users as such, but who are definitely affected by the product: customer support people; infrastructure and operations people; other testers (like testing toolsmiths, or accessibility specialists); future testers; current developers future developers… Think in terms of what they might value from the product: supportability, testability, maintainability, portability, localizability. (These are quality criteria too, but they’re focused on the internal organization more than on their direct benefit to the end user.)

Refine your notes. Create lists, mind maps, tables, sketches, diagrams, flowcharts, stories… whatever helps you to reflect on your experience.

Share your findings with other people in the test or development (or in this case, study) group. That’s very important. It’s a really good way both to share knowledge and to de-bias ourselves and to reveal things that we might have forgotten, ignored, or dismissed too quickly.

Have these questions in mind as you go: What is this that we’re building? Who are we building it for? How would they get value from it? As time goes by, you’ll start to raise other questions: What could go wrong? How would we know? How could people’s value might be threatened or compromised? How could we test this? How should we test this? Then you’ll be ready to make better choices about test design, and applying test techniques.

Of course, this isn’t just advice for the new tester. It applies to anyone who wants to do serious testing. Testing that starts by reading a document and leaps immediately to creating formal, procedurally scripted test cases will almost certainly be weak testing, uninformed by knowledge of the product and how people will engage with it. Testing that starts with being handed some API documentation and leaps to the creation of automated checks for correct results will miss lots of problems that programmers will encounter—problems that we could discover if we try to experience it the way programmers—especially outside programmers—will.

As we’re developing the product, we’re learning about it. As we’re learning the product, we’re developing ideas about what it is, what it does, how people might use it, and how they might get value from it, and that learning feeds back into more development. As we develop our understanding of the product more deeply, we can be much better prepared to consider how people might try to use it unsuccessfully, how they might misuse it, and how their value might be threatened. That’s why it’s important, I believe, to do test execution perform testing first—to prepare ourselves better for test design and for identifying and applying test techniques—so we can find better bugs.

This post has been greatly influenced by ideas on sympathetic testing that came to me—over a couple of decades—from Jon Bach, James Bach, and Cem Kaner.

Evaluating Test Cases, Checks, and Tools

April 11th, 2021

For testers who are being asked to focus on test cases and testing tools, remember this: a test case never finds a bug. The tester finds a bug, and the test case may play a role in finding the bug. (Credit to Pradeep Soundararajan for putting this so succinctly, all those years ago.)

Similarly, an automated check never finds a bug. The tester finds a bug, and the check may play a role in finding the bug.

A testing tool never finds a bug. The tester finds a bug, and the tool may play a role in finding the bug.

If you suspect that managers are putting too much emphasis on test cases, or automated checks, or testing tools—artifacts—, try this:

Start a list.

Whenever you find a bug, make a quick note about the bug and how you found it. Next to that, put a score on the value of the artifact. Write another quick note to describe and explain why you gave the the artifact a particular score.

Score 3 when you notice that an artifact was essential in finding the bug; there’s no way you could have found the bug without the artifact.

Score 2 if the artifact was significant in finding the bug; you could have found the bug, but the artifact was reasonably helpful.

Score 1 if the artifact helped, but not very much.

Score 0 if the artifact played no role either way.

Score -1 whenever you notice the artifact costing you some small amount of time, or distracting you somewhat.

Score -2 whenever the artifact when you notice the artifact costing you significant time or disruption from the task of finding problems that matter.

Score -3 whenever you notice that the artifact is actively preventing you from finding problems—when your attention has been completely diverted from the product, learning about it, and discovering possible problems in it, and has been directed towards the care and feeding of the artifact.

Notice that you don’t need to find a bug to offer a score. Pause your work periodically to evaluate your status and take a note. If you haven’t found a bug in the last little while, note that. In any case, every now and then, identify how long you’ve been on a particular thread of investigation using a test case, or a set of checks, or a tool. Evaluate your interaction with the artifact.

Periodically review the list with your manager and your team. The current total score might be interesting; if it’s high, that might suggest that your tools or test cases or other artifacts are helping you. If it’s low or negative, that might suggest that the tools or test cases or other artifacts are getting in your way.

Don’t take too long on the aggregate score; practically no time at all. It’s far more important to go through the list in detail. The more extreme numbers might be the most interesting. You might want to pay the greatest or earliest attention to the things that score the lowest and highest first, but maybe not. You might prefer to go through the list in order.

In any case, as soon as you begin your review of a particular item, throw away the score, because the score doesn’t really mean anything. It’s arbitrary. You could call it data, but it’s probably not valid data, and it’s almost certainly not reliable data. If people start using the data to control the decisions, eventually the data will be used to control you. Throw the score away.

What matters is your experience, and what you and the rest of the team can learn from it. Turn your attention to your notes and your experience. Then start having a real conversation with your manager and team about the bug, about the artifact or tool, and about your testing. If the artifact was helpful, identify how it helped, and how it might help next time, and how it could fool you if you became over-reliant on it. If the artifact wasn’t helpful, consider how it interfered with your testing, how you might improve or adjust it or whether you should put it to bed for a while or throw it away.

Learn from every discovery. Learn from every bug.

Related reading:

Assess Quality, Don’t Measure It

Flaky Testing

February 22nd, 2021

The expression “flaky tests” is evidence of flaky testing. No scientist refers to “flaky experimental results”. Scientists who observe inconsistency don’t dismiss it. They pay close attention to it, and probe it. They redesign their experiments or put better controls on them.

When someone refers to an automated check (or a suite of them) as a “flaky test”, the suggestion is that it represents an unreliable experiment. That assumption is misplaced. In fact, the experiment reliably shows that someone’s models of the product, check code, test environment, outcomes, theory, and the relationships between them are misaligned.

That’s not a “flaky experiment”. It’s an excellent experiment. The experiment is telling you something crucial: there’s something you don’t know. In science, a surprising, perplexing, or inconsistent result prompts scientists to begin an investigation. By contrast, in software, an inconsistent result prompts some people to shrug and ignore what the experiment is trying to tell them. Then they do weird stuff like calculating a “flakiness score”.

Of course, it’s very tempting psychologically to dismiss results that you can’t explain as “noise”, annoying pieces of red junk on your otherwise lovely all-green lawn. But a green lawn is not the goal. Understanding what the junk is, where it is, and how it gets there is the goal. It might be litter—it it might be a leaking container of toxic waste.

It’s not a great idea to perform a test that you don’t understand, unless your goal is to understand it and its relationship to the product. But it’s an even worse idea to dismiss carelessly a test outcome that you don’t understand. For a tester, that’s the epitome of “flaky”.

Now, on top of all that, there’s something even worse. Suppose you and your team have a suite of 100,000 automated checks that you proudly run on every build. Suppose that, of these, 100 run red. So you troubleshoot. It turns out that your product has problems indicated by 90 of the checks, but ten of the red results represent errors in the check code. No problem. You can fix those, now that you’re aware of the problems in them.

Thanks to the scrutiny that red checks receive, you have become aware that 10% of the outcomes you’re examining are falsely signalling failure when they are in reality successes. That’s only 10 “flaky” checks out of 100,000. Hurrah! But remember: there are 99,900 checks that you haven’t scrutinized. And you probably haven’t looked at them for a while.

Suppose you’re on a team of 10 people, responsible for 100,000 checks. To review those annually requires each person working solo to review 10,000 checks a year. That’s 50 per person (or 100 per pair) every working day of the year. Does your working day include that?

Here’s a question worth asking, then: if 10% of 100 red checks are misleadingly signalling a problem, what percentage of 99,900 green checks are misleadingly signalling “no problem”? They’re running green, so no one looks at them. They’re probably okay. But even if your unreviewed green checks are ten times more reliable than the red checks that got your attention (because they’re red), that’s 1%. That’s 999 misleadingly green checks.

Real testing requires intention and attention. It’s okay for a suite of checks to run unattended most of the time. But to be worth anything, they require periodic attention and review—or else they’re like smoke detectors, scattered throughout enormous buildings, whose batteries and states of repair are uncertain. And as Jerry Weinberg said, “most of the time, a nonfunctioning smoke alarm is behaviorally indistinguishable from one that works. Sadly, the most common reminder to replace the batteries is a fire.”

And after all this, it’s important to remember that most checks, as typically conceived, are about confirming the programmers’ intentions. In general, they represent an attempt to detect coding problems and thereby reduce programmers committing (pun intended) easily avoidable errors. This is a fine and good thing—mostly when the effort is targeted towards lower-level, machine-friendly interfaces.

Typical GUI checks, instrumented with machinery, are touted as “simulating the user”. They don’t really do any such thing. They simulate behaviours, physical keypresses and mouse clicks, which are only the visible aspects of using the product—and of testing. GUI checks do not represent users’ actions, which in the parlance of Harry Collins and Martin Kusch are behaviours plus intentions. Significantly, no one reduces programming or management to scripted and unmotivated keystrokes, yet people call automated GUI checks “simulating the user” or “automated testing”.

Such automated checks tell us almost nothing about how people will experience the product directly. They won’t tell us how the product supports the user’s goals and tasks—or where people might have problems getting what they want from the product. Automated checks will not tell us about people’s confusion or frustration or irritation with the product. And automated checks will not question themselves to raise concern about deeper, hidden risk.

More worrisome still: people who are sufficiently overfocused, fixated, on writing and troubleshooting and maintaining automated checks won’t raise those concerns either. That’s because programming automated GUI checks is hard, like all programming is hard. But programming a machine to simulate human behaviours via complex, ever-changing interfaces designed for humans instead of machines is especially hard. The effort easily displaces risk analysis, studying the business domain, learning about users’ problems, and critical thinking about all of that.

Testers: how much time and effort are you spending on care and feeding of scripts that represents distraction from interacting with the product and searching for problems that matter? How much more valuable would your coding be if it helped you examine, explore, and experiment with the product and its data? If you’re a manager, how much “testing” time is actually coding and fixing time, in which your testers are being asked to fuss with making the checks run green, and adapting them to ongoing changes in the product?

So the issue is not flaky tests, but flaky testing talk, and flaky test strategy. It’s amplified by referring to “flaky understanding” and “flaky explanation” and “flaky investigation” as “flaky tests”.

Some will object. “But that’s what people say! We can’t just change the language!” I agree. But if we don’t change the way we speak —and the way we think along with it—we won’t address the real flakiness, which the flakiness in our systems, and the flakiness in our understanding and explanations of those systems. With determination and skill and perseverance, we can change this. We can help our clients to understand the systems they’ve got, so that they can decide whether those are the systems they want.

Learn about how to focused on fast, inexpensive, powerful testing strategies to find problems that matter. Register for classes here.

Necessary Confusion and the Bootstrap Heuristic

February 11th, 2021

I’m testing a test tool at the moment. I’m investigating it for a talk. The producers of the tool claim to have hundreds of thousands of users. A few positive remarks appear in a scrolling widget on the product’s web site from people purported to be users.

Me, I can’t make head or tail of the product. It doesn’t seem to do what it’s supposed to do. It looks like a chaotic mess. It’s baffling; it’s exasperating. I don’t know where to start in analysing it and preparing a report. I’m confused. But I’m okay with that.

Any worthwhile testing starts with some degree of necessary confusion.

Why? Because worthwhile testing is primarily about learning something about a product and learning about how to test it in a complex and uncertain space. That’s by nature confusing, and that’s normal.

If the test space is neither complex nor uncertain, and if there’s little risk, you may not need to test at all, and a simple demonstration might do the trick. Knowing that the product can work might be enough, for the moment. That’s why, for developers, performing checks and automating them at the unit level can make a lot of sense. Those checks tend to address specific, atomic conditions; they’re simple to develop and perform and encode; and they provide quick feedback without slowing down development.

A product gets built from small, discrete components. Through small, gradual changes, it turns into something much bigger and more complex, with interacting components and emergent behaviours that are non-trivial.

An encounter with anything non-trivial that you’re not familiar with tends to be messy and confusing at first. At the same time, as a working tester, you’re probably under pressure to “get things right the first time” or “get everything sorted from the beginning”. But having everything sorted really means that we’re at the end of something that was unsorted, and we’re at the beginning of the next unsorted thing!

In Rapid Software Testing, we refer to the Bootstrap Conjecture:

Any process we care about that is done both well and efficiently began by being done poorly and inefficiently.

Therefore, having “done something right the first time” probably means that it wasn’t really right, or it wasn’t really the first time, or that it was trivial, or that you got lucky.

In learning about something complex and in learning how to test it, there are frequent periods of confusion. In fact, if we’re dealing with something complex and we feel we’re sure about how to test it, that should prompt us to pause and reflect: why are we so sure?

Necessary confusion is confusion for which we do not have an algorithmic resolution. To resolve necessary confusion, we must explore a complex solution space using heuristics (that is, means of solving problems that could work but that might fail) and bounded rationality (that is, reasoning in a space where there are limits on what we know and what we can know). To overcome confusion, we have to play, puzzle, make conjectures, perform experiments, miss stuff, ask questions, make mistakes, and be patient. Necessary confusion always occurs during deep learning and innovation.

We’re often trained in our cultures, in our social groups, and in our schooling to deny that we’re confused. That gets ramped up as soon as we get into the software business: appearing not to know something is socially awkward; almost seen as a sin in some circles of knowledge work. Confusion can make us uncomfortable.

As a tester, you could just write (or worse, run) a bunch of automated scripts that check a new product or feature for specific, anticipated errors. If you do that without exploring the product and preparing your mind your testing will be blind to important bugs that could be there.

No set of instructions can teach you everything you need to learn about a product, and about the ways in which diverse people will try to use it. No formal procedure can anticipate how you or other people will experience the product. No testing framework will handle surprising behaviour without you learning how to deal with that framework. No tool, no “AI”, can determine whether the product is operating correctly, or whether a product manager will regard a red bar as something that amounts to an important bug. Complete and correct knowledge about those things isn’t available in advance.

You can learn how to test in advance. That will avoid some unnecessary confusion during testing. You can learn about the technology and domain of your product in advance, and that will avoid more unnecessary confusion during testing. You can learn to use particular tools in advance, and that might spare you some unnecessary confusion during testing too.

But you can’t deeply learn a new product or feature before encountering and interacting with it. The confusion you experience when learning a product is necessary, temporary, and healthy.

The key is to accept the confusion; to recognize that it’s okay to be confused. As we interact with the product and the people around it; as we gain experience; as we practice new skills and apply new tools, some of the confusion lifts.

Start with a survey of the product. Take a tour of the interfaces — the GUI, the command line, the API. Play with it. List out its key features. Create an outline of what is there to be tested. Consider who might use it, and for what. Build on your ideas of how they might value it, and how their value might be threatened. Think about data that gets taken in, processed, stored, retrieved, rendered, displayed, and deleted. How could any of that get messed up? How could the data be mishandled, misrepresented, excessively constrained, insufficiently constrained, or Just Plain Wrong?

And then iterate. Go through the same process with each function and feature, getting progressively deeper as you go. Maybe write little snippets of code to generate some data, or to analyze the output. (Have you been working with a product for a long time? This cycle is fractal; it applies to new functions or features, or to repairs in a product you know well.)

As we learn about the product domain; as we go about the business of sensemaking; as we develop our mental models; as we talk about the product and the problems we observe… more of the confusion dissipates. This can all happen remarkably quickly if we allow ourselves just a little time for experiencing, exploring, and experimenting with the product. Ironically, we must deliberately require and allow ourselves room for spontaneity. We need to be brave and open enough to help our managers understand how necessary that kind of work is — and how powerful it can be.

When we embrace the confusion and lean in, things begin to get clearer, our code and maps and lists get tidier, our notions of risk get sharper, and we’re better prepared to search for problems. And then we’re more likely to find the deep, dangerous problems that matter—the ones that everyone has missed so far. At the beginning, though, that process starts as we pull ourselves up by our own bootstraps.

The Bootstrap Heuristic is: begin in confusion; end in precision.

Oh… and that test tool that I’m testing? There’s a reason that I’m confused: I’ve got a confusing product in front of me. The product is inconsistent with claims that its producers make about it. The product’s behaviour is inconsistent with its purpose. It seems incapable of keeping track of its state. It provides misleading results. For outsiders, it seems designed to provide the impression that testing is happening, without any real testing going on. From the inside perspective of a tester, it’s baffling, and that’s largely because it doesn’t work.

So there’s another heuristic: persistent confusion about a product—confusion that doesn’t go away—is often a pointer to serious problems in it. If you, as a tester, can’t make sense of a product, how will the product’s customers make sense of it?

After working with this product for a little more than an hour, much of the confusion I referred to above has evaporated, and I can prepare a report with confidence.

I’m only left with one thing that I find confusing:

How can anybody be fooled by a tool like this?

First Aid for the Mission Statement

January 23rd, 2021

A while back, a tester brought a patient in for treatment. It wasn’t a human patient; it was a sentence about building and testing in an organization. The tester asked me for help.

“Could you provide me with a first aid kit for this statement that came from my management?”

“We have to move on to DevOps to be able to release code more often but we also have to increase testautomation in any way we can and minimize manual time consuming testing.”

This is the sort of statement that needs more than first aid; it needs emergency room treatment. We’ll start with handling some critical problems right away to get the patient stabilized. Then we need to prepare the way for longer-term recovery, so that the patient can be restored to good health and become an asset to society.

I’ll suggest a number of quick treatments. As I do that, I’ll identify why I believe the patient needs them.

  • Replace “have to” with “choose to” in each case.

Unless someone is about to run afoul laws of nature, of government, or of ethics, no one has to do anything. People and organizations choose to do things. It’s important to preserve your agency. When you have to do things, you don’t have control over them. When you choose to do things, you remain in charge.

  • Replace “move on to DevOps” with “apply DevOps principles and practices”.

DevOps is not something to do; it’s a set of ideas and approaches for getting important things done. The central principle of DevOps—that development and operations people work together to support the needs of the business—is what matters most. If that principle is absent in the organization, there are practically infinite ways in which things will go wrong.

There will be more to say about DevOps-related practices and principles as we proceed.

  • Replace “to be able to” with “to be better able to”.

Not being a thing in and of itself, DevOps is not necessary to be able to do anything in particular. There were plenty of organizations building and releasing software successfully before DevOps, and there will be plenty of successful organizations long after DevOps is forgotten. Nonetheless, many of the ideas currently associated with DevOps could be very helpful.

DevOps doesn’t guarantee success, but applying its core principle might improve the odds.

  • Replace with “release code” with “release valuable products”.

The product is not the code, and the code is not the product. The product is the sum of code, software platforms, machinery, data, documentation, and customer support. The product is all of that together, delivering whatever experience, value, and problems that the customer encounters, good and bad. The code is part of the product, and the code enables the product. The code controls the mechanical parts of the product. That’s not to diminish the importance of the code. The code makes the product possible. If it’s a software product and there’s no code, there’s no product. If there is code, and it’s bad code, it’s less likely that we have a good product.

It’s a good thing to release valuable products. Therefore it’s a good thing to understand the code that we’re releasing—but not just the code, and not just its behaviour. That’s not trivial, but it’s the easy part of testing. The harder part of testing is understanding the relationships between the code, its behaviour and the people who will be using the product or otherwise interacting with it—including developers, operations people, and testers.

  • Replace “more often” with “at an efficient and sustainable pace”.

Delivering working software frequently is one of the principles behind the Manifesto for Agile Software Development. DevOps principles and practices are intended to support agility. Building efficiently and frequently—certainly a DevOps practice, but not a practice exclusive to DevOps—affords the opportunity to discover whether there are problems that threaten value before we inflict them on customers.”More often” isn’t the point, though, because “more often” might be good, bad, or irrelevant.

Consider the extremes.

When we build a product very rarely, problems get buried under layers of increasing complexity in the product and our inexperience with it. Testers become overwhelmed with the volume of learning and investigation (and, very probably, bug reporting) to be done.

When we build a product frenetically, we also get infrequent experience with it, because each new build comes along before we can gain experience with last one. In this case, testers get overwhelmed by the pace of the builds, and shallow testing, at best, is all that’s possible.

There is probably business value in being able to deploy software promptly, but that’s not the same as deploying software constantly. There might be contexts in which frequent deployment might provide real business value. In other contexts, a firehose of deployment can disrupt your customers, such that they don’t care about the deployment, but they do care about the disruption. The key is to recognize what context you’re in, and to minimize costly disruption.

One reasonable compromise in most cases is to set up systems to build the product easily and therefore frequently. In the build and in the processes leading up to it, include a smattering of automated checks to provide a quick alert to problems close to the surface. From the stream of builds, choose one periodically, and spend some time testing each one deeply to find rare, hidden, or subtle problems that can escape even a disciplined development process.

  • Replace “but we also” with “Thus”.

The first clause in the statement sets up the second clause. The first clause doesn’t undermine the second clause; the first clause puts legs under the second.

In our emergency treatment, let’s use “thus” to reattach the legs to the body. Also, we’ll add a sentence break, giving the patient a little more room to breathe.

  • Replace “we have to increase” with “we choose to apply”.

We’ve already covered the “we have to” part of this replacement.

Increasing something is not necessarily a good thing. In engineering work, everything is a tradeoff between desirable factors. Every activity that we might consider valuable comes with some degree of opportunity cost, reducing our capacity to do other things that we might also consider valuable. When we choose to apply something, we can choose to apply it more, or less, or just as we’re currently doing, to obtain the greatest overall benefit.

  • Replace “testautomation” with “powerful tools”.

I’m sincerely hoping that testautomation is a typo, rather than a new term. We’ll put a space in there.

There are many wonderful ways to apply tools in testing. Automated checking is one of them; only one of them.

Tools can help us to build the product, to prepare for testing, and to reconfigure our systems efficiently for better coverage. Tools can help us to probe the internals of the system to see things that would otherwise be invisible. Tools can help us to collect and represent data for analysis; to see patterns in output. Tools can help us to record and review our work.

Replacing “test automation” with “powerful tools” could help reduce the risk that “test automation” will be interpreted only as “output checking”.

  • Replace “in any way we can” with “in ways that help us”.

There are some things that we can do that we probably don’t want to do. We can serve steak with turpentine sauce, but no one should eat it and it’s a waste of good turpentine.

Tools can definitely help us with building the product quickly, and with identifying specific functional problems that might threaten its value. Tools can also be overapplied, reducing our engagement and human interaction with the product.

Fixation on tooling to exercise functions in the product can be a real problem if we forget that people use software. Even if our product is a software service, its API is used by people directly—programmers putting the API to work—and indirectly—end users who interact with the product through an interface designed for non-programmers. Functional correctness is important; so are parafunctional elements of the product: usability, performance, supportability, testability…

Tools can help us to focus our attention on important observations. Tools can also dazzle or distract us, diverting our focus from other important observations. We choose to apply tools not in any we can, but in ways that help—and our choices can include changing or dropping tools when they’re not helping.

Consequently, it might be a good idea to remember what we’re using the tools for. So, in addition to the replacement, let’s add “to build the product, to understand it, and to identify problems efficiently”.

  • Replace “and minimize manual time consuming testing” with… something else.

This particular wound has become infected, and there’s a lot of debris in it. It requires a fair amount of cleanup, emergency surgery, and some stitches.

One principle of DevOps is the idea that teams use “practices to automate processes that historically have been manual and slow”. That’s a good idea for tasks that can be mechanized, and that benefit from being mechanized. It’s not such a great idea to forget that many tasks—and parts of tasks—are non-routine, rely on expertise and tacit knowledge, and can’t be made explicit or mechanized.

Programming involves strategizing, interpreting, designing, speculating, reflecting, analysing… Testing involves all of those things too, and more. Organizations reasonably want programmers work quickly, but no one suggests “minimizing manual time-consuming programming”. This is because no one considers programming a manual process. Programming is an intellectual process; a cognitive process; a social process. So is testing.

Programmers type; that’s the manual part of programming. Just as for programming, the central work of testing is not the typing. No one refers to the typing part as “manual programming”; and when a programmer sets a build process in motion or takes advantage of plugins in an integrated development environment, no one refers to “automated programming”.

Just as there is no manual programming, and no automated programming, there is no manual testing, and there is no automated testing. There is testing.

“Manual” and “Automated” Testing

The End of Manual Testing

Anything worth doing requires some time and effort, and we usually want to apply our limited time and effort to things that are worth doing. It does make sense to minimize or eliminate the amount of time that we spend on unimportant things. It also makes sense to apply appropriate effort to things that are worth doing; to maintain their value; and to increase that value where possible.

Just as some tasks in programming can be carried out by time-saving tools like compilers, some of the tasks in testing can be carried out by time-saving tools like automated checks.

Compiling, though, is not the central task of programming. The central task of programming is modeling and expressing things in the human world in a way that machinery can deal with them. We can then use that programmed machinery to extend, enhance, accelerate, intensify, or enable human capabilities. All that requires a significant degree of preparation, technical savvy and social judgement—and time for cycles of design, experimentation, learning, and refinement. No one should complain about this taking the time it needs.

Checking is not the central task of testing. The central task of testing is the search for problems that matter—ways in which the software fails to meet the needs or desires of its users, or introduces new problems of its own. That requires not only checks for problems that we can anticipate, but a search for problems that we didn’t anticipate.  Like development work, testing work also requires a signficant degree of preparation, technical savvy and social judgement—and time for experiencing, exploring, discovery, and investigation. No one should complain about this taking the time it needs.

At least one class of testing tasks is even faster than running automated checks: the tasks that we choose not to do, because cost, value, and risk don’t warrant them. It’s worth the investment to pause every now and then to assess the relative value of unattended automated checks; instrumented tool-supported testing; and direct, unmediated experience with the product.

The trick here is to set ourselves up to do the fastest, least expensive testing that fulfills the mission of finding problems that matter before it’s too late. One way to get there is to apply fast, easy, non-interruptive checks that don’t slow down development. Then, periodically, do deep testing to find rare, subtle, hidden, intermittent, emergent bugs that might elude even highly capable and well-disciplined programmers.

So, replace “and minimize manual time consuming testing” with with “We want to minimize distracting, unhelpful, or unnecessary work. We want to and maximize our ability to evaluate and learn about the product both efficiently AND sufficiently deeply.”

With all those replacements, the text might be longer, but it’s more accurate and more precise; bigger and stronger.

So:

We choose to move to DevOps to be better able to release valuable products at an efficient and sustainable pace; thus we choose to apply powerful tools in ways that help us to build the product, to understand it, and to identify problems efficiently. We want to minimize distracting, unhelpful or unnecessary work. We want to maximize our ability to evaluate and learn about the product both efficiently AND sufficiently deeply.

Every recovering patient can use motivation and support, so we’ll set our patient up with a motivating and supporting statement, and send them on their way together.

As developers, testers, and operations people working togther, our goal is to enable and support the business by delivering, testing, building, and deploying valuable, problem-free products. As testers, our special focus is to help people to become aware of any important problems that would threaten the value of the product to people that matter; to help our clients determine whether the product that they’ve got is the product they want.

Bug of the Day: AI Sees Bits, Not Things

January 4th, 2021

An article that I was reading this morning was accompanied by a stock photo with an intriguing building in the background.

Students throwing their graduation caps in the air

I wanted to know where the building was, and what it was. I thought that maybe Chrome’s “Search Google for image” feature could help to locate an instance of the photo where the building was identified. That didn’t happen, but I got something else instead.

An assortment of images of migrating geese

Google Images provided me with a reminder that “machine learning” doesn’t see things and make sense of them; it matches patterns of bits to other patterns of bits. A bunch of blobby things in a variegated field? Birds in the sky, then—and the fact that there are students in their graduation gowns just below doesn’t influence that interpretation.

That reminded me of this talk by Martin Krafft:

The MIT network’s concept of a tree (called a symbol) does not extend beyond its visual features. This network has never climbed a tree or heard a branch break. It has never seen a tree sway in the wind. It doesn’t know that a tree has roots, nor that it converts carbon dioxide into oxygen. It doesn’t know that trees can’t move, and that when the leaves have fallen off in winter, it won’t recognize the tree as the same one because it cannot conclude that the tree is still in the same position and therefore must be the same tree.”

Martin Krafft, The Robots Won’t Take Away Our Jobs: Let’s Reframe the Debate on Artificial Intelligence, 14:30</p>

Then I had another idea: what if I fed a URL to the image above to Google Images? This is what I got:

Results from a Google Image search, given a link to an image

Software and machinery assist us in many ways as we’re organizing and sifting and sorting and processing data. That’s cool. When it comes to making sense of the world, drawing inferences, and making decisions that matter to people, we must continue to regard the machinery as cognitively and socially oblivious. Whether we’re processing loan applications, driving cars, or testing software, machinery can help us, but responsible, socially aware humans must remain in charge.

(A couple of friendly correspondents on Twitter have noted that the building is the Marina Bay Sands resort in Singapore.)

Bug of the Day: What Time Are the Class Sessions?

December 17th, 2020

One problem that we face in software development and testing is that data and information aren’t the same. Here’s an example, prompted by email from a correspondent.

There’s a Rapid Software Testing Explored class running January 11-14, 2021. It’s set to run at times that work for people in Europe and the UK, mostly. The service I use for managing registrations, Eventbrite, offers the opportunity to list the starting and ending times for the class. So far, so good.

The class starts 12h00 Central European time, on January 11, 12h00. The class lasts for four days. Each day, there are three webinars of 90 minutes, with a half-hour break between each one. Thus the class ends at 17h30 Central European time, on January 14. How should this be displayed on the landing page for the event?

Eventbrite offered a form for me to fill in the starting and ending date and time for my event. I filled it in. Then Eventbrite provides an option to display the start time and ending time of the class on the landing page for the event. When I accept both options, the page duly presents the class as starting on the start time (2021/01/11 06h00 EST), and ending on the ending time (2021/01/14 11h30). Those times are entirely, factually correct as data. That correctness is pretty easy to check, too.

A person who wrote from Europe wanted to register for the class wrote to ask if he should assume that the class ran from 12h00 to 17h30 on the first day and from 8h30 to 17h30 on the second, third, and fourth days. If you’re like me, and you already know the timetable for the class (you do; I just told you), the writer’s assumption might seem strange—but that’s from the perspective of people with insider knowledge, like you and I. There’s no particularly good reason to label that evaluation as strange from an outsider’s perspective.

The issue here is that, in its template for displaying an upcoming class, Eventbrite allows me to check a box to show the start date and time, and another box to show the end date and time. There isn’t an option to display the dates alone, without the time, nor is there an option to display the date range with starting and ending times for each day of the class.

Is that a bug? Hard to say. From the perspective of someone writing code to gather the data and display the page, it’s almost certainly not a bug—not a coding bug. If the requirement is to “display the starting and ending date and time of the event“, the code gathers that data from me and displays it correctly to my customers. But correctly doesn’t mean informatively.

Is it a bug in that the expressed requirement is wrong, then? Also hard to say. First, I haven’t seen the requirements document. I suspect that Eventbrite’s business is mostly single-day events, so the issue probably doesn’t come up that often, relative to the majority of cases. But it does come up for some people, and for some events. It did for me, and for my customer, this time.

Should Eventbrite be able to display the start and end times for each day of a multi-day event? Maybe. But that would be more complicated to code and harder to test. Maybe it’s not worth the trouble and the risk of trying it.

Should the start and end times be displayed with a time zone beside them? They are. Should those time zones be chosen relative to where the event is happening, or relative to the time zone for the person who is looking at the site? Eventbrite seems to provides the latter, but maybe it doesn’t; maybe it shows Eastern Time worldwide.

It doesn’t take long to enter the rabbit hole of possibilities: if the time is displayed relative to the viewer’s start and end time, what if that user is connecting to the page via a VPN in a time zone different from hers? I tried this, and it seems either Eventbrite figures out the time on my local system OR it displays its times in Eastern time worldwide. How can I be sure what gets displayed in Europe? Will European users be confused if they see the start and end times rendered as North American Eastern time?

What if the user will be traveling, and wants to know the time of the event where it’s being held? (This sure isn’t a problem in December 2020, but what happens when we’re travelling again?)

Should Eventbrite offer an option to display the date alone, and let those running the event identify the daily schedule some other way? Probably, but who’s to say?

And imagine that you’re working at Eventbrite: what should a tester’s role be in all of this?

Here’s what we say in RST: it’s the role of designers, programmers, and managers to develop requirements, designs, and programs that transform the complex, messy, social world of people and their needs into the simpler, cleaner, world of machines and their very stilted languages. It is the tester’s role to look for and to find problems in those transformations, so that the designers, programmers, and managers can recognize those problems and make decisions on how to deal with them.

To fulfill our role, we must experience, explore, and experiment with the product and its requirements. We must develop an understanding of how people might use the product, and how they might be perplexed or surprised or annoyed by it.

When the product is being put in front of people who haven’t seen it, we must struggle to maintain the perspective of the first-timer. When the product is placed in a domain in which it will be used by experts, we must develop expertise in that domain, as quickly and as deeply as we can.

The tester can participate in the development of requirements, design, and code, and can make suggestions about them. But anyone else can do that too—documentation people, customer support people, customers,…

What makes testers special in all this is the testers’ focus on problems. It’s our abiding faith that there are problems, and that those problems might matter to people who might be forgotten by the builders. It’s the tester’s special job to consider how the insider’s perspective might be different from the outsider’s perspective. Some people on the team might consider those things. No one else on the team is focused on them.

It’s the tester’s job to raise questions about the product, its requirements, and its design, and ask “Is there a problem here? Might there be a problem here? Is everyone okay with the product we’re developing? Is everyone willing to live with the problems that we’re aware of?” This is often socially awkward, because people who are focused on solving problems (like developers and designers and managers) often find it distracting and to some degree irritating to hear about new ones. Don’t you?

And, in this case, here’s the rub: the data and the display can be correct, but still fail to solve a problem for the someone who wants to know “What are the danged class times for each day?” Some people guess (and guess correctly); others are willing to wait for an answer (those people find out on a page that gets displayed after they register); and some people write to ask me. That underscores another point: a bug is not a property of a product; it’s a relationship between the product and some person.

It turns out that daily start and end times are hard to express in machine-friendly data structures, but easy to express in the free-form text description of the class that Eventbrite also affords on the class’ landing page. So upon recognizing the problem for one of my customers, and that the problem mattered to him, that’s how I addressed it.

If you’re interested in all this, you might be interested in the Rapid Software Testing Explored class, where we examine the nature of problems and how to look for them and report them skilfully to your clients.

Again: the class runs on four consecutive days, starting at noon CET. Each day, there are three webinars of 90 minutes, with a half-hour break between each one. Just so you know.

Bug of The Day: Bad Data Means Search for Book Title Fails

December 14th, 2020

This is your periodic reminder that data has problems, just like code does.

A correspondent on LinkedIn pointed me towards a book by George Lakoff, an author I admire. For some reason, I had not been aware of the book. So I looked it up. I wanted to go straight to it, so I put the title in quotes:

Where Mathematics Comes From

Hmmm. That’s a little strange. Nothing? Let’s try without the quotes.

Where Mathematics Come From

Do you see the problem? Do you see why the quoted search string didn’t work? It looks to me like there’s a bad entry in a database somewhere.

Data is messy. Data is often wrong. Data can trip up functions that might otherwise appear to be working fine.

Data needs to be checked and examined critically, just like program code does; and so do the interactions of good and bad data with program code. Otherwise, you might lose a sale, mess up a payment, or open the door to a security breach without noticing. That’s why, in Rapid Software Testing, we use a variety of ideas for covering the product and the things around it with testing.

Sure, you might have automated checks set up for certain functions and workflows through your product. That’s fine, and a good thing. Are you using the power of automation to help find problems with your data?

A Naïve Request from Management

October 21st, 2020

A tester recently asked “If you’re asked to write a ‘test plan’ for a new feature before development starts, what type of thing do you produce?”

I answered that I would produce a reply: “I’d be happy to do that. What would you like to see in this test plan?”

The manager’s reply was, apparently, “test cases covering all edge cases we’ll need to test”.

That’s a pretty naïve request. Here’s my answer:

“Making sure the product handles edge cases properly is definitely an important task. If I were to take your request literally—test cases covering all edge cases we’d need to test—it could take a lot of time for me to prepare, and a long time for you to review and figure out all the things I might have left out.

“And there’s another issue: I don’t know in advance what all the edge cases are, or even what they might be—and neither do the developers, and neither do you. No one does. But that’s okay! We can start right now by learning about possible edge cases through testing. We can’t perform testing on a running product yet, obviously, but we can perform some thought experiments and test people’s ideas about the product.

“So how about I give you a short summary—a list or a mind map—of some of the broad risk areas we can start considering right away? We can share the list with the developers to help them anticipate problems, defend against them, and check their work. That will greatly reduce the need to test edge cases later, when the product has been built and the problems are harder to find.

“We can add to that risk list as we develop the product—and we can take things off it as we address those risks. That will help focus the testing work. When we start working with builds of the product, I’ll explore it with an eye to finding edge cases that we didn’t anticipate. And I’ll keep the quick summaries coming whenever you like. You can review those and give me feedback, so that we’re both on top of things all the way along.”

The software business, alas, still runs on folklore and mythodology about testing. Too few managers understand testing. Many managers—and alas, many testers—don’t realize that testing isn’t about test cases, but are nonetheless addicted to test cases. When we provide responsible answers to naïve questions, we can help to address that problem.

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.