Blog Posts for the ‘Tools’ Category

s/automation/programming/

Thursday, June 2nd, 2016

Several years ago in one of his early insightful blog posts, Pradeep Soundarajan said this:

“The test doesn’t find the bug. A human finds the bug, and the test plays a role in helping the human find it.”

More recently, Pradeep said this:

Instead of saying, “It is programmed”, we say, “It is automated”. A world of a difference.

It occurred to me instantly that it could make a world of difference, so I played with the idea in my head.

Automated checks? “Programmed checks.” 

Automated testing? “Programmed testing.” 

Automated tester?  “Programmed tester.” 

Automated test suite?  “Programmed test suite.”

Let’s automate to do all the testing?  “Let’s write programs to do all the testing.”

Testing will be faster and cheaper if we automate. “Testing will be faster and cheaper if we write programs.”

Automation will replace human testers. “Writing programs will replace human testers.”

To me, the substitutions all generated a different perspective and a different feeling from the originals. When we don’t think about it too carefully, “automation” just happens; machines “do” automation.  But when we speak of programming, our knowledge and experience remind us that we need people do programming, and that good programming can be hard, and that good programming requires skill.  And even good programming is vulnerable to errors and other problems.

So by all means, let’s use hardware and software tools skilfully to help us investigate the software we’re building.  Let’s write and develop and maintain programs that afford deeper or faster insight into our products (that is, our other programs) and their behaviour.  Let’s use and build tools that make data generation, visualisation, analysis, recording, and reporting easier. Let’s not be dazzled by writing programs that simply get the machinery to press its own buttons; let’s talk about how we might use our tools to help us reveal problems and risks that really matter to us and to our clients.  

And let’s consider the value and the cost and the risk associated with writing more programs when we’re already rationally uncertain about the programs we’ve got.

A Context-Driven Approach to Automation in Testing

Sunday, January 31st, 2016

(We interrupt the previously-scheduled—and long—series on oracles for a public service announcement.)

Over the last year James Bach and I have been refining our ideas about the relationships between testing and tools in Rapid Software Testing. The result is this paper. It’s not a short piece, because it’s not a light subject. Here’s the abstract:

There are many wonderful ways tools can be used to help software testing. Yet, all across industry, tools are poorly applied, which adds terrible waste, confusion, and pain to what is already a hard problem. Why is this so? What can be done? We think the basic problem is a shallow, narrow, and ritualistic approach to tool use. This is encouraged by the pandemic, rarely examined, and absolutely false belief that testing is a mechanical, repetitive process.

Good testing, like programming, is instead a challenging intellectual process. Tool use in testing must therefore be mediated by people who understand the complexities of tools and of tests. This is as true for testing as for development, or indeed as it is for any skilled occupation from carpentry to medicine.

You can find the article here. Enjoy!

Oracles from the Inside Out, Part 8: Successful Stumbling

Thursday, November 26th, 2015

When we’re building a product, despite everyone’s good intentions, we’re never really clear about what we’re building until we try to build some of it, and then study what we’ve built. Even after that, we’re never sure, so to reduce risk, we must keep studying. For economy, let’s group the processes associated with that study—review, exploration, experimentation, modelling, checking, evaluating, among many others—and call them testing. Whether we’re testing running code or testing ideas about it, testing at every step reveals problems in what we’ve built so far, and in our ideas about what we’ve built.

Clever people have the capacity to detect some problems and address them before they become bigger problems. A smart business analyst is aware of unusual exceptions in a workflow, recognizes an omission in the requirements document, and gets it corrected. An experienced designer goes over her design in her head, notices a gap in her model, and refines it. A sharp programmer, pairing with another, realizes that a function is using a data type that will overflow, and points out the problem such that it gets fixed right away.

Notice that in each one of these cases, it’s not quite right to say that the business analyst, the designer, or the programmer prevented a problem. It’s more accurate to say that a person detected a little problem and prevented it from becoming a bigger problem. Bug-stuff was there, but a savvy person stomped it while it was an egg or a nymph, before it could hatch or develop into a full-blown cockroach. In order to prevent bigger problems successfully, we have to become expert at detecting the small ones while they’re small.

Sometimes we can be clever and anticipate problems, and design our testing to shine light on them. We can build collaboration into our designs, review into our specifications, and pairing into our programming. We can set up static analysis tools that check code for inconsistency with explicit rules. When we’re dealing with running code, testing might take the form of specific procedures for a tester to follow; sometimes it takes the form of explicit conditions to observe; and sometimes it takes the form of automated checks. All of these approaches can help to find problems along the way.

It’s a fact that when we’re testing, we don’t always find the problems we set out to find. One reason might be, alas, that the problems have successfully evaded our risk ideas, our procedures, our coverage, and our oracles. But another reason might be that, thanks to people’s diligence, some problems were squashed before they had a chance to encounter our testing for them.

Conversely, some problems that we do find are ones that we didn’t anticipate. Instead, we stumble over them. “Stumbling” may sound unappealing until we consider the role that serendipity—accidental or incidental discovery—has played in every aspect of human achievement.

So here, I’m not talking about stumbling in terms of clumsiness. Instead, I’m speaking in terms of what we might find, against the odds, through a combination of diligent search, experimentation, openness to discovery, and alertness—as people have stumbled over diamonds, lost manuscripts, new continents, or penicillin. Chance favours the explorer and—as Pasteur pointed out—the prepared mind. If we don’t open our testing to problems where customers could stumble, customers will find those places.

Productive stumbling can be extended and amplified by tools. They don’t have to be fancy tools by any means, either.

Example: Stuck for a specific idea about risk heuristic, I created some tables of more-or-less randomized data in Excel, and used a Perl script to cover all of the possible values in a four-digit data field. One of those values returned an inappropriate result—one stumble over a gold nugget of a bug. Completely unexpectedly, though, I also stumbled over a sapphire: while scanning quickly through the log file, using a blink oracle: every now and then, a transaction took ten times longer than it should have courtesy of a startling and completely unrelated bug.

Example: At a client site, I had a suspicion that a test script contained an unreasonable amount of duplication. I opened the file in a text editor, selected the first line in a data structure, hit the Ctrl-F key, and kept hitting it. I applied a blink oracle again: most of the text didn’t change at all; tiny patches, representing a handful of variables flickered. Within a few seconds I had discovered that the script wasn’t really doing anything significant except trying the same thing with different numbers. More importantly, I discovered that the tester needed real help in learning how to create flexible, powerful, and maintainable test code.

Example: I wrote a program as a testing exercise for our Rapid Software Testing class. A colleague used James Bach’s PerlClip tool to discover the limit on the amount of data that the program would accept. From this, he realized that he could determine precisely the maximum numeric value supported by the program, something that I, the programmer, had never considered. (When you’re in the building mindset, there’s a lot that you don’t consider.)

Example: Another colleague, testing the same program, used Excel to generate all of the possible values for one of the input fields. From this he determined that the program was interpreting input strings in ways that, once again, I had never considered. Just this test and the previous one revealed information that exploded my five-line description of the program into fifteen far more detailed lines, laden with surprises and exceptions. One of these lines represents a dangerous and subtle gotcha in the programming language’s standard libraries. All this learning came from a program that is, at its core, only two lines of code! What might we learn about a program that’s two million lines of code?

Example: In this series of posts on oracles, I’ve already recounted the tale of how James took data from hundreds of test runs, and used Excel’s conditional formatting feature to visualize the logged results. The visualizations instantly highlighted patterns that raised questions about the behaviour of a product, questions that fed back into refinements of the requirements and design decisions.

Example: While developing a tool to simulate multi-step transactions in a banking application, I discovered that the order in which the steps were performed had a significant impact on the bank’s profit on the overall transaction. This is only one instance of a pattern I’ve seen over and over again: while developing the infrastructure to perform checking, I stumble over bug after bug in the application to be tested. Subsequently, after the bugs are fixed and the product is stabilized and carefully maintained, the checks—despite their value as change detectors—don’t reveal bugs. Most of the value of the checks gets cashed in the testing activity that produces them.

Example: James performed 3000 identical queries on eBay; one query every two or three seconds. He expected random variation over time (i.e. a “drunkard’s walk”). Instead, the visualization allowed him to see suspicious repeating jumps and drops that looked anything but random. Analysis determined that he was probably seeing the effects of many servers responding to his query—some of which occasionally failed to contribute results before timing out.

These examples show how we can use tools powerfully: to generate data sets and increase coverage, so that we can bring specific conditions to our attention; to amplify signals amidst the noise; to highlight subtle patterns and make them clearly visible; to afford observation of things that we never expected to see; to perturb or stress the system such that rare or hidden problems become perceptible.

The traditional view of an oracle is an ostensibly “correct” reference that we can compare to the output from the program. A common view of test automation is using a tool to act like a robotic and unimaginative user to produce output to be checked against a reference oracle. A pervasive view of testing is nothing more than simple output checking, focused on getting right answers and ignoring the value of raising important new questions. In Rapid Software Testing, we think this is too narrow and limiting a view of oracles, of automation, and of testing itself. Testing is exploring a product and experimenting with it, so that we can learn about it, discover surprising things, and help our clients evaluate whether the product they’ve got is the product they want. Automated checking is only one way in which we can use tools to aid in our exploration, and to shine light on the product—and excellent automated checking depends on exploratory work to help us decide what might be interesting to check and to help us to refine our oracles. An oracle is any means—a feeling, principle, person, mechanism, or artifact—by which we might recognize a problem that we encounter during testing. And oracles have another role to play, which I’ll talk about in the last post in this long series.

Oracles from the Inside Out, Part 5: Oracles as References as Media

Tuesday, September 15th, 2015

Try asking testers how they recognize problems. Many will respond that they compare the product to its specification, and when they see an inconsistency between the product and its specification, they report a bug. Others will talk about creating and running automated checks, using tools to compare output from the product to specific, pre-determined, expected results; when the product produces a result inconsistent with expectations, the check identifies a bug which the tester then reports to the developer or manager. It might be tempting to think of this as moving from the bottom right quadrant on this table to the bottom left.

Traditional talk about oracles refers almost exclusively to references. W.E. Howden, who introduced “oracle” as a term of testing art, said that an oracle as “an external mechanism which can be used to check test output for correctness”. Yet thinking of oracles in terms of correctness leads to some pretty serious problems. (I’ve outlined some of them here).

In the Rapid Software Testing namespace, we take a different, broader view of oracles. Rather than focusing on correctness, we focus on problems: an oracle is a means by which we recognize a problem when we encounter one during testing. Checking for correctness, as Howden puts it, may severely limit our capacity to notice many kinds of problems. A product or service can be correct with respect to some principle, but have plenty of problems that aren’t identified by that principle; and a product can produce incorrect results without the incorrectness representing a problem for anyone. When testers fixate on documented requirements, there’s a risk that they will restrict their attention to looking for inconsistencies with specific claims; when testers fixate on automated checks, there’s a risk that they will restrict their focus to inconsistency with a comparable algorithm. Focus your attention too narrowly on a particular oracle—or a particular class of oracle—and you can be confident of one thing: you’ll miss lots of bugs.

Documents and tools are media. In the most general sense, “medium” is descriptive of something in between, like “small” and “large”. But “medium” as a noun, a medium, can be between lots of things. A communication medium like radio sits between performers and an audience; a psychic medium, so the claim goes, provides a bridge between a person and the spirit world; when people want to exchange things of value, they use often use money as a medium for the exchange. Marshall McLuhan, an early and influential media theorist, said that a medium is anything that humans create or use to effect change. Media are tools, technologies that people use to extend, enhance, enable, accelerate, or intensify human capabilities. Extension is the most obvious and prominent effect of media. Most people think of media in terms of communications media. A medium can certainly be printed pages or television screens that enable messages to be conveyed from one person to another. McLuhan viewed the phonetic alphabet as a technology—a medium that extended the range of speech over great distances and accelerated its transmission. But a cup of coffee is a medium too; it extends alertness and wakefulness, and when consumed socially with others, it can extend conversation and friendliness. Media, placed between a product and our observation of it, extend our capacity to recognize bugs.

McLuhan emphasized that media change things in many different ways at the same time. In addition to extending or enabling or accelerating our capabilities, McLuhan said, every new medium obsolesces one or more existing media, grabbing our attention away from old things; every new medium retrieves notions of formerly obsolescent media, making old things new again. McLuhan used heat as a metaphor for the degree to which media require the involvement of the user; a “cool” medium like radio, he said, requires the listener to participate and fill in the missing pieces of the experience; a “hot” medium like a movie, provides stimulation to the ear and especially the eye, requiring less engagement from the viewer. Every medium, when “overheated” (McLuhan’s term for a medium that has been stretched or extended beyond its original or intended capacity), reverses into the opposite of what it might have been originally intended to accomplish. Socrates (and the King of Egypt) recognized that writing could extend memory, but could reverse into forgetfulness (see Plato’s dialogue Phaedrus). Coffee extends alertness and conversation, but too much of it and people become too wired work and too addled to chat. A medium always draws attention to itself to some degree; an overheated medium may dazzle us so much that we begin to ignore what it contains or what we intended it to do for us. More importantly, a medium affects us. This is one of the implications of McLuhan’s famous but oblique statement “the medium is the message”. By “message”, he means “the change of scale or pace or pattern” that a new invention or innovation “introduces into human affairs.” (This explanation comes from Mark Federman, to whom I’m indebted for explaining McLuhan’s work to me over the years.)

When we pay attention, we can easily observe media overheating both in talk about testing and development work and in the work itself. Documents and tools frequently dominate conversations. In some organizations, a problem won’t be considered a bug unless it is inconsistent with an explicit statement in a specification or requirements document. Yet documents are only partial representations, subsets, of what people claim to have known or believed at some point in time, and times change. In some places, testing work is dominated by automated checking. Checks can be very valuable, providing great precision and fast feedback. But checks may focus on functional aspects of the product, and less on other parafunctional attributes.

McLuhan’s work emphasizes that media are essentially neutral, agnostic to our purposes. It is our engagement with media that produces good or bad outcomes—good and bad outcomes. Perhaps the most important implication of McLuhan’s work is that media amplify whatever we are. If we’re fabulous testers, our tools extend our capabilities, helping us to be even more fabulous. But if we’re incompetent, tools extend our incompetence, allowing us to do bad testing faster and worse than we’ve ever been able to do it before. To the degee that we are inclined to avoid conflict and arguments, we will use documents to help us avoid conflict and arguments; to the degree that we are inclined to welcome discussion and the refinement of ideas, then documents can help us do that. If we are disposed to be alert to a wide range of problems, automated checks will help us as we diversify our scope; if we are oblivious to certain kinds of problems in the product, automated checks will amplify our oblivion.

Reference oracles—documents, checking tools, representative data, comparable products—are unquestionably media, extending all of the other kinds of oracles: private and shared mental models, both private and shared feelings, conversations with others, and principles of consistency. How can we evaluate them? What do we use them for? And how can we use them to help us find problems without letting them overwhelm or displace all the other ways we might have of finding problems? That’s the subject of the next post.

Very Short Blog Posts (15): “Manual” and “Automated” Testers

Tuesday, April 1st, 2014

“Help Wanted. Established scientific research lab seeks Intermediate Level Manual Scientist. Role is intended to complement our team of Automated and Semi-Automated Scientists. The successful candidate will perform research and scientific experiments without any use of tools (including computer hardware or software). Requires good communication skills and knowledge of the Hypothesis Development Life Cycle. Bachelor’s degree or five years of experience in manual science preferred.”

Sounds ridiculous, doesn’t it? It should.

Related post:

“Manual” and “Automated” Testing

What Exploratory Testing Is Not (Part 3): Tool-Free Testing

Saturday, December 17th, 2011

People often make a distinction between “automated” and “exploratory” testing. This is like the distinction between “red” cars and “family” cars. That is, “red” (colour) and “family” (some notion of purpose) are in orthogonal categories. A car can be one colour or another irrespective of its purpose, and a car can be used for a particular purpose irrespective of its colour. Testing, whether exploratory or not, can make heavy or light use of tools. Testing, whether it entails the use of tools or not, can be highly scripted or highly exploratory.

“Exploratory” testing is not “manual” testing. “Manual” isn’t a useful word for describing software testing in any case. When you’re testing, it’s not the hands that do the testing, any more than when you’re riding a pedal bike it’s the feet that do the bike-riding. The brain does the testing; the hands, at best, provide one means of input and interaction with the thing we’re testing. And not even “manual” testing is manual in the sense of being tool- or machinery-free. You do you use a computer when you’re testing, don’t you?

(Well, mostly, but not always. If you’re reviewing requirements, specifications, code, or documentation, you might be looking at paper, but you’re still testing. A thought experiment or a conversation about a product is a kind of a test; you’re questioning something in order to evaluate it, pitting ideas against other ideas in an unscripted way. While you’re reviewing, are you using a pen to annotate the paper you’re reading? A notepad to record your observations? Sticky tabs to mark important places in the text? Then you’re using tools, low-tech as they might be.)

Some people think of test automation in terms of a robot that pounds on virtual keys more quickly, more reliably, and more deterministically than a human could. That’s certainly one potential notion of test automation, but it’s very limiting. That traditional view of test automation focuses on performing checks, but that’s not the only way in which automation can help testing.

In the Rapid Software Testing class, James Bach and I suggest a more expansive view of test automation: any use of (software- or hardware-based) tools to support testing. This helps keeps us open to the idea that machines can help us with almost any of the mimeomorphic, non-sapient aspects of testing, so that we can focus on and add power to the polimorphic, sapient aspects. Exploration is polimorphic activity, but it can include and be supported by mimeomorphic actions. Cem Kaner and Doug Hoffman take a similar tack: exploratory test automation is “computer-assisted testing that supports learning of new information about the quality of the software under test.” Learning new information is one of the hallmarks of exploratory testing, which usually points towards emphasizing variation rather than repetition.

That said, there can be a role for mechanized repetition, even when you’re using a highly exploratory approach: when repeating aspects of the test are intended to support discovery of something new or surprising. The key is not whether you’re mechanizing the activity. The key is what happens at the end of the activity. The less the results of one activity are permitted to inform the next, the more scripted the approach. If the repetition is part of a learning loop—a cycle of probing, discovering, investigating, and interpreting—that feeds back on itself immediately, then the approach is exploratory. James has also posted a number of motivations for repeating tests. Each one can (with the possible exception of “avoidance or indifference”) be entirely consistent with and supportive of exploration.

There are some actions that tools can perform better than humans, as long as the action doesn’t require human judgment or wisdom. Humanity can even get in the way of some desirable outcome. For example, when your exploration of some aspect of a product is based on statistical analysis, and randomization is part of the test design, it’s important to remember that people are downright lousy at generating randomized data. Even when people believe that they’re choosing numbers at random, there are underlying (and usually quite unconscious) patterns and biases that inform their choices. If you want random numbers, tools can help.

Tools can support exploration in plenty of other ways: data generation, system configuration; simulation; logging and video capture; probes that examine the internal state of the system; oracles that detect certain kinds of error conditions in a product or generate plausible results for comparison; visualization of data sets, key elements to observe, relationships, or timing; recording and reporting of test activity.

A few years back, I was doing testing of a teller workstation application at a bank (I’ve written about this in How to Reduce the Cost of Software Testing). The other testers, working on domestic transactions, were working from scripts that contained painfully detailed and explicit steps and observations. (Part of the pain came from the fact that the scripts were supplemented with screen shots, and the text and the images didn’t always agree.) My testing assignment involved foreign exchange, and the testing tasks I had been given were unscripted and, to a large degree, self-determined. In order to learn the application quickly, I had to explore, but this in no way meant that I didn’t use tools. On the contrary, in fact. In that context, Excel was the most readily available and powerful tool on hand. I used it (and its embedded Visual Basic for Applications) to:

  • maintain and update (at a key stroke) enormous tables of currencies, rates, and transaction types
  • access appropriate entries from the table via regular expression parsing
  • model the business rules of the application under test
  • display the intended flow of money through a transaction
  • add visual emphasis to the salient outcomes of tests and test scenarios
  • provide, using a comparable algorithm, clear results to which the product’s results could be compared
  • help in performing extremely rapid evaluation of a test idea
  • create tables of customer data so that I could perform a test using a variety of personas
  • accelerate my understanding of the product and the test space
  • enhance my learning about Boolean algebra and how it could be used in algorithms
  • record my work and illustrate outcomes for my clients
  • perform quick calculations when necessary
  • help me find more actual problems than the other four testers combined

All of this activity happened in a highly exploratory way; each of the activities interacted with the others. I used very rapid cycles of looking at what I needed to learn next about the application, experimenting with and performing tests, programming, asking questions of subject matter experts and programmers and managers, reporting, reading reference documentation, debugging, and learning. Tight loops of activities happening in parallel are what characterize exploratory processes. Yet this was not tool-free work; tools were absolutely central to my exploration of the product, to my learning about it, and to the mission of finding bugs. Indeed, without the tools, I would have had much more limited ideas about what could be tested, and how it could be tested.

The explorers of old used tools: compasses and astrolabes, maps and charts, ropes and pulleys, ships and wagons. These days, software testers explore applications by using mind-mapping software and text editors; spreadsheets and calculators; data generation tools and search engines; scripting tools and automation frameworks. The concept that characterizes exploratory testing is not the input mechanism, which can be fingers on a keyboard, tables of data pumped into the program via API calls, bits delivered through the network, signals from a variable voltage controller. Exploratory testing is about the way you work, and the extent to which test design, test execution, and learning support and reinforce each other. Tools are often a critical part of that process.

Can Exploratory Testing Be Automated?

Wednesday, September 22nd, 2010

In a comment on the previous post, Rahul asks,

One doubt which is lingering in my mind for quite sometime now, “Can exploratory testing be automated?”

There are (at least) two ways to interpret and answer that question. Let’s look first at answering the literal version of the question, by looking at Cem Kaner’s definition of exploratory testing:

Exploratory software testing is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.

If we take this definition of exploratory testing, we see that it’s not a thing that a person does, so much as a way that a person does it. An exploratory approach emphasizes the individual tester, and his/her freedom and responsibility. The definition identifies design, interpretation, and learning as key elements of an exploratory approach. None of these are things that we associate with machines or automation, except in terms of automation as a medium in the McLuhan sense: an extension (or enablement, or enhancement, or acceleration, or intensification) of human capabilities. The machine to a great degree handles the execution part, but the work in getting the machine to do it is governed by exploratory—not scripted—work.

Which brings us to the second way of looking at the question: can an exploratory approach include automation? The answer there is absolutely Yes.

Some people might have a problem with the idea, because of a parsimonious view of what test automation is, or does. To some, test automation is “getting the machine to perform the test”. I call that checking. I prefer to think of test automation in terms of what we say in the Rapid Software Testing course: test automation is any use of tools to support testing.

If yes then up to what extent? While I do exploration (investigation) on a product, I do whatever comes to my mind by thinking in reverse direction as how this piece of functionality would break? I am not sure if my approach is correct but so far it’s been working for me.

That’s certainly one way of applying the idea. Note that when you think in a reverse direction, you’re not following a script. “Thinking backwards” isn’t an algorithm; it’s a heuristic approach that you apply and that you interact with. Yet there’s more to test automation than breaking. I like your use of “investigation”, which to me suggests that you can use automation in any way to assist learning something about the program.

I read somewhere on Shrini Kulkarni’s blog that automating exploratory testing is an oxymoron, is it so?

In the first sense of the question, Yes, it is an oxymoron. Machines can do checking, but they can’t do testing, because they’re missing the ability to evaluate. Here, I don’t mean “evaluation” in the sense of performing a calculation and setting a bit. I mean evaluation in the sense of making a determination about what people value; what they might choose or prefer.

In the second way of interpreting the question, automating exploratory testing is impossible—but using automation as part of an exploratory process is entirely possible. Moreover, it can be exceedingly powerful, about which more below.

I see a general perception among junior testers (even among ignorant seniors) that in exploratory testing, there are no scripts (read test cases) to follow but first version of the definition i.e. “simultaneous test design, test execution, and learning” talks about test design also, which I have been following by writing basic test cases, building my understanding and then observing the application’s behavior once it is done, I move back to update the test cases and this continues till stakeholders agree with state of the application.

Please guide if it is what you call exploratory testing or my understanding of exploratory testing needs modifications.

That is an exploratory process, isn’t it? Let’s use the rubric of Kaner’s definition: it’s a style of working; it emphasizes your freedom and responsibility; it’s focused on optimizing the quality of your work; it treats design, execution, interpretation, and learning in a mutually supportive way; and it continues throughout the project. Yet it seems that the focus of what you’re trying to get to is a set of checks. Automation-assisted exploration can be very good for that, but it can be good for so much more besides.

So, modification? No, probably not much, so it seems. Expansion, maybe. Let me give you an example.

A while ago, I developed a program to be used in our testing classes. I developed that program test-first, creating some examples of input that it should accept and process, and input that it should reject. That was an exploratory process, in that I designed, executed, and interpreted unit checks, and I learned. It was also an automated process, to the degree that the execution of the checks and the aggregating and reporting of results was handled by the test framework. I used the result of each test, each set of checks, to inform both my design of the next check and the design of the program. So let me state this clearly:

Test-driven development is an exploratory process.

The running of the checks is not an exploratory process; that’s entirely scripted. But the design of the checks, the interpretation of the checks, the learning derived from the checks, the looping back into more design or coding of either program code or test code, or of interactive tests that don’t rely on automation so much: that’s all exploratory stuff.

The program that I wrote is a kind of puzzle that requires class participants to test and reverse-engineer what the program does. That’s an exploratory process; there aren’t scripted approaches to reverse engineering something, because the first unexpected piece of information derails the script. In workshopping this program with colleagues, one in particular—James Lyndsay—got curious about something that he saw. Curiosity can’t be automated. He decided to generate some test values to refine what he had discovered in earlier exploration. Sapient decisions can’t be automated. He used Excel, which is a powerful test automation tool, when you use it to support testing. He invented a couple of formulas. Invention can’t be automated. The formulas allowed Excel to generate a great big table. The actual generation of the data can be automated. He took that data from Excel, and used the Windows clipboard to throw the data against the input mechanism of the puzzle. Sending the output of one program to the input of another can be automated. The puzzle, as I wrote it, generates a log file automatically. Output logging can be automated. James noticed the logs without me telling him about them. Noticing can’t be automated. Since the program had just put out 256 lines of output, James scanned it with his eyes, looking for patterns in the output. Looking for specific patterns and noticing them can’t be automated unless and until you know what to look for, BUT automation can help to reveal hitherto unnoticed patterns by changing the context of your observation. James decided that the output he was observing was very interesting. Deciding whether something is interesting can’t be automated. James could have filtered the output by grepping for other instance of that pattern. Searching for a pattern, using regular expressions, is something that can be automated. James instead decided that a visual scan was fast enough and valuable enough for the task at hand. Evaluation of cost and value, and making decisions about them, can’t be automated. He discovered the answer to the puzzle that I had expressed in the program… and he identified results that blew my mind—ways in which the program was interpreting data in a way that was entirely correct, but far beyond my model of what I thought the program did.

Learning can’t be automated. Yet there is no way that we would have learned this so quickly without automation. The automation didn’t do the exploration on its own; instead, automation super-charged our exploration. There were no automated checks in the testing that we did, so no automation in the record-and-playback sense, no automation in the expected/predicted result sense. Since then, I’ve done much more investigation of that seemingly simple puzzle, in which I’ve fed back what I’ve learned into more testing, using variations on James’ technique to explore the input and output space a lot more. And I’ve discovered that the program is far more complex than I could have imagined.

So: is that automating exploratory testing? I don’t think so. Is that using automation to assist an exploratory process? Absolutely.

For a more thorough treatment of exploratory approaches to automation, see

Investment Modeling as an Exemplar of Exploratory Test Automation (Cem Kaner)

Boost Your Testing Superpowers (James Bach)

Man and Machine: Combining the Power of the Human Mind with Automation Tools (Jonathan Kohl)

“Agile Automation” an Oxymoron? Resolved and Testing as a Creative Endeavor (Karen Wysopal)

…and those are just a few.

Thank you, Rahul, for the question.

Questions from Listeners (2): Is Unit Testing Automated?

Monday, June 28th, 2010

On April 19, 2010, I was interviewed by Gil Broza.  In preparation for that interview, we solicited questions from the listeners, and I promised to answer them either in the interview or in my blog.  Here’s the second one.

Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

This question provides a great opportunity to look at a number of points—so many that I’d like to address only the first sentence in the question this time around. I’ll look at the second part of the question later on.

Expansive Definitions

I find the most helpful definitions and descriptions to be those that are expansive and inclusive. While testing, one big risk is that I might have narrow ideas about certain risks or threats to the value of the product. Thinking expansively helps me to avoid tunnel vision that would lead to my missing important problems. In conversations, thinking expansively helps me to remain alert to the possibility that the other person and I might be talking at cross-purposes. That can happen when one of us uses a word that means different things to each of us. It can also happen when we’re thinking of the same thing, but using different words. In fact, as Jerry Weinberg once remarked to James Bach, “A tester is someone who knows that things can be different.” Here’s an example of that. The questioner says that “unit testing is automated”. I’d argue that this refers to one part of testing, test execution, the part we can automate. Well, to me, things can be different.

Testing Includes Many Activities

Testing includes not only test execution, but also test design, learning, and reporting, all performed in cycles or loops. What is test design? As we say in the Rapid Software Testing course notes, test design includes

  • modeling the test space (that is, considering questions of what we could test; what’s in scope);
  • determining oracles (that is, figuring out the principles or mechanisms by which we’d recognize a problem, and considering how those principles or mechanisms might fail to help us recognize a problem)
  • determining coverage (that is, how much testing we’re going to do, given the scope)
  • determining procedures (how we’re going to perform the tests; how we’ll go about the business of test execution)

Test execution includes

  • configuring the product (obtaining it, setting it up for the purposes of a given test)
  • operating the product (exercising the product in some way to obtain coverage)
  • observing the product (applying the oracles that we’ve determined in advance, but also recognizing behaviours that trigger us to recognize and apply new oracles)
  • evaluating the product (comparing its behaviour to our oracles)
  • applying a stopping heuristic (deciding when the test is done)
  • Test execution may or may not include reporting, but reporting happens at some point. And when testing is being done well, learning is happening pretty much all the time. This isn’t a strictly linear process, by the way. Depending on your approach to testing, and depending on what you’re these things may happen in the order that you see above, or they may happen all at once in an organic tangled ball, with lots of tight little loops. Sometimes all of the elements of testing are done by the same person, and the elements interact with each other very quickly. Sometimes one person designs a test and another person handles the execution, in which case the loops will be long or broken. If you separate test design and test execution (as happens in scripted testing), you separate the learning associated with each. Sometimes we’ll evaluate a result and stop a test; sometimes we’ll stop first and then interpret what we’ve seen. For a given test, some aspects may take much longer than others; some may be done more consciously or thoughtfully than others. But at some point in pretty much every test, each of the steps above happen.

    Unit Testing Includes Many Activities

    Like any other kind of testing, unit testing consists of cycles of design, execution, learning, and reporting. Like any other test, a unit test starts with some person having a test idea, a question that we want to ask about the program. A person designing a unit test typically frames that question in terms of a check—an observation linked to a decision rule such that both can be performed by a machine. The person writes program code to express that yes-or-no question, usually assisted by some kind of unit testing framework. Next, some person—or, more often, some process that a person has initiated—performs the checks. The check produces a result. Sometimes a person observes that result independently of other results; more often, some person (the author of the automation framework) has programmed a mechanism that provides a means of aggregating the results. Then some person interprets the aggregated results and figures out what needs to be done next—whether everything is okay, whether a test result suggests that the product should be revised, or whether the check is excellent or wanting or broken irrelevant. And then the development cycle continues, in a loop that includes some development of the actual product too.

    Most Parts of Unit Testing Are Sapient, Not Mechanical

    Notice how many times the word “person” appears in the above description of unit testing. None of the steps in the process (with the exception of the running of the checks) can be automated, since each step requires a thinking person, rather than a machine, to seek information, to make decisions, and to control the overall process. Parts of unit testing can be assisted by automation, but the automation isn’t doing anything particularly on its own; it remains an extension of the person’s ability to execute and to observe.

    What form might unit test automation take? Many people think in terms of a testing framework that sets up some conditions, executes some code from the product under test, makes some assertions about the output of some function or some aspect the state of the system. That’s cool, and quite powerful. But for years at Quarterdeck, I watched programmers doing unit testing (and did some myself) by stepping though code under various debuggers (DEBUG, SYMDEB, WDEB386, or Soft-ICE, a software-based simulacrum of an in-circuit emulator), watching the registers and the ports for each instruction. Sometimes I’m writing some stuff in Ruby, and I want to do a quick little test of a fairly trivial function that I know I’m going to throw away. In that case, I don’t bother with the testing framework; I run the code and inspect the variables in IRB, the Ruby interpreter, and get my information that way. Sometimes I write a function, and generate some data to test it using automation. Sometimes, while unit testing, I use tools to examine the contents of a database table or a file or the Windows registry. Are all these different things unit testing? Jerry Weinberg says that testing is “gathering information with the intention of informing a decision”. I’m testing a unit, and I’m using automation to assist that testing, even though (so it seems) people tend to hold a more narrow view of what unit testing is. Unit testing is testing done at the unit level.

    Is stepping through the code the way that we should always do unit testing? Of course not. For the purpose of creating easily-runnable change detectors, the unit test framework is the way to go. Yet different approaches, tools, and techniques that we employ allow us to observe in different ways, discover different problems, and learn different things about the unit under test.

    Finally, it’s important to note that the development of unit-level checks tends to reveal more problems than the running of them. Chip Groeder won a best paper award at the STAR conference in 1997, in which he claimed that 88% of the bugs that he found with automated tests were found during development of the tests (that is, the non-automated parts of the testing). (Thanks to Cem Kaner for pointing me to this.)  Anecdotally, everyone that I speak to who uses automation for the execution of tests—whether at the unit level or not—says exactly the same thing.  That’s not to say that automated checks are useless.  On the contrary; checks, as change detectors, are very useful.  Instead, my point is that unit testing is not automated; not the interesting parts. Unit checking is automated.

    In summary:

    • Unit testing is a highly exploratory process, in the that the loops are short, tightly integrated, and typically performed by the same person.
    • The most important parts of unit test are the sapient parts—the design, programming, design of reports, interpretation of results, and the evaluation of what to do next.
    • The scripted part of unit testing—the execution of the checks—is the least interesting part of unit testing. And yet…
    • Many people seem to be fascinated by the mechanical parts, dazzled by lines on the screen, blissful upon observation of the green bar. And the same people say things like “unit testing is automated”. Why is that?

    That’s a lot for now. I’ll answer the rest of the question in a future post.

    Heuristics and Leadership

    Friday, May 28th, 2010

    In a recent blog post, James Bach discusses the essence of heuristics. A heuristic is a fallible method for solving a problem or making a decision. When used as an adjective, “heuristic” means fallible and conducive to learning. James ends the post by introducing a number of questions in order to test whether someone is teaching you a heuristic effectively. Meeta Prakash, in the comments, remarks “Your questions sound so much like my idea and interpretation of a ‘leader’.”

    Yes, Meeta, the asking of those questions sounds like leadership to me too. What is leadership? Leadership is “creating an environment in which everyone is empowered” (that’s Jerry Weinberg’s definition). We believe in that. We believe that excellent testing starts with the skill set and the mindset of the individual tester. Other things might help, but excellence in testing must centre on the tester.

    Discussion of the Method is Billy Vaughan Koen’s book on engineering. He describes the engineering method as “the use of heuristics to cause the best change in a poorly understood situation within the available resources”. He notes that the individual engineer, the engineering organization in which he works, and the engineering discipline overall each has a state of the art, which he abbreviates as “sota”. These sotas overlap one another in some places and cover new ground elsewhere. Each leads and lags the others in certain areas. The overall discipline has a sota that is more advanced than that of the organization and the individual in some places. The organization’s sota is aware of things that neither the individual nor the discipline has yet recognized. Each individual’s sota contains some knowledge that is unknown to both the organization and the discipline.

    Each sota may advance before that of the other two, and the advances are ongoing. Each sota evolves in a different context. For each stage of evolution, we can’t be certain about what factors might be relevant to success or failure. Thus no practice nor method nor approach can deemed to be best. We don’t know, can’t know, whether some method is best; we don’t know, can’t know, whether someone might invent a better method. We don’t know, can’t know, whether someone has already invented a better method. What we can do, however, is to create environments in which everyone is empowered to discover and apply new heuristics along with those that we already know. That’s important because the discipline and the organization never advance on their own. They advance when someone has the inspiration, the initiative, the courage, and the opportunity to try something new, uncertain as to whether the new approach will work.

    Organizations and individuals can foster initiative by empowering people (or at the very least, by leaving them alone). The overall discipline tends not to encourage initiative much, so it seems. With prescriptive, restrictive standards, disciplines often discourage innovation. Why would disciplines do this? One reason is that disciplines are typically led by experts who, as McLuhan said, are heavily invested in their own expertise, and therefore resist change that would threaten that investment. Propaganda and the threat of unemployment are among the crude tools that experts wield.  Want evidence for that in testing? You need look no further than the “experts”, the certifiers, and the standards enthusiasts in our craft; their narrow and flawed models of evidence and measurement; their intolerance of uncertainty; and their resistance to acknowledging the exploratory mindset. Yet without exploration, progress in any domain ceases while the rest of the world rushes past.

    “All is heuristic,” declares Koen. That’s an absolute statement which appears to declare that there are no absolutes. But rather than denying the paradox, Koen embraces it. All is heuristic, he says, including the heuristic that all is heuristic. The notion that all is heuristic is pretty robust. Koen points out that even algorithms have contexts in which they work and contexts in which they fail, and that algorithms must be chosen by people applying judgement and skill in uncertain conditions. Yet he leaves open the possibility that someone, somewhere, some day, might discover an infallible method for solving a problem.

    James, our colleagues, and I deal with a similar paradox when we contend that all testing is heuristic. There’s no test for infallibility! It might be that there are some occasions when there are infallible methods for testing. It’s just that we’ve never seen one, and that we can’t currently imagine a case in which a process or a standard could—without the application of judgement or skill—be guaranteed to solve some testing problem. That’s because skilled testers (let’s call them “we”, without claiming expertise, but asserting that we are students of the craft) have specific advantages over process models, methods, and tools (“they”):

    • We have situational awareness (they don’t).
    • We work from the assumption that we’re fallible (they don’t).
    • We have the capacity to make judgements on questions of cost vs. value (they don’t).
    • We have the ability to apply a stopping heuristic at any time (they don’t).
    • We have the intelligence to choose which heuristics are applicable and which are not (they don’t).
    • We have the opportunity to consult, in real time, with our clients (they don’t).
    • We have the inventiveness to work around a problem (they don’t).
    • We have the sensory appartus to determine when a heuristic is failing (they don’t).
    • We have the humanity to notice something unexpected that might be a problem for people (they don’t).
    • We have the capacity to learn (they don’t).

    And that’s only a partial list. A couple of notes: every one of these capabilities is itself heuristic; and it should be clear that I’m not talking only about skilled testers, but about any skilled discipline.

    It’s not that bodies of knowledge or process models or standards never have anything interesting to say (although, in my experience, most of them do tend to be written in a style that induces immediate and profound slumber). It’s that none of these tools have any intrinsic relevance or value unless and until they are applied by people. If the tools are to be applied effectively, we need to recognize that they are all heuristic. We must test these heurstics and their validity with questions like the ones James poses.  We must also recognize the people applying them must have the skills to know how, when, and when not to apply them. To develop those heuristics and those skills, we need to create an environment in which everyone is empowered. That is what we believe.

    I Update My Blog and Discover Testing Tools

    Sunday, March 21st, 2010

    For the last few weeks, I’ve been updating my blog and my Web site.  This was inspired largely by Blogger’s decision to drop support for blog publishing via FTP.  That would mean moving the blog to a .blogspot.com site, or to a custom domain that wouldn’t be developsense.com or a subdomain of it (later:  not a subdomain, but a subfolder of http://www.developsense.com).  Ugh. Many of my colleagues have taken to using WordPress, and I’ve been admiring the look and feel and features of their blogs, so off I went.  Making the conversion has been a little arduous, but that’s largely because I’ve done a few things in addition to the conversion; I’ve made the blog look much more like the rest of my site, I’ve fixed a number of problems, and added a number of new features.

    Along the way, I got quite a bit of help from a number of online resources and tools that I feel are worth mentioning, especially for testers who seek to learn about some of the underlying technologies.

    W3 Schools (http://www.w3schools.com).  This site offers tutorials and references for most of the important Web technologies, including HTML, XHTML, CSS, PHP, and plenty more. One of the coolest things about the W3Schools site is its ability to provide you with interactive examples via the TryIt editor: in one pane, you type text; in the other, you see the effects immediately. Short feedback loops are a great way of learning, for me.

    Rubular (http://www.rubular.com). This handy online tool focuses on regular expressions.  Like TryIt, it allows you to experiment with regular expressions and see their effects immediately.  When you make a mistake, it’s easy to do experiments that explain it.

    Web Developer Toolbar for Firefox (https://addons.mozilla.org/en-US/firefox/addon/60). I found this browser tool indispensible for figuring out gnarly (and largely self-imposed) CSS problems.  Among many other things, it allows you to trace the trail of styles that apply to a particular element in the browser window. Once you’ve done that, you can review the style sheets that are being applied to a page, and edit the styles on the fly.  This is to CSS what a really good debugger is to other kinds of code.  I now find it really easy to figure out problems not only on my own site, but also on other people’s sites, and I can perform experiments that test out possible solutions.

    CSS Based Design (http://adactio.com/articles/1109/).  This article by Jeremy Keith (who happens to be the fellow behind The Session http://www.thesession.org, a wonderful Irish traditional music resource) is so old, by Web standards, that it might as well have been written on stone tablets.  But it’s also as direct, clear, and authoritative as other stuff written on stone tablets. It also provides the clearest and simplest explanation of margins, borders, and padding that I’ve been able to find.

    BrowserShots.org (http://www.browsershots.org).  Want to know what your page looks like on another browser?  Want to know what your page looks like on another 46 browsers?  Pass the address to BrowserShots, wait a while, and you’ll get to see the page on (as of this writing) up to 47 different browsers. (Admittedly, this begs the question as to why there are 47 different browsers, but I digress.) It’s a free and popular service, so there is a queue.  Submit your page, go off and do something else.

    The WordPress Codex (http://code.wordpress.org).  This one is of less general utility, but if you’re setting up or troubleshooting a WordPress blog, it’s indispensible.

    As for offline tools, the hands-down award winner is TextPad.  I registered my first copy of it in 1997.  It probably has the highest value per cost ratio for any software product that I’ve ever purchased.