Blog Posts for the ‘Regression’ Category

If We Do Sanity Testing Before Release, Do We Have To Do Regression Testing?

Monday, December 3rd, 2018

Here is an edition of the reply I offered to a question that someone asked on Quora. Bear in mind that it might be a good idea to follow the links for context.

If we do sanity testing before release, do we have to do regression testing?

What if I told you Yes? What if I told you No?

Some questions shouldn’t be answered. That is: some questions shouldn’t be answered with a Yes or a No without addressing the context first. No one can give you a good answer to your question unless they know you, your product, and your project’s context.

Even after that problem is addressed, people outside your context may not know what you mean by regression testing or sanity testing, and you can’t be sure of knowing what they mean. That applies to other terms in the conversation, too; maybe they’ll talk about “manual testing”; I don’t believe there’s such a thing as “manual testing”. Maybe you agree with them now; maybe you’ll agree with me after you’ve read the linked post. Or maybe after you read this one.

Some people will suggest that regression testing and sanity testing are fundamentally different somehow; I’d contend that a sanity test may be a shallow form of regression testing, when the sanity test is what I’ve talked about here, and when regression testing is testing focused on regression- or change-related risk. In order to sort that out, you’d have to talk it through to make sure that you’re not in shallow agreement.

Nonetheless, some people will try to answer your question. To prepare you for some of those answers: it’s probably not very helpful to think about needing to do one kind of testing or the other. It’s probably more helpful to think in terms of what you and your organization want to do, and choosing what to do based on what (you believe) you know about your product, and what (you believe) you want to know about it, given the situation.

While this is not an exhaustive list, here are a few factors to consider:

  • Do you and the developers already have a lot of experience with your product?
  • Is your product being developed in a careful, disciplined way?
  • Are the developers making small, simple, incremental changes that they comprehend the risks well?
  • Is the product relatively well insulated from dependencies on platforms (hardware, operating systems, middleware, browers…) that vary a lot?
  • Are there already plenty of unit-level checks in place, such that the developers are likely to be aware of low-level coding errors early and easily?
  • Is it unusual to do a shallow pass through the features of the product and find bugs that are sticking out like a sore thumb?
  • Do you and the developers feel like they’re working at a sustainable pace?

If the answer to all of those questions is Yes, then maybe your regression testing can afford to be more focused on deep, rare, hidden, subtle, emergent problems, which are unlikely to be revealed by a sanity test. Or maybe your product (or a given feature, or a given change, or whatever you’re focused on) entails relatively low risk, so deep regression testing isn’t necessary and a sanity test will do. Or maybe your product is poorly-understood and has changed a lot, so both sanity checking and deep regression testing could be important to you.

I can offer things for you to think about, but I don’t think it’s appropriate for me or anyone else to answer your question for you. The good news is that if you study testing seriously, practice testing, and learn to test, you’ll be able to make this determination in collaboration with your team, and answer the question for yourself.

James Bach and I teach Rapid Software Testing to help people to become smart, powerful, helpful, independent testers, with agency over their work. If you want help with learning about Rapid Software Testing for yourself or for your team, find out how you can attend a public class, live or on line, or request one in-house.

Exploratory Testing on an API? (Part 4)

Wednesday, November 21st, 2018

As promised, (at last!) here are some follow-up notes on previous installments in the series that starts here.

Let’s revisit the original question:

Do you perform any exploratory testing on APIs? How do you do it?

To review: there’s a problem with the question. Asking about “exploratory testing” is a little like asking about “vegetarian cauliflower”, “carbon-based human beings”, or “metallic copper”. Testing is fundamentally exploratory. Testing is an attempt by a self-guided agent to discover something unknown; to learn something about the product, with a special focus on finding problems and revealing risk.

One way to learn about the product is to develop and run a set of automated checks that access the product via the API. That is, a person writes code to direct the machine to operate and observe the product, apply decision rules, and report the outputs. This produces a script, but producing a script is not a scripted process! Learning about the product, designing the checks, and developing the code to implement them checks are all exploratory processes. Interpreting and explaining and reporting on what happened are exploratory processes too.

When we use a machine to execute automated checks, there’s no discovery or learning involved in the performance of the check itself; the machine is doing the executing, and the machine doesn’t
discover or learn anything. We do.

Checks are like the system on your car that monitors certain aspects of your engine and illuminates the “check engine” light. Checks are, in essence, tools by which people become aware of specific conditions in the product. The machine learns no more than the dashboard light learns. It’s the humans, aided by tools that might include checking, who learn.

People learn as they interpret outcomes of the checks and investigate problems that the checks seem to indicate. The machinery has no way of knowing whether a reported “failed” output represent a problem in the product or a problem in the check. Checks don’t know the difference between “changed” and “broken”, between “different” and “fixed”, between “good” and “bad”. The machinery also has no way of knowing whether a reported “passed” output really means that the product is trouble-free; a problem in the check could be masking a problem in the product.

Testing via an API is exploratory testing via an API, because exploratory testing is, simply, testing. (Exploratory) testing not simply “acting like a user”, “testing without tools”, or “a manual testing technique”.

Throughout this whole series, what I’ve been doing is not “manual testing”, and it’s not “automated testing” either. I use tools to get access to the product via the API, but that’s not automated testing. There is no automated testing. Testing is neither manual nor automated. No one talks about “automated” or “manual” experiments. No one talks about “manual” or “automated” research. Testing is done neither by the hands, nor by the machinery, but by minds.

Tools do play an important role in testing. We use tools to extend, enhance, enable, accelerate, and intensify the testing we do. Tools play an important role in programming too, but no one refers to “manual programming”. No one calls compiling “automated programming”. Compiling is something that a machine can do; it is a translation between on set of strings (source code) and another set of strings (machine code). This is not to dismiss the role of the compiler, or of compiler writers; indeed, writing a sophisticated compiler is a job for an advanced programmer.

Programming starts in the complex, imprecise, social world of humans. Designers and programmers repair messy human communication into a form that’s so orderly that a brainless machine can follow the necessary instructions and perform the intended tasks. Throughout the development process, testers explore and experiment with the product to find problems, to help the development team and the business to decide whether the product they’ve got is the product they want. Tools can help, but none these processes cannot be automated.

In the testing work that I’ve described in the previous posts, I haven’t been “testing like a user”. Who uses APIs? It might be tempting to answer “application programs” (it’s an application programming interface, after all), or “machines”. But the real users of an API are human beings. These include the direct users—the various developers who write code to take advantage of what a product offers—and the indirect users—the people who use the products that programmers develop. For sure, some of my testing has been informed by ideas about actions of users of an API. That’s part of testing like a tester.

In several important ways, there’s a lot of opportunity for testability through APIs. Very generally, components and services with APIs tend to be of a smaller scale than entire applications, so studying and understanding them can be much more tractable. An API is deterministic and machine-specific. That means means that certain kinds of risks due to human variability are of lower concern than they might be through a GUI, where all kinds of things can happen at any time.

The API is by definition a programming interface, so it’s natural to use that interface for automated checking. You can use validator checks to detect problems with the syntax of the output, or parallel algorithms to check the semantics of transactions through an API.

Once they’re written, it’s easy to repeat such checks, especially to detect regressions, but be careful. In Rapid Software Testing, regression testing isn’t simply repetition of checks. To us, regression testing means testing focused on risk related to change; “going backwards” (which is what “regress” means; the opposite of “progress”).

A good regression testing strategy is focused on what has changed, how it has changed, and what might be affected. That would involve understanding what has changed; testing the change itself; exploring around that; and a smattering of testing of stuff that should be unaffected by the change (to reveal hidden or misunderstood risk). This applies whether you are testing via the API or not; whether you have a set of automated checks or not; whether you run checks continuously or not.

If you are using automated checks, remember that they can help to detect unanticipated variations from specified results, but they don’t show that everything works, and they don’t show that nothing has broken. Instead, checks verify that output from given functions are consistent from one build to the next. That’s only one kind of oracle, Do not simply confirm that everything is OK; actively search for problems. Explore around. Are all the checks passing? Ask “What else could go wrong?”

Automated checks can take on special relevance when they’re in the form of contract testschecks. The idea here is to solicit sets of inputs and outputs from actual consumers of an API that represent specified, desired queries and results, and to agree to check the contract periodically from both the supplier and consumer ends. Because they’re agreed upon by both parties, contract checks are really very good for first-order functional checking. Nonetheless, remember that such checks are heavily focused on confirmation, and not on discovery of problems and risks that aren’t covered by the contracts. Testing doesn’t stop with the contract checks passing; it starts there.

That is, now that you’ve gone to the trouble of writing code to check for specific outputs, why stop there? I’ve used checks in an exploratory way by:

  • varying the input systematically to look for problems related to missing or malformed data, extreme values, messed-up character handling, and other foreseeable data-related bugs;
  • varying the input more randomly (“fuzzing” is one instance of this technique), to help discover surprising hidden boundaries or potential security vulnerabilities;
  • varying the order and sequences of input, to look for subtle state-related bugs;
  • writing routines to stress the product, pumping lots of transactions and lots of data through it, to find performance-related bugs;
  • capturing data (like particular values) or metadata (like transaction times) associated with the checks, visualizing it, and analyzing it, to see problems and understand risks in new ways.

A while back, Peter Houghton told me an elegant example of using checking in exploration. Given an API to a component, he produces a simple script that calls the same function thousands of times from a loop and benchmarks the time that the process took. Periodically he re-runs the script and compares the timing to the first run. If he sees a significant change in the timing, he investigates. About half the time, he says, he finds a bug.

So, to sum up: all testing is exploratory. Exploration is aided by tools, and automated checking is an approach to using tools. Investigation of the unknown and discover of new knowledge is of the essence of exploration. We must explore to find bugs. All testing on APIs is exploratory.

Very Short Blog Posts (17): Regression Obsession

Thursday, April 24th, 2014

Regression testing is focused on the risk that something that used to work in some way no longer works that way. A lot of organizations (Agile ones in particular) seem fascinated by regression testing (or checking) above all other testing activities. It’s a good idea to check for the risk of regression, but it’s also a good idea to test for it. Moreover, it’s a good idea to make sure that, in your testing strategy, a focus on regression problems doesn’t overwhelm a search for problems generally—problems rooted in the innumerable risks that may beset products and projects—that may remain undetected by the current suite of regression checks.

One thing for sure: if your regression checks are detecting a large number of regression problems, there’s likely a significant risk of other problems that those checks aren’t detecting. In that case, a tester’s first responsibility may not be to report any particular problem, but to report a much bigger one: regression-friendly environments ratchet up not only product risk, but also project risk, by giving bugs more time and more opportunity to hide. Lots of regression problems suggest a project is not currently maintaining a sustainable pace.

And after all, if a bug clobbers your customer’s data, is the customer’s first question “Is that a regression bug, or is that a new bug?” And if the answer is “That wasn’t a regression; that was a new bug,” do you expect the customer to feel any better?

Related material:

Regression Testing (a presentation from STAR East 2013)
Questions from Listeners (2a): Handling Regression Testing
Testing Problems Are Test Results
You’ve Got Issues

Questions from Listeners (2a): Handling Regression Testing

Saturday, August 7th, 2010

This is a followup to an earlier post, Questions from Listeners (2): Is Unit Testing Automated? The original question was

Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

Now I’ll deal with the second part of the question.

Part One: What Do We Really Mean By “Automation”?

Some people believe that “automation” means “getting the computer to do the testing”. Yet computers don’t do testing any more than compilers do programming, than cruise control does driving, than blenders do cooking. In Rapid Software Testing, James Bach and I teach that test automation is any use of tools to support testing.

When we’re perform tests on a running program, there’s always a computer involved, so automation is always around to some degree. We can use tools to help us configure the program, to help us observe some aspect of the program as it’s running, to generate data, to supply input to the program, to monitor outputs, to parse log files, to provide an oracle against which outputs can be compared, to aggregate and visualize results, to reconfigure the program or the system,… In that sense, all tests can be automated.

Some people believe that tests can be automated. I disagree. Checks can be automated. Checks are a part of an overall program of testing, and can aid it, but testing itself can’t be automated. Testing requires human judgement to determine what will be observed and how it will be observed; testing requires requires human judgement to ascribe meaning to the test result. Human judgement is needed to ascribe significance to the meaning(s) that we ascribe, and human judgement is required to formulate a response to the information we’ve revealed with the test. Is there a problem with the product under test? The test itself? The logical relationship between the test and the product? Is the test relevant or not? Machines can’t answer those questions. In that sense, no test can be automated.

Automation is a medium. That is, it’s an extension of some human capability, not a replacement for it. If we test well, automation can extend that. If we’re testing badly, then automation can help us to test badly at an accelerated rate.

Part Two: Why Re-run Every Test?

My car is 25 years old. Aside from some soon-to-be addressed rust and threadbare upholstery, it’s in very good shape. Why? One big reason is that my mechanic and I are constantly testing it and fixing important problems.

When I’m about to set out on a long journey in my car, I take the it in to Geoffrey, my mechanic. He performs a bunch of tests on the car. Some of those tests are forms of review: he checks his memory and looks over the service log to see which tests he should run. He addresses anything that I’ve identified as being problemmatic or suspicious. Some of Geoffrey’s tests are system-level tests, performed by direct observation: he listens to the engine in the shop and takes the car out for a spin on city streets and on the highway. Some of his tests are functional tests: he applies the brakes to check to see if they lose pressure. Some of his tests are unit tests, assisted by automation: he uses a machine to balance the tires and a gauge to pressure-test the cooling system. Some of his smoke tests are refined by tools: a look at the tires is refined by a pressure gauge; when he sees wear on the tires, he uses a gauge to measure the depth of the tread. Some of his tests are heavily assisted by automation: he has a computer that hooks up to a port on the car, and the computer runs checks that give him gobs of data that would be difficult or impossible for him to obtain otherwise.

When I set out on a medium-length trip, I don’t take the car in, but I still test for certain things. I walk around the car, checking the brake lights and turn signals. I look underneath for evidence of fluid leaks. I fill the car with gas, and while I’m at the gas station, I lift the hood and check the oil and the windshield wiper fluid. For still shorter trips, I do less. I get in, turn on the ignition, and look at the fuel gauge and the rest of the dashboard. I listen to the sound of the engine. I sniff the air for weird smells–gasoline, coolant, burning rubber.

As I’m driving, I’m making observations all the time. Some of those observations happen below the level of my consciousness, only coming to my attention when I’m surprised by something out of the ordinary, like a bad smell or the strange sound. On the road, I’m looking out the window, glancing at the dashboard, listening to the engine, feeling the feedback from the pedals and the steering wheel. If I identify something as a problem, I might ignore it until my next scheduled visit to the mechanic, I might leave it for a little while but still take it in earlier than usual, or I might take the car in right away.

When Geoffrey has done some work, he tells me what he has done, so to some degree I know what he’s tested. I also know that he might have forgotten something in the fix, and that he might not have tested completely, so after the car has been in the shop, I need to be more alert to potential problems, especially those closely related to the fix.

Notice two things: 1) Both Geoffrey and I are testing all the time. 2) Neither Geoffrey nor I repeat all of the tests that we’ve done on every trip, nor on every visit.

When I’m driving, I know that the problems I’m going to encounter as I drive are not restricted to problems with my car. Some problems might have to do with others—pedestrians or animals stepping out in front of me, or other drivers making turns in my path, encroaching on my lane, tailgating. So I must remain attentive, aware of what other people are doing around me. Some problems might have to do with me. I might behave impatiently or incompetently. So it’s important for me to keep track of my mental state, managing my attention and my intention. Some problems have to do with context. I might have to deal with bad weather or driving conditions. On a bright sunny day, I’ll be more concerned about the dangers of reflected glare than about wet roads. If I’ve just filled the tank, I don’t have to think about fuel for another couple hundred miles at least. Because conditions around me change all the time, I might repeat certain patterns of observation and control actions, but I’m not going to repeat every test I’ve ever performed.

Yes, I recognize that software is different. If software were a car, programmers would constantly be adding new parts to the vehicle and refining the parts that are there. On a car, we don’t add new parts very often. More typically, old parts wear out and get replaced. As such, change is happening. After change, we concentrate our observation and testing on things that are most likely to be affected by the change, and on things that are most important. In software, we do exactly the same thing. But in software, we can take an extra step to reduce risk: low-level, automated unit tests that provide change detection and rapid feedback, and which are the first level of defense against accidental breakage. I wrote about that here.

Part Three: Think About Cost, Value, Risk, and Coverage

Testing involves interplay between cost, value, and risk. The risk is generally associated with the unknown—problems that you’re not aware of, and the unknown consequences of those problems. The value is in the information you obtain from performing the test, and in the capacity to make better-informed decisions. There are lots of costs associated with tests. Automation reduces many of those costs (like execution time) and increases others (like development and maintenance time). Every testing activity, irrespective of the level of automation, introduces opportunity costs against potentially more valuable activities. A heavy focus on running tests that we’ve run before—and which have not been finding problems—represents opportunity cost against tests that we’ve never run and that won’t be found by our repeated tests. A focus on the care and feeding of repeated tests diminishes our responsiveness to new risk. A focus on repetition limits our test coverage.

Some people object to the idea of relaxing attention on regression tests, because their regression tests find so many problems. Oddly, these people are often the same people who trot out the old bromide that bugs that are found earlier are less expensive to fix. To those people, I would say this: If your regression tests consistently find problems, you’ll probably want to fix most of them. But there’s another, far more important problem that you’ll want to fix: someone has created an environment that’s favourable to backsliding.

Regression Testing, part I

Wednesday, September 6th, 2006

More traffic from the Agile Testing mailing list; Grig Gheorghiu is a programmer in Los Angeles who has some thoughtful observations and questions.

I’m well aware of the flame wars that are going on between the ‘automate everything’ camp and the ‘rapid testing’ camp. I was hoping you can give some practical, concrete, non-sweeping-generalization-based examples of how your testing strategy looks like for a medium to large project that needs to ensure that every release goes out with no regressions. I agree that not all tests can be automated. For those tests that are not automated, my experience is that you need a lot of manual testers to make sure things don’t slip through the cracks.

That sounds like a sweeping generalization too. 🙂

I can’t provide you with a strategy that ensures that every release goes out with no regressions. Neither can anyone else.

Manual tests are typically slower to run than automated tests. However, they take almost no development time, and they have diversity and high cognitive value, thus they tend to have a higher chance of revealing new bugs. Manual tests can also reveal regression bugs, especially when they’re targeted for that purpose. Human testers are able to change their tests and their strategies based on observations and choices, and they can do it at an instant.

Automated tests can’t do that. At the end-user, system, and integration level, they tend to have a high development cost. At the unit level, the development cost is typically much lower. Unit tests (and even higher-level tests) are typically super-fast to run and ensure that all the tests that used to pass still pass.

When the discussion about manual tests vs. automated tests gets pathological, it’s because some people seem to miss the point that testing is about making choices. We have an infinite number of tests that we could run. Whatever subset of those tests we choose, and no matter how many we think we’re running, we’re still running a divided-by-infinity fraction of all the possibilities. That means that we’re rejecting huge numbers of tests with every test cycle. One of the points of the rapid testing philosophy is that by making intelligent choices about risks and coverage, we change our approach from compulsively rejecting huge numbers of tests to consciously rejecting huge numbers of tests.

So the overall strategy for regression testing is to make choices that address the regression risk. All that any strategy can do–automated, manual, or some combination of the two–is to provide you with some confidence that there has been minimal regression (where minimal might be equal to zero, but no form of testing can prove that).

Other factors can contribute to that confidence. Those factors could include small changes to the code; well-understood changes, based on excellent investigation of the original bug; smart, thoughtful developers; strolls through the code with the debugger; unit testing at the developer level; automated unit testing; paired developers; community ownership of the code; developers experienced with the code base; readable, maintainable, modular code; technical review; manual and automated developer testing of the fix at some higher level than the unit level; good configuration management. People called “testers” may or may not be involved in any of these activities, and I’m probably missing a few. All of these are filters against regression problems, and many of them are testing activities to some degree. Automated tests of some description might be part of the picture too.

When it comes to system-level regression testing, here are the specific parts of one kind of strategy. As soon as some code, any code, is available, rapid testers will test the change immediately after the repair is done. In addition, they’ll do significant manual testing around the fix. This need not require a whole bunch of testers, but investigative skills are important. Generally speaking, a fix for a well-investigated and well-reproduced bug can be tested pretty quickly, and feedback about success or failure of the fix can also be provided rapidly. We could automate tests at this higher level, but in my experience, at this point it is typically far more valuable to take advantage of the human tester’s cognitive skills to make sure that nothing else is broken. (Naturally, if a tool extends our capabilities in some form, we’ll use it if the cost is low and the value high.)

In an ideal scenario, it’s the unit tests that help to ensure that this particular problem doesn’t happen again. But even in a non-ideal scenario, things tend to be okay. Other filters heuristically capture the regression problems–and when regressions make it as far as human testers, thoughtful people will usually recognize that it’s a serious problem. If that happens, developers and development management tend to make changes farther up the line.

It was ever thus. In 1999, at the first Los Altos Workshop on Software Testing, a non-scientific survey of a roundtable of very experienced testers indicated that regression tests represented about 15% of the total bugs found over the course of the project, and that the majority of these were found in development or on the first run of the automated tests. Cem Kaner, in a 1996 class that I was attending that an empirical survey had noted a regression rate of about 6%. Both he and I are highly skeptical about empirical studies of software engineering, but those resultscould be translated with suitable margin of error into “doesn’t happen very much”.

With agile methods–including lots of tests at the level of functional units–that turns into “hardly happens at all”, especially since the protocol is to write a new test, at the lowest level possible, for every previous failure determined by other means. This extra check is both necessary and welcome in agile models, because the explicit intention is to move at high velocity. Agile models mitigate the risk of high velocity, I perceive, by putting in low-level automated unit tests, where it’s cheap to automate–low cost and high value. Automated regression at the system level is relatively high cost and low value–that is, it’s expensive, if all we’re addressing is the risk of developers making silly mistakes.

I’ll have more to say about some specific circumstances in a future post. Meanwhile, I hope this helps.