DevelopsenseLogo

Questions from Listeners (2a): Handling Regression Testing

This is a followup to an earlier post, Questions from Listeners (2): Is Unit Testing Automated? The original question was

Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

Now I’ll deal with the second part of the question.

Part One: What Do We Really Mean By “Automation”?

Some people believe that “automation” means “getting the computer to do the testing”. Yet computers don’t do testing any more than compilers do programming, than cruise control does driving, than blenders do cooking. In Rapid Software Testing, James Bach and I teach that test automation is any use of tools to support testing.

When we’re perform tests on a running program, there’s always a computer involved, so automation is always around to some degree. We can use tools to help us configure the program, to help us observe some aspect of the program as it’s running, to generate data, to supply input to the program, to monitor outputs, to parse log files, to provide an oracle against which outputs can be compared, to aggregate and visualize results, to reconfigure the program or the system,… In that sense, all tests can be automated.

Some people believe that tests can be automated. I disagree. Checks can be automated. Checks are a part of an overall program of testing, and can aid it, but testing itself can’t be automated. Testing requires human judgement to determine what will be observed and how it will be observed; testing requires requires human judgement to ascribe meaning to the test result. Human judgement is needed to ascribe significance to the meaning(s) that we ascribe, and human judgement is required to formulate a response to the information we’ve revealed with the test. Is there a problem with the product under test? The test itself? The logical relationship between the test and the product? Is the test relevant or not? Machines can’t answer those questions. In that sense, no test can be automated.

Automation is a medium. That is, it’s an extension of some human capability, not a replacement for it. If we test well, automation can extend that. If we’re testing badly, then automation can help us to test badly at an accelerated rate.

Part Two: Why Re-run Every Test?

My car is 25 years old. Aside from some soon-to-be addressed rust and threadbare upholstery, it’s in very good shape. Why? One big reason is that my mechanic and I are constantly testing it and fixing important problems.

When I’m about to set out on a long journey in my car, I take the it in to Geoffrey, my mechanic. He performs a bunch of tests on the car. Some of those tests are forms of review: he checks his memory and looks over the service log to see which tests he should run. He addresses anything that I’ve identified as being problemmatic or suspicious. Some of Geoffrey’s tests are system-level tests, performed by direct observation: he listens to the engine in the shop and takes the car out for a spin on city streets and on the highway. Some of his tests are functional tests: he applies the brakes to check to see if they lose pressure. Some of his tests are unit tests, assisted by automation: he uses a machine to balance the tires and a gauge to pressure-test the cooling system. Some of his smoke tests are refined by tools: a look at the tires is refined by a pressure gauge; when he sees wear on the tires, he uses a gauge to measure the depth of the tread. Some of his tests are heavily assisted by automation: he has a computer that hooks up to a port on the car, and the computer runs checks that give him gobs of data that would be difficult or impossible for him to obtain otherwise.

When I set out on a medium-length trip, I don’t take the car in, but I still test for certain things. I walk around the car, checking the brake lights and turn signals. I look underneath for evidence of fluid leaks. I fill the car with gas, and while I’m at the gas station, I lift the hood and check the oil and the windshield wiper fluid. For still shorter trips, I do less. I get in, turn on the ignition, and look at the fuel gauge and the rest of the dashboard. I listen to the sound of the engine. I sniff the air for weird smells–gasoline, coolant, burning rubber.

As I’m driving, I’m making observations all the time. Some of those observations happen below the level of my consciousness, only coming to my attention when I’m surprised by something out of the ordinary, like a bad smell or the strange sound. On the road, I’m looking out the window, glancing at the dashboard, listening to the engine, feeling the feedback from the pedals and the steering wheel. If I identify something as a problem, I might ignore it until my next scheduled visit to the mechanic, I might leave it for a little while but still take it in earlier than usual, or I might take the car in right away.

When Geoffrey has done some work, he tells me what he has done, so to some degree I know what he’s tested. I also know that he might have forgotten something in the fix, and that he might not have tested completely, so after the car has been in the shop, I need to be more alert to potential problems, especially those closely related to the fix.

Notice two things: 1) Both Geoffrey and I are testing all the time. 2) Neither Geoffrey nor I repeat all of the tests that we’ve done on every trip, nor on every visit.

When I’m driving, I know that the problems I’m going to encounter as I drive are not restricted to problems with my car. Some problems might have to do with others—pedestrians or animals stepping out in front of me, or other drivers making turns in my path, encroaching on my lane, tailgating. So I must remain attentive, aware of what other people are doing around me. Some problems might have to do with me. I might behave impatiently or incompetently. So it’s important for me to keep track of my mental state, managing my attention and my intention. Some problems have to do with context. I might have to deal with bad weather or driving conditions. On a bright sunny day, I’ll be more concerned about the dangers of reflected glare than about wet roads. If I’ve just filled the tank, I don’t have to think about fuel for another couple hundred miles at least. Because conditions around me change all the time, I might repeat certain patterns of observation and control actions, but I’m not going to repeat every test I’ve ever performed.

Yes, I recognize that software is different. If software were a car, programmers would constantly be adding new parts to the vehicle and refining the parts that are there. On a car, we don’t add new parts very often. More typically, old parts wear out and get replaced. As such, change is happening. After change, we concentrate our observation and testing on things that are most likely to be affected by the change, and on things that are most important. In software, we do exactly the same thing. But in software, we can take an extra step to reduce risk: low-level, automated unit tests that provide change detection and rapid feedback, and which are the first level of defense against accidental breakage. I wrote about that here.

Part Three: Think About Cost, Value, Risk, and Coverage

Testing involves interplay between cost, value, and risk. The risk is generally associated with the unknown—problems that you’re not aware of, and the unknown consequences of those problems. The value is in the information you obtain from performing the test, and in the capacity to make better-informed decisions. There are lots of costs associated with tests. Automation reduces many of those costs (like execution time) and increases others (like development and maintenance time). Every testing activity, irrespective of the level of automation, introduces opportunity costs against potentially more valuable activities. A heavy focus on running tests that we’ve run before—and which have not been finding problems—represents opportunity cost against tests that we’ve never run and that won’t be found by our repeated tests. A focus on the care and feeding of repeated tests diminishes our responsiveness to new risk. A focus on repetition limits our test coverage.

Some people object to the idea of relaxing attention on regression tests, because their regression tests find so many problems. Oddly, these people are often the same people who trot out the old bromide that bugs that are found earlier are less expensive to fix. To those people, I would say this: If your regression tests consistently find problems, you’ll probably want to fix most of them. But there’s another, far more important problem that you’ll want to fix: someone has created an environment that’s favourable to backsliding.

6 replies to “Questions from Listeners (2a): Handling Regression Testing”

  1. Neither Geoffrey nor I repeat all of the tests that we’ve done on every trip, nor on every visit.

    Agreed, you didn’t repeat *all* of the tests. However, you and Geoffrey did conduct a subset of the tests (are they actually tests, or rather checks?). Most likely those tests were conducted many times before. Even on the same objects.

    They might be the same objects in three dimensions, but they’re different in the fourth: time. Just as you’re both the same and not the same was you were ten years ago, the object of our testing is the same, or different, depending on your perspective. The key difference here is that software doesn’t wear out. The key similarity is that, even while the software remains the same, parts of the system in which the software is an element change. The hardware, the operating system, the application framework, third-party libraries, interdependent programs, and the people who are using the software may all be changing. Since testing is infinite, the key is to focus on the tests that are most relevant. How do you chose which ones are relevant? Why not try Karen Johnson’s RCRCRC mnemonic for regression test heuristics: Recent, Core, Risk, Configuration, Repaired, and Chronic.

    Another question: When you step on the pedals without questioning if there is a problem with it, are you testing at that moment? Is it possible to test unconsciously?

    It’s certainly possible to suddenly realize that you’re testing, or that you’ve been testing. Oracles pop into your head as you recognize inconsistencies; not every test comes packaged with an expected, predicted result. James and I talk about that in the post here.

    Reply
  2. The metaphor from answer #2 is not good from my perspective. For cars we know that there are certain parts which decline over time as they are used. For example each 30k kilometers i need to exchange some vital part of the engine, as it will stop working if I don’t. (I forgot this once, and it cost me a lot of money, so I will remember.) There are parts in cars which degenerate by design.


    As I said in my comment to Michel above, software doesn’t wear out, but its context changes over time. So too should our choices about what to test and how we test it.

    This isn’t the same for the intangible software code. Sure, overtime the software will decline if not maintained properly. But it’s the design here that starts to rod. There is a name for design maintenance in software development that our programmers use to call refactoring. Exchanging old parts of the design with new ones, while still applying the latest tuning-kit. (My father was a car mechanic, and we used to do the check-ups together, too.)

    Nevertheless you explain the nature of software and testing with this metaphor in a vivid manner. Yuo could have made the value of checklists for early car mechanics more obvious, though. Finally, the oracles that exist for cars have been much more evaluated once the mechanic gets in touch with them.

    Do you mean that our choices about what to test, our ideas about how we test, and our capacity for recognizing problems all change over time? If so, I agree.

    This might be a topic that still needs investigation in software…

    Right. Nothing is ever settled.

    Reply
  3. Hi,

    Looks like your listener is working in a world where regression testing is a necessary evil, and is looking for ways to minimize the evil. Extending your part 3, where risk and opportunity costs are balanced, I’d recommend that your listener investigate a few areas:

    – Find out which regression tests are valued by customers. Usage logs or analytics data can give good information on what features are actually being used by customers. I’ve found that feature usage usually follows the principles taught by Signore Pareto (vast majority of usage is confined to a few features).

    – Build a capability for dependency analysis. In each new release, there is likely to be large areas of the code that are untouched. He or she can run a high level smoke/sanity test in those areas prior to release.

    Extending this further, regression testing is a very fertile area to apply risk based techniques.

    Good luck,

    John

    Reply
  4. Agree with your points Michael. I am a big fan of automation and use it as often as it makes sense. I also understand that its check and not test, In-fact I even named my product as iCheckWebsite instead of iTestWebsite – because I know test is impossible to automate.

    I also try to ensure that whatever rules we are checking with the automation – we check them as often as we can – at least with every major change in the system. Agree – there is a little tangible value in seeing green bar all the time, but it does feel good and I think this feel good factor might affect our testing in a positive way (sometime).

    What we do need to keep in mind though is – green bar is not a true reflection of quality. It does not and cannot (IMHO) indicate that product is good enough. In-fact sometime it can be a big false positive which might have negative effect on the project. I think too much focus on getting the green bar is wrong. Sometime it is false positive and sometime teams might spend more time in ensuring that the bar is green (visible indication of quality) rather than testing the project and delivering according to the needs of stakeholders.

    I do not have problem in running regression suite as often as we can (with every change in the product) – but with the understanding that this is not THE only thing I will do. As long as it is one more weapon in my armoury – it’s a good idea.

    Reply
  5. Your comment to my reply got me pondering again (thanks for that!)…

    “It’s certainly possible to suddenly realize that you’re testing, or that you’ve been testing.”

    Testing is about questioning, about investigating. Can you give an example how i question or investigate a product unconsciously? Every situation i can think of depends on a conscious activity.

    “Oracles pop into your head as you recognize inconsistencies”

    To me this isn’t testing. This is *evaluating*.

    “…not every test comes packaged with an expected, predicted result. James and I talk about that in the post here.”

    True, true. It’s one of the most inspiring my favorite blogposts.

    Reply
  6. Whereas I agree with the basic theme of the article, I don’t agree with the metaphor of car and software testing. A car is a physical item and there is no dependency between different components. If the headlights are changed, it does not affect engine. In case of software, unless there is a very detailed dependency matrix defined, you can not be certain of the side effects.

    Michael’s reply: Some comments in return.

    On the one hand, I think you might want to examine the idea that, on a car, there is no dependency between different components. I think what you mean is that you believe you understand the dependencies between certain components, based on your model of those relationship. That is, you believe that changing the headlights would not have an effect on the engine. On the other hand, suppose that you replaced a defective headlight switch with another defective headlight switch that drained the battery. Suppose that you thought that the problem was a burned-out headlight, but turned out instead to be worn insulation to the wiring bundle leading to it. Suppose that you inadvertently bumped an already-loose electrical connection as you performed the repair. Cruise around car interest sites (Car Talk is terrific fun), or talk to a mechanic, and you’ll find no end of interesting and surprising problems. (For example, here one guy determined a problem in his headlight system was related to brake fluid leaking onto one of the relays that controlled it.) As testers, it’s our responsibility to look at problems—effects, causes, and risks—from different perspectives and expansive models. Yet this still doesn’t mean that we have to test everything after every action. Since complete testing is impossible, we couldn’t do that even if we wanted to.

    Even if you do have a very detailed dependency matrix defined, you cannot be certain of the side effects. This is because on the one hand, the definition itself could be wrong; and on the other, one of the effects of the bug is that it exists or has an impact outside of the correct model. How do you find that out if those things are so? There are two common alternatives: 1) Testing before you release the product. 2) Finding out from the field, after the product has been released. A reasonable way to think about (1), in my view, is to test with a focus on risk—and do some other forms of testing that aren’t risk-based, to expose problems in your model of risk. You’ll be less likely to find problems in your risk model if you only repeat tests that you’ve run before.

    Reply

Leave a Comment