DevelopsenseLogo

Why Is Testing Taking So Long? (Part 1)

If you’re a tester, you’ve probably been asked, “Why is testing taking so long?” Maybe you’ve had a ready answer; maybe you haven’t. Here’s a model that might help you deal with the kind of manager who asks such questions.

Let’s suppose that we divide our day of testing into three sessions, each session being, on average, 90 minutes of chartered, uninterrupted testing time. That’s four and a half hours of testing, which seems reasonable in an eight-hour day interrupted by meetings, planning sessions, working with programmers, debriefings, training, email, conversations, administrivia of various kinds, lunch time, and breaks.

The reason that we’re testing is that we want to obtain coverage; that is, we want to ask and answer questions about the product and its elements to the greatest extent that we can. Asking and answering questions is the process of test design and execution. So let’s further assume that we break each session into average two-minute micro-sessions, in which we perform some test activity that’s focused on a particular testing question, or on evaluating a particular feature. That means in a 90-minute session, we can theoretically perform 45 of these little micro-sessions, which for the sake of brevity we’ll informally call “tests”. Of course life doesn’t really work this way; a test idea might a couple of seconds to implement, or it might take all day. But I’m modeling here, making this rather gross simplification to clarify a more complex set of dynamics. (Note that if you’d like to take a really impoverished view of what happens in skilled testing, you could say that a “test case” takes two minutes. But I leave it to my colleague James Bach to explain why you should question the concept of test cases.)

Let’s further suppose that we’ll find problems every now and again, which means that we have to do bug investigation and reporting. This is valuable work for the development team, but it takes time that interrupts test design and execution—the stuff that yields test coverage. Let’s say that, for each bug that we find, we must spend an extra eight minutes investigating it and preparing a report. Again, this is a pretty dramatic simplification. Investigating a bug might take all day, and preparing a good report could take time on the order of hours. Some bugs (think typos and spelling errors in the UI) leap out at us and don’t call for much investigation, so they’ll take less than eight minutes. Even though eight minutes is probably a dramatic underestimate for investigation and reporting, let’s go with that. So a test activity that doesn’t find a problem costs us two minutes, and a test activity that does find a problem takes ten minutes.

Now, let’s imagine one more thing: we have perfect testing prowess; that if there’s a problem in an area that we’re testing, we’ll find it, and that we’ll never enter a bogus report, either. Yes, this is a thought experiment.

One day we come into work, and we’re given three modules to test.

The morning session is taken up with Module A, from Development Team A. These people are amazing, hyper-competent. They use test-first programming, and test-driven design. They work closely with us, the testers, to design challenging unit checks, scriptable interfaces, and log files. They use pair programming, and they review and critique each other’s work in an egoless way. They refactor mercilessly, and run suites of automated checks before checking in code. They brush their teeth and floss after every meal; they’re wonderful. We test their work diligently, but it’s really a formality because they’ve been testing and we’ve been helping them test all along. In our 90-minute testing session, we don’t find any problems. That means that we’ve performed 45 micro-sessions, and have therefore obtained 45 units of test coverage.

(And if you’re viewing this under at least some versions of IE 7, you’ll see a cool bug in its handling of the text flow around the table.  You’ve been warned!)

Module Bug Investigation and Reporting
(time spent on tests that find bugs)
Test Design and Execution
(time spent on tests that don’t find bugs)
Total Tests
A 0 minutes (no bugs found) 90 minutes (45 tests) 45
The first thing after lunch, we have a look at Team B’s module. These people are very diligent indeed. Most organizations would be delighted to have them on board. Like Team A, they use test-first programming and TDD, they review carefully, they pair, and they collaborate with testers. But they’re human. When we test their stuff, we find a bug very occasionally; let’s say once per session. The test that finds the bug takes two minutes; investigation and reporting of it takes a further eight minutes. That’s ten minutes altogether. The rest of the time, we don’t find any problems, so that leaves us 80 minutes in which we can run 40 tests. Let’s compare that with this morning’s results.

Module Bug Investigation and Reporting
(time spent on tests that find bugs)
Test Design and Execution
(time spent on tests that don’t find bugs)
Total Tests
A 0 minutes (no bugs found) 90 minutes (45 tests) 45
B 10 minutes (1 test, 1 bug) 80 minutes (40 tests) 41
After the afternoon coffee break, we move on to Team C’s module. Frankly, it’s a mess. Team C is made up of nice people with the best of intentions, but sadly they’re not very capable. They don’t work with us at all, and they don’t test their stuff on their own, either. There’s no pairing, no review, in Team C. To Team C, if it compiles, it’s ready for the testers. The module is a dog’s breakfast, and we find bugs practically everywhere. Let’s say we find eight in our 90-minute session. Each test that finds a problem costs us 10 minutes, so we spent 80 minutes on those eight bugs. Every now and again, we happen to run a test that doesn’t find a problem. (Hey, even dBase IV occasionally did something right.) Our results for the day now look like this:

Module Bug Investigation and Reporting
(time spent on tests that find bugs)
Test Design and Execution
(time spent on tests that don’t find bugs)
Total Tests
A 0 minutes (no bugs found) 90 minutes (45 tests) 45
B 10 minutes (1 test, 1 bug) 80 minutes (40 tests) 41
C 80 minutes (8 tests, 8 bugs) 10 minutes (5 tests) 13
Because of all the bugs, Module C allows us to perform thirteen micro-sessions in 90 minutes. Thirteen, where with the other modules we managed 45 and 41. Because we’ve been investigating and reporting bugs, there are 32 micro-sessions, 32 units of coverage, that we haven’t been able to obtain on this module. If we decide that we need to perform that testing (and the module’s overall badness is consistent throughout), we’re going to need at least three more sessions to cover it. Alternatively, we could stop testing now, but what are the chances of a serious problem lurking in the parts of the module we haven’t covered? So, the first thing to observe here is:
Lots of bugs means reduced coverage, or slower testing, or both.

There’s something else that’s interesting, too. If we are being measured based on the number of bugs we find (exactly the sort of measurement that will be taken by managers who don’t understand testing), Team A makes us look awful—we’re not finding any bugs in their stuff. Meanwhile, Team C makes us look great in the eyes of management. We’re finding lots of bugs! That’s good! How could that be bad?

On the other hand, if we’re being measured based on the test coverage we obtain in a day (which is exactly the sort of measurement that will be taken by managers who count test cases; that is, managers who probably have an even more damaging model of testing than the managers in the last paragraph), Team C makes us look terrible. “You’re not getting enough done! You could have performed 45 test cases today on Module C, and you’ve only done 13!” And yet, remember that in our scenario we started with the assumption that, no matter what the module, we always find a problem if there’s one there. That is, there’s no difference between the testers or the testing for each of the three modules; it’s solely the condition of the product that makes all the difference.

This is the first in a pair of posts. Let’s see what happens tomorrow.

11 replies to “Why Is Testing Taking So Long? (Part 1)”

  1. I agree with you Michael, and it is really interesting to read the way you have presented this. Though the situations differ from team to team and a manager should understand those dynamics before measuring something 🙂

    Reply
  2. Michael, I really wonder if test managers have not figured this out till today, or do they know this but still want to control people rather than seeking value from them or need numbers to feed their bosses. It almost looks like a food chain 🙂

    -Sharath.B

    Reply
  3. I am working with team D, E, F… I tried to understand my management that if your developers throw rubbish, what can I do?

    Michael replies: I’d encourage management to read the blog post to which you’re referring. Print it out, if you like.

    In the end, though, managers are often reluctant to put quality requirements on programmers. The mission is “produce code”, rather than “produce code that works”. I think the best that we can do is to underscore the cost, in time and effort, associated with sloppy code. Sometimes we can discuss that with a programmer who can help to transmit the message, since usually the programmers are under time pressure, rather than quality pressure. In my experience, many programmers don’t like that either. Do you have any friends—even one or two—among the programmers with whom you might align?

    But the good thing is the length of my micro-sessions is reducing day by day.

    If your microsessions are getting shorter because you’ve improved your skill at investigating and reporting quickly, or because the programmers have helped in making the products easier to test, then that’s probably all to the good. On the other hand, if your sessions are shorter because you’re reducing your coverage, that might not be such a good thing. Let’s be careful out there.

    Reply
  4. i m into performance testing and working in a comparatively large project. The test coverage is so baad that we end up running 3 to 4 – 12 hr tests a month.

    Michael replies: That’s a test result.

    Main reasons behind thus being the large number of opened and reopened QCs.. which always delays the scripting until the las week of release schedule…

    That’s a test result too. The puzzle for me is why people ignore these test results.

    Reply

Leave a Comment