Blog Posts from February, 2020

Breaking the Test Case Addiction (Part 9)

Saturday, February 15th, 2020

Last time, Frieda and I had been looking at visualizations of time spent on various testing activities, include work that foster test coverage of the product (T time), bug investigation and reporting (B time) and setup work to get ready to test, or tidying up afterwards (S time).

“So…,” Frieda mused, “I could track T-time, and B-time, and S-time. But I’d be a little worried about watching the clock all the time, instead of concentrating on my testing. It’d be like micro-managing myself.”

“That is worth worrying about,” I replied. “The last thing we want to be is obsessive-compulsive clock watchers. So here’s a secret: to some degree, we misrepresent our accounting of session time.”

“Oh, great,” said Frieda. “I thought this whole discussion has been about establishing trust.”

“It is. But it’s also about accounting for what we do in a way that everybody can make sense of what’s happening. And although we care about accuracy, precision isn’t too big a deal. In session-based test management, we’re trying to account for the effort that we’ve put in, but we’re also trying to make things easy enough for our clients to comprehend. So we don’t watch the clock all the time. A reasonable estimate of how much time we spent on T, B, and S is good enough. Precision to the nearest five or ten per cent will do. We’re not lying, but we are simplifying; smoothing out the details so they don’t get overwhelmed or obsessed or fooled by the numbers. Remember, the point of all this isn’t score-keeping. It’s to prompt us to ask questions. Mostly: are we okay with how we’re spending time?”

“Here’s an example,” I continued. “One day, after the morning standup, I start working a charter that covers some area of the product. Things go smoothly for the first 20 minutes or so, and then a developer comes up and asks me to help him with reproducing a problem that someone else reported. That goes on for 15 minutes.

“Then I get back to work on the charter. There are quick little interruptions along the way—a phone call here, and an instant message there—but by and large I can handle them quickly and keep the flow going for an hour and a half. I run into some bugs, and I run into some problems with a test tool that amount to setup time.

“Then it’s lunch. When I come back, I’m still looking at the same area. I work at it for 25 minutes, and the development manager wanders by for a chat. That takes 20 minutes. I get back into testing for 45 minutes, and then it’s Paula’s birthday, so I go to the lunchroom and eat cake and chat for 15 minutes.

“I get back and do testing work for 40 minutes, and then another tester asks me to look over a coverage outline they’ve done. That takes 10 minutes. Then I get back to the charter, and work it for another 25 minutes, and wrap it up.

“Now: if we add all that up, that’s just over five hours of clock time, of which an hour in total was interruptions. 245 minutes were spent on actual testing. If we think of a session as 90 minutes, that’s pretty close to three sessions worth of work.

“So when I’m reporting, I’ll probably submit that as two session sheets, one to describe what I did the morning and the other for the afternoon. I’ll account for the work as three sessions worth of time. I’ll make a reasonable guess as to how much I spent on T-time, B-time, and S-time for each one. Again, precision to the nearest five or ten per cent is good enough. With the TBS numbers, we’re trying to identify approximately how badly our coverage has been interrupted. If we’re not okay with what the approximation suggests, we’ll look into the specifics.”

“But won’t managers get upset if we don’t report the numbers precisely?” Frieda asked.

“Trust me,” I said, laughing. “They’re not watching that closely. They never are. They can’t want watch that closely; it’s not possible. They don’t have time to scrutinize everyone’s work every minute of every day. There’d be no point to it. Plus supervising people’s every move would undermine the social nature of work. People need to be unsupervised to some degree in order to feel trusted—and be—responsible for what they’re doing.

“Plus,” I noted, “if managers were watching closely, they would be horrified at home much time was being wasted on the care and feeding of test cases, and how little time was being spent on actual testing and collaborative work.”

“Heh,” said Frieda. “That’s true.”

“On the other hand,” I continued, “it would quite reasonable and important for them to know if your session time is being swamped by bug investigation and reporting, or by setup or followup work, or if interruptions from outside the session are preventing you from performing at least a couple of sessions worth of coverage a day.”

“Doesn’t that vary a lot?” Frieda asked. “I mean… some groups do a lot of stuff in meetings. You know, like design meetings and grooming meetings and project planning meetings. Should we track those?”

“Sure,” I said, “if you like. The key is this: if everyone is completely happy with a situation, don’t bother trying to measure anything in particular. But if someone is unhappy, or if someone has a feeling that there might be something to be unhappy about, then pay some attention to it. For instance, someone might say that testing is taking too long…”

“I’ve heard that before,” said Frieda.

“Uh huh. Too long compared to what? What part, or parts, specifically are taking too long? Get some data. After you’ve collected the data, ask questions about it. Analyze it. Are testers spending a lot of time in bug investigation? Why is that? Is it because they’re being overly detailed in preparing their reports? Are they investigating bugs for longer than necessary? Is it because the bugs are subtle and hard to reproduce? Or is it because there are so many bugs that it’s it’s overwhelming the testing time, and any opportunity for test coverage is destroyed?

“Each of those things should prompt a different management action, or a different change in behaviour. Maybe the problem is not really that the testing is taking too long, but that the developers are under too much pressure. They’re producing code so quickly that they don’t have a good handle on what they’re building, and they don’t have time to check their work. Or maybe the problem is that the testers are spending tons of time writing up bug reports—and maybe a solution to that we be to have the testers work right next to the developers. Then, instead of doing unnecessary paperwork, the testers could simply demo some bugs to the developers right away.

“The point of activity-based test management is to avoid turning testing work into production of artifacts. To prevent testers from being turned into test case machines.”

“What happens when somebody wants artifacts?” asked Frieda. “That’s a big reason managers say they want test cases… so they can know for sure that the work got done.”

“You know there’s a term for that, in our lingo: test integrity. Test integrity is about making sure the testing we say we did matches up with the testing we actually did. Are test cases the only way that managers could know that work got done?”, I asked.

“Well…,” she replied. “I guess there’s debriefing, as we were talking about. But they want… evidence. You know, something in writing.”

“How about the tester’s notes?”

“Hmmm…” Frieda paused. “Most testers aren’t that great at taking notes.”

“I agree,” I said. “I’ve seen that too, and it can be a real problem. People doing good investigative work—journalists, lab researchers, detectives—need to keep good notes. Testers do too. I like to tell testers that it’s okay not to keep good notes… as long as you want to forget lots of important stuff.”

“Why aren’t testers good at taking notes?” Frieda asked.

“I think there’s a feedback loop at work,” I replied. “People don’t do good investigative work when they’re following formally scripted test cases — and they don’t tend to take good notes either. Why should they? They just do what the script tells them to do, and the mission turns from ‘test the product’ into ‘follow the script’. That makes testing rote, and boring, and it derails the task of looking for problems. Why even bother to take notes in that case? And then, since people don’t practice taking notes, their note-taking skills decline. And then when they’re given a chance to work in a less scripted way, they don’t take good notes. They forget important details of what they were up to, and even if they remember, they might not have evidence.”

“So,” I continued, “one way to get people to learn to keep good notes is to set them free from writing and executing test cases. But the deal is that, in return, they have to produce some kind of evidence of what they were thinking and doing. They can show me that stuff to supplement the debriefing, and we can review it together. Tidy notes, taken every couple of minutes or so, tend to be helpful. I’d like to see what their test ideas were, or what risks they considered as they went. If they’ve used specific test data and examined specific behaviours, they can show me lists or tables or mind maps. If they’ve written some code to help them test, they can show me the code and the output from it. Their notes don’t have to be ponderous or bureaucratic, but I want to see something that helps me to follow their thought process and develop trust.”

“Some managers are really worried about that integrity stuff,” said Frieda.

“That’s reasonable,” I replied. “If I were managing a project for which integrity were an issue, like in a medical hardware or software context, I still wouldn’t make people follow test cases most of the time. If stuff needs to be checked, automate it. For high integrity, I’d require formal session reports as part of the deliverables, and I’d give the testers constant feedback on them. In session-based test management, for instance, there’s this concept of the session sheet that combines test notes, data about the session, and references to artifacts that were generated during the session. Things like test results, snippets of test code, or even screen shots or videos if they’re helpful.

“Before the session, I might identify specific factors to examine, or output values to check. I might charter them to use to tables of existing data. More often I’d get the tester to develop those things independently, and then show them to me along with the session sheet during the debriefing. Then we can discuss the tester’s choices and actions, and figure out how well we’re covering the product and what needs to be done next. And after that we can summarize session sheets into reports for managers, auditors, regulators, or anyone else who’s looking for something formal.”

For more on note-taking and session sheets, see https://www.developsense.com/presentations/etnotebook.pdf