The Motive for Metaphor

September 3rd, 2010

There’s a mildly rollicking little discussion going on the in the Software Testing Club at the moment, in which Rob Lambert observes, “I’ve seen a couple of conversations recently where people are talking about red, green and yellow box testing.” Rob then asks “There’s the obvious black and white. How many more are there?”

(For what it’s worth, I’ve already made some comments about a related question here.)

At one point a little later in the conversation, Jaffamonkey (I hope that’s a pseudonym) replies,

If applied in modern context Black Box is essentially pure functional testing (or unit testing) whereas White Box testing is more of what testers are required to do, which is more about testing user journeys, and testing workflows, usability etc.

Of course, that’s not what I understand the classical distinction to be.

The classical distinction started with the notion of “black box” testing. You can’t see what’s inside the box, and so you can’t see how it’s working internally. But that may not be so important to you for a particular testing mission; instead, you care about inputs and outputs, and the internal implementation isn’t such a big deal. You’d take this kind of approach when a) you don’t have source code; or b) you’re intentionally seeking problems that you might not notice so quickly by inspection, but that you might notice by empirical experiments and observation; or maybe c) you may believe that the internal implementation is going to be varied or variable, so no point in taking it into account with respect to the current focus of your attention. I’m sure you can come up with more reasons.

This “black box” idea suggests a contrast: “glass box” testing. Since glass is transparent, you can see the inner workings, and the insight into what is happening internally gives you a different perspective for risks and test ideas. This is especially important when a) your mission involves testing what’s happening inside the box (programmers take this perspective more often than not); or b) your overall mission will be simpler, in some dimension, because of your understanding of the internals; or maybe c) you want to learn something about how someone has solved a particular problem. Again, I’m sure you can some up with lots more reasons; these are examples, not defnitive lists.

Unhelpfully (to me), someone somewhere along the way decided that the opposite of “black” must be “white”; that black box testing was the kind where you can’t see inside the box; and that therefore white (rather than glass) box testing must the name for the other stuff. At this point, the words and the model begin to part company.

Even less helpfully, people stopped thinking in terms of a metaphor and started thinking in terms of labels dissociated from the metaphor. The result is an interpretation like Jaffa’s above, where he (she?) seems to have inverted the earlier interpretations, for reasons I know not why. Who knows? Maybe it’s just a typo.

More unhelpfully still (to me), someone has (or several someones have) apparently come along with color-coding systems for other kinds of testing. Bill Matthews reports that he’s found

Red Box = “Acceptance testing” or “Error message testing” or “networking , peripherals testing and protocol testing”
Yellow Box = “testing warning messages” or “integration testing”
Green Box = “co-existence testing” or “success message testing”

Sources:

http://www.testrepublic.com/forum/topics/define-red-box-testing-yellow

http://www.geekinterview.com/question_details/27985

http://www.allinterview.com/showanswers/7077.html

http://www.coolinterview.com/interview/10080/

For me, there are at least four big problems here. First, there is already disagreement on which colours map to which concepts. Second, there is no compelling reason that I can see to associate a given colour with any of the given ideas. Third, the box metaphor doesn’t have a clear relationship to what’s going on in the mind or the practice of a tester. The colour is an arbitrary label on an unconstrained container. Fourth, since the definitions appear on interview sites and the sites disagree, there’s a risk that some benighted hiring manager will assume that there is only one interpretation, and will deprive himself of an otherwise skilled tester who read a different site. (To defend yourself against this fourth problem, use safety language: “Here’s what I understand by ‘chartreuse-box testing’. This is the interpretation given by this person or group, but I’m aware there may be other interpretations in your context.” For extra points, try saying something like, “Is that consistent with your interpretation? If not, I’d be happy to adopt the term the way you use it around here.” And meaning it. If they refuse to hire you because of that answer, it’s unlikely that working there would have been much fun.)

All of this paintbox of terms is unhelpful (to me) because it means another 30,000 messages on LinkedIn and QAForums, wherein enormous numbers of testers weigh in with their (mis)understandings of some other author’s terms and intentions—and largely with the intention of asking or answering homework questions, so it seems. The next step is that, at some point, some standards-and-certification body will have to come along and lay down the law about what colour testing you would have to do to find out how many angels can dance on the head of a pin, what colour the pin is, and whether the angels are riding unicorns. And then another, competing standards-and-certification body will object, saying that it’s not angels, it’s fairies, and it’s not unicorns, it’s centaurs, and they’re not dancing, they’re doing gymnastics. And don’t get us started on the pin! Courses and certifications on colour-mapping to mythological figures will be available (at a fee) to check (not test!) your ability to memorize a proprietary table of relationships. Meanwhile, most of the people involved in the discussion will have forgotten—in the unlikely event that they ever knew— that the point of the original black-and-glass exercise was to make things more usefully understandable. Verification vs. validation, anyone? One is building the right thing; the other is building the thing right. Now, quick: which is which? Did you have to pause to think about it? And if you find a problem wherein the thing was built wrong, or that the wrong thing was built, does anyone really care whether you were doing validation testing or verification testing at the time?

Well… maybe they do. So, all that said, remember this: no one outside your context can tell you what words you can or can’t use. And remember this too: no one outside your context can tell you what you can or can’t find useful. Some person, somewhere, might find it handy to refer to a certain kind of testing as “sky testing” and another kind of testing as “ground testing”, and still another as “water testing”. (No, I can’t figure it out either.) If people find those labels helpful, there’s nothing to stop them, and more power to them. But if the labels are unhelpful to you and only make your brain hurt, it’s probably not worth a lot of cycles to try to make them fit for you.

So here are some tests that you can apply to a term or metaphor, whether you produce it yourself or someone else produced it:

  • Is it vivid? That is (for a testing metaphor), does it allow you to see easily in your mind’s eye (hear in your mind’s ear, etc.) something in the realm of common experience but outside the world of testing?
  • Is it clear? That is, does it allow you to make a connection between that external reference and something internal to testing? Do people tend to get it the first time they hear it, or with only a modicum of explanation? Do people retain the connection easily, such that you don’t have to explain it over and over to the same people? Do people in a common context agree easily, without arguments or nit-picking?
  • Is it sticky? Is it easy to remember without having to consult a table, a cheat sheet, or a syllabus? Do people adopt the expression naturally and easily, and do they use it?

If the answer to these questions is Yes across the board, it might be worthwhile to spread the idea. If you’re in doubt, field-test the idea. Ask for (or offer) explanations, and see if understanding is easy to obtain. Meanwhile, if people don’t adopt the idea outside of a particular context, do everyone a favour: ditch it, or ignore it, or keep it within a much closer community.

In his book The Educated Imagination (based on the Massey Lectures, a set of broadcasts he did for the Canadian Broadcasting Corporation in 1963), Northrop Frye said, “Outside literature, the main motive for writing is to describe this world. But literature itself uses language in a way which associates our minds with it. As soon as you use associative language, you begin using figures of speech. If you say, “this talk is dry and dull”, you’re using figures associating it with bread and breadknives. There are two kinds main kinds of association, analogy and identity, two things are like each other and two things that are each other (my emphasis –MB). One produces a figure of speech called the simile. The other produces a figure called metaphor.”

When we’re trying to describe our work in testing, I think most people would agree that we’re outside the world of literature. Yet we learn most easily and most powerfully by association—by relating things that we don’t understand well to things that we understand a little better in some specific dimension. In reporting on our testing, we’re often dealing with things that are new to us, and telling stories to describe them. The same is true in learning about testing. Dealing with the new and telling stories leads us naturally to use associative language. Frye explains why we have to be cautious: “In descriptive writing, you have to be careful of associative language. You’ll find that analogy, or likeness to something else, is very tricky to handle in description, because the differences are as important as the resemblances. As for metaphor, where you’re really saying “this is that,” you’re turning your back on logic and reason completely because logically two things can never be the same thing and still remain two things.”

Having given that caution, Frye goes on to explain why we use metaphor, and does so in a way that I think might be helpful for our work: “The poet, however, uses these two crude, primitive, archaic forms of thought in the most uninhibited way, because his job is not to describe nature but to show you a world completely absorbed and possessed by the human mind…The motive for metaphor, according to Wallace Stevens, is a desire to associate, and finally to identify, the human mind with what goes on outside it, because the only genuine joy you can have is in those rare moments when you feel that although we may know in part, as Paul says, we are also a part of what we know.”

So the final test of a term or a metaphor or a heuristic, for me, is this:

  • Is it useful? That is, does it help you make sense of the world to the degree that you can identify an idea with something deeper and more resonant than a mere label? Does it help you to own your ideas?

Hire Ben Simo!

August 31st, 2010

I have four or five blog posts in the hopper, each almost ready to go. I’m working on a whole book and a chapter of another one, and I’m on a deadline that I’m about to blow. The kids are still out of school, and I really should be cooking dinner right now. And yet…

As I write, one of the best testers that I know is looking for work. His name is Ben Simo. He lives in Colorado Springs, Colorado (my understanding is that he’s willing to relocate). He’s well-versed in LoadRunner and Performance Center, like many other testers. Unlike (alas!) so many other testers with those bullet points on the résumé, he’s not inclined merely to go through the motions and use tools for checking. He is an astute, passionate critical thinker, entirely focused on investigating and defending the value of the products for which he’s responsible by identifying problems that threaten that value. And yet’s he’s not of the Quality Assurance school; he entirely understands that assuring quality is the responsibility of those who produce and manage the work—programmers, writers, designers, artists, and managers. His job, as he sees it, is to make quality assurance possible. He collaborates with the project community, investigates the product ,and provides the most important, most timely information he can to the people who are producing and managing the work. With that, they can make the decisions they must make, informed by the very best technical information that he can provide.

He’s past President of the Association for Software Testing; he was Conference Chair for the 2009 Conference for the Association for Software Testing; he maintains one blog called Questioning Software and another called Is There A Problem Here?. He was recently given a three-part interview by UTest; you can read that here, and here, and here.

And, as of this writing, he’s available. For someone looking for a tester, he’s like the dream date that you spy across the dance floor whom a friend tells you is single, smart, modest, and well-off. The only issue is that maybe he’s a little too modest. He won’t be on the dance floor for long, because some organization will come along and sweep him off his feet. And that organization will be exceptionally lucky.

And that organization could be yours. His email address is ben at qualityfrog (period) com.

Statistician or Journalist?

August 27th, 2010

Eric Jacobson has a problem, which he thoughtfully relates on his thoughtful blog in a post called “How Can I Tell Users What Testers Did?”. In this post, I’ll try to answer his question, so you might want to read his original post for context.

I see something interesting here: Eric tells a clear story to relate to his readers some problem that he’s having with explaining his work to others who, by his account, don’t seem to understand it well. In that story, he mentions some numbers in passing. Yet the numbers that he presents are incidental to the story, not central to it. On the contrary, in fact: when he uses numbers, he’s using them as examples of how poorly numbers tell the kind of story he wants to tell. Yet he tells a fine story, don’t you think?

In the Rapid Software Testing course, we present this idea (Note to Eric: we’ve added this since you took the class): To test is to compose, edit, narrate, and justify two parallel stories. You must tell a story about the product: how it works, how it fails, and how it might not work in ways that matter to your client (and in the context of a retrospective, you might like to talk about how the product was failing and is now working). But in order to give that story its warrant, you must tell another story: you must tell a story about your testing. In a case like Eric’s, that story would take the form of a summary report focused on two things: what you want to convey to your clients, and what they want to know from you (and, ideally, those two things should be in sync with each other).

To do that, you might like to consider various structures to frame your story. Let’s start with the elements of what we (somewhat whimsically) call The Universal Test Procedure (you can find it in the course notes for the class). From a retrospective view, that would include

  • your model of the test space (that is, what was inside and outside the scope of your testing, and in particular the risks that you were trying to address)
  • the oracles that you used
  • the coverage that you obtained
  • the test techniques you applied
  • the ways in which you configured the product
  • the ways in which you operated the product
  • the ways in which you observed the product
  • the ways in which you evaluated the product; and
  • the heuristics by which you decided to stop testing
  • what you discovered and reported, and how you reported

You might also consider the structures of exploratory testing. Even if your testing isn’t highly exploratory, a lot of the structures have parallels in scripted testing.

Jon Bach says (and I agree) that testing is journalism, so look at the way journalists structure a story: they often start with the classic pyramid lead. They might also start with a compelling anecdote as recounted in What’s Your Story, by Craig Wortmann, or Made to Stick, by Chip and Dan Heath. If you’re in the room with your clients, you can use a whiteboard talk with diagrams, as in Dan Roam’s The Back of the Napkin. At the centre of your story, you could talk about risks that you addressed with your testing; problems that you found and that got addressed; problems that you found and that didn’t get addressed; things that slowed you down as you were testing; effort that you spent in each area; coverage that you obtained. You could provide testimonials from the programmers about the most important problems you found; the assistance that you provided to them to help prevent problems; your contributions to design meetings or bug triage sessions; obstacles that you surmounted; a set of charters that you performed, and the feature areas that they covered. Again, focus on what you want to convey to your clients, and what they want to know from you.

Incidentally, the more often and the more coherently you tell your story, the less explaining you’ll have to do about the general stuff. That means keeping as close to your clients as you can, so that they can observe the story unfolding as it happens. But when you ask “What metric or easily understood information can my test team provide users, to show our contribution to the software we release?”, ask yourself this: “Am I a statistician or a journalist?”


Other resources for telling testing stories:

Thread-Based Test Management: Introducing Thread-Based Test Management, by James Bach; and A New Thread, by Jon Bach (as of this writing, this is brand new stuff)

Telling Your Exploratory Story: A presentation at Agile 2010, by Jonathan Bach (I was unable to download anything other than a damaged version this, but maybe it’s working now; please let me know)

Constructing the Quality Story (from Better Software, November 2009): Knowledge doesn’t just exist; we build it. Sometimes we disagree on what we’ve got, and sometimes we disagree on how to get it. Hard as it may be to imagine, the experimental approach itself was once controversial. What can we learn from the disputes of the past? How do we manage skepticism and trust and tell the testing story?

On Metrics:

Three Kinds of Measurement (And Two Ways to Use Them) (from Better Software, July 2009): How do we know what’s going on? We measure. Are software development and testing sciences, subject to the same kind of quantitative measurement that we use in physics? If not, what kinds of measurements should we use? How could we think more usefully about measurement to get maximum value with a minimum of fuss? One thing is for sure: we waste time and effort when we try to obtain six-decimal-place answers to whole-number questions. Unquantifiable doesn’t mean unmeasurable. We measure constantly WITHOUT resorting to numbers. Goldilocks did it.

Issues About Metrics About Bugs (Better Software, May 2009): Managers often use metrics to help make decisions about the state of the product or the quality of the work done by the test group. Yet measurements derived from bug counts can be highly misleading because a “bug” isn’t a tangible, countable thing; it’s a label for some aspect of some relationship between some person and some product, and it’s influenced by when and how we count… and by who is doing the counting.

On Coverage:

Got You Covered (from Better Software, October 2008): Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

Cover or Discover (from Better Software, November 2008): Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

A Map By Any Other Name (from Better Software, December 2008): A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

All Testing is (not) Confirmatory

August 24th, 2010

In a recent blog post, Rahul Verma suggests that all testing is confirmatory.

First, I applaud his writing of an exploratory essay. I also welcome and appreciate critique of the testing vs. checking idea. I don’t agree with his conclusions, but maybe in the long run we can work something out.

In mythology, there was a fellow called Procrustes, an ironmonger. He had a iron bed which he claimed fit anyone perfectly. He accomplished a perfect fit by violently lengthening or shortening the guest. I think that, to some degree, Rahul is putting the idea of confirmation into Procrustes’ bed.

He cites the cites the Oxford Online Dictionary definition of confirm: (verb) establish the truth or correctness of (something previously believed or suspected to be the case). (Rahul doesn’t cite the usage notes, which show some different senses of the word.)

When I describe a certain approach to testing as “confirmatory” in my discussion of testing vs. checking, I’m not trying to introduce another term. Instead, I’m using an ordinary English adjective to identify an approach or a mindset to testing. My emphasis is twofold: 1) not on the role of confirmation in test results, but rather on the role of confirmation in test design; and 2) on a key word in the definition Rahul cites, “previously“.

A confirmatory mindset would steer the tester towards designing a test based on a particular and  specific hypothesis. A tester working in a confirmatory way would be oriented towards saying, “Someone or something has told me that the product should do be able to do X. My test will demonstrate that it can do X.” Upon the execution of the (passing) test, the tester would say “See? The product can do X.” Such tests are aimed in the direction of showing that the product can work.

Someone working from an exploratory or investigative mindset would have a different, broader, more open-ended mission. “Someone or something has told me that the product does X. What are the extents and limitations of what we think of as X? What are the implications of doing X? What essential component of X might we have missed in our thinking about previous tests? What else happens when I ask the product to do X? Can I make the product do P, by asking it to do X in a slightly different way? What haven’t I noticed? What could I learn from the test that I’ve just executed?” Upon performing the test, the tester would report on whatever interesting information she might have discovered, which might include a pass or fail component, but might not. Exploratory tests are aimed at learning something about the product, how it can work, how it might work, and how it might not work; or if you like, on “if it will work”, rather than “that it can work”. To those who would reasonably object: yes, yes, no test ever shows that a product will work in all circumstances. But the focus here is on learning something novel, often focusing on robustness and adaptability. In this mindset, we’re typically seeking to find out how the program deals with whatever we throw at it, rather than on demonstrating that it can hit a pitch in the centre of the strike zone.

I believe that, in his post, Rahul is focused on the evaluation of the test, rather than on test design. That’s different from what I’m on about. He puts confirmation squarely into result interpretation, defining the confirmation step as “a decision (on) whether the test passed or failed or needs further investigation, based on observations made on the system as a result of the interaction. The observations are compared against the assumption(s).” I don’t think of that as confirmation (“establishing the truth or correctness of something previously believed or suspected to be the case”). I think of that as application of an oracle; as a comparison of the observed behaviour with a principle or mechanism that would allow us to recognize a problem. In the absence of any countervailing reason for it to be otherwise, we expect a product to be consistent with its history; with an image that someone wants to project; with comparable products; with specific claims; with reasonable user expectations; with the explicit or implicit purpose of the product; with itself in any set of observable aspects; and with relevant standards, statutes, regulations, or laws. (These heuristics, with an example of how they can be applied in an exploratory way, are listed as the HICCUPP heuristics here. It’s now “HICCUPPS”; we recognized the “Standards and Statutes” oracle after the article was written.)

At best, your starting hypothesis determines whether applying an oracle suggests confirmation. If your hypothesis is that the product works—that is, that the product behaves in a manner consistent with the oracle heuristics—then your approach might be described as confirmatory. Yet the confirmatory mindset has been identified in both general psychological literature and testing literature as highly problematic. Klayman and Ha point out in their 1987 paper Confirmation, Disconfirmation, and Information in Hypothesis Testing that “In rule discovery, the positive test strategy leads to the predominant use of positive hypothesis tests, in other words, a tendency to test cases you think will have the target property.” For software testing, this tendency (a form of confirmation bias) is dangerous because of the influence it has on your selection of tests. If you want to find problems, it’s important to take a disconfirmatory strategy—one that includes tests of conditions outside the space of the hypothesis that program works. “For example, when dealing with a major communicable disease (or software bugs —MB), it is more serious to allow a true case to go undiagnosed and untreated than it is to mistakenly treat someone.” Here, Klayman and Ha point out, if we want to prevent disease, the emphasis should be on tests that are outside of those that would exemplify a desired attribute (like good health). In the medical case, they say that would involve “examining people who test negative for the disease, to find any missed cases, because they reveal potential false negatives.” In testing, the object would be to run tests that challenge the idea that the test should pass. This is consistent with Myers’ analysis in The Art of Software Testing (which, interestingly, as it was written in 1979, predates Klayman and Ha’s paper).

As I see it, if we’re testing the product (rather than, say, demonstrating it), we’re not looking for confirmation of the idea that it works; we’re seeking to disconfirm the idea that it works. Or, as James Bach might put it, we’re in the illusion demolition business.

One other point: Rahul suggests “Testing should be considered complete for a given interaction only when the result of confirmation in terms of pass or fail is available.” To me, that’s checking. A test should reveal information, but it does not have to pass or fail. For example, I might test a competitive product to discover the features that it offers; such tests don’t have a pass or fail component to them. A tester might be asked to compare a current product with a past version to look for differences between the two. A tester might be asked to use a product and describe her experience with it, such that there’s an evaluation with explicit, atomic pass or fail criteria. “Pass and fail” are highly limiting in terms of our view of the product: I’m sure that the arrival of yet another damned security message on Windows Vista was deemed as a pass in the suite of automated checks that got run on the system every night. But in terms of my happiness with the product, it’s a grinding and repeated failure. I think Rahul’s notion that a test must pass or fail is confused with the idea that a test should involve the application of a stopping heuristic.  For a check, “pass or fail” is essential, since a check relies on the non-sapient application of a decision rule.  For a test, pass-vs.-fail might an example of the “mission accomplished” stopping heuristic, but there are plenty of other conditions that we might use to trigger the end of a test.

Since Rahul appears to be a performance tester, perhaps he’ll relate to this example (the framing of which I owe to the work of Cem Kaner). Imagine a system that has an explicit requirement to handle 100,000 transactions per minute. We have two performance testing questions that we’d like to address. One is the load testing question: “Can this system in fact handle 100,000 transactions per minute?” To me, that kind of question often gets addressed with a confirmatory mindset. The tester forms a hypothesis that the system does handle 100,000 transactions per minute; he sets up some automation to pump 100,000 transactions per minute through the system; and if the system stays up and exhibits no other problems, he asserts that the test passes.

The other performance question is a stress testing question: “In what circumstances will the system be unable to handle a given load, and fail?” For that we design a different kind of experiment. We have a hypothesis that the system will fail eventually as we ramp up the number of transactions. But we don’t know how many transactions will trigger the failure, nor do we know the part of the system in which the failure will occur, nor do we know way in which the failure will manifest itself.  We want to know those things, so have a different information objective here than for the load test, and we have a mission that can’t be handled by a check.

In the latter test, there is a confirmatory dimension if you’re willing to look hard enough for it. We “confirm” our hypothesis that, given heavy enough stress, the system will exhibit some problem. When we apply an oracle that exposes a failure like a crash, maybe one could say that we “confirm” that the the crash is a problem, or that behaviour we consider to be bad is bad. Even in the former test, we could flip the hypothesis, and suggest that we’re seeking to confirm the hypothesis that the program doesn’t support a load of 100,000 transactions per minute . If Rahul wants to do that, he’s welcome to do so. To me, though, labelling all that stuff as “confirmatory” testing reminds me of Procrustes.

Questions from Listeners (2a): Handling Regression Testing

August 7th, 2010

This is a followup to an earlier post, Questions from Listeners (2): Is Unit Testing Automated? The original question was

Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

Now I’ll deal with the second part of the question.

Part One: What Do We Really Mean By “Automation”?

Some people believe that “automation” means “getting the computer to do the testing”. Yet computers don’t do testing any more than compilers do programming, than cruise control does driving, than blenders do cooking. In Rapid Software Testing, James Bach and I teach that test automation is any use of tools to support testing.

When we’re perform tests on a running program, there’s always a computer involved, so automation is always around to some degree. We can use tools to help us configure the program, to help us observe some aspect of the program as it’s running, to generate data, to supply input to the program, to monitor outputs, to parse log files, to provide an oracle against which outputs can be compared, to aggregate and visualize results, to reconfigure the program or the system,… In that sense, all tests can be automated.

Some people believe that tests can be automated. I disagree. Checks can be automated. Checks are a part of an overall program of testing, and can aid it, but testing itself can’t be automated. Testing requires human judgement to determine what will be observed and how it will be observed; testing requires requires human judgement to ascribe meaning to the test result. Human judgement is needed to ascribe significance to the meaning(s) that we ascribe, and human judgement is required to formulate a response to the information we’ve revealed with the test. Is there a problem with the product under test? The test itself? The logical relationship between the test and the product? Is the test relevant or not? Machines can’t answer those questions. In that sense, no test can be automated.

Automation is a medium. That is, it’s an extension of some human capability, not a replacement for it. If we test well, automation can extend that. If we’re testing badly, then automation can help us to test badly at an accelerated rate.

Part Two: Why Re-run Every Test?

My car is 25 years old. Aside from some soon-to-be addressed rust and threadbare upholstery, it’s in very good shape. Why? One big reason is that my mechanic and I are constantly testing it and fixing important problems.

When I’m about to set out on a long journey in my car, I take the it in to Geoffrey, my mechanic. He performs a bunch of tests on the car. Some of those tests are forms of review: he checks his memory and looks over the service log to see which tests he should run. He addresses anything that I’ve identified as being problemmatic or suspicious. Some of Geoffrey’s tests are system-level tests, performed by direct observation: he listens to the engine in the shop and takes the car out for a spin on city streets and on the highway. Some of his tests are functional tests: he applies the brakes to check to see if they lose pressure. Some of his tests are unit tests, assisted by automation: he uses a machine to balance the tires and a gauge to pressure-test the cooling system. Some of his smoke tests are refined by tools: a look at the tires is refined by a pressure gauge; when he sees wear on the tires, he uses a gauge to measure the depth of the tread. Some of his tests are heavily assisted by automation: he has a computer that hooks up to a port on the car, and the computer runs checks that give him gobs of data that would be difficult or impossible for him to obtain otherwise.

When I set out on a medium-length trip, I don’t take the car in, but I still test for certain things. I walk around the car, checking the brake lights and turn signals. I look underneath for evidence of fluid leaks. I fill the car with gas, and while I’m at the gas station, I lift the hood and check the oil and the windshield wiper fluid. For still shorter trips, I do less. I get in, turn on the ignition, and look at the fuel gauge and the rest of the dashboard. I listen to the sound of the engine. I sniff the air for weird smells–gasoline, coolant, burning rubber.

As I’m driving, I’m making observations all the time. Some of those observations happen below the level of my consciousness, only coming to my attention when I’m surprised by something out of the ordinary, like a bad smell or the strange sound. On the road, I’m looking out the window, glancing at the dashboard, listening to the engine, feeling the feedback from the pedals and the steering wheel. If I identify something as a problem, I might ignore it until my next scheduled visit to the mechanic, I might leave it for a little while but still take it in earlier than usual, or I might take the car in right away.

When Geoffrey has done some work, he tells me what he has done, so to some degree I know what he’s tested. I also know that he might have forgotten something in the fix, and that he might not have tested completely, so after the car has been in the shop, I need to be more alert to potential problems, especially those closely related to the fix.

Notice two things: 1) Both Geoffrey and I are testing all the time. 2) Neither Geoffrey nor I repeat all of the tests that we’ve done on every trip, nor on every visit.

When I’m driving, I know that the problems I’m going to encounter as I drive are not restricted to problems with my car. Some problems might have to do with others—pedestrians or animals stepping out in front of me, or other drivers making turns in my path, encroaching on my lane, tailgating. So I must remain attentive, aware of what other people are doing around me. Some problems might have to do with me. I might behave impatiently or incompetently. So it’s important for me to keep track of my mental state, managing my attention and my intention. Some problems have to do with context. I might have to deal with bad weather or driving conditions. On a bright sunny day, I’ll be more concerned about the dangers of reflected glare than about wet roads. If I’ve just filled the tank, I don’t have to think about fuel for another couple hundred miles at least. Because conditions around me change all the time, I might repeat certain patterns of observation and control actions, but I’m not going to repeat every test I’ve ever performed.

Yes, I recognize that software is different. If software were a car, programmers would constantly be adding new parts to the vehicle and refining the parts that are there. On a car, we don’t add new parts very often. More typically, old parts wear out and get replaced. As such, change is happening. After change, we concentrate our observation and testing on things that are most likely to be affected by the change, and on things that are most important. In software, we do exactly the same thing. But in software, we can take an extra step to reduce risk: low-level, automated unit tests that provide change detection and rapid feedback, and which are the first level of defense against accidental breakage. I wrote about that here.

Part Three: Think About Cost, Value, Risk, and Coverage

Testing involves interplay between cost, value, and risk. The risk is generally associated with the unknown—problems that you’re not aware of, and the unknown consequences of those problems. The value is in the information you obtain from performing the test, and in the capacity to make better-informed decisions. There are lots of costs associated with tests. Automation reduces many of those costs (like execution time) and increases others (like development and maintenance time). Every testing activity, irrespective of the level of automation, introduces opportunity costs against potentially more valuable activities. A heavy focus on running tests that we’ve run before—and which have not been finding problems—represents opportunity cost against tests that we’ve never run and that won’t be found by our repeated tests. A focus on the care and feeding of repeated tests diminishes our responsiveness to new risk. A focus on repetition limits our test coverage.

Some people object to the idea of relaxing attention on regression tests, because their regression tests find so many problems. Oddly, these people are often the same people who trot out the old bromide that bugs that are found earlier are less expensive to fix. To those people, I would say this: If your regression tests consistently find problems, you’ll probably want to fix most of them. But there’s another, far more important problem that you’ll want to fix: someone has created an environment that’s favourable to backsliding.

Acceptance Tests: Let’s Change the Title, Too

August 4th, 2010

Gojko Adzic recently wrote a blog post called Let’s Change the Tune on some of our approaches in agile development. In changing the tune, some of the current words won’t fit so well, so he proposes (for example), “Specifying Collaboratively instead of test first or writing acceptance tests”. I have one more: I think we should change the title of this piece of agile development.

Acceptance tests are given a central role in agile development. They are typically used to express a requirement in terms of an atomic example, and they’re typically automated. That is, they’re expressed in the form of program code for a binary computer, code that helps us to determine whether some aspect of the product is functionally correct. When those tests pass, say certain proponents of the lore, we know we’re done. Yet acceptability of a product is multi-dimensional. In the end, the product is always being used by people to solve some problem. The code may perform certain functions exquisitely as part of product that is an incomplete solution to the problem, that is hard to use, or that we hate. The expression of requirements and the determination of acceptability in terms of simplistic, binary decisions delegated to a computer seems to me like a bias towards processes and tools, rather than individuals and interactions.

Done as they usually are, acceptance tests are set very close to the beginning of a cycle of development. Yet during that development cycle, we tend to learn a significant amount about the scope of the problem to be solved, about technology, about risk, about trade-offs. If the acceptance tests remain static, the learning isn’t reflected in the acceptance tests. That semss to me like a bias towards following a plan, rather than responding to change.

Acceptance tests are examples of how the product should behave. Those tests are typically performed in very constrained, staged, artificial environments that are shadows of the envionments in which the product will be used. Acceptance tests are not really tests, in the sense of testing the mettle of the product, subjecting it to the challenges and stresses of real-world use. Yet acceptance tests are often treated more as authoritative, definitive, specifications for the product, instead of representative examples. That sounds to me like a bias towards comprehensive documentation, rather than working software.

Acceptance tests are often discussed as though they determined the completion of development. While the acceptance tests aren’t passing, we know we’re not done; when the acceptance tests pass, we’re done and, implicitly, the customer is obliged to accept the product as it is. That sounds to me like a bias towards negotiated contracts, rather the customer collaboration.

The idea that we’re done when the acceptance tests pass is a myth. As a tester, I can assure you that a suite of passing acceptance tests doesn’t mean that the product is acceptable to the customer, nor does it mean that the customer should accept it. It means that the product is ready for serious exploration, discovery, investigation, and learning–that is, for Ii>testing–so that we can find problems that we didn’t anticipate with those tests but that would nonetheless destroy value in the product.

When the acceptance tests pass, the product might be acceptable. When the acceptance tests fail, we know for sure that the product isn’t acceptable. Thus I’d argue that instead of talking about acceptance tests, we should be talking about them as rejection tests.

Post-script: Yes, I’d call them rejection checks. But you don’t have to.

When Should the Product Owner Release the Product?

August 3rd, 2010

In response to my previous blog post “Another Silly Quantitative Model”, Greg writes: In my current project, the product owner has assumed the risk of any financial losses stemming from bugs in our software. He wants to release the product to customers, but he is of course nervous. How do you propose he should best go about deciding when to release? How should he reason about the risks, short of using a quantitative model?

The simple answer is “when he’s not so nervous that he doesn’t want to ship”. What might cause him to decide to stop shipment? He should stop shipment when there are known problems in the product that aren’t balanced by countervailing benefits. Such problems are called showstoppers. A colleague once described “showstopper” as “any problem in the product that would make more sense to fix than to ship.”

When I was a product owner and I reasoned with the project team about showstoppers, we deemed as a showstopper

  • Any single problem in the product that would definitely cause loss or harm (or sufficient annoyance or frustration) to users, such that the product’s value in its essential operation would be nil. Beware of quantifying “users” here. In the age of the Internet, you don’t need very many people with terrible problems to make noise disproportionate to the population, nor do you need those problems to be terrible problems when they affect enough people. The recent kerfuffle over the iPhone 4 is a case in point; the Pentium Bug is another.
  • Any set of problems in the product that when taken individually would not threaten its value, but when viewed collectively would. That could include a bunch of minor irritants that confuse, annoy, disturb, or slow down people using the product; embarrassing cosmetic defects; non-devastating functional problems; parafunctional issues like poor performance or compatibility, and the like.

Now, in truth, your product owner might need to resort to a quantitative model here: he has to be able to count to one. One showstopper, by definition, is enough to stop shipment.

How might you evaluate potential showstoppers qualitatively? My colleague Fiona Charles has two nice suggestions: “Could a problem that we know about in this product trigger a front-page story in the Globe and Mail‘s Report on Business, or in the Wall Street Journal?” “Could a problem that we know about in this product lead to a question being raised in Parliament?” Now: the fact is that we don’t, and can’t, know the answer to whether the problem will have that result, but that’s not really the point of the questions. The points are to explore and test the ways that we might feel about the product, the problems, and their consequences.

What else might cause nervousness for your client? Perhaps he’s worried that, other than the known problems, there are unanswered questions about the product. Those include

  • Open questions whose answer would produce instances of (1) or (2) above.
  • Unasked questions that, when asked, would fall into (3) above. Where would you get ideas for such questions? Try the Heuristic Test Strategy Model at http://www.satisfice.com/tools/satisfice-tsm-4p.pdf for an example of the kinds of questions that you might ask.
  • Unanswered questions about the product are one indicator that you might not be finished testing. There are other indicators; you can read about them here: http://www.developsense.com/blog/2009/09/when-do-we-stop-test/

    Questions about how much we have (or haven’t) tested are questions about test coverage. I wrote three columns about that a while back. Here are some links and synopses:

    Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

    Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

    A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

    Whether you’ve established a clear feeling or are mired in uncertainty, you might want to test your qualitative evaluation with a quantitative model. For example, many years ago at Quarterdeck, we had a problem that threatened shipment: on bootup, our product would lock system that had a particular kind of hard disk controller. There was a workaround, which would take a trained technical support person about 15 minutes to walk through. No one felt good about releasing the product in that state, but we were under quarterly schedule pressure. We didn’t have good data to work with, but we did have a very short list of beta testers and data about their systems. Out of 60 beta testers, three had machines with this particular controller. There was no indication that our beta testers were representative of our overall user population, but 5% of our beta testers had this controller. We then performed the thought experiment of destroying the productivity of 5% of our user base, or tech support having to spend 15 minutes with 5% of our user base (in those days, in the millions). How big was our margin of error? What if we were off by a factor of two, and ten per cent of our user base had that controller? What if we were off by a factor of five, and only one per cent of our user base had that controller? Suppose that only one per cent of the user base had their machines crash on startup; suppose that only a fraction of those users called in. Eeew. The story and the feeling, rather than the numbers, told us this: still too many. Is it irrational to base a decision based on such vague numbers? If so, who cares? We made an “irrational” but conscious decision: fix the product, rather than ship it. That is, we didn’t decide based on the numbers, but rather on how we felt about the numbers. Was that the right decision? We’ll never know, unless we figure out how to get access to parallel universes. In this one, though, we know that the problem got fixed startlingly quickly when another programmer viewed it with fresh eyes; that the product shipped without the problem; that users with that controller never faced that problem; and that the tech support department continued in its regular overloaded state, instead of a super-overloaded one.

    The decision to release is always a business decision, and not merely a technical one. The decision to release is not based on numbers or quantities; even for those who claim to make decisions “based on the numbers”, the decisions are really based on feelings about the numbers. The decision to release is always driven by balancing cost, value, knowledge, trust, and risk. Some product owners will have a bigger appetite for risk and reward; others will be more cautious. Being a product owner is challenging because, in the end, product owners own the shipping decisions. By definition, a product owner assumes responsibility for the risk of financial losses stemming from bugs in the software. That’s why they get the big bucks.

    Another Silly Quantitative Model

    July 14th, 2010

    John D. Cook recently issued a blog post, How many errors are left to find?, in which he introduces yet another silly quantitative model for estimating the number of bugs left in a program.

    The Lincoln Index, as Mr. Cook refers to it here, was used as a model for evaluating typographical errors, and was based on a method for estimating the population of a given species of animal. There are several terrible problems with this analysis.

    First, reification error. Bugs are relationships, not things in the world. A bug is a perception of a problem in the product; a problem is a difference between what is perceived and what is desired by some person. There are at least four ways to make a problem into a non-problem: 1) Change the perception. 2) Change the product. 3) Change the desire. 4) Ignore the person who perceives the problem. Any time a product owner can say, “That? Nah, that’s not a bug,” the basic unit of the system of measurement is invalidated.

    Second, even if we suspended the reification problem, the model is inappropriate. Bugs cannot be usefully modelled as a single kind of problem or a single population. Typographical errors are not the only problems in writing; a perfectly spelled and syntactically correct piece of writing is not necessarily a good piece of writing. Nor are plaice the only species of fish in the fjords, nor are fish the only form of life in the sea, nor do we consider all life forms as equivalently meaningful, significant, benign, or threatening. Bugs have many different manifestations, from capability problems to reliability problems to compatibility problems to performance problems and so forth. Some of those problems don’t have anything to do with coding errors (which themselves could be like typos or grammatical errors or statements that can interpreted ambiguously). Problems in the product may include misunderstood requirements, design problems, valid but misunderstood implementation of the design, and so forth. If you want to compare estimating bugs in a program to a population estimate, it would be more appropriate to compare it to estimating the number of all biological organisms in a given place. Imagine some of the problems in doing that, and you may get some insight into the problem of estimating bugs.

    Third, there’s Djikstra’s notion that testing can show the presence of problems, but not their absence. That’s a way of noting that testing is subject to the Halting Problem. Since you can’t tell if you’ve found the last problem in the product, you can’t estimate how many are left in it.

    Fourth, the Ludic Fallacy (part one). Discovery and analysis of problems in a product is not a probabilistic game, but a non-linear, organic system of exploration, discovery, investigation, and learning. Problems are discovered at neither a steady nor a random rate. Indeed, discoveries often happen in clusters as the tester learns about the program and things that might threaten its value. The Lincoln Index, focused on typos—a highly decidable and easily understood problem that could largely be accomplished by checking—doesn’t fit for software testing.

    Fifth, the Ludic Fallacy (part two). Mr. Cook’s analysis implies that all problems are of equal value. Those of us who have done testing and studied it for a long time know that, from one time to another, some testers find a bunch of problems, and others find relatively few. Yet those few problems might be of critical significance, and the many of lesser significance. It’s an error to think in terms of a probabilistic model without thinking in terms of the payoff. Related to that is the idea that the number of bugs remaining in the product may not be that big a deal. All the little problems might pale in significance next to the one terrible problem; the one terrible problem might be easily fixable while the little problems grind down the users’ will to live.

    Sixth, measurement-induced distortion. Whenever you measure a self-aware system, you are likely to introduce distortion (at best) and dysfunction (at worst), as the system adapts itself to optimize the thing that’s being measured. Count bugs, and your testers will report more bugs—but finding more bugs can get in the way of finding more important bugs. That’s at least in part because of…

    Seventh, the Lumping Problem (or more formally, Assimiliation Bias). Testing is not a single activity; it’s a collection of activities that includes (at least) setup, investigation and reporting, and design and execution. Setup and investigation and reporting take time away from test coverage. When a tester finds a problem, she investigates reports it. That time is time that she can’t spend finding other problems. The irony here is that the more problems you find, the fewer problems you have time to find. The quality of testing work also involves the quality of the report. Reporting time, since it isn’t taken into account in the model, will distort the perception of the number of bugs remaining.

    Eighth, estimating the number of problems remaining in the product takes time away from sensible, productive activities. Considering that the number of problems remaining is subjective, open-ended, and unprovable, one might be inclined to think that counting how many problems are left is a waste of time better spent on searching for other bad ones.

    I don’t think I’ve found the last remaining problem with this model.

    But it does remind me that when people see bugs as units and testing as piecework, rather than the complex, non-linear, cognitive process that it is, they start inventing all these weird, baseless, silly quantitative models that are at best unhelpful and that, more likely, threaten the quality of testing on the project.

    Questions from Listeners (2): Is Unit Testing Automated?

    June 28th, 2010

    On April 19, 2010, I was interviewed by Gil Broza.  In preparation for that interview, we solicited questions from the listeners, and I promised to answer them either in the interview or in my blog.  Here’s the second one.

    Unit testing is automated. When functional, integration, and system test cannot be automated, how to handle regression testing without exploding the manual test with each iteration?

    This question provides a great opportunity to look at a number of points—so many that I’d like to address only the first sentence in the question this time around. I’ll look at the second part of the question later on.

    Expansive Definitions

    I find the most helpful definitions and descriptions to be those that are expansive and inclusive. While testing, one big risk is that I might have narrow ideas about certain risks or threats to the value of the product. Thinking expansively helps me to avoid tunnel vision that would lead to my missing important problems. In conversations, thinking expansively helps me to remain alert to the possibility that the other person and I might be talking at cross-purposes. That can happen when one of us uses a word that means different things to each of us. It can also happen when we’re thinking of the same thing, but using different words. In fact, as Jerry Weinberg once remarked to James Bach, “A tester is someone who knows that things can be different.” Here’s an example of that. The questioner says that “unit testing is automated”. I’d argue that this refers to one part of testing, test execution, the part we can automate. Well, to me, things can be different.

    Testing Includes Many Activities

    Testing includes not only test execution, but also test design, learning, and reporting, all performed in cycles or loops. What is test design? As we say in the Rapid Software Testing course notes, test design includes

    • modeling the test space (that is, considering questions of what we could test; what’s in scope);
    • determining oracles (that is, figuring out the principles or mechanisms by which we’d recognize a problem, and considering how those principles or mechanisms might fail to help us recognize a problem)
    • determining coverage (that is, how much testing we’re going to do, given the scope)
    • determining procedures (how we’re going to perform the tests; how we’ll go about the business of test execution)

    Test execution includes

  • configuring the product (obtaining it, setting it up for the purposes of a given test)
  • operating the product (exercising the product in some way to obtain coverage)
  • observing the product (applying the oracles that we’ve determined in advance, but also recognizing behaviours that trigger us to recognize and apply new oracles)
  • evaluating the product (comparing its behaviour to our oracles)
  • applying a stopping heuristic (deciding when the test is done)
  • Test execution may or may not include reporting, but reporting happens at some point. And when testing is being done well, learning is happening pretty much all the time. This isn’t a strictly linear process, by the way. Depending on your approach to testing, and depending on what you’re these things may happen in the order that you see above, or they may happen in all at once in an organic tangled ball, with lots of tight little loops. Sometimes all of the elements of testing are done by the same person, and the elements interact with each other very quickly. Sometimes one person designs a test and another person handles the execution, in which case the loops will be long or broken. If you separate test design and test execution (as happens in scripted testing), you separate the learning associated with each. Sometimes we’ll evaluate a result and stop a test; sometimes we’ll stop first and then interpret what we’ve seen. For a given test, some aspects may take much longer than others; some may be done more consciously or thoughtfully than others. But at some point in pretty much every test, each of the steps above happen.

    Unit Testing Includes Many Activities

    Like any other kind of testing, unit testing consists of cycles of design, execution, learning, and reporting. Like any other test, a unit test starts with some person having a test idea, a question that we want to ask about the program. A person designing a unit test typically frames that question in terms of a check—an observation linked to a decision rule such that both can be performed by a machine. The person writes program code to express that yes-or-no question, usually assisted by some kind of unit testing framework. Next, some person—or, ar more often, some process that a person has initiated—performs the checks. The check produces a result. Sometimes a person observes that result independently of other results; more often, some person (the author of the automation framework) has programmed a mechanism that provides a means of aggregating the results. Then some person interprets the aggregated results and figures out what needs to be done next—whether everything is okay, whether a test result suggests that the product should be revised, or whether the check is excellent or wanting or broken irrelevant. And then the development cycle continues, in a loop that includes some development of the actual product too.

    Most Parts of Unit Testing Are Sapient, Not Mechanical

    Notice how many times the word “person” appears in the above description of unit testing. None of the steps in the process (with the exception of the running of the checks) can be automated, since each step requires a thinking person, rather than a machine, to seek information, to make decisions, and to control the overall process. Parts of unit testing can be assisted by automation, but the automation isn’t doing anything particularly on its own; it remains an extension of the person’s ability to execute and to observe.

    What form might unit test automation take? Many people think in terms of a testing framework that sets up some conditions, executes some code from the product under test, makes some assertions about the output of some function or some aspect the state of the system. That’s cool, and quite powerful. But for years at Quarterdeck, I watched programmers doing unit testing (and did some myself) by stepping though code under various debuggers (DEBUG, SYMDEB, WDEB386, or Soft-ICE, a software-based simulacrum of an in-circuit emulator), watching the registers and the ports for each instruction. Sometimes I’m writing some stuff in Ruby, and I want to do a quick little test of a fairly trivial function that I know I’m going to throw away. In that case, I don’t bother with the testing framework; I run the code and inspect the variables in IRB, the Ruby interpreter, and get my information that way. Sometimes I write a function, and generate some data to test it using automation. Sometimes, while unit testing, I use tools to examine the contents of a database table or a file or the Windows registry. Are all these different things unit testing? Jerry Weinberg says that testing is “gathering information with the intention of informing a decision”. I’m testing a unit, and I’m using automation to assist that testing, even though (so it seems) people tend to hold a more narrow view of what unit testing is. Unit testing is testing done at the unit level.

    Is stepping through the code the way that we should always do unit testing? Of course not. For the purpose of creating easily-runnable change detectors, the unit test framework is the way to go. Yet different approaches, tools, and techniques that we employ allow us to observe in different ways, discover different problems, and learn different things about the unit under test.

    Finally, it’s important to note that the development of unit-level checks tends to reveal more problems than the running of them. Chip Groeder won a best paper award at the STAR conference in 1997, in which he claimed that 88% of the bugs that he found with automated tests were found during development of the tests (that is, the non-automated parts of the testing). (Thanks to Cem Kaner for pointing me to this.)  Anecdotally, everyone that I speak to who uses automation for the execution of tests—whether at the unit level or not—says exactly the same thing.  That’s not to say that automated checks are useless.  On the contrary; checks, as change detectors, are very useful.  Instead, my point is that unit testing is not automated; not the interesting parts. Unit checking is automated.

    In summary:

    • Unit testing is a highly exploratory process, in the that the loops are short, tightly integrated, and typically performed by the same person.
    • The most important parts of unit test are the sapient parts—the design, programming, design of reports, interpretation of results, and the evaluation of what to do next.
    • The scripted part of unit testing—the execution of the checks—is the least interesting part of unit testing. And yet…
    • Many people seem to be fascinated by the mechanical parts, dazzled by lines on the screen, blissful upon observation of the green bar. And the same people say things like “unit testing is automated”. Why is that?

    That’s a lot for now. I’ll answer the rest of the question in a future post.

    Doing Development Work vs. Doing Quality Assurance

    June 5th, 2010

    Here’s a case where a comment and question were worthy of a post of their own.  In reference to my recent post, Testers:  Get Out of the Quality Assurance Business, Selim Mia writes:

    Hi Michael,

    I have started following your blog just from past few days and I like to thank you for all of your thoughtful posts by which reflects your craftsmanship.

    Thank you for reading, and thank you for thanking me.

    I have solely agreed all of your points/advice/discussions on this post. I had many confusion about the term QA and QC since the start of my testing career and still have many confusion, i think other testers have the same. i have been working in a department called “QA” in my organization but doing mostly testing tasks as like other companies in Bangladesh. But along with testing we have also doing some of the QA tasks (i think) and below i have mentioned some of these:

    • Check-in Review: we check, each developer at-least once in a day Check-in their source code into the svn repository (source code management system) with the comment what changes he made for this particular check-in and also reviewer name who pair reviewed the code before check-in.
    • Code review: we check, is the code reviewed by the technology expert in witch technology project is developing in the regular interval (at least for the new developer’s code, code of complex functionalities, etc) and also we ensure that actions has been taken for all the review comments.
    • Audit Process Framework: we check, are all the development processes are following by the all project members except their have enough justification and approval not to follow the particular process(es).
    • Audit Bug repository: we ensure all the reported bugs have been taken into action (not a bug, assigned, WIP, fix, won’t fix).
    • Audit Document Management System: we ensure that all the updated version of all documents of the particular project are stored on the DMS.

    Are not all above activities are part (of course, not all) of QA? Your kind words will be very much helpful to me.

    Regards,
    - Selim

    What a great question! Thank you for asking.

    The overarching mission for a tester, in my view, is to be of service to the project. Now, that’s not only the case for testers; I think it’s the overarching mission of anyone, everyone, on the project. We’re all in service to our paramount clients—the product owners, the business owners, the gold owners and the goal donors (as some Agile wags have said)—but we’re also in service to each other. When we’re thinking that way, the testers help the programmers by testing the product using a different skill set and mind set from the programmers; the programmers help the testers by providing a more testable product (log files, scriptable interfaces, and so on). Testers may help programmers to pinpoint the circumstances in which a bug happens; programmers help testers by providing explanations, test programs, hints on what to test. Testers learn to program; programmers learn to test. We support each other and learn from each other.

    The Agile people for years have been advocating the idea of the self-organizing team. I believe in that too. That means that, in principle, anyone on the team is empowered to do whatever work needs to be done. So if a programmer takes on the tasks of setting up and configuring test environments, or if the tester is recruited to review code or models or bugs—activities that help to assure quality as a part of collaborative process, I’d say that’s cool.

    The audit stuff gives me pause. Auditing, in my view, is a kind of testing role: gathering information with the intention of informing a decision. Auditors don’t set policy or enforce rules; they provide information to management. In many process-model-obsessed organizations (here in the West, at least) the role has taken on a different slant: auditors are a kind of process police. In such organizations, people rearrange and reprioritize their work not to optimize its value, but to keep the auditors happy. This is a form of goal displacement. To me, the priority should be on providing service and value to our clients, including each other.

    In my view, if auditors discover some deviation from a set policy or a process model, I’d argue that the first step is to question the reasons for the deviation. Maybe someone is being sloppy; maybe someone is cutting corners; maybe someone is adding risk. But maybe someone has discovered a faster, less expensive, more efficient, more informative, more productive way of handling a task. Models always leave out something. Process models often leave out means by which we can encourage beneficial variation and change. I’ve never heard of an auditor reporting on some fabulous new problem-solving approach that someone has discovered internally. Most often, in my experience, process models leave out adaptability and people, as this remarkable TED talk describes.

    It’s neither a tester’s job nor an auditor’s job, in my view, to set or enforce policy, and I think it’s politically dangerous for us to be perceived that way. As soon as we are perceived to be responsible for enforcement, we run the risk of being seen as tattletales, busybodies, quality police. In that kind of environment, information will soon start to be hidden, which undermines the task of investigating the product and identifying problems with it that threaten its value.

    So, to the extent that you’re doing development work that helps to assure quality; to the extent that your teammates themselves are asking you to assist them; to the extent that you’re providing a service to them; to the extent that they appreciate what you’re doing as a service to them; and to the extent that they thank you for it, I’d say “rock on”, and congratulations.

    In another forum, a correspondent suggested “Maybe it’s all down to the “overall” thing – be part of the process, not a megalomaniac who thinks he owns it.” I absolutely agree with that.  To the extent that you’re doing “quality assurance”; to the extent that your managers are requiring you to impose on your teammates (or even worse, to the extent that you’re imposing without being asked by anyone); to the extent that you’re slowing down the project or inflicting help; to the extent that the programmers see your work as enforcing the contents of a process model or policy document; to the extent that you are barely tolerated or outright resented—well, as always, that’s up to you and your organization. But it’s not the kind of work that I would condone or accept myself.

    Again, thanks for writing.