<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-6567846</atom:id><lastBuildDate>Tue, 23 Feb 2010 19:43:44 +0000</lastBuildDate><title>DevelopSense Blog</title><description>Observations on software testing and quality, by Michael Bolton</description><link>http://www.developsense.com/blog/blog.shtml</link><managingEditor>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</managingEditor><generator>Blogger</generator><openSearch:totalResults>173</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-8884191642960241964</guid><pubDate>Tue, 23 Feb 2010 15:44:00 +0000</pubDate><atom:updated>2010-02-23T12:27:31.942-05:00</atom:updated><title>Return to Ellis Island</title><description>&lt;a href="http://dnicolet1.tripod.com/agile/index.blog?entry_id=1989736"&gt;Dave Nicollette responds&lt;/a&gt; to my &lt;a href="http://www.developsense.com/blog/2010/02/ellis-island-bug.html"&gt;post on the Ellis Island bug&lt;/a&gt;.  I appreciate his continuing the conversation that started in the comments to my post.&lt;br /&gt;&lt;br /&gt;Dave says, "In describing a 'new' category of software defect he calls &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2010/02/ellis-island-bug.html"&gt;Ellis Island bugs&lt;/a&gt;...".&lt;br /&gt;&lt;br /&gt;I want to make it clear:  there is &lt;span style="font-style: italic;"&gt;nothing&lt;/span&gt; new about Ellis Island bugs, except the name.  They've been with us forever, since before there were computers, even.&lt;br /&gt;&lt;br /&gt;He goes on to say "Using the typical behavior-driven approach that is popular today, one of the very first things I would think to write (thinking as a developer, not as a tester) is an example that expresses the desired behavior of the code when the input values are illogical. Protection against Ellis Island bugs is &lt;em&gt;baked in&lt;/em&gt; to contemporary software development technique."&lt;br /&gt;&lt;br /&gt;I'm glad Dave does that. I'm glad his team does that.  I'm glad that it's baked in to contemporary software development technique.  That's a &lt;span style="font-style: italic;"&gt;good thing&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;First, there's no evidence to suggest that excellent coding practices are universal, and plenty of evidence to suggest that they aren't.  Second, the Ellis Island problem is not a problem that you &lt;span style="font-style: italic;"&gt;introduce&lt;/span&gt; in your own code.  It's a class of problem that you have to &lt;span style="font-style: italic;"&gt;discover&lt;/span&gt;.  As Dave rightly points out,&lt;br /&gt;&lt;br /&gt;"...only way to catch this type of defect is by exploring the behavior of the code after the fact. Typical boundary-condition testing will miss some Ellis Island situations &lt;em&gt;because developers will not understand what the boundaries are supposed to be&lt;/em&gt;."&lt;br /&gt;&lt;br /&gt;The issue is not that "developers" will not understand what the boundaries are supposed to be.  (I think Dave means "programmers" here, but that's okay, because other developers, &lt;span style="font-style: italic;"&gt;including&lt;/span&gt; &lt;span style="font-style: italic;"&gt;testers&lt;/span&gt; won't understand what the boundaries are supposed to be either.)  &lt;span style="font-style: italic;"&gt;People in general&lt;/span&gt; will not understand what the boundaries are supposed to be without testing and interacting with the built product.  And even then, people will understand only to the extent that they have the time and resources to test.&lt;br /&gt;&lt;br /&gt;Dave seems to have locked onto the triangle program as an example of a "badly developed program".  Sure it's a badly developed program.  I could do better than that, and so could Dave.  Part of the point of our exercise is that if the testers looked at the source code (which we supply, quietly, along with the program), they'd be more likely to find that kind of bug.  Indeed, when programmers are in the class and have the initiative to look at the source, they often spot that problem, and that provides an important lesson for the testers:  &lt;span style="font-style: italic;"&gt;it might be a really good idea to learn to read code&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Yet testing isn't just about questioning and evaluating the code that we write, because the code that we write is Well Tested and Good and Pure.  &lt;span style="font-style: italic;"&gt;We&lt;/span&gt; don't write badly developed programs.  That's a thing of the past.  Modern development methods make sure that problem never happens. &lt;irony&gt;The trouble is that APIs and libraries and operating systems and hardware ROMs weren't written by our ideal team.  They were written by other teams, whose minds and development practices and testing processes we do not, cannot, know.  How do we know that the code that we're calling &lt;span style="font-style: italic;"&gt;isn't&lt;/span&gt; badly developed code?  We don't know, and so we have to test.&lt;br /&gt;&lt;br /&gt;I think we'd agree that Ruby, in general, is &lt;span style="font-style: italic;"&gt;much&lt;/span&gt; better developed software than the triangle program, so let's look at that instead.&lt;br /&gt;&lt;br /&gt;The Pickaxe says of the String::to_i() method: "&lt;span style="font-weight: bold;"&gt;If there is not a valid number at the start of str, 0 is returned. The method never raises an exception.&lt;/span&gt;"  That's cool.  Except that I see two things that are suprising.&lt;br /&gt;&lt;br /&gt;The first is that to_i returns zero, instead of an exception. That is, it returns a value (quite probably the wrong value) in exactly the same data type as the calling function would expect. That leaves the door wide open for misinterpretation by someone who hasn't tested the function seeking that kind of problem. We thought we had done that, and we were mistaken.  Our tests were revealing accurately that invalid data of a certain kind was being rejected appropriately,  but we weren't yet sensitized to a problem that was revealed only by later tests.&lt;br /&gt;&lt;br /&gt;The second surprising thing is that the documentation is flatly wrong:  to_i &lt;span style="font-style: italic;"&gt;absolutely does &lt;/span&gt;throw exceptions when you hand it a parameter outside the range 2 through 36.  We discovered that through testing too. That's interesting. I'd far rather it threw an exception on a number that it can't parse properly,  so that I could more easily detect that situation and handle it more in the way that I'd like.&lt;br /&gt;&lt;br /&gt;Well, after a bunch of testing by students and experts alike, we finally surprised ourselves with some data and a condition that revealed the problem.  We thought that we had tested really well, and we found out that we hadn't caught everything.   So now I have to write some code that checks the string and the return value more carefully than Ruby itself does.  That's okay.  No problem.  Now... that's &lt;span style="font-style: italic;"&gt;one method&lt;/span&gt; in &lt;span style="font-style: italic;"&gt;one class&lt;/span&gt; of all of Ruby.  What other surprises lurk?&lt;br /&gt;&lt;br /&gt;(Here's one.  When I copied the passage &lt;span style="font-weight: bold;"&gt;in bold&lt;/span&gt; above from my PDF copy of the Pickaxe, I got more than I bargained for:  in addition to the text that I copied, I got this:  "Report erratum Prepared exclusively for Michael Bolton".  Should I have been surprised by that or not?)&lt;br /&gt;&lt;br /&gt;Whatever problem we anticipate, we can insert code to check for that problem.  Good.  Whatever problem we discover, we can insert code to check for that problem too. That's great.  In fact, we check for &lt;span style="font-style: italic;"&gt;all&lt;/span&gt; the problems that our code could &lt;span style="font-style: italic;"&gt;possibly&lt;/span&gt; run into.  Or rather we &lt;span style="font-style: italic;"&gt;think&lt;/span&gt; we do, and &lt;span style="font-style: italic;"&gt;we don't know when we're not doing it&lt;/span&gt;. To address &lt;span style="font-style: italic;"&gt;that&lt;/span&gt; problem, we've got a team around us who provides us with lots of test ideas, and pairs and reviews and exercises the code that we write, and we all do that stuff really well.&lt;br /&gt;&lt;br /&gt;The problem comes with the fact that when we're writing software, we're dealing with far more than just the software we write. That other software is typically a black box to us.  It often comes to us documented poorly and tested worse.  It does things that we don't know about, that we can't know about.  It may do things that its developers considered reasonable but that we would consider surprising.  Having been surprised, we might also consider it reasonable... &lt;span style="font-style: italic;"&gt;but we'd consider it surprising first&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Let me give you two more Ellis Island examples.  Many years ago, I was involved with supporting (and later program managing and maintaining) a product called DESQview.  Once we had a fascinating problem that we heard about from customers.  On a particular brand of video card (from a  company called "Ahead"), typing DV wouldn't start DESQview and give you all that multitasking goodness.  Instead, it would cause the letters VD to appear in the upper left corner of the display, and then hang the system.  We called the manufacturer of that card—headquartered in Germany—, and got one in.  We tested it, and couldn't reproduce the problem. Yet customers kept calling in with the problem.  At one point, I got a call from a customer who happened to be a systems integrator, and he had a card to spare.  He shipped it to us.&lt;br /&gt;&lt;br /&gt;The first Ellis Island surprise was that &lt;span style="font-style: italic;"&gt;this&lt;/span&gt; card, also called "Ahead" was from a Taiwanese company, not a German one.  The second surprise was that, at the beginning of a particular INT 10h call, the card saved the contents of the CPU registers, and restored them at the end of that call.  The Ellis Island issue here was that the BX register was not returned in its original state, but set to 0 instead.  After the fact, after the discovery, the programmer developed a terminate-and-stay-resident program to save and restore the registers, and later folded that code into DESQview itself to special-case that card.&lt;br /&gt;&lt;br /&gt;Now:  our programmers were fantastic.  They did a lot of the Agile stuff before Agile was named; they paired, they tested, they reviewed, they investigated.  This problem had nothing to do with the quality of the code that &lt;span style="font-style: italic;"&gt;they&lt;/span&gt; had written.  It had everything to do with the fact that you'd expect someone using the processor not to muck with what was already there, combined with the fact that in our test lab we didn't have every video card on the planet.&lt;br /&gt;&lt;br /&gt;The oddest thing about Dave's post is that he interprets my description of the Ellis Island problem as an argument "to support &lt;em&gt;status quo&lt;/em&gt; role segregation."  Whaa...?   This has &lt;span style="font-style: italic;"&gt;nothing&lt;/span&gt; to do with role segregation.  &lt;span style="font-style: italic;"&gt;Nothing&lt;/span&gt;.  At one point, I say "the programmer's knowledge is, at best, is a different set compared to what empirical testing can reveal."   That's true in any situation, be it a solo shop, a traditional shop, or an Agile shop.  It's true of &lt;span style="font-style: italic;"&gt;anyone's&lt;/span&gt; understanding of &lt;span style="font-style: italic;"&gt;any&lt;/span&gt; situation.  There's always more to know than we think there is, and there's always another interpretation that one could take, rightly or wrongly.  Let me give you an example of that:&lt;br /&gt;&lt;br /&gt;When I say "the programmer's knowledge is, at best, is a different set compared to what empirical testing can reveal," there is &lt;span style="font-style: italic;"&gt;nothing&lt;/span&gt; in that sentence, nor in the rest of the post, to suggest that the programmers shouldn't explore, or that testers should be the only ones to explore.  Dave simply &lt;span style="font-style: italic;"&gt;made that part up&lt;/span&gt;.  My post says one thing, mostly on epistemology, that we don't know what we don't know.  From my post, Dave takes another interpretation about organizational dynamics that is completely orthogonal to my point.  Which, in fact, is an Ellis Island kind of  problem on its own.&lt;/irony&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-8884191642960241964?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2010/02/return-to-ellis-island.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-6944440369215725130</guid><pubDate>Wed, 10 Feb 2010 06:46:00 +0000</pubDate><atom:updated>2010-02-10T05:12:05.126-05:00</atom:updated><title>The Ellis Island Bug</title><description>A couple of years ago, I developed a version of a well-known reasoning exercise.  It's a simple exercise, and I implemented it as a really simple computer program.  I described it to &lt;a href="http://www.satisfice.com"&gt;James Bach&lt;/a&gt;, and suggested that we put it in our &lt;a href="http://www.developsense.com/courses.html"&gt;Rapid Software Testing&lt;/a&gt; class.&lt;br /&gt;&lt;br /&gt;James was skeptical.  He didn't figure from my description that the exercise would be interesting enough.  I put in a couple of little traps, and tried it a few times with colleagues and other unsuspecting victims, sometimes in person, sometimes over the Web.  Then I tried the actual exercise on James, using the program.  He helpfully stepped into one of the traps.  Thus emboldened, I started using the exercise in classes.  Eventually James found an occasion to start using it too.  He watched students dealing with it, had some epiphanies, tried some experiments.  At one point, he sat down with his brother Jon and they tested the program aggressively, and revealed a ton of new information about it&amp;mdash;many of which I hadn't known myself.  And I &lt;span style="font-style: italic;"&gt;wrote&lt;/span&gt; the thing.&lt;br /&gt;&lt;br /&gt;Experiential exercises are like peeling an onion; beneath everything we see on the surface, there's another layer that we can learn about.  Today we made a discovery; we found a bug as we transpected on the exercise, and James put a name on it.&lt;br /&gt;&lt;br /&gt;We call it an Ellis Island bug.  Ellis Island bugs are data conversion bugs, in which a program silently converts an input value into a different value.  They're named for the tendency of customs officials at Ellis Island, a little way back in history, to rename immigrants unilaterally with names that were relatively easy to spell.  With an Ellis Island bug, you could reasonably expect an error on a certain input.  Instead you get the program's best guess at what you "really meant".&lt;br /&gt;&lt;br /&gt;There are lots of examples of this.  We have an implementation of the famous triangle program, written many years ago in Delphi.  The program takes three integers as input, with each number representing the length of a side of a triangle.  Then the program reports on whether the triangle is scalene, isoceles, or equilateral.   Here's the line that takes the input:&lt;br /&gt;&lt;br /&gt;function checksides (a, b, c : shortint) : string&lt;br /&gt;&lt;br /&gt;Here, no matter what numeric value you submit, the Delphi libraries will return that number as a signed integer between -128 and 127.  This leads to all kinds of amusing results: a side of length greater than 127 will invisibly be converted to a negative number, causing the program to report "not a triangle" until the number is 256 or greater; and entries like 300, 300, 44 will be interpreted as an equilateral triangle.&lt;br /&gt;&lt;br /&gt;Ah, you say, but no one uses Delphi any more.  So how about C?  We've been advised forever not to trust input formatting strings, and to parse them ourselves.  How about Ruby?&lt;br /&gt;&lt;br /&gt;Ruby's String object supplies a to_i method, which converts a string to its integer representation.  Here's what the &lt;a href="http://www.pragprog.com/titles/ruby/programming-ruby"&gt;Pickaxe&lt;/a&gt; says about that:&lt;br /&gt;&lt;cite&gt;&lt;br /&gt;to_i         &lt;span style="font-style: italic;"&gt;str&lt;/span&gt;.to_i( &lt;span style="font-style: italic;"&gt;base&lt;/span&gt;=10 ) → &lt;span style="font-style: italic;"&gt;int&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Returns the result of interpreting leading characters in str as an integer base base (2 to 36). Given a base of zero, to_i looks for leading 0, 0b, 0o, 0d, or 0x and sets the base accordingly. Leading spaces are ignored, and leading plus or minus signs are honored. Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str, 0 is returned. The method never raises an exception.&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;We discovered a bunch of things today as we experimented with our program.  The most significant thing was the last two sentences: an invalid number is silently converted to zero, and no exception is raised! &lt;br /&gt;&lt;br /&gt;We found the problem because we thought we were seeing a different one.  Our program parses a string for three numbers. Depending upon the test that we ran, it appeared as though multiple signs were being accepted (+--+++--), but that only the first sign was being honoured.  Or that only certain terms in the string tolerated multiple signs.   Or that you could use multiple signs once in a string—no, twice. What the hell?  All our confusion vanished when we put in some debug statements and saw invalid numbers being converted to 0, a kind of guess as to what Ruby &lt;span style="font-style: italic;"&gt;thought&lt;/span&gt; you meant.&lt;br /&gt;&lt;br /&gt;This is by design in Ruby, so some would say it's not a bug. Yet it leaves Ruby programs spectacularly vulnerable to bugs wherein the programmer isn't aware of the behaviour of the language.  I knew about to_i's ability to accept a parameter for a number base (someone showed it to me ages ago), but I didn't know about the conversion-to-zero error handling.  I would have expected an exception, but it doesn't do that.  It just acts like an old-fashioned customs agent:  "S-C-H-U-M-A-C... What did you say?  Schumacher?  You mean Shoemaker, right? Let's just make that Shoemaker.  Youse'll like that better here, trust me."&lt;br /&gt;&lt;br /&gt;We also discovered that the method is incorrectly documented: to_i &lt;span style="font-style: italic;"&gt;does&lt;/span&gt; raise an exception if you pass it an invalid number base—37, for example.&lt;br /&gt;&lt;br /&gt;There are many more stories to tell about this program—in particular, how the programmer's knowledge is, at best, is a different set compared to what empirical testing can reveal.  Many of the things we've discovered about this trivial program could not have been caught by code review; many of them aren't documented or are poorly documented both in the program and in the Ruby literature.  We couldn't look them up, and in many cases we couldn't have anticipated them if they hadn't emerged from testing.&lt;br /&gt;&lt;br /&gt;There are other examples of Ellis Island bugs.  A correspondent, Brent Lavelle, reports that he's seen a bug in which 50,00 gets converted to 5000, even if the user is from France or Germany (in those countries, a comma rather than a period denotes the decimal, and they use spaces where we use commas).&lt;br /&gt;&lt;br /&gt;Now:  boundary tests may reveal some Ellis Island bugs.    Other Ellis Island bugs defy boundary testing, because there's a catch:  many such tests would require you &lt;span style="font-style: italic;"&gt;to know what the boundary is&lt;/span&gt; and what is supposed to happen when it is crossed.  From the outside, that's not at all clear. It's not even clear to the programmer, when libraries are doing the work. That's why it's insufficient to test at the boundaries that we know about already; that's why we must &lt;span style="font-style: italic;"&gt;explore&lt;/span&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-6944440369215725130?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2010/02/ellis-island-bug.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>10</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-8735302325709059212</guid><pubDate>Thu, 04 Feb 2010 16:50:00 +0000</pubDate><atom:updated>2010-02-05T11:52:42.989-05:00</atom:updated><title>Testing and Management Parallels</title><description>Rikard Edgren, Henrik Emilsson and Martin Jansson collaborate on blog called &lt;span style="font-style: italic;"&gt;&lt;a href="http://thetesteye.com/blog"&gt;thoughts from the test eye&lt;/a&gt;&lt;/span&gt;. In a satirical post from this past summer called "&lt;a href="http://thetesteye.com/blog/2009/07/scripted-vs-exploratory-testing-from-a-managerial-perspective/"&gt;Scripted vs Exploratory Testing from a Managerial Perspective&lt;/a&gt;", Martin proposes that "From a managerial perspective without knowing too much about testing, your sole experience comes from the scripted test environment…"  But I think that from a managerial perspective, there is another place you could look to understand skilled testing:  managing.  I'll follow the points in Martin's post.&lt;br /&gt;&lt;br /&gt;If you're a capable manager, and you're managing other managers, you know that there are things for which scripting &lt;i&gt;doesn't&lt;/i&gt; work:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Control&lt;/b&gt;. Managers guide the managers working under them, but everyone involved knows that managers don't have &lt;span style="font-style: italic;"&gt;complete&lt;/span&gt; control over what they're managing. No script can capture the esssence of management work.  (If scripts could do that, we'd have automated management by now.)  Managers know that when they have some written guidance on how workers are to perform certain tasks, effective workers and managers alike must adapt to the situation and use their judgement.  If, as a manager, you could script workers' actions completely, they wouldn't come to your office to ask for help, and you wouldn't have to assist, guide, motivate, or reprimand them.   You, the manager, have to observe a variety of things that cannot be anticiapted, and respond to what actually happens.  You might have checklists, but you don't have a list of scripted tasks.  You recognize that knowing when management work will end for a particular project can be anticipated but not predicted with certainty.  Indeed, that's a function of the risks that you're hired to manage and the problems you're hired to solve.  As a manager, you're managing many things simultaneously.  You have the freedom and responsibility to carry out your work in the manner you think best, and you grant similar freedom and responsibility to your people.  &lt;span style="font-style: italic;"&gt;Isn't all that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Hierarchy&lt;/b&gt;. There is a structure to management, with different roles playing their part in the system.  No competent manager supervising other managers would characterize management as "some people to do the thinking and others execute".  That would suggest that some managers think and other managers execute.  As a manager, you recognize that &lt;span style="font-style: italic;"&gt;all&lt;/span&gt; managers worthy of the name &lt;span style="font-style: italic;"&gt;both&lt;/span&gt; think &lt;span style="font-style: italic;"&gt;and&lt;/span&gt; execute, with the recognition that an organization is stronger as a collaborative network.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Scalability&lt;/b&gt;.  You know that in management, you can't easily bring in people who can execute management scripts that other managers have written.  Managers need to own their processes. Getting new managers in the middle of a project would derail it, and you can't take just anyone.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Management Software&lt;/b&gt;  As a manager, you know that no tool—even one that costs several million dollars—can replace your judgment.  At best, it collate data and can generate excellent reports, but the decision-making is yours.  As a manager, you're leery of having your work overly mediated.  When you have important but mundane tasks to perform, you hand off the non-sapient parts to computing machinery, but you apply sapience to planning, designing, and programming the tools—and you apply sapience to observing the results, to determining their meaning and signifiance, and to your response.  When you have to delegate sapient work, you  know that it can't be performed by a machine.  So you hire someone—a person, not a machine—to do the work with your collaboration and guidance.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Education&lt;/b&gt;. You look back on how you learned, and you realize that, whether you had years of schooling or learned on the job, you don't believe in mail-order management courses, and you harbour no illusions that a two-day course accompanied by a piece of paper can teach you how to be a manager; nor can you trust that someone brandishing a similar piece of paper is ready for a management job until you know a lot more about him.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What does Exploratory Testing (ET) include?  Well, it's kind of like management, isn't it?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Flatness In Organization.&lt;/b&gt;  Managers perform management actions as they go along. Managers do not need people to design their actions for them.  Managers foster leadership by empowering people to use their skills; guiding, but not controlling; granting freedom and requiring responsibility.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Chaos Can Be Tamed&lt;/b&gt;. You have no idea on how you are going to manage, nor on how the managers reporting to you will manage. You have not planned everything out in detail before you start managing; you can't, and you know you'd be fooling yourself if you pretended to do so. You cannot report exactly how long time you need, since you don't know everything in advance.  In fact, discovering what needs to be done is a key aspect of your work.   You recognize that management is a holistic process, not a linear one.  You will use your skills, combined with all of the information available to inform your decisions on time, scope, quality, innovation, skill, and learning.  You will use feedback from your surroundings to gather the information you need to make decisions.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Scalability. &lt;/b&gt;When you're hiring people to be managers who report to you , you only want managers. If they're not ready for that, but show promise, you'll train and mentor them into the role.  Not anyone can be a manager. It is hard to just get anyone to help out since you cannot use just anyone from the organisation.   They need to learn real management skills to be effective., which means that, among other things they must be given the freedom to make mistakes that can be observed and corrected in an empowering, fault-tolerant environment.  When looked at this way, management &lt;i&gt;does&lt;/i&gt; scale.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Skills-based Education&lt;/b&gt; Multiple-choice based certification for managers is insufficient. Better:  there are degree programs, and there are shorter skill-based courses that involve simulations, open discussion, and testing actual software.  Good courses are valuable supplements to an environment that fosters learning and innovation; courses that teach only management nomenclature are a waste of time and money.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Management Software Isn't Management&lt;/b&gt;. Management isn't done by software.  Major software vendors have tools for this, but they don't replace managers.  Customer relationship management software is not customer relationship management; enterprise resource management software isn't enterprise resource management.  A real manager knows that it is what she thinks and what she does is important; that for her real work--the analysis and decision making--her paper notepad is as just as valid a tool as an Excel spreadsheet, and that no tool, no matter how big or how expensive or how powerful, is anything more than a tool.  &lt;span style="font-style: italic;"&gt;Isn't that like being a tester, and like managing testers?&lt;br /&gt;&lt;br /&gt;Excellent testing skill has much in common with excellent management skill.  As testers, maybe we can use the similarities between them to help explain the work that we do.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-8735302325709059212?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2010/02/testing-and-management-parallels.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-3679147989308873184</guid><pubDate>Wed, 27 Jan 2010 19:37:00 +0000</pubDate><atom:updated>2010-01-28T17:53:11.408-05:00</atom:updated><title>Exploratory Testing IS Accountable</title><description>In &lt;a href="http://www.satisfice.com/blog/archives/401"&gt;this blog post&lt;/a&gt;, my colleague &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt; &lt;a href="http://www.satisfice.com/blog/archives/401"&gt;talks about logging&lt;/a&gt; and its importance in support of &lt;a href="http://www.developsense.com/resources.shtml#exploratory_testing"&gt;exploratory testing&lt;/a&gt;.  Logging takes care of one part of the accountability angle, and in an approach like &lt;a href="http://www.satisfice.com/sbtm"&gt;session-based test management&lt;/a&gt; (developed by James and his brother &lt;a href="http://www.quardev.com/"&gt;Jon&lt;/a&gt;), the test notes and the debrief take care of another part of it.&lt;br /&gt;&lt;br /&gt;Logging records &lt;i&gt;what happened from the perspective of the test system&lt;/i&gt;.  Good logging relieves the tester from having to record specific actions in detail; the machine does that.  The tester is thereby free to record test notes—a running account of the tester's ideas, questions, and results as he tested, or &lt;i&gt;what happened from the perspective of the tester&lt;/i&gt;.  Those notes form the meat of the session sheet, which also includes&lt;br /&gt;&lt;ul&gt;&lt;li&gt;coverage data&lt;/li&gt;&lt;li&gt;who did the testing&lt;/li&gt;&lt;li&gt;when they started&lt;/li&gt;&lt;li&gt;how long it took&lt;/li&gt;&lt;li&gt;the proportion of time spent on test design and execution, bug investigation and reporting, and setup&lt;/li&gt;&lt;li&gt;the proportion of the time spent on on-charter work vs. opportunity work&lt;/li&gt;&lt;li&gt;references to log files, data files, and related material such as scenarios, help files, specifications, standards, and so forth&lt;br /&gt;&lt;/li&gt;&lt;li&gt;and, of course,  bugs discovered and issues identified.&lt;/li&gt;&lt;/ul&gt;After the session or at the end of the day, the tester presents a report—the session sheet combined with an oral account—in the &lt;i&gt;debrief&lt;/i&gt;, a conversation between the tester and the test lead or test manager.  In the debrief, the test lead reviews—that is, &lt;span style="font-style: italic;"&gt;tests&lt;/span&gt;—the tester's experience and his report. The question "What happened?" gets addressed; the oral and written aspects of the report get discussed and evaluated; the session charter is confirmed or revised; holes are discovered and, where needed, plugged with followup testing; bug reports get reviewed; issues get brought up; coaching happens; mentoring happens; learning happens; knowledge gets transferred.  The goal here is for the tester and the test lead to be able to say, "&lt;i&gt;we can vouch for what was tested&lt;/i&gt;".&lt;br /&gt;&lt;br /&gt;The session sheet is structured in such a way that it can be scanned by a text-parsing tool written in Perl.  The measurements (in particular the coverage measurements) are collected and collated automatically into reports in the form of sortable HTML tables.  Session sheets are kept for later review, if they're needed.&lt;br /&gt;&lt;br /&gt;If logging in the program isn't available right away, screen recording tools (like &lt;a href="http://www.bbsoftware.co.uk/BBTestAssistant.aspx?cc=true"&gt;BB Test Assistant&lt;/a&gt;, &lt;a href="http://www.techsmith.com/camtasia.asp"&gt;Camtasia&lt;/a&gt;, &lt;a href="http://www.spectorsoft.com/products/SpectorPro_Windows/"&gt;Spector&lt;/a&gt;, ...) can provide a retrospective account of what happened.  (An over-the-shoulder video camera works too.)  Note that these tools simply record video (and, optionally, sound—which is good for narration).  Programmatic repetition of the session isn't the point.  Nor is the point to have a supervisor review the screen capture obsessively; that wastes time, and besides, nobody likes working for Big Brother.  The idea is to use the video only when necessary—to aid in recollection where it's needed, and to help in troubleshooting hard-to-reproduce bugs.&lt;br /&gt;&lt;br /&gt;We suggest, where it doesn't get in the way, taking the test notes on the same machine as the application under test, and using the text editor window popping up as a way to link the execution of the application with bugs, test ideas or questions.  For bugs that don't appear to be state-critical you can also take very brief notes for later followup.  Include a time stamp, where the time stamp is an index into the recording; then revisit the recording later if more detail is called for.  (In Notepad, you can press F5; in TextPad, Edit/Insert/Time, and it's macroable; other text editors almost certainly have a similar feature.)&lt;br /&gt;&lt;br /&gt;Between a charter, the session sheet, the oral report, data files, &lt;i&gt;and&lt;/i&gt; the logs &lt;i style="font-weight: bold;"&gt;and&lt;/i&gt; the debrief, it's hard for me to imagine a more accountable way of working.  Each aspect of the reporting structure reinforces the others. This is why I get confused when test managers talk about exploratory testing being "unaccountable" or "unmanageable" or "unstructured":  when I ask them what accountability and management means to them, they point lamely to a pile of scripts or spreadsheets full of overspecified actions that were written weeks or months before the software was built, or they mumble something about not knowing what goes on in a tester's head.&lt;br /&gt;&lt;br /&gt;Any testing approach is manageable &lt;span style="font-style: italic;"&gt;when you choose to manage it&lt;/span&gt;.   If you want structure think about what you mean (maybe &lt;a href="http://www.developsense.com/resources.shtml#exploratory_testing"&gt;this guide to the structures of exploratory testing&lt;/a&gt; will help), identify the structures that are important to you, and develop those structures in your testers, in your team, and in your approaches.  If you want accountability, provide structures for it (like &lt;a href="http://www.satisfice.com/sbtm"&gt;session-based test management&lt;/a&gt;), and then require accountability.  If you find that your testers aren't sufficiently skilled, train them and mentor them.  (And if you don't know how to do that rapidly and effectively, &lt;a href="http://www.developsense.com/courses.html"&gt;we can help you&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;If there's something you don't like about the results you're getting, &lt;span style="font-style: italic;"&gt;manage&lt;/span&gt;: observe what's going on in your system of testing, and put in a control action where you want to change something.  If you want to know what's going on in a tester's head, observe her directly and interview her as she's testing; have her pair with another tester or a  test lead; critique her notes; debrief her and coach her, until you get the results that you seek.  If you want to supercharge the efficiency of your testers, work with the programmers and their managers to focus on &lt;a href="http://www.satisfice.com/tools/testable.pdf"&gt;testability&lt;/a&gt;, with special attention paid to scriptable interfaces, &lt;a href="http://www.satisfice.com/blog/archives/401"&gt;logging&lt;/a&gt;, and at least some programmer testing.  (It might help to identify the information-hiding and feedback-loop-lengthening costs of the absence of testability).  If you find individual debriefs taking too long, or if you want to share information more broadly within the test team, try group debriefs at the end of one day or the beginning of the next.  If you want to add features to the reporting protocol, add them; if you want to drop them, drop them.  Experiment, re-evaluate, and tune your testing as you see fit.&lt;br /&gt;&lt;br /&gt;And if you have a more manageable and accountable approach than this for fostering the discovery of important problems in the product, &lt;span style="font-style: italic;"&gt;please&lt;/span&gt; let us know (&lt;a href="mailto:michael@developsense.com"&gt;me&lt;/a&gt;, or &lt;a href="mailto:james@satisfice.com"&gt;James&lt;/a&gt;, or &lt;a href="mailto:jbtestpilot@hotmail.com"&gt;Jon&lt;/a&gt;).  We'd &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; like to hear about it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-3679147989308873184?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2010/01/exploratory-testing-is-accountable.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-2837636033012933075</guid><pubDate>Sun, 17 Jan 2010 07:18:00 +0000</pubDate><atom:updated>2010-01-17T02:47:37.860-05:00</atom:updated><title>Disposable Time</title><description>In our &lt;a href="http://www.developsense.com/courses.html"&gt;Rapid Testing class&lt;/a&gt;, &lt;a href="http://www.satisfice.com"&gt;James Bach&lt;/a&gt; and I like to talk about an underappreciated tester resource:  &lt;i&gt;disposable time&lt;/i&gt;. Disposable time is &lt;i&gt;the time that you can afford to waste without getting into trouble&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Now, we want to be careful about what we mean by "waste", here.  It's not that you want to &lt;i&gt;waste&lt;/i&gt; the time.  You probably want to spend it wisely.  It's just that you won't suffer harm if you do happen to waste it.  Disposable time is to your working hours what disposable income is to your total personal income.  (In fact, even that's not quite correct, strictly speaking; we actually mean discretionary income:  the money that's left over after you've paid for all of the things that you &lt;i&gt;must&lt;/i&gt; pay for—food, shelter, basic clothing, medical, and tax expenses.  The money that people call &lt;i&gt;disposable income&lt;/i&gt; is more properly called &lt;a href="http://en.wikipedia.org/wiki/Disposable_income"&gt;&lt;i&gt;discretionary income&lt;/i&gt;&lt;/a&gt;; as Wikipedia says, "the amount of 'play money' left to spend or save."  Oh well.  We'll go with the incorrect but popular interpretation of "disposable" here. )&lt;br /&gt;&lt;br /&gt;You're never being scrutinized every minute of every day.  Practically everyone has a few moments when no one important is watching.  In that time, you might&lt;br /&gt;&lt;br /&gt;&lt;li&gt;try a tiny test that hasn't been prescribed.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;try putting in a risky value instead of a safe value. &lt;/li&gt;&lt;br /&gt;&lt;li&gt;pretend to change your mind, or to make a mistake, and go back a step or two; users make mistakes, and error handling and recovery are often the most vulnerable parts of the program.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;take a couple of moments to glance at some background information relevant to the work that you're doing.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;write in your journal.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;see if any of your colleagues in technical support have a hot issue that can inform some test ideas.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;steal a couple of moments to write a tiny, simple program that will save you some time; use the saved time and the learning to extend your programming skills so that you can solve increasingly complex programming problems.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;spend an extra couple of minutes at the end of a coffee break befriending the network support people.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;sketch a workflow diagram for your product, and at some point show it to an expert, and ask if you've got it right.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;snoop around in the support logs for the product.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;add a few more lines to  a spreadsheet of data values&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;help someone else solve a problem that they're having.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;chat with a programmer about some aspect of the technology.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;even if you do nothing else, at least pause and look around the screen as you're testing.  Take a moment or two to recognize a new risk and write down a new question or a new test idea.  Report on that idea later on; ask your test lead, your manager, or a programmer, or a product owner if it's a risk worth investigating.  Hang on to your notes.  When someone asks "Why didn't you find that bug," you may have an answer for them.&lt;/li&gt;&lt;br /&gt;If it turns out that you've made a bad investment, &lt;i&gt;oh well&lt;/i&gt;.  By defintion, however large or small the period, disposable is time that you can afford to blow without suffering consequences.&lt;br /&gt;&lt;br /&gt;On the other hand, you may have made a &lt;span style="font-style: italic;"&gt;good&lt;/span&gt; investment.  You may have found a bug, or recognized a new risk, or learned something important, or helped someone out of a jam, or built on a professional relationship, or surprised and impressed your manager.  You may have done all of these things at once.  Even if you feel like you've wasted your time, you've probably learned enough to insulate yourself from wasting &lt;span style="font-style: italic;"&gt;more&lt;/span&gt; time in the same way.  When you discover that an alley is blind, you're unlikely to return there when there are other things to explore.&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://www.amazon.com/Black-Swan-Impact-Highly-Improbable/dp/1400063515"&gt;The Black Swan&lt;/a&gt;, Nassim Nicholas Taleb proposes an investment strategy wherein you put the vast bulk of your money, your nest egg, in very safe securities. You then invest a small amount—an amount that you can afford to lose—in very speculative bets that have a chance of providing a spectacular return.  He call that very improbable high-return event a positive Black Swan.  Your nest egg is like the part of your job that you must accomplish.  Disposable time is like your Black Swan fund; you may lost it all, but you have a shot at a big payoff.  But there's an important difference, too:  since learning is an almost inevitable product of using your disposable time, there's almost always some modest positive outcome.&lt;br /&gt;&lt;br /&gt;We encourage test managers to allow disposable time explicitly for their testers.  As an example, Google provides its staff with &lt;a href="http://en.wikipedia.org/wiki/Google#Innovation_Time_Off"&gt;Innovation Time Off&lt;/a&gt;.  Engineers are encouraged to spend 20% of their time pursuing projects that interest them.  That sounds like a waste, until one learns that Google projects like  &lt;a bitly="BITLY_PROCESSED" href="http://en.wikipedia.org/wiki/Gmail" title="Gmail"&gt;Gmail&lt;/a&gt;, &lt;a bitly="BITLY_PROCESSED" href="http://en.wikipedia.org/wiki/Google_News" title="Google News"&gt;Google News&lt;/a&gt;, &lt;a bitly="BITLY_PROCESSED" href="http://en.wikipedia.org/wiki/Orkut" title="Orkut"&gt;Orkut&lt;/a&gt;, and &lt;a bitly="BITLY_PROCESSED" href="http://en.wikipedia.org/wiki/AdSense" title="AdSense"&gt;AdSense&lt;/a&gt; came of these investments.&lt;br /&gt;&lt;br /&gt;What Google may not know is that even within the other 80% of the time that's ostensibly on mission, people still have, and are still using, &lt;span style="font-style: italic;"&gt;non-explicit&lt;/span&gt; disposable time.  People have that almost everywhere, whether they have explicit disposable time or not.&lt;br /&gt;&lt;br /&gt;If you're working in an environment where you're being watched so closely that none of this is possible, and where you're punished for learning or seeking problems, my advice is to make sure that slavery has been abolished in your jurisdiction.  Then find a job where your testing skills are valued and your managers aren't wasting their time by watching your work instead of doing theirs.   But when you've got a few moments to fill, fill them and learn something!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-2837636033012933075?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2010/01/disposable-time.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-581121006228553109</guid><pubDate>Fri, 08 Jan 2010 00:11:00 +0000</pubDate><atom:updated>2010-01-08T10:51:31.213-05:00</atom:updated><title>Defect Detection Efficiency:  An Evaluation of a Research Study</title><description>Over the last several months, B.J. Rollison has been delivering presentations and writing articles and blog posts in which he cites a paper &lt;i&gt;Defect Detection Efficiency: Test Case Based vs. Exploratory Testing&lt;/i&gt;,&lt;span style="font-family: TimesNewRomanPSMT;"&gt; [DDE2007]&lt;/span&gt; by &lt;span style="font-family: TimesNewRomanPSMT;"&gt;Juha Itkonen, Mika V. Mäntylä and Casper Lassenius (First International Symposium on Empirical Software Engineering and Measurement, pp. 61-70; the paper can be found &lt;a bitly="BITLY_PROCESSED" href="http://www.soberit.hut.fi/jitkonen/Publications/Itkonen_M%C3%A4ntyl%C3%A4_Lassenius_2007_ESEM.pdf"&gt;here&lt;/a&gt;).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: TimesNewRomanPSMT;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;I appreciate the authors’ intentions in examining the efficiency of exploratory testing.&amp;nbsp; That said, the study and the paper that describes it have some pretty serious problems.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Some Background on Exploratory Testing&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;It is common for people writing about exploratory testing to consider it a technique, rather than an approach. “Exploratory” and “scripted” are opposite poles on a continuum. At one pole, exploratory testing integrates test design, test execution, result interpretation, and learning into a single person at the same time.&amp;nbsp; At the other, scripted testing separates test design and test execution by time, and typically (although not always) by tester, and mediates information about the designer’s intentions by way of a document or a program.As James Bach has recently pointed out, the exploratory and scripted poles are like “hot” and “cold”.&amp;nbsp; Just as there can be warmer or cooler water, there are intermediate gradations to testing approaches. The extent to which an approach is exploratory is the extent to which the tester, rather than the script, is in immediate control of the activity.&amp;nbsp; A strongly scripted approach is one in which ideas from someone else, or ideas from some point in the past, govern the tester’s actions. Test execution can be &lt;i&gt;very scripted&lt;/i&gt;, as when the tester is given an explicit set of steps to follow and observations to make; &lt;i&gt;somewhat scripted&lt;/i&gt;, as when the tester is given explicit instruction but is welcome or encouraged to deviate from it; or &lt;i&gt;very exploratory&lt;/i&gt;, in which the tester is given a mission or charter, and is mandated to use whatever information and ideas are available, even those that have been discovered in the present moment.&lt;br /&gt;&lt;br /&gt;Yet the approaches can be blended.&amp;nbsp; James points out that the distinguishing attribute in exploratory and scripted approaches is the presence or absence of loops.&amp;nbsp; The most extreme scripted testing would follow a strictly linear approach; design would be done at the beginning of the project; design would be followed by execution; tests would be performed in a prescribed order; later cycles of testing would use exactly the same tests for regression&lt;br /&gt;&lt;br /&gt;Let's get more realistic, though.&amp;nbsp; Consider a tester with a list of tests to perform, each using a data-focused automated script to address a particular test idea.&amp;nbsp; A tester using a highly scripted approach would run that script, observe and record the result, and move on to the next test.&amp;nbsp; A tester using a more exploratory approach would use the list as a point of departure, but upon observing an interesting result might choose to perform a different test from the next one on the list; to alter the data and re-run the test; to modify the automated script; or to abandon that list of tests in favour of another one.&amp;nbsp; That is, the tester's actions in the moment would not be &lt;i&gt;directed&lt;/i&gt; by earlier ideas, but would be &lt;i&gt;informed&lt;/i&gt; by them. Scripted approaches set out the ideas in advance, and when new information arrives, there's a longer loop between discovery and the incorporation of that new information into the testing cycle.&amp;nbsp; The more exploratory the approach, the shorter the loop.&amp;nbsp; Exploratory approaches do not preclude the use of prepared test ideas, although both James and I would argue that our craft, in general, places excessive emphasis on test cases and focusing techniques at the expense of more general heuristics and defocusing techniques.&lt;br /&gt;&lt;br /&gt;The point of all this is that neither exploratory testing nor scripted approaches are testing techniques, nor bodies of testing techniques.&amp;nbsp; They're approaches that can be applied to any testing technique.&lt;br /&gt;&lt;br /&gt;To be fair to the authors of [DDE2007], since publication of their paper there has been ongoing progress in the way that many people—in particular Cem Kaner, James Bach, and I—articulate these ideas, but the fundamental notions haven’t changed significantly.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Literature Review&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;While the authors do cite several papers on testing and test design techniques, they do not cite some of the more important and relevant publications on the exploratory side. &amp;nbsp;Examples of such literature include “&lt;a bitly="BITLY_PROCESSED" href="http://www.kaner.com/pdfs/performanceMeasure2006.pdf"&gt;Measuring the Effectiveness of Software Testers&lt;/a&gt;” (Kaner, 2003; slightly updated in 2006); and “&lt;a bitly="BITLY_PROCESSED" href="http://www.kaner.com/pdfs/metrics2004.pdf"&gt;Software engineering metrics: What do they measure and how do we know?&lt;/a&gt;” (Kaner &amp;amp; Bond, 2004); and "&lt;a bitly="BITLY_PROCESSED" href="http://www.kaner.com/pdfs/Top5SEissues.pdf"&gt;Inefficiency and Ineffectiveness of Software Testing: A Key Problem in Software Engineering&lt;/a&gt;” (Kaner 2006; to be fair to the authors, this paper may have been published too late to inform [DDE2007]), &amp;nbsp;&lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/tools/procedure.pdf"&gt;General Functionality and Stability Test Procedure (for Microsoft Windows 2000 Application Certification)&lt;/a&gt; (Bach, 2000); &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/tools/satisfice-tsm-4p.pdf"&gt;Satisfice Heuristic Test Strategy Model &lt;/a&gt;(Bach, 2000); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.ca/How-Break-Software-Practical-Testing/dp/0201796198"&gt;How To Break Software&lt;/a&gt; (Whittaker, 2002).&lt;br /&gt;&lt;br /&gt;The authors of [DDE2007] appear also to have omitted literature on the subject of exploration and its role in learning. Yet there is significant material on the subject, in both popular and more academic literature.&amp;nbsp; Examples here include &lt;i&gt;&lt;a href="http://bit.ly/2QMudl" rel="http://bit.ly/plugins/iframe?hashUrl=http%3A%2F%2Fbit.ly%2F2QMudl"&gt;Collaborative Discovery in a Scientific Domain&lt;/a&gt;&lt;/i&gt; (Okada and Simon; note that the subjects are testing software);&lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.ca/Exploring-Science-Cognition-Development-Discovery/dp/0262611767"&gt;&lt;i&gt; Exploring Science: The Cognition and Development of Discovery Processes&lt;/i&gt;&lt;/a&gt; (David Klahr and Herbert Simon); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.ca/Plans-Situated-Actions-Human-Machine-Communication/dp/0521337399"&gt;&lt;i&gt;Plans and Situated Actions&lt;/i&gt;&lt;/a&gt; (Lucy Suchman); &lt;a bitly="BITLY_PROCESSED" href="http://openlibrary.org/b/OL21330115M/Play_as_exploratory_learning"&gt;&lt;i&gt;Play as Exploratory Learning&lt;/i&gt;&lt;/a&gt; (Mary Reilly); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/How-Solve-Mathematical-Princeton-Science/dp/069111966X"&gt;&lt;i&gt;How to Solve It&lt;/i&gt;&lt;/a&gt; (George Polya); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Simple-Heuristics-That-Make-Smart/dp/0195143817"&gt;&lt;i&gt;Simple Heuristics That Make Us Smart&lt;/i&gt;&lt;/a&gt; (Gerg Gigerenzer); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Sensemaking-Organizations-Foundations-Organizational-Science/dp/080397177X"&gt;&lt;i&gt;Sensemaking in Organizations&lt;/i&gt;&lt;/a&gt; (Karl Weick); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Cognition-Bradford-Books-Edwin-Hutchins/dp/0262581469"&gt;&lt;i&gt;Cognition in the Wild&lt;/i&gt;&lt;/a&gt; (Edward Hutchins); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Social-Life-Information-Seely-Brown/dp/0875847625"&gt;&lt;i&gt;The Social Life of Information&lt;/i&gt;&lt;/a&gt; (Paul Duguid and John Seely Brown); &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Sciences-Artificial-Herbert-Simon/dp/0262691914"&gt;&lt;i&gt;Sciences of the Artificial&lt;/i&gt;&lt;/a&gt; (Herbert Simon); all the way back to &lt;i&gt;&lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/System-Logic-Ratiocinative-Inductive/dp/111720992X"&gt;A System of Logic, Ratiocinative and Inductive&lt;/a&gt;&lt;/i&gt; (John Stuart Mill, 1843).&lt;br /&gt;&lt;br /&gt;These omissions are reflected in the study and the analysis of the experiment, and that leads to a common problem in such studies: heuristics and other important cognitive structures in exploration are treated as mysterious and unknowable.&amp;nbsp; For example, the authors say, “For the exploratory testing sessions we cannot determine if the subjects used the same testing principles that they used for designing the documented test cases or if they explored the functionality in pure ad-hoc manner. For this reason it is safer to assume the ad-hoc manner to hold true.” &amp;nbsp;[DDE2007, p. 69]&amp;nbsp; Why assume?&amp;nbsp; At the very least, one could at least observe the subjects and debrief them, asking about their approaches.&amp;nbsp; In fact, this is exactly the role that the test lead fulfills in the practice of skilled exploratory testing.&amp;nbsp; And why describe the principles only as "ad-hoc"?&amp;nbsp; It's not like the principles can't be articulated. I talk about oracle heuristics in &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/articles/2005-01-TestingWithoutAMap.pdf"&gt;this article&lt;/a&gt;, and talk about stopping heuristics &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2009/09/when-do-we-stop-test.html"&gt;here&lt;/a&gt;; Kaner's &lt;a bitly="BITLY_PROCESSED" href="http://www.testingeducation.org/BBST/index.html"&gt;Black Box Software Testing&lt;/a&gt; course talks about test design heuristics; &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/testmethod.shtml"&gt;James Bach&lt;/a&gt;'s work talks about test strategy heuristics (especially &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/tools/satisfice-tsm-4p.pdf"&gt;here&lt;/a&gt;); James Whittaker's books talk about heuristics for finding vulnerabilities...&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Tester Experience&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;The study was performed using testers who were, in the main, novices.&amp;nbsp; “27 subjects had no previous experience in software engineering and 63 &lt;i&gt;had no previous experience in testing&lt;/i&gt;. 8 subjects had one year and 4 subjects had two years testing experience. &lt;i&gt;Only four subjects reported having some sort of training in software testing&lt;/i&gt; prior to taking the course.”&amp;nbsp; ([DDE2007], p. 65 my emphasis)&amp;nbsp; Testing—especially testing using an exploratory approach—is a complex cognitive activity. &amp;nbsp;If one were to perform a study on novice jugglers, one would likely find that they drop an approximately equal number of objects, whether they were juggling balls or knives.&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Tester Training&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;The paper notes that “subjects were trained to use the test case design techniques before the experiment.” However, the paper does not make note of any specific training in heuristics or exploratory approaches.&amp;nbsp; That might not be surprising in light of the weaknesses on the exploratory side of the literature review.&amp;nbsp; My experience, that of James Bach, and anecdotal reports from our clients suggests that even a brief training session can greatly increase the effectiveness of an exploratory approach.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Cycles of Testing&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;Testing happens in cycles. &amp;nbsp;In a strongly scripted testing, the process tends to the linear.&amp;nbsp; All tests are designed up front; then those tests are executed; then testing for that area is deemed to be done. &amp;nbsp;In subsequent cycles, the intention is to repeat the original tests to make sure that bugs are fixed to check for regression.&amp;nbsp; By contrast, exploratory testing is an organic and iterative process. &amp;nbsp;In an exploratory approach, the same area might be visited several times, such that learning from early “reconnaissance” sessions informs further exploration in subsequent “deep coverage” sessions.&amp;nbsp; The learning from those (and from ideas about bugs that have been found and fixed) informs “wrap-up sessions”, in which tests may be repeated, varied, or cut from new cloth.&amp;nbsp; No allowance is made for information and learning obtained during one round of testing to inform later rounds.&amp;nbsp; Yet such information and learning is typically of great value.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Quantitative vs. Qualitative Analysis&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;In the study, there is a great deal of emphasis placed on quantifying results, on experimental and on mathematical rigour.&amp;nbsp; However, such rigour may be misplaced when the products of testing are qualitative, rather than quantitative.&lt;br /&gt;&lt;br /&gt;Finding bugs is important, finding many bugs is important, and finding important bugs is especially important. Yet bugs and bug reports are by no means the only products of testing.&amp;nbsp; The study largely ignores the other forms of information that testing may provide.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The tester might learn something about test design, and feed that learning into her approach toward test execution, or vice versa. The value of that learning might be realized immediately (as in an exploratory approach) or over time (as in a scripted approach).&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The tester, upon executing a test, might recognize a new risk or missing coverage. That recognition might inform ideas about the design and choices of subsequent tests.&amp;nbsp; In a scripted approach, that’s a relatively long loop.&amp;nbsp; In an exploratory approach, upon noticing a new risk, the tester might choose to note findings for later on.&amp;nbsp; On the other hand, the discovery could be cashed immediately:&amp;nbsp; she&amp;nbsp; might choose to repeat the test, she might perform a variation on the same test, or might alter her strategy to follow a different line of investigation. &amp;nbsp;Compared to a scripted approach, the feedback loop between discovery and subsequent action is far shorter. &amp;nbsp;The study ignores the length of the feedback loops.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;In addition to discovering bugs that threaten the value of the product, the tester might discover &lt;i&gt;issues&lt;/i&gt;—problems that threaten the value of the testing effort or the development project overall.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The tester who takes an exploratory approach may choose to investigate a bug or an issue that she has found.&amp;nbsp; This may reduce the total bug count, but in some contexts may be very important to the tester’s client. &amp;nbsp;In such cases, the quality of the investigation, rather than the number of bugs found, would be important.&lt;/li&gt;&lt;/ul&gt;More work products from testing can be found &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/blog/wp-content/uploads/2009/10/et-dynamics22.pdf%20"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;“Efficiency” vs. “Effectiveness”&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;The study takes a very parsimonious view of “efficiency”, and further confuses “efficiency” with “effectiveness”.&amp;nbsp; Two tests are equally &lt;i&gt;effective&lt;/i&gt; if they produce the same effects. The discovery of a bug is certainly an important effect of a test.&amp;nbsp; Yet there are other important effects too, as noted above, but they're not considered in the study.&lt;br /&gt;&lt;br /&gt;However, even if we decide that bug-finding is the only worthwhile effect of a test, two equally &lt;i&gt;effective&lt;/i&gt; tests might not be equally &lt;i&gt;efficient&lt;/i&gt;. &amp;nbsp;I would argue that efficiency is a relationship between effectiveness and cost.&amp;nbsp; An activity is more efficient if it has the same effectiveness at lower cost in terms of time, money, or resources.&amp;nbsp; This leads to what is by far the most serious problem in the paper...&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Script Preparation Time Is Ignored&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;i&gt;The authors’ evaluation of “efficiency” leaves out the preparation time for the scripted tests!&amp;nbsp; &lt;/i&gt;The paper says that the exploratory testing sessions took 90 minutes for design, preparation, and execution. The preparation for the scripted tests took &lt;i&gt;seven hours&lt;/i&gt;, where the scripted test execution sessions took 90 minutes, for a total of 8.5 hours.&amp;nbsp; This fact is not highlighted; indeed, it is not mentioned until the eighth of ten pages. (page 68).&amp;nbsp; In journalism, that would be called &lt;i&gt;burying the lead&lt;/i&gt;.&amp;nbsp; In terms of bug-finding alone, the authors suggest that the results were of equivalent effectiveness,&lt;i&gt; yet the scripted approach took, in total, 5.6 times longer than the exploratory approach. &lt;/i&gt;What other problems could the exploratory testing approaches find given &lt;i&gt;seven additional hours&lt;/i&gt;?&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;Conclusions&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;The authors offer these four conclusions at the end of the paper:&lt;br /&gt;&lt;br /&gt;“First, we identify a lack of research on manual test execution from other than the test case design point of view. It is obvious that focusing only on test case design techniques does not cover many important aspects that affect manual testing. Second, our data showed no benefit in terms of defect detection efficiency of using predesigned test cases in comparison to an exploratory testing approach. Third, there appears to be no big differences in the detected defect types, severities, and in detection difficulty. Fourth, our data indicates that test case based testing produces more false defect reports.”&lt;br /&gt;&lt;br /&gt;I would offer to add a few other conclusions.&amp;nbsp; The first is from the authors themselves, but is buried on page 68:&amp;nbsp; “Based on the results of this study, we can conclude that an exploratory approach could be efficient, especially considering the average 7 hours of effort the subjects used for test case design activities.”&amp;nbsp; Or, put another way,&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;During test execution&lt;/i&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;unskilled &lt;/i&gt;testers found the same number of problems, irrespective of the approach that they took, but&lt;/li&gt;&lt;li&gt;preparation of scripted tests increased testing time &lt;i&gt;approximately by a factor of five&lt;/i&gt;&lt;/li&gt;&lt;li&gt;and appeared to add no significant value.&lt;/li&gt;&lt;/ul&gt;Now:&amp;nbsp; as much as I would like to cite this study as a significant win for exploratory testing, I can’t. &amp;nbsp;There are too many problems with it.&amp;nbsp; There’s not much value in comparing two approaches when those approaches are taken by unskilled and untrained people.&amp;nbsp; The study is heavy on data but light on information. There are no details about the bugs that were found and missed using each approach.&amp;nbsp; There’s no description of the testers’ activities or thought processes; just the output numbers.&amp;nbsp; There is the potential for interesting, rich stories on which bugs were found and which bugs were missed by which approaches, but such stories are absent from the paper.&amp;nbsp; Testing is a qualitative evaluation of a product; this study is a &lt;i&gt;quantitative&lt;/i&gt; evaluation of testing.&amp;nbsp; Valuable information is lost thereby.&lt;br /&gt;&lt;br /&gt;The authors say, “We could not analyze how good test case designers our subjects were and how much the quality of the test cases affected the results and how much the actual test execution aproach.” &amp;nbsp;Actually, they could have analyzed that. &amp;nbsp;It’s just that they didn’t.&amp;nbsp; Pity.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-581121006228553109?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2010/01/defect-detection-efficiency-evaluation.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>6</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-7332628846818689355</guid><pubDate>Sat, 26 Dec 2009 20:36:00 +0000</pubDate><atom:updated>2010-01-08T16:58:33.782-05:00</atom:updated><title>Handling an Overstructured Mission</title><description>Excellent testers recognize that excellent testing is not merely a process of confirmation, verification, and validation.&amp;nbsp; Excellent testing is a process of exploration,discovery, investigation, and learning.&lt;br /&gt;&lt;br /&gt;A correspondent that I consider to be an excellent tester (let's call him Al) works in an environment where he is obliged by his managers to execute overly structured, highly confirmatory scripted tests. Al wrote to me recently, saying that he now realizes why that's frustrating for him:&amp;nbsp; every time he runs through a scripted test, he gets five new ideas that he wants to act upon. I think that's a wonderful thing, but when he acts on those ideas and fulfills his &lt;i&gt;implicit &lt;/i&gt;mission (finding important problems in the product), it diverts him from his &lt;i&gt;explicit&lt;/i&gt; mission (to complete some number of scripted tests per day), and he gets heat from his manager about that.&amp;nbsp; At the end of a couple of days, the manager wants to know why Al is behind schedule—even if Al has revealed important problems along the way—because the manager is focused on test effort in terms of test cases completed, rather than test ideas explored.&lt;br /&gt;&lt;br /&gt;I suggested to Al (as I suggest to you, if you're in that kind of situation) a workaround:&amp;nbsp; don't &lt;i&gt;act on &lt;/i&gt;the new test ideas; but do &lt;i&gt;note&lt;/i&gt; them.&amp;nbsp; Jot them down in handwritten notes or a text file, and especially note your motivation for them—ideas about risk, coverage, oracles, strategies, and the like. Tell your test manager or test lead that you didn't run tests associated with those ideas, and then ask, "Are you okay with us NOT running them?"&lt;br /&gt;&lt;br /&gt;In addition, check in with your manager more often than once every two days. Deliver a report, including new ideas, at one- to two-hour intervals.&amp;nbsp; If direct personal contact isn't available, try instant messages or email. If those don't work, batch them, but note the time at which you started and/or stopped a burst of testing activity.&lt;br /&gt;&lt;br /&gt;Al was excited about that.&amp;nbsp; "Wow!" he said.&amp;nbsp; "That also means defects arising from the new ideas are noted down. Currently, my management is under the impression that test cases are the things that reveal problems, but it's my acting on my test ideas that really reveals the problems."&amp;nbsp; He also noted, "There's another bad that comes from that.&amp;nbsp; If the test cases don't reveal problems, we take the problems that we've found and create a test case for them so that those problems aren't missed next time."&amp;nbsp; I've seen that happen a lot, too.&amp;nbsp; On the face of it, it doesn't sound like a bad idea—except that specific problems that are fixed and verified tend to remain fixed.&amp;nbsp; Repeating those tests is an opportunity cost against new tests that would reveal previously undiscovered problems.&lt;br /&gt;&lt;br /&gt;So:&amp;nbsp; the idea here is to make certain aspects of our work visible.&amp;nbsp; Scripted test cases often reveal problems as those cases are developed.&amp;nbsp; When those problems get fixed, the script loses its power.&amp;nbsp; Thus variation on the script, rather than following the script rigourously, tends to reveal the actual problem.&amp;nbsp; However, unless we're clear that this is happening, managers will mistakenly give credit to the wrong thing—namely, the script—rather than to the mindset and the skill set of the tester.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-7332628846818689355?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/12/handling-overstructured-mission.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-8156895598208404745</guid><pubDate>Fri, 18 Dec 2009 15:36:00 +0000</pubDate><atom:updated>2009-12-18T10:36:40.120-05:00</atom:updated><title>Selena Delesie on Exploratory Test Chartering</title><description>A little while ago, I mentioned that I'd be writing more about &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/sbtm"&gt;session-based test management&lt;/a&gt; (SBTM).  For me, one thing that's great about having a community of students and colleagues is that they can save me lots of time and work.&lt;br /&gt;&lt;br /&gt;&lt;a bitly="BITLY_PROCESSED" href="http://www.selenadelesie.com/"&gt;Selena Delesie&lt;/a&gt; took the &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/courses.shtml"&gt;Rapid Software Testing course&lt;/a&gt; from me a few years back (that is, she was a student).  Since then, she has taken Rapid Testing and its practices, including SBTM, and made them her own.  This is exactly what James Bach and I aim for.&amp;nbsp; We want to help testers, test leads, and managers realize the the most important factor in excellent testing, bar none, is the mindset and the skill set of the individual tester.&amp;nbsp; This means taking the ideas in the course and internalizing them, adopting them, developing them, experimenting with them, altering them to fit your context.&amp;nbsp; We get people started by making them feel powerful, mostly by helping them to recognize the power and skills that they already have. Then, after the class, they can feel confident in doing the heavy lifting on their own. Selena is by no means our only student who has done that, but she's a paradigmatic example of what's possible.&lt;br /&gt;&lt;br /&gt;&lt;a bitly="BITLY_PROCESSED" href="http://selenadelesie.com/2009/12/16/be-an-explorer/"&gt;This post&lt;/a&gt; from her blog is a nice account of her appreciation of exploratory testing and of her career growth. That on its own would be good enough, but she's now blogged &lt;a bitly="BITLY_PROCESSED" href="http://selenadelesie.com/2009/12/18/charting-a-course/"&gt;a post on chartering sessions&lt;/a&gt;, and it's &lt;i&gt;excellent&lt;/i&gt;.&amp;nbsp; It identifies some of the common traps and misconceptions about chartering, and provides some sharp advice on how to avoid them. It talks not merely about how to charter, but how to do it in a way that affords the tester the freedom &lt;i&gt;and&lt;/i&gt; responsibility to do his or her best work. Highest recommendation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-8156895598208404745?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/12/selena-delesie-on-exploratory-test.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-8787510676465505787</guid><pubDate>Mon, 14 Dec 2009 05:19:00 +0000</pubDate><atom:updated>2009-12-14T21:30:55.608-05:00</atom:updated><title>Structures of Exploratory Testing:  Resources</title><description>In a Webinar that he did for uTest on December 10, James Whittaker mused aloud about what a great idea it would be to structure exploratory testing and capture ideas about it in a repository for sharing with others.  It seems to me that one ideal version of that would take the form of a bibliography in a book about exploratory testing, but apparently that's not available.  Yet I digress.&lt;br /&gt;&lt;br /&gt;The fact is, people have been doing exactly that for &lt;i&gt;years&lt;/i&gt;.  And I do like the idea of having a repository and sharing, so here's a survey of some exploratory testing structures and some writing about them that I hope people will find helpful.  There are some excellent books out there, but for now, these ones are all online and free.  Expect updates.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Evolving Work Products, Skills and Tactics, ET Polarities, and Test Strategy.&lt;/b&gt;&lt;/i&gt;  James Bach, Jon Bach, and I authored the latest version of the &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/blog/wp-content/uploads/2009/10/et-dynamics22.pdf"&gt;Exploratory Skills and Dynamics&lt;/a&gt; list.  This is a kind of evolving master list of exploratory testing structures.  James describes it &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/blog/archives/370"&gt;here&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Oracles.&lt;/i&gt;&lt;/b&gt;  The &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/articles/2005-01-TestingWithoutAMap.pdf"&gt;HICCUPPS consistency heuristics&lt;/a&gt;, which James Bach initiated and which I wrote about in this article for Better Software in 2005. (Actually, at the time it was only HICCUPP—History, Image, Comparable Products, Claims, User Expectations, Purpose, Product—but since then we've also added S, for Standards and Statutes.  Mike Kelly also talks about HICCUPP &lt;a bitly="BITLY_PROCESSED" href="http://www.testingreflections.com/node/view/2635"&gt;here&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Test Strategy.&lt;/b&gt;&lt;/i&gt;  James Bach's &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/tools/satisfice-tsm-4p.pdf"&gt;Heuristic Test Strategy Model&lt;/a&gt; isn't restricted to exploratory approaches, but certainly helps to guide and structure them.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Data Type Attacks, Web Tests, Testing Wisdom, Heuristics, and Frameworks. &lt;/i&gt;&lt;/b&gt;Elisabeth Hendrickson's &lt;a bitly="BITLY_PROCESSED" href="http://testobsessed.com/wordpress/wp-content/uploads/2007/02/testheuristicscheatsheetv1.pdf"&gt;Test Heuristics Cheat Sheet&lt;/a&gt; is a rich set of guideword heuristics and helpful reference information.&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Context Factors, Information Objectives.&lt;/b&gt;&lt;/i&gt;  Cem Kaner most recently delivered his &lt;a bitly="BITLY_PROCESSED" href="http://www.kaner.com/pdfs/QAIExploring.pdf"&gt;Tutorial on Exploratory Testing&lt;/a&gt; for the QAI Quest Conference in Chicago, 2008.  There's a similar, but not identical talk &lt;a bitly="BITLY_PROCESSED" href="http://www.kaner.com/pdfs/ETatQAI.pdf"&gt;here&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Quick Tests.&lt;/i&gt;&lt;/b&gt; In our &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/courses.html"&gt;Rapid Software Testing course&lt;/a&gt;, James Bach and I talk about quick tests.  The &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/rst.pdf"&gt;course notes&lt;/a&gt; are available for free.  Fire up Acrobat and search for "Quick Tests".&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Coverage (specific).&lt;/b&gt;&lt;/i&gt; Michael Hunter's &lt;a bitly="BITLY_PROCESSED" href="http://www.ddj.com/blog/debugblog/archives/2007/05/you_are_not_don_31.html"&gt;You Are Not Done Yet&lt;/a&gt; is a detailed set of coverage ideas to help prompt further exploration when you &lt;i&gt;think&lt;/i&gt; you're done.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Coverage (general).&lt;/i&gt;&lt;/b&gt;  James Bach wrote this article in 2001, in which he summarizes test coverage ideas under the mnemonic "&lt;b&gt;S&lt;/b&gt;an &lt;b&gt;F&lt;/b&gt;rancisco &lt;b&gt;D&lt;/b&gt;ep&lt;b&gt;o&lt;/b&gt;t."—Structure, Function, Data, Platform, and Operations.  Several years later, I convinced him to add an element to the list, so now it's "&lt;b&gt;S&lt;/b&gt;an &lt;b&gt;F&lt;/b&gt;rancisco &lt;b&gt;D&lt;/b&gt;ep&lt;b&gt;o&lt;/b&gt;&lt;b&gt;t&lt;/b&gt;.  The last T is for...&lt;i&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/i&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Time.&lt;/b&gt;&lt;/i&gt;  I realized a few years ago that some guideword heuristics might help us to pay attention to the ways in which products related to time, and vice versa.  That turned into a Better Software article called &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/articles/2006-05-TimeForNewTestIdeas.pdf"&gt;&lt;i&gt;"Time for New Test Ideas"&lt;/i&gt;&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Tours.&lt;/b&gt;&lt;/i&gt;  Mike Kelly's &lt;a bitly="BITLY_PROCESSED" href="http://www.michaeldkelly.com/archives/50"&gt;FCC CUTS VIDS&lt;/a&gt; Touring Heuristics (note the date) provides a set of structured approaches for touring the application.&lt;b&gt;&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Stopping Heuristics.&lt;/i&gt;&lt;/b&gt;  There are structures to deciding when to stop a given test, a line of investigation, or a test cycle.  I catalogued them &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2009/09/when-do-we-stop-test.html"&gt;here&lt;/a&gt;, and Cem Kaner made a vital addition &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2009/10/when-do-we-stop-testing-one-more-sure.html"&gt;here&lt;/a&gt;. &lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Accountability, Reporting Progress.&lt;/i&gt;&lt;/b&gt; James and Jon Bach's description of &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/sbtm"&gt;Session-Based Test Management&lt;/a&gt; is a set of structures for making exploratory testing sessions more accountable.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Procedure.&lt;/i&gt;&lt;/b&gt;  The &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/tools/procedure.pdf"&gt;General Functionality and Stability Test Procedure&lt;/a&gt;.  It was designed for Microsoft in the late 1990s by James Bach, and may be the first documented procedure to guide exploratory test execution and investigation.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Emotions.&lt;/i&gt;&lt;/b&gt;  I gave a talk on &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/presentations/2007-10-STARWest-EmotionsAndTestOracles.pdf"&gt;emotions as powerful pointers to test oracles&lt;/a&gt; at STAR West in 2007.  That helped to inspire some ideas about...&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Noticing, Observation.&lt;/i&gt;&lt;/b&gt;  At STAR East 2009, I did &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/presentations/NoticingSTAREast2009.pdf"&gt;a keynote talk on noticing&lt;/a&gt;, which can be important for exploratory test execution.  The talk introduces a number of areas in which we might notice, and some patterns to sharpen noticing.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Leadership.&lt;/i&gt;&lt;/b&gt;  For the 2009 QAI Conference in Bangalore, India, I did a plenary talk in which I noted several important structural similarities between &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/presentations/2009-11-STC9000-ExploratoryTestingLeadership.pdf"&gt;exploratory testing and leadership&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;So, there it is: a repository.  I'll eventually reproduce it as part of the &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/resources.shtml"&gt;resources page&lt;/a&gt; on my Web site.  Feel free to share; comments and suggestions for additions are welcome.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-8787510676465505787?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/12/structures-of-exploratory-testing.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-8761523682076267086</guid><pubDate>Wed, 09 Dec 2009 21:55:00 +0000</pubDate><atom:updated>2009-12-09T16:55:33.864-05:00</atom:updated><title>Best Bug... or Bugs?</title><description>And now for the immodest part of the &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2009/12/eurostars-test-lab-bravo.html"&gt;EuroSTAR 2009 Test Lab&lt;/a&gt; report:&amp;nbsp; I won the Best Bug award, although it's not clear to me which bug got the nod, since I reported several fairly major problems.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;I tested &lt;a bitly="BITLY_PROCESSED" href="http://www.openmedsoftware.org/wiki/Main_Page"&gt;OpenEMR&lt;/a&gt;.&amp;nbsp; For me, one candidate for the most serious problem would have been a consistent pattern of inconsistency in input handling and error checking.&amp;nbsp; I observed over a dozen instances of some kind of sloppiness.&lt;br /&gt;&lt;br /&gt;This reminded me of a problem that we testers often see in project work, the problem of measuring by &lt;i&gt;counting things&lt;/i&gt;—counting bugs, counting bug &lt;i&gt;reports&lt;/i&gt;, counting requirements.&amp;nbsp; When the requirement is to defend the application against overflowing text fields and vulnerability to input constraint attacks by hackers, how should we count?&amp;nbsp; How many mentions of that should there be?&amp;nbsp; One, in a statement of general principles at the beginning of a requirements document?&amp;nbsp; Hundreds, in a statement of specific purpose for each input field in a functional specification?&amp;nbsp; How many requirements are there to make sure that fields don't overflow?&amp;nbsp; How many requirements that they support only the characters, numbers, or date ranges that they're supposed to?&amp;nbsp; What about traceability?&amp;nbsp; If this is a genuine problem, and the requirements documents don't mention a particular requirement explicitly, should we refrain from reporting on a problem with that implicit requirement?&lt;br /&gt;&lt;br /&gt;When I report an issue—for example, that practically all of the input fields in OpenEMR have some kind of problem with them—should that count as one bug report?&amp;nbsp; Since it applies to hundreds of fields, should it count as hundreds of bug reports?&amp;nbsp; When such a pervasive overall problem exists, should the tester make a report for each and every field in which he observes a problem?&amp;nbsp; And if you want to answer Yes, to that question:&amp;nbsp; is it worth the opportunity cost to do that when there are plenty of other problems in the product?&lt;br /&gt;&lt;br /&gt;So again, there were so many instances of unconstrained and unchecked input that I stopped recording specifics and instead reported a general pattern in the bug tracking system.&amp;nbsp; My decision to do this was an instance of the Dead Horse &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2009/09/when-do-we-stop-test.html"&gt;stopping heuristic&lt;/a&gt;; reporting yet another instance of the same class of problem would be like flogging a dead horse.&amp;nbsp; I could have wasted a lot of time and energy reporting each instance of each problem I observed, along with specific symptoms and possible ramifications of each one.&amp;nbsp; Yet I'm very skeptical that this would serve the project well.&amp;nbsp; In my experience as a program manager for a product whose code was being developed outside our company, I found that there was steadily diminishing return in value for many reports of the same ilk.&amp;nbsp; When, in testing, we identified a general pattern of failure, we stopped looking for more instances.&amp;nbsp; We sent the product back to the development shop, and required the programmers and their testers to review the product through-and-through for that kind of problem.&lt;br /&gt;&lt;br /&gt;If I were to be evaluated on the number of bugs that I found, I'd find it hard to resist the easy pickings of yet another input constraint attack bug report.&amp;nbsp; Yet when I'm testing, every moment of bug investigation and reporting is, by some reckoning, another moment that I can't spend on obtaining more test coverage (more about that &lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2009/11/why-is-testing-taking-so-long-part-1.html"&gt;here&lt;/a&gt;).&amp;nbsp; By focusing on investigating and reporting on input problems (and thereby increasing my bug count), am I missing opportunities to design and perform tests on scheduling conflict-resolution algorithms, workflows, database integrity,...?&lt;br /&gt;&lt;br /&gt;There were two other fairly serious problems that I observed.&amp;nbsp; One was that the Chinese version of the product showed a remarkable number of English words, presumably untranslated, interspersed among the ideograms; I expected to see no English at all.&amp;nbsp; I treated that problem in the same way as the input constraint problem:&amp;nbsp; with a single report of a general problem.&lt;br /&gt;&lt;br /&gt;The second serious problem was that searches of various kinds would place a link in the address bar.&amp;nbsp; The link represented a command to a CGI script of some kind, which evidently constructed and forwarded a query to an underlying SQL database.&amp;nbsp; Backspacing over the last digit in the address bar and replacing it with a slash caused a lovely SQL error message to appear on the screen, unhandled by any of OpenEMR's code.&amp;nbsp; The message could have been used, said our local product owner, to expose the structure of the database to snoops or hackers.&amp;nbsp; I found that problem by a defocusing heuristic—looking at the browser, rather than the browser window.&lt;br /&gt;&lt;br /&gt;I don't know which of these problems took Best Bug honours.&amp;nbsp; I'm not sure that the presenters specified which bug they were crediting with Best Bug.&amp;nbsp; That makes a certain kind of sense, since I can't tell which of these problems is the most serious either.&amp;nbsp; After all, a problem isn't its own thing; it's a relationship between a person and a product or a situation.&amp;nbsp; There are plenty of ways to address a problem.&amp;nbsp; You could fix the product or the situation.&amp;nbsp; You could change the perspective or the perception of the person observing the problem, say by keeping the problem as it is but providing a workaround.&amp;nbsp; You could choose to ignore the problem yourself, which underscores the fact that a problem for some person might not be a problem for you.&amp;nbsp; That's why it's not helpful to count problems.&lt;br /&gt;&lt;br /&gt;Managers:&amp;nbsp; do you see how evaluating testers based on test cases or bug counts, rather than the value of reporting, will lead to distortion at best, and more likely to dysfunction?&amp;nbsp; Do you see how providing overstructured test scripts or test cases could reduce the diversity—and therefore the quality—of testing?&amp;nbsp; Do you see how the notion of "one test per requirement" or "one positive and one negative test per requirement" is misleading?&lt;br /&gt;&lt;br /&gt;Testers:&amp;nbsp; do you see how being evaluated on bug counts could lead to inattentional blindness with respect to the more serious problems than the low-hanging fruit affords?&amp;nbsp; Do you see how focusing on bugs, rather than focusing on test coverage, could reduce the value of your testing?&lt;br /&gt;&lt;br /&gt;Instead of counting things, let's consider evaluating testing work in a different way.&amp;nbsp; Let's consider the overall testing story and its many dimensions.&amp;nbsp; Let's think about the story around each&amp;nbsp; bug, and each bug report—not just the number of reports, but the meaning and significance of each one.&amp;nbsp; Let's look at the value of the information to stakeholders, primarily to programmers and to product owners.&amp;nbsp; Let's think about the extent to which the tester makes things easier for others on the team, including other testers.&amp;nbsp; Let's look at the diversity of problems discovered, the diversity of approaches used, and the diversity of tools and techniques applied.&amp;nbsp; And rather than using this information to reward or punish testers, let's use it to guide coaching, mentoring, and training such that the focus is on developing skill for everyone.&lt;br /&gt;&lt;br /&gt;The dimensions above are qualitative, rather than quantitative.&amp;nbsp; Yet if our mission is to provide information to inform decisions about quality, we of all people should recognize that expressing value in terms of numbers often removes important information rather than adding it.&lt;br /&gt;&lt;br /&gt;Additional reading:&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Measuring-Managing-Performance-Organizations-Robert/dp/0932633366"&gt;Measuring and Managing Performance in Organizations&lt;/a&gt; (Robert D. Austin)&lt;br /&gt;&lt;a bitly="BITLY_PROCESSED" href="http://www.kaner.com/pdfs/metrics2004.pdf"&gt;Software Engineering Metrics:&amp;nbsp; What Do They Measure and How Do We Know?&lt;/a&gt; (Kaner and Bond)&lt;br /&gt;&lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Quality-Software-Management-First-Order-Measurement/dp/0932633242"&gt;Quality Software Management, Vol. 2:&amp;nbsp; First Order Measurement&lt;/a&gt; (Weinberg)&lt;br /&gt;&lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Perfect-Software-Other-Illusions-Testing/dp/0932633692"&gt;Perfect Software (and Other Illusions About Testing)&lt;/a&gt; (Weinberg)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-8761523682076267086?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/12/best-bug-or-bugs.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>5</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-947502453185810783</guid><pubDate>Wed, 09 Dec 2009 17:42:00 +0000</pubDate><atom:updated>2009-12-09T12:42:14.933-05:00</atom:updated><title>EuroSTAR's Test Lab:  Bravo!</title><description>One of the coolest things about EuroSTAR 2009 was the test lab set up by &lt;a bitly="BITLY_PROCESSED" href="http://www.workroom-productions.com/"&gt;James Lyndsay&lt;/a&gt; and Bart Knaack.&lt;br /&gt;&lt;br /&gt;James and Bart (who self-identified as Test Lab Rats) provided testers with the opportunity to have a go at two applications, FreeMind (an open-source mind-mapping program) and OpenEMR (an open-source product for tracking medical records).  The Lab Rats did a splendid job of setting things up and providing the services and information that participants needed to get up and running quickly.&lt;br /&gt;&lt;br /&gt;Sponsorship in the form of five laptop computers was provided through the good graces of Steve Green at &lt;a bitly="BITLY_PROCESSED" href="http://www.testpartners.co.uk/"&gt;Test Partners,&lt;/a&gt; Stuart Noakes at &lt;a bitly="BITLY_PROCESSED" href="http://www.tcl-global.com/about.php"&gt;Transition Consulting Ltd&lt;/a&gt;., and Bart Knaack at &lt;a bitly="BITLY_PROCESSED" href="http://www.logica.com/"&gt;Logica&lt;/a&gt;.  James Lyndsay also lent a server and a router to the event.&lt;br /&gt;&lt;br /&gt;Sponsorship was also provided by tool vendors (here in alphabetical order) &lt;a bitly="BITLY_PROCESSED" href="http://www.andagon.de/english/index.html"&gt;Andagon&lt;/a&gt;, &lt;a bitly="BITLY_PROCESSED" href="http://www.microsoft.com/visualstudio/en-us/products/2010/default.mspx"&gt;Microsoft&lt;/a&gt;, &lt;a bitly="BITLY_PROCESSED" href="http://www.microfocus.com/Solutions/TestingASQ/"&gt;MicroFocus&lt;/a&gt;, &lt;a bitly="BITLY_PROCESSED" href="http://www.neotys.com/"&gt;Neotys&lt;/a&gt;, and &lt;a bitly="BITLY_PROCESSED" href="http://www.testingtech.com/"&gt;Testing Technologies&lt;/a&gt;.  These sponsors had their tools installed on the laptops, and presented their demos by applying them to OpenEMR and FreeMind as they were installed in the Test Lab.  On a loose schedule, some of the presenters did talks and demonstrations of how they tested.&lt;br /&gt;&lt;br /&gt;The aforementioned Stuart Noakes and &lt;a bitly="BITLY_PROCESSED" href="http://www.aqis.eu/news.htm"&gt;Mieke Gievers&lt;/a&gt; gave advice and assistance to the Lab Rats.&lt;br /&gt;&lt;br /&gt;Well, that's all very nice, but what was it &lt;i&gt;like&lt;/i&gt;?&lt;br /&gt;&lt;br /&gt;As someone who spent a couple of hours in the lab, exploring the applications and listening in on the presentations, I'd say it was terrific (although the prospect that OpenEMR is being used in actual medical practices seemed faintly alarming). Both applications were sophisticated enough for some reasonably serious testing, and had interesting problems to discover and report.&lt;br /&gt;&lt;br /&gt;Interestingly, none of the certificationists or the standardization folks sat in the lab and &lt;i&gt;tested&lt;/i&gt;, to my knowledge.&lt;br /&gt;&lt;br /&gt;Bravo to James and Bart, to the sponsors, to the conference organizers and to the program committee for putting this together.&amp;nbsp; Let's see more &lt;i&gt;actual testing&lt;/i&gt; at testing conferences!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-947502453185810783?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/12/eurostars-test-lab-bravo.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-2307539223344339965</guid><pubDate>Wed, 25 Nov 2009 18:17:00 +0000</pubDate><atom:updated>2009-11-25T14:07:56.549-05:00</atom:updated><title>Why Is Testing Taking So Long? (Part 2)</title><description>&lt;a bitly="BITLY_PROCESSED" href="http://www.developsense.com/2009/11/why-is-testing-taking-so-long-part-1.html"&gt;Yesterday&lt;/a&gt; I set up a thought experiment in which we divided our day of testing into three 90-minute sessions. I also made a simplifying assumption that bursts of testing activity representing some equivalent amoun of test coverage (I called it a micro-session, or just a "test") take two minutes. Investigating and reporting a bug that we find costs an additional eight minutes, so a test on its own would take two minutes, and a test that found a problem would take ten.&lt;br /&gt;&lt;br /&gt;Yesterday we tested three modules. We found some problems. Today the fixes showed up, so we'll have to verify them.&lt;br /&gt;&lt;br /&gt;Let's assume that a fix verification takes six minutes. (That's yet another gross oversimplification, but it sets things up for our little thought experiment.) We don't just perform the original microsession again; we have to do more than that. We want to make sure that the problem is fixed, but we also want to do a little exploration around the specific case and make sure that the general case is fixed too. &lt;br /&gt;&lt;br /&gt;Well, at least we'll have to do that for Modules B and C. Module A didn't have any fixes, since nothing was broken. And Team A is up to its usual stellar work, so today we can keep testing Team A's module, uninterrupted by either fix verifications or by bugs. We get 45 more micro-sessions in today, for a two-day total of 90.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;table align="left" border="2" cellpadding="3" cellspacing="3" height="129" style="width: 681px;"&gt;&lt;tbody&gt;&lt;tr align="center" valign="middle"&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Module&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Fix Verifications&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Bug Investigation and Reporting&lt;br /&gt;(time spent on tests that find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Test Design and Execution&lt;br /&gt;(time spent on tests that &lt;i&gt;don't&lt;/i&gt; find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;New Tests Today&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Two-Day Total&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr align="center" valign="middle"&gt; &lt;td&gt;A&lt;br /&gt;&lt;/td&gt; &lt;td&gt;0 minutes (no bugs yesterday)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;0 minutes (no bugs found)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;90 minutes (45 tests)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;45&lt;br /&gt;&lt;/td&gt; &lt;td&gt;90&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Team B stayed an hour or so after work yesterday. They fixed the bug that we found, tested the fix, and checked it in. They asked us to verify the fix this afternoon. That costs us six minutes off the top of the session, leaving us 84 more minutes. Yesterday's trends continue; although Team B is very good, they're human, and we find another bug today. The test costs two minutes, and bug investigation and reporting costs eight more, for a total of ten. In the remaining 74 minutes, we have time for 37 micro-sessions. That means a total of 38 new tests today—one that found a problem, and 37 that didn't. Our two-day today for Module B is 79 micro-sessions.&lt;br /&gt;&lt;br /&gt;&lt;table align="left" border="2" cellpadding="3" cellspacing="3" height="129" style="width: 681px;"&gt;&lt;tbody&gt;&lt;tr align="center" valign="middle"&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Module&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Fix Verifications&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Bug Investigation and Reporting&lt;br /&gt;(time spent on tests that find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Test Design and Execution&lt;br /&gt;(time spent on tests that &lt;i&gt;don't&lt;/i&gt; find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;New Tests Today&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Two-Day Total&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td&gt;A&lt;br /&gt;&lt;/td&gt; &lt;td&gt;0 minutes (no bugs yesterday)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;0 minutes (no bugs found)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;90 minutes (45 tests)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;45&lt;br /&gt;&lt;/td&gt; &lt;td&gt;90&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td&gt;B&lt;br /&gt;&lt;/td&gt; &lt;td&gt;6 minutes (1 bug yesterday)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;10 minutes (1 test, 1 bug)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;74 minutes (37 tests)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;38&lt;br /&gt;&lt;/td&gt; &lt;td&gt;79&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Team C stayed late last night. Very late. They felt they had to. Yesterday we found eight bugs, and they decided to stay at work and fix them. (Perhaps this is why their code has so many problems; they don't get enough sleep, and produce more bugs, which means they have to stay late again, which means even less sleep...) In any case, they've delivered us all eight fixes, and we start our session this afternoon by verifying them. Eight fix verifications at six minutes each amounts to 48 minutes. So far as obtaining new coverage goes, today's 90-minute session with Module C is pretty much hosed before it even starts; 48 minutes—more than half of the session—is taken up by fix verifications, right from the get-go. We have 42 minutes left in which to run new micro-sessions, those little two-minute slabs of test time that give us some equivalent measure of coverage. Yesterday's trends continue for Team C too, and we discover four problems that require investigation and reporting. That takes 40 of the remaining 42 minutes. Somewhere in there, we spend two minutes of testing that doesn't find a bug. So today's results look like this:&lt;br /&gt;&lt;br /&gt;&lt;table align="left" border="2" cellpadding="3" cellspacing="3" height="129" style="width: 681px;"&gt;&lt;tbody&gt;&lt;tr align="center" valign="middle"&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Module&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Fix Verifications&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Bug Investigation and Reporting&lt;br /&gt;(time spent on tests that find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Test Design and Execution&lt;br /&gt;(time spent on tests that &lt;i&gt;don't&lt;/i&gt; find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;New Tests Today&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Two-Day Total&lt;/b&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr align="center"&gt; &lt;td&gt;A&lt;br /&gt;&lt;/td&gt; &lt;td&gt;0 minutes (no bugs yesterday)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;0 minutes (no bugs found)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;90 minutes (45 tests)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;45&lt;br /&gt;&lt;/td&gt; &lt;td&gt;90&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr align="center"&gt; &lt;td&gt;B&lt;br /&gt;&lt;/td&gt; &lt;td&gt;6 minutes (1 bug yesterday)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;10 minutes (1 test, 1 bug)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;74 minutes (37 tests)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;38&lt;br /&gt;&lt;/td&gt; &lt;td&gt;79&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr align="center"&gt; &lt;td&gt;C&lt;br /&gt;&lt;/td&gt; &lt;td&gt;48 minutes (8 bugs yesterday)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;40 minutes (4 tests, 4 bugs)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;2 minutes (1 test)&lt;br /&gt;&lt;/td&gt; &lt;td&gt;5&lt;br /&gt;&lt;/td&gt; &lt;td&gt;18&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Over two days, we've been able to obtain only 20% of the test coverage for Module C that we've been able to obtain for Module A.&amp;nbsp; We're still at less than 1/4 of the coverage that we've been able to obtain for Module B.&lt;br /&gt;&lt;br /&gt;Yesterday, we learned one lesson:  &lt;b&gt;&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Lots of bugs means reduced coverage, or slower testing, or both.&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;From today's results, here's a second:  &lt;b&gt;&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Finding bugs today means verifying fixes later, which means even less coverage or even slower testing, or both.&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;So why is testing taking so long? One of the biggest reasons might be this:  &lt;b&gt;&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Testing is taking longer than we might have expected or hoped because, although we've budgeted time for testing, we lumped into it the time for investigating and reporting problems that we didn't expect to find.&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Or, more generally,&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Testing is taking longer than we might have expected or hoped because we have a faulty model of what testing is and how it proceeds.&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;For managers who ask "Why is testing taking so long?", it's often the case that their model of testing doesn't incorporate the influence of things outside the testers' control.&amp;nbsp; Over two days of testing, the difference between the quality of Team A's code and Team C's code has a &lt;i&gt;profound&lt;/i&gt; impact on the amount of uninterrupted test design and execution work we're able to do. The bugs in Module C present interruptions to coverage, such that (in this very simplified model) we're able to spend only &lt;i&gt;one-fifth&lt;/i&gt; of our test time designing and executing tests. After the first day, we were already way behind; after two days, we're even further behind. And even here, we're being optimistic. With a team like Team C, how many of those fixes will be perfect, revealing no further problems and taking no further investigation and reporting time?&lt;br /&gt;&lt;br /&gt;And again, those faulty management models will lead to distortion or dysfunction.&amp;nbsp; If the quality of testing is measured by bugs found, then anyone testing Module C will look great, and people testing Module A will look terrible.&amp;nbsp; But if the quality of testing is evaluated by coverage, then the Module A people will look sensational and the Module C people will be on the firing line.&amp;nbsp; But remember, the differences in results here have &lt;i&gt;nothing&lt;/i&gt; to do with the quality of the testing, and &lt;i&gt;everything&lt;/i&gt; to do with &lt;i&gt;the quality of what is being tested&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;There's a psychological factor at work, too. If our approach to testing is confirmatory, with steps to follow and expected, predicted results, we'll design our testing around the idea that the product should do this, and that it should behave thus and so, and that testing will proceed in a predictable fashion. If that's the case, it's possible—probable, in my view—that we will bias ourselves towards the expected and away from the unexpected. If our approach to testing is exploratory, perhaps we'll start from the presumption that, to a great degree, we don't know what we're going to find. As much as managers, hack statisticians, and process enthusiasts would like to make testing and bug-finding predictable, people don't know how to do that such that the predictions stand up to human variability and the complexity of the world we live in. Plus, if you can predict a problem, why wait for testing to find it?&amp;nbsp; If you can really predict it, do something about it &lt;i&gt;now&lt;/i&gt;.&amp;nbsp; If you don't have the ability to do that, you're just playing with numbers.&lt;br /&gt;&lt;br /&gt;Now: note again that this has been a thought experiment. For simplicity's sake, I've made some significant distortions and left out an enormous amount of what testing is really like in practice.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I've treated testing activities as compartmentalized chunks of two minutes apiece, treading dangerously close to &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/presentations/againsttestcases.pdf"&gt;the unhelpful and misleading model of testing as development and execution of test cases&lt;/a&gt;.&amp;nbsp;&lt;/li&gt;&lt;li&gt;I haven't looked at the role of setup time and its impact on test design and execution.&amp;nbsp;&lt;/li&gt;&lt;li&gt;I haven't looked at the messy reality of having to wait for a product that isn't building properly.&amp;nbsp;&lt;/li&gt;&lt;li&gt;I haven't included the time that testers spend waiting for fixes.&amp;nbsp;&lt;/li&gt;&lt;li&gt;I haven't included the delays associated with bugs that block our ability to test and obtain coverage of the code behind them.&lt;/li&gt;&lt;li&gt; I've deliberately ignored the complexity of the code.&lt;/li&gt;&lt;li&gt;I've left out difficulties in learning about the business domain.&lt;/li&gt;&lt;li&gt;I've made a highly simplistic assumptions about the quality and relevance of the testing and the quality and relevance of the bug reports, the skill of the testers in finding and reporting bugs, and so forth.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;And I've left out the fact that, as important as skill is, luck always plays a role in finding problems.&lt;/li&gt;&lt;/ul&gt;My goal was simply to show this:  &lt;i&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Problems in a product have a huge impact on our ability to obtain test coverage of that product.&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The trouble is that even this fairly simple observation is below the level of visibilty of many managers. Why is it that so many managers fail to notice it?&lt;br /&gt;&lt;br /&gt;One reason, I think, is that they're used to seeing linear processes instead of organic ones, a problem that Jerry Weinberg describes in &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Becoming-Technical-Leader-Problem-Solving-Approach/dp/0932633021"&gt;Becoming a Technical Leader&lt;/a&gt;. Linear models "assume that observers have a perfect understanding of the task," as Jerry says. But software development isn't like that at all, and it can't be. By its nature, software development is about dealing with things that we haven't dealt with before (otherwise there would be no need to develop a new product; we'd just reuse the one we had). We're always dealing with the novel, the uncertain, the untried, and the untested, so our observation is bound to be imperfect. If we fail to recognize that, we won't be able to improve the quality and value of our work.&lt;br /&gt;&lt;br /&gt;What's worse about managers with a linear model of development and testing is that "they filter our innovations that the observer hasn't seen before or doesn't understand" (again, from &lt;i&gt;Becoming a Technical Leader&lt;/i&gt;.) As an antidote for such managers, I'd recommend &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Perfect-Software-Other-Illusions-Testing/dp/0932633692"&gt;Perfect Software, and Other Illusions About Testing&lt;/a&gt; and &lt;a bitly="BITLY_PROCESSED" href="http://www.amazon.com/Lessons-Learned-Software-Testing-Kaner/dp/0471081124"&gt;Lessons Learned in Software Testing&lt;/a&gt; as primers. But mostly I'd suggest that they &lt;i&gt;observe the work of testing&lt;/i&gt;. In order to do that well, they may need some help from us, and that means that we need to observe the work of testing too. So over the next little while, I'll be talking more than usual about &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/sbtm"&gt;Session-Based Test Management&lt;/a&gt;, developed initially by James and Jon Bach, which is a powerful set of ideas, tools and processes that aid in observing and managing testing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-2307539223344339965?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/11/what-does-testing-take-so-long-part-2.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-3161610593071907152</guid><pubDate>Tue, 24 Nov 2009 20:55:00 +0000</pubDate><atom:updated>2009-11-25T14:05:08.139-05:00</atom:updated><title>Why Is Testing Taking So Long? (Part 1)</title><description>If you're a tester, you've probably been asked, "Why is testing taking so long?"  Maybe you've had a ready answer; maybe you haven't.  Here's a model that might help you deal with the kind of manager who asks such questions.&lt;br /&gt;&lt;br /&gt;Let's suppose that we divide our day of testing into three sessions, each session being, on average, 90 minutes of chartered, uninterrupted testing time.  That's four and a half hours of testing, which seems reasonable in an eight-hour day interrupted by meetings, planning sessions, working with programmers, debriefings, training, email, conversations, administrivia of various kinds, lunch time, and breaks.&lt;br /&gt;&lt;br /&gt;The reason that we're testing is that we want to obtain &lt;i&gt;coverage&lt;/i&gt;; that is, we want to ask and answer questions about the product and its elements to the greatest extent that we can.  Asking and answering questions is the process of &lt;i&gt;test design and execution&lt;/i&gt;.  So let's further assume that we break each session into average two-minute micro-sessions, in which we perform some test activity that's focused on a particular testing question, or on evaluating a particular feature.  That means in a 90-minute session, we can theoretically perform 45 of these little micro-sessions, which for the sake of brevity we'll informally call "tests".  Of course life doesn't really work this way; a test idea might a couple of seconds to implement, or it might take all day.  But I'm modeling here, making this rather gross simplification to clarify a more complex set of dynamics.  (Note that if you'd like to take a &lt;i&gt;really&lt;/i&gt; impoverished view of what happens in skilled testing, you could say that a "test case" takes two minutes.  But I leave it to my colleague James Bach to explain why you should &lt;a bitly="BITLY_PROCESSED" href="http://www.satisfice.com/presentations/againsttestcases.pdf"&gt;reject the concept of test cases&lt;/a&gt;.) &lt;br /&gt;&lt;br /&gt;Let's further suppose that we'll find problems every now and again, which means that we have to do bug investigation and reporting.  This is valuable work for the development team, but it takes time that interrupts test design and execution—the stuff that yields test coverage.  Let's say that, for each bug that we find, we must spend an extra eight minutes investigating it and preparing a report.  Again, this is a pretty dramatic simplification.  Investigating a bug might take all day, and preparing a good report could take time on the order of hours.  Some bugs (think typos and spelling errors in the UI) leap out at us and don't call for much investigation, so they'll take less than eight minutes.  Even though eight minutes is probably a dramatic underestimate for investigation and reporting, let's go with that.  So a test activity that &lt;i&gt;doesn't&lt;/i&gt; find a problem costs us two minutes, and a test activity that &lt;i&gt;does&lt;/i&gt; find a problem takes ten minutes.&lt;br /&gt;&lt;br /&gt;Now, let's imagine one more thing:  we have &lt;i&gt;perfect&lt;/i&gt; testing prowess; that if there's a problem in an area that we're testing, we'll find it, and that we'll never enter a bogus report, either.&amp;nbsp; Yes, this is a thought experiment.&lt;br /&gt;&lt;br /&gt;One day we come into work, and we're given three modules to test.&lt;br /&gt;&lt;br /&gt;The morning session is taken up with Module A, from Development Team A.  These people are amazing, hyper-competent.  They use test-first programming, and test-driven design.  They work closely with us, the testers, to design challenging unit checks, scriptable interfaces, and log files.  They use pair programming, and they review and critique each other's work in an egoless way.  They refactor mercilessly, and run suites of automated checks before checking in code.  They brush their teeth and floss after every meal; they're wonderful.  We test their work diligently, but it's really a formality because they've been testing and we've been helping them test all along.  In our 90-minute testing session, we don't find any problems.  That means that we've performed 45 micro-sessions, and have therefore obtained 45 units of test coverage.&lt;br /&gt;&lt;br /&gt;&lt;table align="left" border="2" cellpadding="3" cellspacing="3" height="129" style="width: 681px;"&gt;&lt;tbody&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Module&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Bug Investigation and Reporting&lt;br /&gt;(time spent on tests that find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Test Design and Execution&lt;br /&gt;(time spent on tests that &lt;i&gt;don't&lt;/i&gt; find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Total Tests&lt;/b&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td&gt;A&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;0 minutes (no bugs found)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;90 minutes (45 tests)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;45&lt;br /&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;br /&gt;The first thing after lunch, we have a look at Team B's module.  These people are very diligent indeed. Most organizations would be delighted to have them on board.  Like Team A, they use test-first programming and TDD, they review carefully, they pair, and they collaborate with testers.  But they're human.  When we test their stuff, we find a bug very occasionally; let's say once per session.  The test that finds the bug takes two minutes; investigation and reporting of it takes a further eight minutes.  That's ten minutes altogether.  The rest of the time, we don't find any problems, so that leaves us 80 minutes in which we can run 40 tests.&amp;nbsp; Let's compare that with this morning's results.&lt;br /&gt;&lt;br /&gt;&lt;table align="left" border="2" cellpadding="3" cellspacing="3" height="129" style="width: 681px;"&gt;&lt;tbody&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Module&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Bug Investigation and Reporting&lt;br /&gt;(time spent on tests that find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Test Design and Execution&lt;br /&gt;(time spent on tests that &lt;i&gt;don't&lt;/i&gt; find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Total Tests&lt;/b&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td&gt;A&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;0 minutes (no bugs found)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;90 minutes (45 tests)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;45&lt;br /&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td&gt;B&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;10 minutes (1 test, 1 bug)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;80 minutes (40 tests)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;41&lt;br /&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;After the afternoon coffee break, we move on to Team C's module.  Frankly, it's a mess.  Team C is made up of nice people with the best of intentions, but sadly they're not very capable.  They don't work with us at all, and they don't test their stuff on their own, either.  There's no pairing, no review, in Team C. To Team C, if it compiles, it's ready for the testers.  The module is a dog's breakfast, and we find bugs practically everywhere.  Let's say we find eight in our 90-minute session.  Each test that finds a problem costs us 10 minutes, so we spent 80 minutes on those eight bugs.  Every now and again, we happen to run a test that doesn't find a problem.  (Hey, even dBase IV occasionally did something right.)&amp;nbsp; Our results for the day now look like this:&lt;br /&gt;&lt;br /&gt;&lt;table align="left" border="2" cellpadding="3" cellspacing="3" height="129" style="width: 681px;"&gt;&lt;tbody&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Module&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Bug Investigation and Reporting&lt;br /&gt;(time spent on tests that find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Test Design and Execution&lt;br /&gt;(time spent on tests that &lt;i&gt;don't&lt;/i&gt; find bugs)&lt;/b&gt;&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;Total Tests&lt;/b&gt;&lt;br /&gt;&lt;/div&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td&gt;A&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;0 minutes (no bugs found)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;90 minutes (45 tests)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;45&lt;br /&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td&gt;B&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;10 minutes (1 test, 1 bug)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;80 minutes (40 tests)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;41&lt;br /&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;tr align="center" valign="middle"&gt;   &lt;td&gt;C&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;80 minutes (8 tests, 8 bugs)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;10 minutes (5 tests)&lt;br /&gt;&lt;/td&gt;   &lt;td&gt;13&lt;br /&gt;&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Because of all the bugs, Module C allows us to perform thirteen micro-sessions in 90 minutes.  &lt;i&gt;Thirteen&lt;/i&gt;, where with the other modules we managed 45 and 41.  Because we've been investigating and reporting bugs, there are 32 micro-sessions, 32 units of coverage, that we &lt;i&gt;haven't&lt;/i&gt; been able to obtain on this module. If we decide that we need to perform that testing (and the module's overall badness is consistent throughout), we're going to need at least three more sessions to cover it.  Alternatively, we could stop testing now, but what are the chances of a serious problem lurking in the parts of the module we haven't covered? So, the first thing to observe here is:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Lots of bugs means reduced coverage, or slower testing, or both.&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There's something else that's interesting, too.  If we are being measured based on the number of bugs we find (&lt;i&gt;exactly&lt;/i&gt; the sort of measurement that will be taken by managers who don't understand testing), Team A makes us look awful—we're not finding any bugs in their stuff.  Meanwhile, Team C makes us look great in the eyes of management.  We're finding lots of bugs! That's good!&amp;nbsp; How could that be bad?  &lt;br /&gt;&lt;br /&gt;On the other hand, if we're being measured based on the test coverage we obtain in a day (which is &lt;i&gt;exactly&lt;/i&gt; the sort of measurement that will be taken by managers who count test cases; that is, managers who probably have an &lt;i&gt;even more&lt;/i&gt; damaging model of testing than the managers in the last paragraph), Team C makes us look terrible.  "You're not getting enough done!  You could have performed 45 test cases today on Module C, and you've only done 13!"  And yet, remember that in our scenario we started with the assumption that, no matter what the module, we always find a problem if there's one there.  That is, there's &lt;i&gt;no difference&lt;/i&gt; between the &lt;i&gt;testers&lt;/i&gt; or the &lt;i&gt;testing&lt;/i&gt; for each of the three modules; it's solely the condition of the &lt;i&gt;product&lt;/i&gt; that makes all the difference.&lt;br /&gt;&lt;br /&gt;This is the first in a pair of posts.  Let's see what happens &lt;a href="http://www.developsense.com/2009/11/what-does-testing-take-so-long-part-2.html"&gt;tomorrow&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-3161610593071907152?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/11/why-is-testing-taking-so-long-part-1.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-8192355983922368659</guid><pubDate>Tue, 10 Nov 2009 18:14:00 +0000</pubDate><atom:updated>2009-11-10T13:51:35.683-05:00</atom:updated><title>"Merely" Checking or "Merely" Testing</title><description>The distinction between testing vs. checking got a big boost recently from &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt; at the &lt;a href="http://www.oredev.org/"&gt;Øredev&lt;/a&gt; conference in Malmö, Sweden.  But a recent tweet by Brian Marick, and a recent conversation with a colleague have highlighted an issue that I should probably address.&lt;br /&gt;&lt;br /&gt;My colleague suggested that somehow I may have underplayed the significance or importance or the worth of checking.  &lt;a href="http://bit.ly/e1Nt5"&gt;Brian's tweet&lt;/a&gt; said,&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;"I think the trendy distinction between "testing" and "checking" is a power play: which would you preface with "mere"? http://bit.ly/2Cuyj&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;As a consequence, I was worried that I had ever said "mere checking" or "merely checking" in one of my blog postings or on Twitter, so I researched it.  Apparently I had not; that was a relief.  However, the fact that I was suspicious even of myself suggests that some maybe I need to clarify something.&lt;br /&gt;&lt;br /&gt;The distinction between testing and checking &lt;i&gt;is&lt;/i&gt; a power play, but it's not a power play between (say) testers and programmers.  It's a power play between the glorification of mechanizable assertions over human intelligence.  It's a power play between sapient and non-sapient actions.&lt;br /&gt;&lt;br /&gt;Recall that the action of a check has three parts to it.  Part one is an &lt;i&gt;observation&lt;/i&gt; of a product.  Part two is a &lt;i&gt;decision rule&lt;/i&gt;, by which we can compare that empirical observation of the product with an idea that someone had about it.  Part three is the setting of a bit (pass or fail, yes or no, true or false) that represents the &lt;i&gt;non-sapient&lt;/i&gt; application of both the observation and the decision rule.  Note, too, that this means that a check can be performed by one of two agencies:  1) a machine. 2) A sufficiently disengaged human; that is, a human who has been scripted to behave like a machine, and who has for whatever reason accepted that assignment.&lt;br /&gt;&lt;br /&gt;So &lt;span style="font-style: italic;"&gt;checks can be hugely important&lt;/span&gt;.  Checks are the means by which a programmer, engaged in test-driven development, tests his idea.  Checks are a valuable product (a by-product, some would say) of test-driven development.  Checks are change detectors, tools that allow programmers to refactor with confidence.  Checks built into continuous integration are mechanisms to make sure that our builds can work well enough to be tested—or, if we're confident enough in the prior quality of our testing, can work well enough to be deployed.  Checks tend to shortens the loop between the implementation of an idea and the disovery of a problem that the checks can detect, since the checks are typically designed and run (a lot, iteratively) by the person doing the implementation.  Checks tend to speed up certain aspects of the post-programmer testing of the product, since good checks will find the kind dopey, embarrassing errors that even the best programmers can make from time to time.  The need for checks sometimes (alas, not always) prompts us to create interfaces that can be used by programmers or testers to aid in later exploration.&lt;br /&gt;&lt;br /&gt;Checking represents the rediscovery of techniques that were around at least in 1957.  &lt;i&gt;"The first attack on the checkout problem may be made before coding has begun."  D. D. McCracken, Digital Computer Programming, 1957&lt;/i&gt;  (Thanks to Ben Simo for inspiring me to purchase a copy of this book.)  In 2007, I had dinner with Jerry Weinberg and Josh Kerievsky.  Josh asked Jerry if he did a lot of unit testing back in the day.  Jerry practically did a spit-take, saying "Yes, of course.  Computer time was hugely expensive, but we programmers were cheap.  Getting the program right was really important, so we had to test a lot."  Then he added something that hadn't occurred to me.  "There was another reason, too.  Apart from everything else, we tested because the machinery was so unreliable.  We'd run a test program, then run the program we wrote, then run the test program again to make sure that we got the same result the second time.  We had to make sure that no tubes had blown out."&lt;br /&gt;&lt;br /&gt;So, in those senses, &lt;i&gt;checking rocks&lt;/i&gt;.  Checking has always rocked.  It seems that in some places, people forgot how much it rocks, and that the Agilists have rediscovered them.&lt;br /&gt;&lt;br /&gt;Yet it's important to note that checks &lt;i&gt;on their own&lt;/i&gt; don't deliver value unless there's sapient engagement with them.  What do I mean by that?&lt;br /&gt;&lt;br /&gt;As &lt;a href="http://www.satisfice.com/blog/archives/99"&gt;James Bach says here&lt;/a&gt;, "A sapient process is any process that relies on skilled humans."  Sapience is the capacity to act with human intelligence, human judgment, and some degree of human wisdom.&lt;br /&gt;&lt;br /&gt;It takes sapience to recognize the need for a check—a risk, or a potential vulnerability.  It takes sapience—testing skill—to express that need in terms of a test idea.  It takes sapience—more test design skill—to express that test idea in terms of a question that we could ask about the program.  Sapience—in terms of testing skill, and probably some programming skill—is needed to frame that question as a yes-or-no, true-or-false, pass-or-fail question.  Sapience, in the form of programming skill, is required to turn that question into executable code that can implement the check (or, far more expensively and with less value, into a test script for execution by a non-sapient human).  We need sapience—testing skill again—to identify an event or condition that would trigger some agency to perform the check.  We need sapience—programming skill again—to encode that trigger into executable code so that the process can be automated.&lt;br /&gt;&lt;br /&gt;Sapience disappears while the check is being performed.  By definition, the observation, the decision rule, and the setting of the bit all happen without the cognitive engagement of a skilled human.&lt;br /&gt;&lt;br /&gt;Once the check has been performed, though, skill comes back into the picture for &lt;i&gt;reporting&lt;/i&gt;.  Checks are rarely done on their own, so they must be aggregated.  The aggregation is typically handled by the application of programming skill.  To make the outcome of the check observable, the aggregated results must be turned into a human-readable report of some kind, which requires both testing and programming skill.  The human observation of the report, intake, is by defintion a sapient process.  Then comes &lt;i&gt;interpretation&lt;/i&gt;.  The human ascribes meaning to the various parts of the report, which requires skills of testing and of critical thinking.  The human ascribes significance to the meaning, which again takes testing and critical thinking skill.  Sapient activity by someone—a tester, a programmer, or a product owner—is needed to determine the response.  Upon deciding on significance, more sapient action is required—fixing the application being checked (by the production programmer); fixing or updating the check (by the person who designed or programmed the check); adding a new check (by whomever might want to do so) or getting rid of the check (by one or more people who matter, and who have decided that the check is no longer relevant).&lt;br /&gt;&lt;br /&gt;So:  the check &lt;i&gt;in and of itself&lt;/i&gt; is relatively trivial.  It's all that stuff around the check—the testing and programming and analysis activity—that's important, supremely important.  And as is usual with important stuff, there are potential traps.&lt;br /&gt;&lt;br /&gt;The first trap is that it might be easy to do any of the sapient aspects of checking badly.  Since the checks are at their core &lt;i&gt;software&lt;/i&gt;, there might be problems in requirements, design, coding, or interpretation, just as there might be with any software.&lt;br /&gt;&lt;br /&gt;The second trap is that it can be easy to fall asleep somewhere between the report and interpretations stages of the checking process.  The green bar tells us that All Is Well, but we must be careful about that.  All is well &lt;i&gt;with respect to the checks that we've programmed&lt;/i&gt; is a very different statement.  Red tends to get our attention, but green is an addictive and narcotic colour.  A passing test is another White Swan, confirmation of our existing beliefs, proof by induction.  Now, we can't live without proof by induction, but induction can't alert us to new problems.  Millions of repeated tests, repeated thousands of times, don't tell us about the bugs that elude them.  We only need one Black Swan to bump into a devastating effect.&lt;br /&gt;&lt;br /&gt;The third trap is that we might believe that checking a program is all there is to testing it.  Checking done well incorporates an enormous amount of testing and programming skill, but some quality attributes of a program are not machine-decidable.  Checks are the kinds of tests that aren't vulnerable to &lt;a href="http://en.wikipedia.org/wiki/Halting_problem"&gt;the halting problem&lt;/a&gt;.Someone on a mailing list once said, "Once all the (automated) acceptance test pass (that is, all the checks), we know we're done."  I liked &lt;a href="http://www.jbrains.ca/"&gt;Joe Rainsberger&lt;/a&gt;'s reply, "No, you're not done; you're ready to give it to a real tester to kick the snot out of it."  That kicking is usually expressed with greater emphasis on exploration, discovery, and investigation, and rather less on confirmation, verification, and validation.&lt;br /&gt;&lt;br /&gt;The fourth trap is a close cousin of the third trap:  at certain points, we might pay undue attention to the value of checking with respect to its cost.  Cost vs. value is a dominating problem with any kind of testing, of course.  One of the reasons that the Agile emphasis on testing remains exciting is that excellent checking lowers the cost of testing, and both help to defend the value of the program.  Yet checks may not be Just The Thing for some purposes.  Joe has expressed concerns in his series &lt;a href="http://www.jbrains.ca/permalink/239"&gt;Integration Tests are a Scam&lt;/a&gt;, and Brian Marick did too, a while ago, &lt;a href="http://www.exampler.com/blog/2008/03/23/an-alternative-to-business-facing-tdd/"&gt;An Alternative to Business-Facing TDD&lt;/a&gt;.  I think they're both making important points here, thinking of checks as a means to an end, rather than as a fetish.&lt;br /&gt;&lt;br /&gt;Fifth:  upon noting the previous four traps (and others), we might be tempted to diminish the value of checking.  That would be a mistake. Pretty much any program is made more testable by someone removing problems before someone else sees them.  Every bug or issue that we find could trigger investigation, reporting, fixing, and retesting, and that gives other (and potentially more serious) problems time to hide.  Checking helps to prevent those unhappy discoveries.  Excellent checking (which incorporates excellent testing) will tend to reduce the number of problems in the product at any given time, and thereby results in a more testable program.  James Bach points out that &lt;a href="http://www.satisfice.com/blog/archives/58"&gt;a good manual test could never be automated&lt;/a&gt; (he'd say "sapient" now, I believe).  But note, in that same post that he says, that "if you can truly automate a manual test, it couldn’t have been a good manual test", and "if you have a great automated test, it’s not the same as the manual test that you believe you were automating".  The point is that there are such things as great automated tests, and some of them might be checks.&lt;br /&gt;&lt;br /&gt;So the power play is over which we're going to value: the &lt;i&gt;checks&lt;/i&gt; ("we have 50,000 automated tests") or the &lt;i&gt;checking&lt;/i&gt;.  Mere checks aren't important; but &lt;i&gt;checking&lt;/i&gt;&amp;mdash;the activity required to build, maintain, and analyze the checks&amp;mdash;is.  To paraphrase Eisenhower, with respect to checking, the checks are nothing; the checking is everything.  Yet the checking &lt;i&gt;isn't&lt;/i&gt; everything; neither is the testing.  They're both important, and to me, neither can be appropriately preceded with "mere", or "merely".&lt;br /&gt;&lt;br /&gt;There's one exception, though:  If you're only doing one or the other, it might be important to say, "You're merely been &lt;i&gt;testing&lt;/i&gt; the program; wouldn't you be better off &lt;i&gt;checking&lt;/i&gt; it, too?" or "That program hasn't been &lt;i&gt;tested&lt;/i&gt;; it's merely been &lt;i&gt;checked&lt;/i&gt;."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-8192355983922368659?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/11/merely-checking-or-merely-testing.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-4300153191056255614</guid><pubDate>Tue, 10 Nov 2009 04:04:00 +0000</pubDate><atom:updated>2009-11-09T23:16:53.381-05:00</atom:updated><title>Testing, Checking, and Convincing the Boss to Explore</title><description>How is it useful to make the distinction between &lt;a href="http://www.developsense.com/2009/08/testing-vs-checking.html"&gt;testing and checking&lt;/a&gt;?  One colleague (let's call him Andrew) recently found it very useful indeed.  I've been asked not to reveal his real name or his company, but he has very generously permitted me to tell this story.&lt;br /&gt;&lt;br /&gt;He works for a large, globally distributed company, which produces goods and services in a sector not always known for its nimbleness.  He's been a test manager with the company for about 10 years.  He's had a number of senior managers who have allowed him and his team to take an exploratory approach, almost a &lt;a href="http://en.wikipedia.org/wiki/Skunk_works"&gt;skunkworks&lt;/a&gt; inside the larger organization.  Rather than depending on process manuals and paperwork, he manages by direct interaction and conversation.  He hires bright people, trains them, and grants them a fairly high degree of autonomy, balanced by frequent check-ins.&lt;br /&gt;&lt;br /&gt;Recently, on a Thursday the relatively new CEO came to town and held an all-hands meeting for Andrew's division.  Andrew was impressed; the CEO seemed genuinely interested in cutting bureaucracy and making the organization more flexible, adaptable, and responsive to change.  After the CEO's remarks, there was a question-and-answer period.  Andrew asked if the company would be doing anything to make testing more effective and more efficient.  The CEO seemed curious about that, and jotted down a note on a piece of paper.  Andrew was given the mandate of following up with the VP responsible for that area.&lt;br /&gt;&lt;br /&gt;Late that afternoon, Andrew called me.  We chatted for a while on the phone.  He hadn't read my series on testing vs. checking, but he seemed intrigued.  I suggested that he read it, and that we get together and talk about it.&lt;br /&gt;&lt;br /&gt;As luck would have it, there was occasion to bring a few more people into the picture.  That weekend, we had a timely conversation with &lt;a href="http://www.quality-intelligence.com/"&gt;Fiona Charles&lt;/a&gt; who reminded us to focus the issue of risk.  &lt;a href="http://www.amibug.com/"&gt;Rob Sabourin&lt;/a&gt;, happened to be visiting on Saturday evening, so he, Andrew, and I sat down to compose a letter to the VP.   Aside from changing the names that would identify the parties involved, this is an unedited version what we came up with:&lt;br /&gt;&lt;br /&gt;[Our Letter]&lt;br /&gt;&lt;br /&gt;Dear [Madam VP]...&lt;br /&gt;&lt;br /&gt;[Mr. CEO] asked me to send you this email as a follow up to a question that I posed during his recent trip to the [OurTown] office on [SomeDate] on the current state of the testing at [OurCompany] and how our testing effectiveness should be improved.&lt;br /&gt;&lt;br /&gt;The [OurTown]-based [OurDivision] test team has been very successful in finding serious issues with our products with a fairly small test team using exploratory test approaches. As an example, a couple of weeks ago one of my testers found a critical error in an emergency fix within his two days of exploratory testing in a load that had passed four person-&lt;i&gt;weeks&lt;/i&gt; of regression testing (scripted checking) by another team.&lt;br /&gt;&lt;br /&gt;Last week a Project Lead called me and asked if my team could perform a regression sweep on a third party delivery. I replied that we could provide the requested coverage with two person-&lt;i&gt;days&lt;/i&gt; of effort without disrupting our other commitments. He seemed surprised and delighted. He had come to us because [OurCompany]'s typical approach yielded a four-to-six person-&lt;i&gt;week&lt;/i&gt; effort which would have caused a delay in the project's subsequent release.&lt;br /&gt;&lt;br /&gt;Our experience using exploratory testing in [OurDivision] has demonstrated improved flexibility and adaptability to respond to rapid changes in priorities.&lt;br /&gt;&lt;br /&gt;Testing is not checking. Checking is a process of confirmation, validation, and verification in which we are comparing an output to a predicted and expected result. Testing is something more than that. Testing is a process of exploration, discovery, investigation, learning, and analysis with the goal of gathering information about risks, vulnerabilities, and threats to the value of the product. The current effectiveness of many groups' automated scripts is quite excellent, yet without supplementing these checks with "brain-engaged" human testing we run the risk of serious problems in the field impacting our customers, and the consequential bad press that follow these critical events.&lt;br /&gt;&lt;br /&gt;At [OurCompany] much of our "testing" is focused on checking. This has served us fairly well but there are many important reasons for broadening the focus of our current approach. While checking is very important, it is vulnerable to the "pesticide paradox". As bacteria develop a resistance to antibiotics, software bugs are capable of avoiding detection by existing tests (checks), whether executed once or repeated over and over. In order to reduce our vulnerability to field issues and critical customer incidents, we must supplement our existing emphasis on scripted tests (both manual and automated) with an active search for new problems and new risks.&lt;br /&gt;&lt;br /&gt;There are several strong reasons for integrating exploratory approaches into our current development and testing practices:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Scripted tests are perceived to be important for compliance with [regulatory] requirements.  They are focused on being repeatable and defensible. Mere compliance is insufficient—we need our products to &lt;i&gt;work&lt;/i&gt;.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Scripted checks take time and effort to design and prepare, whether they are run by machines or by humans. We should focus on reducing preparation cost wherever possible and reallocating that effort to more valuable pursuits.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Scripted checks take far more time and effort to execute when performed by a human than when performed by a machine. For scripted checks, machine execution is recommended over human execution, allowing more time for both human interaction with the product, and consequent observation and evaluation.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Exploratory tests take advantage of the human capacity for recognizing new risks and problems.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Exploratory testing is highly credible and accountable when done well by trained testers. The findings of exploratory tests are rich, risk-focused, and value-centered, revealing far more knowledge about the system than simple pass/fail results.&lt;/li&gt;&lt;/ul&gt;The quality of exploratory testing is based upon the skill set and the mindset of the individual tester. Therefore, I recommend that testers and managers across the organization be trained in the structures and disciplines of excellent exploratory testing. As teams become trained, we should systematically introduce exploratory sessions into the existing testing processes, observing and evaluating the results obtained from each approach.&lt;br /&gt;&lt;br /&gt;I have been actively involved in improving testing, in general, outside of [OurCompany]. I am on the board of a testing association and I have been attending, organizing and facilitating meetings of testers for many years.&lt;br /&gt;&lt;br /&gt;During this time, I have been exposed to much of the latest developments in software testing and I have led the implementation of Session Based Exploratory Testing within my department. In addition, over the past four years, I have been providing instruction in software testing both to the testers within my business unit and to companies outside of [OurCompany].&lt;br /&gt;&lt;br /&gt;I look forward to the opportunity to talk with you about this further.&lt;br /&gt;&lt;br /&gt;[/Our Letter]&lt;br /&gt;&lt;br /&gt;Now, I thought that was pretty strong.  But the response was far more gratifying than I expected.  Andrew sent the message on Sunday afternoon.  The VP responded &lt;i&gt;by 8:45am on Monday morning&lt;/i&gt;.   Her reply was in &lt;i&gt;my&lt;/i&gt; mailbox before 10:00am.  The reply read:&lt;br /&gt;&lt;br /&gt;[The VP's Reply]&lt;br /&gt;&lt;br /&gt;Dear Andrew&lt;br /&gt;&lt;br /&gt;Thanks very much for the email. I find this very intriguing! I believe the distinction you make between testing and checking is quite insightful and I would like to connect with you to see how we can build these concepts and techniques into our quality management services as well as my central team verification tests. I will get a call together with [Mr. Bigwig] and [Mr. OtherBigwig] so that we can figure out the best way to incorporate your ideas. Again, many thanks!!&lt;br /&gt;&lt;br /&gt;[/The VP's Reply]&lt;br /&gt;&lt;br /&gt;A couple of key points:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The letter was much stronger thanks to collaboration.  Any one of the four of us could have written a good letter; the result was better than any of us could have done on our own.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The letter is sticky, in the sense that Chip and Dan Heath talk about in their book &lt;a href="http://www.amazon.com/Made-Stick-Ideas-Survive-Others/dp/1400064287"&gt;Made to Stick: Why Some Ideas Survive and Others Die&lt;/a&gt;.  It's not a profound book, but it contains some useful points to ponder.  The letter starts with two &lt;b&gt;s&lt;/b&gt;tories that are &lt;b&gt;s&lt;/b&gt;imple, &lt;b&gt;u&lt;/b&gt;nexpected, &lt;b&gt;c&lt;/b&gt;oncrete, &lt;b&gt;c&lt;/b&gt;redible, and &lt;b&gt;e&lt;/b&gt;motional (remember, the product manager was &lt;i&gt;surprised&lt;/i&gt; and &lt;i&gt;delighted&lt;/i&gt;).  Those initials can be rearranged to SUCCES, which is the mnemonic that the Heaths use for successful communication.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The testing vs. checking distinction is simpler and memorable than "exploratory approaches" vs. "confirmatory scripted approaches".  The explanation is available (and in most cases necessary), but "testing" and "checking" roll off the tongue quickly after the explanation has been absorbed.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;We managed to hit some of the most important aspects of good testing:  cost vs. value, risk focus, diversification of approaches, flexibility and adaptability, and rapid service to the larger organization.&lt;/li&gt;&lt;/ul&gt;After reviewing this post, Andrew said, "I like the post a lot.  Let's hope we end up helping a lot of people with it."  Amen.  You are, of course, welcome to use this letter as a point of departure for your own letter to the bigwigs.  If you'd like help, please feel free to &lt;a href="mailto:michael@developsense.com?subject=Testing%20vs.%20Checking%20Letter"&gt;drop me a line&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-4300153191056255614?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/11/testing-checking-and-convincing-boss-to.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>9</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-9099797487967162418</guid><pubDate>Fri, 30 Oct 2009 08:59:00 +0000</pubDate><atom:updated>2009-10-30T16:35:24.179-05:00</atom:updated><title>Comment on a Not-So-Good Article on Exploratory Testing</title><description>An article from a while back on StickyMinds entitled &lt;a href="http://www.stickyminds.com/sitewide.asp?ObjectId=6271&amp;amp;Function=edetail&amp;amp;ObjectType=ART&amp;amp;commex=1#6466"&gt;How To Choose Between Scripted and Exploratory Testing&lt;/a&gt; refers to a bunch of factors in making choices between scripted testing and exploratory testing.  The problems start early:  &lt;i&gt;"As a test manager or engineer, you may be considering whether or not to use exploratory testing on your next project."&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;If you're not planning on investigating any problems that you find, I suppose that you &lt;span style="font-style: italic;"&gt;could&lt;/span&gt; choose not to use exploratory testing (investigating a bug is not a scripted or scriptable process).  If you don't allow the tester to control his or her actions, or to feed information from the last test into the thought process and action behind the next test, then you &lt;span style="font-style: italic;"&gt;could&lt;/span&gt; get away without taking an exploratory approach.  If the testing ideas were all declared and written down at the beginning of the project, you &lt;span style="font-style: italic;"&gt;could&lt;/span&gt; successfully avoid exploratory testing.  And if you used an all-&lt;a href="http://www.developsense.com/2009/08/testing-vs-checking.html"&gt;checking&lt;/a&gt; strategy, maybe exploratory testing wouldn't be an approach you'd take.  But would you really want all of your testing to be confirmatory?  All validation and verification?  All checking to make sure that existing beliefs are correct, rather than probing the product to disconfirm those beliefs, with the goal of showing that they're inaccurate or incomplete?&lt;br /&gt;&lt;br /&gt;This isn't the worst article ever written on exploratory testing.   It is, however, typical of a certain kind of bad article.  It's full of assertions that aren't supported in the view of exploratory testing that a bunch of us (including Jon Bach, who, at the bottom of the article provides an unusually stern comment by his usual standards) have been developing for years.  Let's have a look at some of the points.  Quotes from the original article are in italics.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Tester domain knowledge&lt;/b&gt;...If a tester does not understand the system or the business processes, it would be very difficult for him to use, let alone test, the application without the aid of test scripts and cases.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;A few years back, a former co-worker contracted me to to document a network analysis tool that was being rebranded, fixed, and upgraded by his new company.  The only documentation available was not only weak but also a couple of versions out of date.  In addition, I didn't know much about network analysis tools at all.  I reviewed some literature on the topic, and I interviewed the company's lead programmer.  But for me, the most rapid, relevant, and powerful learning came from interacting with the product, figuring out how it worked (and, at several points, how it appeared to fail), and building the story of the product in my mind and in the documentation that I created.  Without that interaction with the product, my learning would have not have been rooted in experience, and would have been much slower. And the documentation that I produced was deemed excellent by the client.&lt;br /&gt;&lt;br /&gt;How can I test a product in a domain that I don't know much about?  The answer is that through testing, I learn, and I learn far more rapidly than by other means.   Anyone learning a complex cognitive activity learns far more, far more quickly, in the doing of it than in the reading about it.  Preschool kids do the same thing.  Watch them; then watch how they tend to slow down and follow the schools' processes instead of their own.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;System complexity&lt;/b&gt;...End-to-end testing can be accomplished with exploratory testing; however, the capabilities and skill sets required are typically those of a more experienced test engineer.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;People often argue that exploratory testing needs capable, experienced, and skilled testers.  I agree.  The argument suggests (and often states outright) that scripted testing &lt;i&gt;doesn't&lt;/i&gt; need capable, experienced, and skilled testers.  To some degree that's true; when the scripted action is &lt;a href="http://http//www.developsense.com/2009/09/pass-vs-fail-vs-is-there-problem-here.html"&gt;checking&lt;/a&gt;, by definition skill and sapience aren't required for the moment of the check.  But sapience and skill &lt;a href="http://www.developsense.com/2009/09/elements-of-testing-and-checking.html"&gt;&lt;i&gt;are&lt;/i&gt; required&lt;/a&gt; for the design and construction of the check, and for the analysis and interpretation of the results, so the argument that scripted testing doesn't need skilled testers doesn't hold water if you want to do scripted testing &lt;i&gt;well&lt;/i&gt;.  Indeed, good testing of any kind requires skill.  If you or your testers are genuinely unskilled, training, coaching, mentoring, and a fault-tolerant environment that fosters learning will help to build skills quickly.&lt;br /&gt;&lt;br /&gt;I'm also puzzled by the argument in another way:  the (claimed) lack of a need for skilled testers is often presented as a &lt;i&gt;virtue&lt;/i&gt; of scripted testing.  This is like saying that the lack of a requirement for medical training is a virtue of quack medicine.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Level of documentation. Scripted testing generally flows from business requirements documents and functional specifications. When these documents do not exist or are deficient, it is very difficult to conduct scripted testing in a meaningful way. The shortcuts that would be required, such as scripting from the application as it is built, are accomplished as efficiently using exploratory testing.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Actually, there are several ways to do scripted testing in a meaningful way without business requirements documents or functional specifications being present or perfect.  A standout example is test-driven development, in which checks are developed prior to developing the application code.  Another:  unit tests, in which checks are prepared at some point after the application code has been developed.  Both forms of checks support refactoring.  Another example:  many Agile development shops use Fitnesse to develop, explore, and discover many of the requirements of the product, and to create scripted checks as development proceeds.&lt;br /&gt;&lt;br /&gt;In fact, no one of whom I am aware has ever seen perfect requirements documents; they're always deficient in some way for some person's purpose, and always to some degree that's intentional.  It's infeasible and wasteful to document every assumption; for programmers and testers with appropriate judgment and skill, it's also somewhat insulting.  Thus, whether developing either scripted or exploratory test ideas, we should treat imperfect documents as a default assumption.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Timeframes and deadlines.&lt;/b&gt;The lead-in time to test execution determines whether you can conduct test design before execution. When there is little or no time, you might at least need to start with exploratory testing while documented tests are prepared after the specifications become available. &lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Here is the implication, once again, that exploratory tests aren't documented.  Yet exploratory ideas &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; be documented in advance, in the form of checklists or as charters.  Existing data files—a form of documentation—can be available in advance.  Exploratory tests can be guided by marked-up diagrams, by user documentation, or even by a functional specification.  Exploratory testing can be &lt;i&gt;informed&lt;/i&gt; by any kind of documentation you like.  One key distinction between exploratory and scripted approaches is not whether documentation is available in advance; with ET the difference is that the tester, rather than the document, is the &lt;i&gt;primary&lt;/i&gt; agency driving the design and execution of the test.  Another key distinction is that in an exploratory approach, detailed documentation of the tester's actions tends to be de-emphasized prior to test execution, and tends to be produced during or very shortly after testing.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Available resources.&lt;/b&gt; The total number of person-days of effort can determine which approach you should take. Formal test documentation has significant overhead that can be prohibitive where resources and budgets are tight.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Well, at least maybe we agree on something.  But the number of person-days of effort doesn't determine which approach you choose; people do that.  It's unclear here as to whether the person-days are a given, or whether the test manager gets to choose.  Moreover, I wouldn't use the word "formal" here, since documentation associated with exploratory approaches can be quite formal (look at &lt;a href="http://www.satisfice.com/sbtm"&gt;session-based test management&lt;/a&gt; as an example).  The word I'd use instead of "formal" above is "excessive" or "wasteful" or "overly detailed".  I'd also suggest that it's fairly rare for testers to feel as though resources and budgets are ample; most testers I speak with maintain that resources and budgets are always tight, for them.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Skills required.&lt;/b&gt; The skill sets of your team members can affect your choices. Good test analysts may not necessarily be effective exploratory testers and vice versa. A nose for finding bugs quickly and efficiently is not a skill that is easily learned...&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Perhaps not.  James Bach, Jon Bach, James Lyndsay, and I (to name but four) have a lot of experience training testers, and our experience is that the capacity to find bugs can be learned more quickly than many people believe.  But habitutate testers to working from detailed manual test scripts and I'll &lt;i&gt;guarantee&lt;/i&gt; that they don't develop skill quickly.  In fact, a huge part of learning to find problems quickly lies in the tester being given the freedom and responsibility to explore and to develop his own mental models of the product.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Verification.&lt;/b&gt; Verification requires something to compare against. Formal test scripts describe an expected outcome that is drawn from the requirements and specifications documents.  As such, we can verify compliance. Exploratory testing is compared to the test engineer’s expectations of how the application should work. &lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Many kinds of testing—not just verification—require something to compare against.  That "something" is called an oracle, a principle or mechanism by which we recognize a problem.  The claim here is that scripted testing is a good thing because we can verify "compliance" (to what?) based on requirements and specifications documents.  Actually, what we'd be verifying here is &lt;i&gt;consistency&lt;/i&gt; between some point in a test script and a specific claim made in one of the documents.  There are two problems here.  One is the problem of &lt;a href="http://www.youtube.com/watch?v=Ahg6qcgoay4"&gt;inattentional blindness&lt;/a&gt;; focusing a tester's attention on a single observation drives the tester's attention away from other possible observations that might represent an issue with the product.  The second problem relates to the fact that requirements and specification documents (as above) can be presumed to contain errors, misinterpretations, and outdated information.&lt;br /&gt;&lt;br /&gt;Exploratory approaches defend against these two problems by their emphasis on the tester's skill set and mindset.  Rather than depending upon a single oracle (that of consistency with a specification), an exploratory approach emphasizes apply several oracles, including consistency with the product's history, with the image the development organization wants to project, with comparable products, with reasonable user expectations, with the intended purpose of the product, with other elements within the product, and with relevant standards, regulations, or laws.  Claims—statements in the requirement documents—are taken seriously by the exploring tester, of course, but the other kinds of oracles provide a rich set of possible comparisons, rather than the relatively impoverished single set provided by a script.  By keeping the tester cognitively engaged and under her own control, we lessen the risk of script-induced inattentional blindness.  Statements like &lt;i&gt;"Exploratory testing is compared to the test engineer’s expectations of how the application should work"&lt;/i&gt; are not only inaccurate, but also trivialize the complex set of heuristics that skilled testers bring to the game.&lt;br /&gt;&lt;br /&gt;At this point, I'm in agreement with Jon Bach.  Refuting the rest of the article would take more time than I've got at the moment.  So, to keep the curious and eager people occupied:&lt;br /&gt;&lt;h3&gt;Web Resources&lt;br /&gt;&lt;/h3&gt;&lt;a href="http://www.developsense.com/2008/09/evolving-understanding-about.html"&gt;Evolving Understanding of Exploratory Testing&lt;/a&gt; (Bolton)&lt;br /&gt;&lt;a href="http://www.kaner.com/pdfs/QAIExploring.pdf"&gt;A Tutorial in Exploratory Testing&lt;/a&gt; (Kaner)&lt;br /&gt;&lt;a href="http://www.kaner.com/pdfs/ETatChoices.pdf"&gt;The Nature of Exploratory Testing&lt;/a&gt; (Kaner)&lt;br /&gt;&lt;a href="http://www.kaner.com/pdfs/ValueOfChecklists.pdf"&gt;The Value of Checklists and the Danger of Scripts: What Legal Training Suggests for Testers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.developsense.com/resources/et-dynamics22.pdf"&gt;Exploratory Testing Dynamics&lt;/a&gt; (James Bach, Jon Bach, Michael Bolton)&lt;br /&gt;&lt;a href="http://www.satisfice.com/tools/procedure.pdf"&gt;General Functionality and Stability Test Procedure (for Microsoft Windows 2000 Application Certification)&lt;/a&gt; (James Bach)&lt;br /&gt;&lt;a href="http://bit.ly/KTXML"&gt;Experiments in Problem Solving&lt;/a&gt; (Jerry Weinberg)&lt;br /&gt;&lt;a href="http://bit.ly/2QMudl"&gt;Collaborative Discovery in a Scientific Domain&lt;/a&gt; (Okada and Simon)&lt;br /&gt;- a study of collaborative problem-solving. Notice the subjects are testing software.&lt;br /&gt;&lt;h3&gt;Books&lt;/h3&gt;&lt;i&gt;Exploring Science: The Cognition and Development of Discovery Processes&lt;/i&gt; (David Klahr)&lt;br /&gt;&lt;i&gt;Plans and Situated Actions&lt;/i&gt; (Lucy Suchman)&lt;br /&gt;&lt;i&gt;Play as Exploratory Learning&lt;/i&gt; (Mary Reilly)&lt;br /&gt;&lt;i&gt;Exploratory Research in the Behavioral Sciences&lt;/i&gt;&lt;br /&gt;&lt;i&gt;Naturalistic Inquiry&lt;/i&gt;&lt;br /&gt;&lt;i&gt;How to Solve It&lt;/i&gt; (George Polya)&lt;br /&gt;&lt;i&gt;Simple Heuristics That Make Us Smart&lt;/i&gt; (Gerg Gigerenzer)&lt;br /&gt;&lt;i&gt;Blink&lt;/i&gt; (Malcolm Gladwell)&lt;br /&gt;&lt;i&gt;Gut Feelings&lt;/i&gt;  (Gerg Gigerenzer)&lt;br /&gt;&lt;i&gt;Sensemaking in Organizations&lt;/i&gt; (Karl Weick)&lt;br /&gt;&lt;i&gt;Cognition in the Wild&lt;/i&gt; (Edward Hutchins)&lt;br /&gt;&lt;i&gt;The Social Life of Information&lt;/i&gt; (Paul Duguid and John Seely Brown)&lt;br /&gt;&lt;i&gt;A System of Logic and Ratiocination&lt;/i&gt; (John Stuart Mill)&lt;br /&gt;&lt;i&gt;Sciences of the Artificial&lt;/i&gt; (Herbert Simon)&lt;br /&gt;&lt;br /&gt;Yes, learning about exploratory testing in a non-trivial way might take some practice and study.  That goes for anything worth doing expertly, doesn't it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-9099797487967162418?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/10/comment-on-not-so-good-article-on.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>6</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-907270036766144109</guid><pubDate>Wed, 28 Oct 2009 01:43:00 +0000</pubDate><atom:updated>2009-10-27T21:44:39.494-05:00</atom:updated><title>Maturity Models Have It Backwards</title><description>At a couple of recent conferences, some people have asked me about one "maturity" model or another.  As one of the few people who has read the CMMI book from cover to cover, here's what I think:  In process-speak, the notion of maturity is backwards.&lt;br /&gt;&lt;br /&gt;A mature entity, in biology, is one that can survive and thrive without parental support. A mature being is one that has achieved an age and stage where it can reproduce, mutate, and diversify.  In general, diversity and sustainability come from interaction with other mature creatures.  An animal or plant attains and sustains maturity either by adapting to its environment, or by being adapted to it already, but no species is adapted to all environments.&lt;br /&gt;&lt;br /&gt;A mature person is one who is highly conscious of when it's appropriate to follow rules and when to break them.  A mature person is largely self-guided.  Only in exceptional circumstances does a mature person need to refer to or appeal to a rulebook at all.  In such cases, the issue is that someone believes that the rulebook isn't working, and in such cases, consensus between mature individuals and organizations—not the rulebook itself—makes the determination as to what should happen next.  Mature people know that rulebooks need to be interpreted.&lt;br /&gt;&lt;br /&gt;We don't consider a person mature when he says or does the same thing over and over again, when he answers by rote, when he appeals to authority, or when he goes through the motions.  We consider a person mature when he is able to think and perform independently, when he behaves responsibly and respectfully towards others, and when he accepts responsibility for his actions.  We also don't mind when a mature person screws around every once in a while, as long as little or no harm comes of it.&lt;br /&gt;&lt;br /&gt;We generally think of mature people as being relaxed and easy-going, rather than rigid and uptight.  Mature people who can't get their way immediately don't stamp their feet, shout, or hold their breath until they turn blue.  Mature people recognize the possibility that other people's values, actions, languages, cultures, and means of expression are worthy of respect.  Mature people question themselves at least as quickly and as closely as they question others.  Mature people are forgiving and fault-tolerant, recognizing that immature people often need to make mistakes in order to learn.  Mature parents don't scare the kids, don't yell at them, don't try to protect them from every peril.  Mature parents provide a supportive environment where kids can make mistakes and learn from them.&lt;br /&gt;&lt;br /&gt;One more thing about mature beings:  they eventually get old, and they die, to be replaced by other mature beings.&lt;br /&gt;&lt;br /&gt;So what does all this mean for "process maturity"?&lt;br /&gt;&lt;br /&gt;If maturity means the same thing for processes as for other living things, a genuinely mature process, whether for individuals or for groups, should incorporate freedom, responsibility, diversity, adaptability, and self-sufficiency.  A genuinely mature process shouldn't emphasize repeatability as a virtue unto itself, but rather as something set up to foster appropriate variation, sustainability, and growth.  A mature process should encourage risk-taking and mistakes while taking steps to limit the severity and consequence of the mistakes, because without making mistakes, learning isn't possible.&lt;br /&gt;&lt;br /&gt;What do you think?  How do process "maturity" models stack up?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Note: This post was inspired by Jerry Weinberg's writing, in particular the material on "maturity" in &lt;/span&gt;Quality Software Management, Volume 1:  Systems Thinking&lt;span style="font-style: italic;"&gt;.  For an unusually sharp passage on the subject, see pages 20 and 21.&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#00ffcc,#99ccff,#3399ff,#3399ff"&gt;&lt;div shape="_x0000_s1026"&gt;&lt;div class="O" style="text-align: center;"&gt;&lt;/div&gt;  &lt;/div&gt;  &lt;/p:colorscheme&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-907270036766144109?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/10/maturity-models-have-it-backwards.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>19</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-7087672707913480587</guid><pubDate>Mon, 19 Oct 2009 21:10:00 +0000</pubDate><atom:updated>2009-10-19T18:05:44.691-05:00</atom:updated><title>The Testers' Christmas Present</title><description>So the holidays are coming up, and you're wondering what to get for your tester friends, or (if you're a tester) for your kids.&lt;br /&gt;&lt;br /&gt;Let me be the first this season to recommend &lt;span style="font-style: italic;"&gt;I Am A Bug&lt;/span&gt;, a perfectly charming little book by Robert Sabourin, and illustrated by his daughter Catherine, who was between 11 and 12 years old as the book was being published.  It's been around for several years.&lt;br /&gt;&lt;br /&gt;The secret is that &lt;span style="font-style: italic;"&gt;I Am A Bug&lt;/span&gt; is a serious testing book, cleverly disguised as a children's book.  It's one of the wisest books on testing I've ever read.  Each page begins with a big message, illustrated and elaborated below.  Here's a littl sample:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;A bee sting may hurt a bit&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;img style="margin: 0pt 10px 10px 0pt; width: 238px; height: 320px;" src="http://www.developsense.com/uploaded_images/Bee-Sting-738407.GIF" alt="" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;font-size:100%;" &gt;The same bug can be found in different computer programs.  In one program the bug may not cause much damage...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;But it can kill you if you're allergic to bees.&lt;/span&gt;&lt;br /&gt;&lt;img src="http://www.developsense.com/uploaded_images/Ambulance-738788.GIF" alt="" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;font-size:100%;" &gt;...but in another program it could be fatal.&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;The whole book is online, at Rob's Web site, http://www.amibug.com.  (The book is found under "Presentations".)  It's fun to read there, but order the dead-tree version for a very reasonable price.  You can get it from the usual online booksellers.  Highly recommended.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-7087672707913480587?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/10/testers-christmas-present.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-485929998141654503</guid><pubDate>Tue, 13 Oct 2009 15:14:00 +0000</pubDate><atom:updated>2009-10-13T11:10:04.673-05:00</atom:updated><title>When Do We Stop Testing?  One More Sure Thing</title><description>Not too long ago, I posted &lt;a href="http://www.developsense.com/2009/09/when-do-we-stop-test.html"&gt;a list of stopping heuristics for testing&lt;/a&gt;.  As usual, such lists are always subjective, subject to refinement and revision, and under scutiny from colleagues and other readers.  As usual, James Bach is a harsh critic (and that's a compliment, not a complaint).  We're still transpecting over some of the points; eventually we'll come out with something on which we agree.&lt;br /&gt;&lt;br /&gt;Joe Harter, &lt;a href="http://joeharterqa.wordpress.com"&gt;in his blog&lt;/a&gt;, suggests splitting "Pause That Refreshes" into two:  "Change in Priorities" and "&lt;strong style="font-weight: normal;"&gt;Lights are Off&lt;/strong&gt;".  The former kicks in when we know that there's still testing to be done, but something else is taking precedence.  The latter is that we've lost our sense of purpose&amp;mdash;as I suggested in the original post we might be tired, or bored, or uninspired to test and that a break will allow us to return to the product later with fresh eyes or fresh minds.  Maybe they're different enough that they belong in different categories, and I'm thinking that they are.  Joe provides a number of examples of why the lights go out; one feels to me like "customary conclusion", another looks like "Mission Accomplished".  But his third point is interesting:  it's a form of &lt;a href="http://en.wikipedia.org/wiki/Parkinson%27s_Law"&gt;Parkinson's Law&lt;/a&gt;, "work expands to fill the time available for its completion".  Says Joe, &lt;span style="font-style: italic;"&gt;"The test team might be given more time than is actually necessary to test a  feature so they fill it up with old test cases that don’t have much meaning."&lt;/span&gt;  I'm not sure how often people feel as though they have more time than they need, but I am sure that I've seen (been in) situations where people seem to be bereft of new ideas and simply going through the motions.  So:  if that feeling comes up, one should consider Parkinson's Law and a Pause That Refreshes.  Maybe there's a new one there. But as Joe himself points out, "In the end it doesn’t matter if you use [Michael's] list, my list or any list at all.   These heuristics are rules of thumb to help thinking testers decide when testing  should stop.  The most important thing is that you are thinking about it."&lt;br /&gt;&lt;br /&gt;For sure, however, there is a glaring omission in the original list.  Cem Kaner pointed it out to me&amp;mdash;and that shouldn't have been necessary, because I've used this heuristic myself.  It focuses on the individual tester, but it might also apply to a testing or development team. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Mission Rejected&lt;/span&gt;.  &lt;span style="font-style: italic;"&gt;We stop testing when we perceive a problem for some person&amp;mdash;in particular, an ethical issue&amp;mdash;that prevents us from continuing work on a given test, test cycle, or development project&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Would you continue a test if it involved providing fake test results?  Lying?  Damaging valuable equipment? Harming a human, as in the &lt;a href="http://en.wikipedia.org/wiki/Milgram_experiment"&gt;Milgram Experiment&lt;/a&gt; or the &lt;a href="http://en.wikipedia.org/wiki/Stanford_prison_experiment"&gt;Stanford Prison Experiment&lt;/a&gt;?  Maybe the victim isn't the test subject, but the client:   Would you continue a test if you believed that some cost of what you were doing&amp;mdash;including, perhaps, your own salary&amp;mdash;were grossly disproportionate to the value it produced?  Maybe the victim is you:   Would you stop testing if you believed that the client wasn't paying you enough?&lt;br /&gt;&lt;br /&gt;The consequences of ignoring this heuristic can be dire.  Outside the field of software testing, but in testing generally, a friend of mine worked in a science lab that did experiments on bone regeneration.  The experimental protocol involved the surgical removal of around one inch of bone from both forelegs of a dog (many dogs, over the course of the research), treating one leg as an experiment and the other as a control.  Despite misgivings, my friend was reasonably convinced of the value of the work.  Later, when he found out that these experiments had been performed over and over, and that no new science was really being done, he suffered a nervous breakdown and left the field.  Sometimes testing doesn't have a happy ending.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-485929998141654503?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/10/when-do-we-stop-testing-one-more-sure.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-5756836231544969641</guid><pubDate>Wed, 30 Sep 2009 20:42:00 +0000</pubDate><atom:updated>2009-09-30T16:05:36.155-05:00</atom:updated><title>Context-free Questions For Testing and Checking</title><description>After a presentation on exploratory approaches and on &lt;a href="http://www.developsense.com/2009/09/tests-vs-checks-motive-for.html"&gt;testing vs. checking&lt;/a&gt; yesterday, a correspondent and old friend writes:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Although the presentation made good arguments for exploratory testing, I am not sure a small QA department can spare the resources unless a majority of regression checking can be moved to automation. Particularly in situations with short QA cycles.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;(Notice that he and I are using "testing" and "checking" &lt;a href="http://www.developsense.com/2009/08/testing-vs-checking.html"&gt;in this specific way&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Any time someone makes an observation about what is or isn't possible, irrespective of the kind of testing (or checking) that they're doing, it suggests some questions for the testing, programming, and management teams.  I'd ask my old friend&lt;br /&gt;&lt;br /&gt;1) How much checking do you &lt;span style="font-style: italic;"&gt;need&lt;/span&gt; to do?&lt;br /&gt;&lt;br /&gt;2) What, specifically, suggests that checking needs to be done? What happens when you do it?  What doesn't happen when you do it?  What happens when you don't do it?  What doesn't happen when you don't do it?&lt;br /&gt;&lt;br /&gt;3) What specifically, might suggest that the testers are the best people to do the checking?  What, specifically, might suggest that they &lt;span style="font-style: italic;"&gt;aren't&lt;/span&gt; the best people to do it?&lt;br /&gt;&lt;br /&gt;4) Where do your testers spend their time?  When you speak with the people who are actually testing, do they feel the time that they're spending on checking is worthwhile?  Do they have things to say about what slows down testing (or checking)?&lt;br /&gt;&lt;br /&gt;5) What are the risks that checking addresses well?  What risks are not addressed well by checking?&lt;br /&gt;&lt;br /&gt;These are open questions that &lt;span style="font-style: italic;"&gt;all&lt;/span&gt; teams can ask, regardless of the approach they're using now. Feel free to replace the word "checking" with "testing", and vice versa, wherever you like.&lt;br /&gt;&lt;br /&gt;I encourage and, when asked, help people to ask and answer these questions, and others like them.  I have no specific answers from the outset; I don't know you, and I don't know your context.  But &lt;span style="font-style: italic;"&gt;you do&lt;/span&gt;.  Maybe the questions can be helpful to you.  I hope so.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-5756836231544969641?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/09/context-probing-questions-for-testing.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-6162092213541034058</guid><pubDate>Tue, 29 Sep 2009 19:51:00 +0000</pubDate><atom:updated>2009-10-09T11:19:45.372-05:00</atom:updated><title>A Letter To The Programmer</title><description>&lt;i&gt;This is a letter that I would not show to a programmer in a real-life situation.  I've often thought of bits of it at a time, and those bits come up in conversation occasionally, but not all at once.&lt;br /&gt;&lt;br /&gt;This is based on an observation of the chat window in Skype 4.0.0.226.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Dear Programmer,&lt;br /&gt;&lt;br /&gt;I discovered a bug today. I'll tell you how I found it. It's pretty easy to reproduce. There's this input field in our program. I didn't know what the intended limit was. It was documented somewhere, but that part of the spec got deleted when the CM system went down last week. I could have asked you, but you were downstairs getting another latte.&lt;br /&gt;&lt;br /&gt;Plus, it's really quick and easy to find out empirically; quicker than looking it up, quicker than asking you, even if you were here. There's this tool called PerlClip that allows me to create strings that look like this&lt;br /&gt;&lt;br /&gt;*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*...&lt;br /&gt;&lt;br /&gt;As you'll notice, the string itself tells you about its own length. The number to the left of each asterisk tells you the offset position of that asterisk in the string. (You can use whatever character you like for a delimiter, including letters and numbers, so that you can test fields that filter unwanted characters.)&lt;br /&gt;&lt;br /&gt;It takes a handful of keystrokes to generate a string of tremendous length, millions of characters. The tool automatically copies it to the Windows clipboard, whereupon you can paste it into an input field. Right away, you get to see the apparent limit of the field; find an asterisk, and you can figure out in a moment exactly how many characters it accepts. It makes it easy to produce all kinds of strings using Perl syntax, which saves you having to write a line of Perl script to do it and another few lines to get it into the clipboard. In fact, you can give PerlClip to a less-experienced tester that doesn't know Perl syntax at all (yet), show them a few examples and the online help, and they can get plenty of bang for the buck. They get to learn something about Perl, too. This little tool is like a keychain version of a Swiss Army knife for data generation. It's dead handy for analyzing input constraints.  It allows you to create all kinds of cool patterns, or data that describes itself, and you can store the output wherever you can paste from the clipboard. Oh, and it's free.&lt;br /&gt;&lt;br /&gt;You can get a copy of PerlClip &lt;a href="http://www.satisfice.com/tools.shtml"&gt;here&lt;/a&gt;, by the way. It was written by &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt; and &lt;a href="http://www.tejasconsulting.com/"&gt;Danny Faught&lt;/a&gt;. The idea started with a Perl one-liner by Danny, and they build on each other's ideas for it.  I don't think it took them very long to write it. Once you've had the idea, it's a pretty trivial program to implement. But still, kind of a cool idea, don't you think?&lt;br /&gt;&lt;br /&gt;So anyway, I created a string a million characters long, and I pasted it into the chat window input field. I saw that the input field apparently accepted 32768 characters before it truncated the rest of the input. So I guess your limit is 32768 characters.&lt;br /&gt;&lt;br /&gt;Then I pressed "Send", and the text appeared in the output field. Well, not all of it. I saw the first 29996 characters, and then two periods, and then nothing else.  The rest of the text had vanished.&lt;br /&gt;&lt;br /&gt;That's weird.  It doesn't seem like a big deal, does it?  Yet there's this thing called &lt;span style="font-style: italic;"&gt;representativeness bias&lt;/span&gt;. It's a critical thinking error, the phenomenon that causes us to believe that a big problem always looks big from every angle, and that an observation of a problem with little manifestations always has little consequences.&lt;br /&gt;&lt;br /&gt;Our biases are influenced by our world views. For example, last week when that tester found that crash in that critical routine, everyone else panicked, but you realized that it was only a one-byte fix and we were back in business within a few minutes. It also goes the other way, though: something that looks trivial or harmless can have dire and shocking consequences, made all the more risky because of the trivial nature of the symptom.  If we think symptoms and problems and fixes are all alike in terms of significance, when we see a trivial &lt;i&gt;symptom&lt;/i&gt;,  no one bothers to investigate the &lt;i&gt;problem&lt;/i&gt;. &lt;i&gt;It's only a little rounding error, and it only happens on one transaction in ten, and it only costs half a cent at most.&lt;/i&gt; When that rounding error is multiplied over hundreds of transactions a minute, tens of thousands an hour... well you get the point.&lt;br /&gt;&lt;br /&gt;I'm well aware that, as a test, this is a toy. It's like a security check where you rattle the doorknob. It's like testing a car by kicking the tires. And the result that I'm seeing is like the doorknob falling off, or the door opening, or a tire suddenly hissing. For a tester, this is a mere bagatelle. It's a &lt;i&gt;trivial&lt;/i&gt; test.  Yet when a trivial test reveals something that we can't explain immediately, it might be good idea to seek an explanation.&lt;br /&gt;&lt;br /&gt;A few things occurred to me as possibilities.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The first one is that someone, somewhere, is missing some kind of internal check in the code. Maybe it's you; maybe it's the guy who wrote the parser downstream, maybe it's the guy that's writing the display engine. But it seems to me as though you figured that &lt;i&gt;you&lt;/i&gt; could send 32768 bytes, &lt;i&gt;someone else&lt;/i&gt; has a limit of 29998 bytes. Or 29996, probably. Well, maybe.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe one of you isn't aware of the published limits of the third-party toolkits you're using. That wouldn't be the first time.  It wouldn't necessarily be negligence on your part, either—the docs for those toolkits are terrible, I know.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe the published limit is available, but there's simply a &lt;i&gt;bug&lt;/i&gt; in one of those toolkits. In that case, maybe there isn't a big problem here, but there's a much bigger problem that the toolkit causes elsewhere in the code.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe you're not using third-party toolkits. Maybe they're toolkits that we developed here. Mind you, that's exactly the same as the last problem; if you're not aware of the limits, or if there's a bug, &lt;i&gt;who produced&lt;/i&gt; the code has no bearing on the &lt;i&gt;behaviour&lt;/i&gt; of the code.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe you're not using toolkits at all, for any given function. Mind you, that doesn't change the nature of the problems above either.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe some downstream guy is truncating everything over 29996 bytes, placing those two dots at the end, and ignoring everything else, and and he's not sending a return value to you to let you know that he's doing it.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe he &lt;i&gt;is&lt;/i&gt; sending you a return value, but the wrong one.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe he's sending you a return value, and you're ignoring it.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe he's sending you a return value, and you are paying attention to it, but there's some confusion about what it means and how it should be handled.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe you're truncating the last two and a half kilobytes or so of data before you send it on, and we're not telling the user about it. Maybe that's your intention.  Seems a little rude to me to do that, but to you, it works as designed. To some user, it &lt;i&gt;doesn't&lt;/i&gt; work—as designed.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe there's no one else involved, and it's just you working on all those bits of the code, but the program has now become sufficiently complex that you're unable to keep everything in your head. That stands to reason; it &lt;i&gt;is&lt;/i&gt; a complicated program, with lots of bits and pieces.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe you're depending on unit tests to tell you if anything is wrong with the individual functions or objects. But maybe nothing &lt;i&gt;is&lt;/i&gt; wrong with any particular one of them in isolation; maybe it's the interaction between them that's problemmatic.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe you don't have any unit tests &lt;i&gt;at all&lt;/i&gt;. &lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe you &lt;i&gt;do&lt;/i&gt; have unit tests for this stuff.  From right here, I can't tell. If you do have them, I can't tell whether your checks are really great and you just missed one this time, or if you missed a few, or if you missed a bunch of them, or whether there's a ton of them and they're all really lousy.&lt;/li&gt;&lt;br /&gt;Any of the above explanations could be in play, many of them simultaneously. No matter what, though, all your unit tests could pass, and you'd never know about the problem until we took out all the mocks and hooked everything up in the real system. Or deployed into the field.  (Actually, by now they're not &lt;a href="http://www.developsense.com/2009/09/tests-vs-checks-should-we-call-test.html"&gt;unit tests&lt;/a&gt;; they're just unit checks, since it's a while since this part of the code was last looked at and we've been seeing green bars for the last few months.)&lt;br /&gt;&lt;/ul&gt;For any one of the cases above, since it's so easy to test and check for these things, I would think that if you or anyone else knew about this problem, your sense of professionalism and craftsmanship would tell you to do some testing, write some checks, and fix it. After all, as Uncle Bob Martin said, you guys don't want us to find &lt;i&gt;any&lt;/i&gt; bugs, right?&lt;br /&gt;&lt;br /&gt;But it's not my place to say that.  All that stuff is up to you.  I don't tell you how to do your work; I tell you what I observe, in this case entirely from the outside.  Plus it's only one test. I'll have to do a few more tests to find out if there's a more general problem. Maybe this is an aberration.&lt;br /&gt;&lt;br /&gt;Now, I know you're fond of saying, "No user would ever do that."  I think what you really mean is no user &lt;span style="font-style: italic;"&gt;that you've thought of&lt;/span&gt;, and &lt;span style="font-style: italic;"&gt;that you like&lt;/span&gt;, would do that &lt;span style="font-style: italic;"&gt;on purpose&lt;/span&gt;.  But it might be a thought to consider users that you haven't thought of, however unlikely they and their task might be to you.  It could be a good idea to think of users that neither one of us like, such as hackers or identity thieves.  It could also be important to think of users that you &lt;span style="font-style: italic;"&gt;do&lt;/span&gt; like who would do things &lt;span style="font-style: italic;"&gt;by accident&lt;/span&gt;.  People make mistakes all the time.  In fact, by accident, I pasted the text of this message into another program, just a second ago.&lt;br /&gt;&lt;br /&gt;So far, I've only talked about the source of the problem and the trigger for it. I haven't talked much about possible consequences, or risks. Let's consider some of those.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A customer could lose up to 2770 bytes of data. That actually sounds like a low-risk thing, to me. It seems pretty unlikely that someone would type or paste that much data in any kind of routine way.  Still, I did hear from one person that they like to paste stack traces into a chat window.  You responded rather dismissively to that.  It does sound like a corner case.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Maybe you don't report truncated data as a matter of course, and there are tons of other problems like this in the code, in places that I'm not yet aware of or that are invisible from the black box.  Not this problem, but a problem with the same kind of cause could lead to a much more serious problem than this unlikely scenario.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe there is a consistent pattern of user interface problems where the internals of the code handle problems but don't alert the user, even though the user might like to know about them.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe there's a buffer overrun. That worries me more—a lot more—than the stack trace thing above.  You remember that this kind of problem used to be dismissed as a "corner case" back when we worked at Microsoft—and then how Microsoft shut down new product development spent two months on investigating these kinds of problems, back in the spring of 2002?  Hundreds of worms and viruses and denial of service attacks stem from problems whose outward manifestation looked exactly as trivial as this problem.  There are variations on it.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe there's a buffer overrun that would allow other users to view a conversation that my contact and I would like to keep between ourselves.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe an appropriately crafted string could allow hackers to get at some of my account information.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe an appropriately crafted string could allow hackers to get at &lt;i&gt;everyone&lt;/i&gt;'s account information.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe there's a vulnerability that allows access to system files, as the Blaster worm did.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Maybe the product is now unstable, and there's a crash about to happen that hasn't yet manifested itself.  We never know for sure if a test is finished.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Here's something that I think is more troubling, and perhaps the biggest risk of all.  Maybe, by blowing off this report, you'll discourage testers from reporting a similarly trivial symptom of a much more serious problem.  In a meeing a couple of weeks ago, the last time a tester reported something like this, you castigated her in public for the apparently trivial nature of the problem.  She was embarrassed and intimidated.  These days she doesn't report anything except symptoms that she thinks you'll consider sufficiently dramatic.  In fact, just yesterday she saw something that she thought to be a pretty serious performance issue, but she's keeping mum about it.  Some time several weeks from now, when we start to do thousands or millions of transactions, you may find yourself wishing that she had felt okay about speaking up today.  Or who knows; maybe you'll just ask her why she didn't find that bug.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;NASA calls this last problem "the normalization of deviance".  In fact, this tiny little inconsistency reminds me of the Challenger problem. Remember that? There were these O-rings that were supposed to keep two chambers of highly-pressurized gases separate from each other. It turns out that on seven of the shuttle flights that preceded the Challenger, these O-rings burned through a bit and some gases leaked (they called this "erosion" and "blow-by"). Various managers managed to convince themselves that it wasn't a problem, because it only happened on about a third of the flights, and the rings, at most, only burned a third of the way through.  Because these "little" problems didn't result in catastrophe the first seven times, NASA managers used this as evidence for safety.  Every successful flight that had the problem was taken as reassurance that NASA could get away with it.  In that sense, it was like Nassim Nicholas Taleb's turkey, who increases his belief in the benevolence of the farmer every day... until some time in the week before Thanksgiving.&lt;br /&gt;&lt;br /&gt;Richard Feynman, in his &lt;i&gt;Appendix to the Rogers Commission Report on the Space Shuttle Challenger Accident&lt;/i&gt;, nailed the issue:&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;The phenomenon of accepting for flight, seals that had shown erosion and blow-by in previous flights, is very clear. The Challenger flight is an excellent example. There are several references to flights that had gone before. The acceptance and success of these flights is taken as evidence of safety. But erosion and blow-by are not what the design expected. They are warnings that something is wrong. The equipment is not operating as expected, and therefore there is a danger that it can operate with even wider deviations in this unexpected and not thoroughly understood way. The fact that this danger did not lead to a catastrophe before is no guarantee that it will not the next time, unless it is completely understood. &lt;b&gt;When playing Russian roulette the fact that the first shot got off safely is little comfort for the next.&lt;/b&gt;&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;That's the problem with any evidence of any bug, at first observation; we only know about a &lt;i&gt;symptom&lt;/i&gt;, not the &lt;span style="font-style: italic;"&gt;cause&lt;/span&gt;, and not the &lt;i&gt;consequences&lt;/i&gt;. When the system is in an unpredicted state, it's in an &lt;i&gt;unpredictable&lt;/i&gt; state.&lt;br /&gt;&lt;br /&gt;Software is wonderfully deterministic, in that it does exactly what we tell it to do. But, as you know, there's sometimes a big difference between what we tell it to do and what we &lt;i&gt;meant&lt;/i&gt; to tell it to do. When software does what we tell it to do instead of what we meant, we find ourselves off the map that we drew for ourselves. And once we're off the map, we don't know where we are.&lt;br /&gt;&lt;br /&gt;According to Wikipedia, &lt;cite&gt;Feynman's investigations also revealed that there had been many serious doubts raised about the O-ring seals by engineers at Morton Thiokol, which made the solid fuel boosters, but communication failures had led to their concerns being ignored by NASA management. He found similar failures in procedure in many other areas at NASA, but singled out its software development for praise due to its rigorous and highly effective quality control procedures - then under threat from NASA management, which wished to reduce testing to save money given that the tests had always been passed.&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;At NASA, back then, the software people realized that just because their checks were passing, it didn't mean that they should relax their diligence. They realized that what really reduced risk on the project was appropriate testing, lots of tests, and paying attention to seemingly inconsequential failures.&lt;br /&gt;&lt;br /&gt;I know we're not sending people to the moon here.  Even though we don't know the consequences of this inconsistency, it's hard to conceive of anyone dying because of it.  So let's make it clear:  I'm not saying that the sky is falling, and I'm not making a value judgment as to whether we should fix it.  That stuff is for you and the project managers to decide upon.  It's simply my role to observe it and report it.&lt;br /&gt;&lt;br /&gt;I think it might be important, though, for us to understand &lt;span style="font-style: italic;"&gt;why&lt;/span&gt; the problem is there in the first place.  That's because I don't know whether the problem that I'm seeing is a big deal.  And the thing is, until you've looked at the code, &lt;b&gt;&lt;i&gt;neither do you&lt;/i&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;As always, it's your call.  And as usual, I'm happy to assist you in running whatever tests you'd like me to run on your behalf.  I'll also poke around and see if I can find any other surprises.&lt;br /&gt;&lt;br /&gt;Your friend,&lt;br /&gt;&lt;br /&gt;The Tester&lt;br /&gt;&lt;br /&gt;&lt;i&gt;P.S. I &lt;b&gt;did&lt;/b&gt; run a second test. This time, I used PerlClip to craft a string of 100000 instances of :).  That pair of characters, in normal circumstances, results in a smiley-face emoticon.  It seemed as though the input field accepted the characters literally, and then converted them to the graphical smiley face.  It took a long, long time for the input field to render this.  I thought that my chat window had crashed, but it hadn't.  Eventually it finished processing, and displayed what it had parsed from this odd input.  I didn't see 32768 smileys, nor 29996, nor 16384, nor 14998.  I saw exactly two dots.  Weird, huh?&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-6162092213541034058?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/09/letter-to-programmer.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>6</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-3902904352272988773</guid><pubDate>Mon, 28 Sep 2009 18:42:00 +0000</pubDate><atom:updated>2009-09-29T00:47:02.660-05:00</atom:updated><title>Should We Call Test-Driven Development Something Else?</title><description>In &lt;a href="http://www.developsense.com/2009/08/testing-vs-checking.html"&gt;the first post in this series&lt;/a&gt;, I proposed "that those things that we usually call 'unit &lt;span style="font-weight: bold;"&gt;tests&lt;/span&gt;' be called 'unit &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt;'."  I stand by the proposal, but I should clarify something important about it.  See, it's all a matter of timing.  And, of course, &lt;a href="http://www.satisfice.com/blog/archives/99"&gt;&lt;span style="font-style: italic;"&gt;sapience&lt;/span&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;After &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt;'s blog post titled "&lt;a href="http://www.blogger.com/%3CA%20HREF="&gt;Sapience and Blowing Peoples’ Minds&lt;/a&gt;", &lt;a href="http://www.jbrains.ca/"&gt;Joe Rainsberger&lt;/a&gt; commented:&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;Sadly, the distinction between testing and checking makes describing test-driven development (TDD) somewhat awkward, because it’s a test when I write it and a check after I run it for the first time.  Am I test-driving or check-driving?&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;Joe has put his finger on something that's important:  that in &lt;a href="http://www.amazon.com/Mangle-Practice-Time-Agency-Science/dp/0226668037"&gt;the mangle of practice&lt;/a&gt;, things are constantly changing, and so are our perspectives on them.&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://www.developsense.com/2009/09/elements-of-testing-and-checking.html"&gt;The Elements of Testing and Checking&lt;/a&gt;, I broke down the process of developing, performing, and analyzing a &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt;.  The most important thing to note is that the &lt;span style="font-weight: bold;"&gt;check &lt;/span&gt;(an observation linked to a decision rule) can be performed non-sapiently, but that &lt;i&gt;everything surrounding it&lt;/i&gt;—the development and analysis of the &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt;—&lt;i&gt;is&lt;/i&gt; &lt;a href="http://www.satisfice.com/blog/archives/99"&gt;sapient&lt;/a&gt;, and is &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;.  Test-driven development is first and foremost &lt;i&gt;development&lt;/i&gt;, and development is a sapient process.  The interactive process of developing a &lt;span style="font-weight: bold;"&gt;check &lt;/span&gt;and analyzing its outcome is a sapient process; the development cycle includes having an idea, &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;it and responding to the information revealed by the &lt;span style="font-weight: bold;"&gt;test&lt;/span&gt; (the whole process), even when the result is supplied by a &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt; (an atomic part of the &lt;span style="font-weight: bold;"&gt;test &lt;/span&gt;process). TDD is an exploratory, heuristic process.  You don't know in advance what your solution is going to look like; you explore the problem space and build your solution iteratively, and you stop when you decide to stop.&lt;br /&gt;&lt;br /&gt;Several years ago, James and Jon Bach produced a set of exploratory skills, tactics, and dynamics:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Modeling&lt;/li&gt;&lt;li&gt;Resourcing&lt;/li&gt;&lt;li&gt;Chartering&lt;/li&gt;&lt;li&gt;Observing&lt;/li&gt;&lt;li&gt;Manipulating&lt;/li&gt;&lt;li&gt;Pairing (now called Collaborating)&lt;/li&gt;&lt;li&gt;Generation and Elaboration&lt;/li&gt;&lt;li&gt;Overproduction and Abandonment&lt;/li&gt;&lt;li&gt;Abandonment and Recovery&lt;/li&gt;&lt;li&gt;Refocusing (Focusing and Defocusing)&lt;/li&gt;&lt;li&gt;Alternating&lt;/li&gt;&lt;li&gt;Branching and Backtracking&lt;/li&gt;&lt;li&gt;Conjecturing&lt;/li&gt;&lt;li&gt;Recording&lt;/li&gt;&lt;li&gt;Reporting&lt;/li&gt;&lt;/ul&gt;(I believe that several other people have made contributions to the original list, including &lt;a href="http://www.kohl.ca/"&gt;Jonathan Kohl&lt;/a&gt; and &lt;a href="http://www.michaeldkelly.com/"&gt;Mike Kelly&lt;/a&gt;. I'd also include &lt;span style="font-style: italic;"&gt;tooling&lt;/span&gt;—building tools, rather than merely obtaining or resourcing them, and &lt;span style="font-style: italic;"&gt;orienteering&lt;/span&gt;—figuring out where you are in relation to where you want to be.  I think James disagrees. That's okay; good colleagues do that. The cool thing about such lists is that they can evolve as we think and learn more, and disagreeing helps us to figure out what's important eventually.  Maybe I'll drop them, or maybe James will adopt them.)&lt;br /&gt;&lt;br /&gt;The point is that these exploratory skills, tactics and dynamics apply not only to &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;, but to practically any open-ended heuristic process.  Note how TDD, done well, incorporates practically all of the stuff from James and Jon's original list, which was focused on &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;So the answer to the question in the title of this blog is this:  No; there's no need to rename TDD.  It really is &lt;span style="font-weight: bold;"&gt;test&lt;/span&gt;-driven development.&lt;br /&gt;&lt;br /&gt;As James replied to Joe,&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;Strictly speaking you are "doing &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;" by "writing &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt;", but not actually "writing &lt;span style="font-weight: bold;"&gt;tests&lt;/span&gt;." If you run the &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;unattended and accept the green bar as is, then that is not &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;. It requires absolutely no &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;skill to do that, just as you wouldn't say someone is doing programming just because they invoke a compiler and verify that the compile was successful. If the bar is NOT green, the process of investigating is &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;, as well as debugging, programming, etc.&lt;br /&gt;&lt;br /&gt;If you watch the &lt;span style="font-weight: bold;"&gt;tests &lt;/span&gt;(James means &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;here, I think --MB) run or ponder the deeper meaning of the green bar, you are doing &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;along with the &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Think of "&lt;span style="font-weight: bold;"&gt;test&lt;/span&gt;" as a verb rather than a noun, and it becomes clear that &lt;span style="font-weight: bold;"&gt;test&lt;/span&gt;-driven design is truly &lt;span style="font-weight: bold;"&gt;test&lt;/span&gt;-driven design, although the &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;is rather simplistic, based primarily on those little &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt;. Once the design is done the automated &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;become useful as change detectors against the risk of regression. They certainly aid the &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;process, despite not being &lt;span style="font-weight: bold;"&gt;tests&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Checks &lt;/span&gt;definitely do NOT drive development. Development is never a rote and non-sapient process. It's far better to say &lt;span style="font-weight: bold;"&gt;test&lt;/span&gt;-driven, because the design of the &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;is a thoughtful process.&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;So what of the earlier business about calling unit &lt;span style="font-weight: bold;"&gt;tests &lt;/span&gt;"unit &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt;"?&lt;br /&gt;&lt;br /&gt;For me, the distinction lies in the artifact—that xUnit thingy, or that rSpec assertion—and the way that you approach it.  A minor gloss on Joe's comment:  the thingy might &lt;i&gt;not&lt;/i&gt; be a &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt; after you run it the first time, especially if it &lt;i&gt;doesn't&lt;/i&gt; pass.  At that point, it is still very much part of your conscious interaction with the business of creating working code; it's &lt;a href="http://en.wikipedia.org/wiki/Figure_and_ground_%28media%29"&gt;&lt;span style="font-style: italic;"&gt;figure&lt;/span&gt;&lt;/a&gt;, rather than &lt;span style="font-style: italic;"&gt;ground&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;After you've solved the problem that your unit of code is intended to solve, the thingy's prominence fades from figure into &lt;a href="http://en.wikipedia.org/wiki/Figure_and_ground_%28media%29"&gt;ground&lt;/a&gt;.  You're no longer really paying much attention to it. There's no design activity going on with respect to it, it gets performed automatically and non-sapiently, and its result gets ignored, especially when the result is positive and aggregated with dozens, hundreds, or thousands of other positive results.  At that point, it's no longer shedding any particular cognitive light on what you're doing, and  its &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;power has faded into a single pixel in a pale green glow.  It's now a &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt;, no longer a &lt;span style="font-weight: bold;"&gt;test &lt;/span&gt;but a change detector. In fact, you might think of "&lt;span style="font-weight: bold;"&gt;check&lt;/span&gt;" as an abbreviation for "&lt;span style="font-weight: bold;"&gt;ch&lt;/span&gt;ange det&lt;span style="font-weight: bold;"&gt;ec&lt;/span&gt;tor".  The change from a &lt;span style="font-weight: bold;"&gt;test &lt;/span&gt;to a &lt;span style="font-weight: bold;"&gt;check &lt;/span&gt;is a kind of reverse metamorphosis, as though an intriuging, fluttering butterfly has turned into a not-very-interesting, ponderous little green caterpillar.  That's not to say that it's not an important part of the Great Chain of Being; just that we tend not to pay much attention to it.  However, we might pay more attention to the caterpillar when it's &lt;span style="font-style: italic;"&gt;red&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;As I've said repeatedly, what you call them is less important than how you think of them.   As James says,&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;I wouldn't insist that people change their ordinary language. I see no problem calling whales "fish" or spiders "insects" in everyday life. But sometimes it matters to be precise. We should be ABLE to make strict distinctions when trying to help ourselves, or others, master our craft.&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;At Agile 2009, Joe pointed out that if we can produce more code with fewer errors in it, we can get our products to real testing, and then to market more quickly.  And that means that we can get paid sooner.  So I agree with Joe here, too:&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;I have to admit I like the pun of Check-Driven Development, even if it only works in American English.&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;The Testing vs. Checking Series&lt;/h2&gt;1. &lt;a href="http://www.developsense.com/2009/08/testing-vs-checking.html"&gt;Testing vs. Checking&lt;/a&gt;&lt;br /&gt;2. &lt;a href="http://www.developsense.com/2009/09/transpection-and-three-elements-of.html"&gt;Transpection and the Three Elements of Checking&lt;/a&gt;&lt;br /&gt;3. &lt;a href="http://www.developsense.com/2009/09/pass-vs-fail-vs-is-there-problem-here.html"&gt;Pass vs. Fail vs. Is There a Problem Here?&lt;/a&gt;&lt;br /&gt;4. &lt;a href="http://www.developsense.com/2009/09/elements-of-testing-and-checking.html"&gt;Elements of Testing and Checking&lt;/a&gt;&lt;br /&gt;5. &lt;a href="http://www.developsense.com/2009/09/testing-checking-and-changing-language.html"&gt;Testing, Checking, and Changing the Language&lt;/a&gt;&lt;br /&gt;6. &lt;a href="http://www.developsense.com/2009/09/tests-vs-checks-motive-for.html"&gt;Tests vs. Checks:  The Motive for Distinguishing&lt;/a&gt;&lt;br /&gt;7. &lt;a href="http://www.developsense.com/2009/09/tester-asks-about-checking.html"&gt;A Tester Asks About Checking&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Related:  &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt; on &lt;a href="http://www.satisfice.com/blog/archives/358"&gt;Sapience and Blowing People's Minds&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-3902904352272988773?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/09/tests-vs-checks-should-we-call-test.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-8306612885372468182</guid><pubDate>Mon, 21 Sep 2009 14:22:00 +0000</pubDate><atom:updated>2009-09-28T14:33:01.376-05:00</atom:updated><title>A Tester Asks About Checking</title><description>In a previous comment, Sunjeet asks&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Does not &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;encompass &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt;?  Can &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;alone be efficient without doing any &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt;?&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;As I hope I made it clear in &lt;a href="http://www.developsense.com/2009/09/elements-of-testing-and-checking.html"&gt;Elements of Testing and Checking&lt;/a&gt;, the development and analysis of &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;is surrounded by plenty of &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;activity, and &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;may include a good deal of &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt;. &lt;span style="font-weight: bold;"&gt; Testing&lt;/span&gt;, I think, can be vastly more efficient if we consider the ways in which &lt;span style="font-weight: bold;"&gt;checking &lt;/span&gt;can be helpful.  Cem Kaner, in his 2004 paper &lt;span style="font-style: italic;"&gt;The Ongoing Revolution in Software Testing&lt;/span&gt;, said this:&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;I think the recent advances in unit testing (Astels, 2003; Rainsberger, 2004) have been the most exciting progress that I've seen in testing in the last 10 years.&lt;br /&gt;&lt;br /&gt;With programmer-created and programmer-maintained change detectors (&lt;span font="normal"&gt;that is, &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt;  -MB&lt;/span&gt;):&lt;br /&gt;• There is a near-zero feedback delay between the discovery of a problem and awareness of the programmer who created the problem&lt;br /&gt;• There is a near-zero communication cost between the discover of the problem and the programmer who will either fix the bug or fix the [&lt;b&gt;check&lt;/b&gt;]&lt;br /&gt;• The [&lt;b&gt;check&lt;/b&gt;] is tightly tied to the underlying coding problem, making troubleshooting much cheaper and easier than system-level &lt;b&gt;testing&lt;/b&gt;.&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;And I agree.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Should testers shun &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt;?  Why not call &lt;span style="font-weight: bold;"&gt;checking &lt;/span&gt;as "confirmative &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;"?&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;There is a role for &lt;span style="font-weight: bold;"&gt;testers &lt;/span&gt;to program and perform &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;where cost is low and value is high, but I think that if practices associated with XP really begin to take hold, in the long run &lt;span style="font-style: italic;"&gt;it behooves &lt;span style="font-weight: bold;"&gt;testers &lt;/span&gt;to get out of the &lt;span style="font-weight: bold;"&gt;checking &lt;/span&gt;business.&lt;/span&gt;  That's because the vast bulk of the &lt;span style="font-weight: bold;"&gt;checking &lt;/span&gt;work will be done by programmers; because &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;at the system level tend to be time-consuming, error-prone, and expensive when performed by humans (and expensive to automate when not performed by humans); because they drive humans to inattentional blindness.&lt;br /&gt;&lt;br /&gt;But there's another reason:  "confirmative testing" &lt;i&gt;isn't really &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;&lt;/i&gt;; it's &lt;i&gt;confirming&lt;/i&gt;.  It's looking at a white swan and saying, "All swans are white;"  at another and saying, "See?  All swans are white;" and at yet another and saying "Just like I told you; all swans are white."  There's an analogy in software, "It works on my machine."  "See?  It works on my machine."  "Just like I told you; it works on my machine."  To find problems in a product, which is one of the key goals of &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;, we need to get out of the confirmatory mindset.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;I confirm that the exact problem is fixed - by exactly executing the steps mentioned in the bug report - by this I confirm that the bug and only that bug is fixed or not.  Brainless?  Yes... a machine could have done the same... yes BUT is this required ...YES we might not be deriving new quality value from it but we are CONFIRMING existing quality info from it...&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Let me suggest an alternative way of looking at this.&lt;br /&gt;&lt;br /&gt;In an environment where the bug report is vague &lt;i&gt;or&lt;/i&gt; your programmers are known or suspected of being unreliable with their bug fixes &lt;i&gt;or&lt;/i&gt; you know that the programmer has not created an automated &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt;, then what you're describing might indeed be a very good idea.  (When &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;the fix, I might start by trying to reproduce the problem as exactly as I could, but I might also start with a slight variation on the problem to see if the general case has been fixed.  Either way, I'll likely end up performing a &lt;span style="font-weight: bold;"&gt;check &lt;/span&gt;to see if the special case of the problem has been fixed.)&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Task 2 i do is ...i look out for side effects ...regression...new &lt;span style="font-weight: bold;"&gt;test &lt;/span&gt;ideas ...execute more tests etc i.e. all the pillars of ET...&lt;/i&gt; &lt;span style="font-style: italic;"&gt;How is task 2 useful without 1?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you're following exactly the steps mentioned in the bug report &lt;i&gt;and&lt;/i&gt; the programmer has fixed the problem &lt;i&gt;and&lt;/i&gt; the programmer has already set up an automated &lt;span style="font-weight: bold;"&gt;check &lt;/span&gt;for the problem, then what you're doing is reproducing the programmer's effort (and the machine's effort) in performing a &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt;, when it might be much more valuable to use your sapient skills and &lt;i style="font-weight: bold;"&gt;test&lt;/i&gt;.  Note that I &lt;span style="font-style: italic;"&gt;cannot&lt;/span&gt; decide either way.  I'm not in your context or your immediate situation.  But &lt;i&gt;you&lt;/i&gt; can.  To me, there's at least one clear circumstance in which it would be greatly more valuable for you to focus on Task 2:  &lt;i&gt;When someone has already done Task 1&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;2. Should i tell my lead/manager...buddy I am just a tester find a checker to do this! or get a machine to do this? if a machine needs to do this who is going to code/script a machine to do this ...wont that be a tester himself?&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;In answer to the first question, let me ask this:  &lt;i&gt;Are you a &lt;span style="font-weight: bold;"&gt;tester&lt;/span&gt; or a &lt;span style="font-weight: bold;"&gt;checker&lt;/span&gt;?&lt;/i&gt;  Again, I don't know.  The quality of the questions you're asking suggest to me that you're a &lt;span style="font-weight: bold;"&gt;tester&lt;/span&gt;; that is, you're not accepting what I'm saying blindly, nor are you rejecting it out of hand.   You're thinking critically about the idea, even if I may have &lt;a href="http://www.satisfice.com/blog/?p=358"&gt;blown your mind&lt;/a&gt; at first.&lt;br /&gt;&lt;br /&gt;Either way, I wouldn't advise telling your lead or your manager that you're refusing to do work.  But thinking in terms of &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;vs. &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt;, if you so choose, might trigger a productive conversation between you about the relative cost and value of the activities that you (and the programmers) are doing.  It might indeed make more sense to get a machine to do the &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt; work—and for the programmers to insert some change detectors and take greater responsibility for the quality of their code, an idea that was one of the triggers for Extreme Programming and the Agile movement.&lt;br /&gt;&lt;br /&gt;As for who does the programming, note the passage from the very posting upon which you commented:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;"When someone asks, 'Can't we hire pretty much any programmer to write our &lt;span style="font-weight: bold;"&gt;test&lt;/span&gt; automation code?', we can point out that the quality of &lt;span style="font-weight: bold;"&gt;checking &lt;/span&gt;is conditioned largely by the quality of the &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;work that surrounds it, and emphasize that creating excellent &lt;span style="font-weight: bold;"&gt;checks &lt;/span&gt;requires excellent &lt;span style="font-weight: bold;"&gt;testing &lt;/span&gt;skill, in addition to programming skill."&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Thank you for your comments and questions.&lt;br /&gt;&lt;h2&gt;The Testing vs. Checking Series&lt;/h2&gt;1. &lt;a href="http://www.developsense.com/2009/08/testing-vs-checking.html"&gt;Testing vs. Checking&lt;/a&gt;&lt;br /&gt;2. &lt;a href="http://www.developsense.com/2009/09/transpection-and-three-elements-of.html"&gt;Transpection and the Three Elements of Checking&lt;/a&gt;&lt;br /&gt;3. &lt;a href="http://www.developsense.com/2009/09/pass-vs-fail-vs-is-there-problem-here.html"&gt;Pass vs. Fail vs. Is There a Problem Here?&lt;/a&gt;&lt;br /&gt;4. &lt;a href="http://www.developsense.com/2009/09/elements-of-testing-and-checking.html"&gt;Elements of Testing and Checking&lt;/a&gt;&lt;br /&gt;5. &lt;a href="http://www.developsense.com/2009/09/testing-checking-and-changing-language.html"&gt;Testing, Checking, and Changing the Language&lt;/a&gt;&lt;br /&gt;6. &lt;a href="http://www.developsense.com/2009/09/tests-vs-checks-motive-for.html"&gt;Tests vs. Checks:  The Motive for Distinguishing&lt;/a&gt;&lt;br /&gt;7. &lt;a href="http://www.developsense.com/2009/09/tester-asks-about-checking.html"&gt;A Tester Asks About Checking&lt;/a&gt;&lt;br /&gt;8. &lt;a href="http://www.developsense.com/2009/09/tests-vs-checks-should-we-call-test.html"&gt;Tests vs. Checks:  Should We Call Test-Driven Development Something Else?&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Related:  &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt; on &lt;a href="http://www.satisfice.com/blog/archives/358"&gt;Sapience and Blowing People's Minds&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-8306612885372468182?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/09/tester-asks-about-checking.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-5539288339784430154</guid><pubDate>Fri, 18 Sep 2009 14:55:00 +0000</pubDate><atom:updated>2009-09-28T14:35:00.579-05:00</atom:updated><title>Tests vs. Checks:  The Motive for Distinguishing</title><description>The word "criticism" has several meanings and connotations.  To criticize, these days, often means to speak reproachfully of someone or something, but criticism isn't always disparaging.  Way, way back when, I studied English literature, and read the work of many critics.  Literary critics and film critics aren't people who merely criticize, as we use the word in common parlance.  Instead, the role of the critic is to &lt;i&gt;contextualize&lt;/i&gt;—to observe and evaluate things, to shine light on some work so as to help people understand it better.&lt;br /&gt;&lt;br /&gt;So when I say that Dale Emery is a critic, that's a compliment.  On the subject of &lt;b&gt;testing&lt;/b&gt; vs. &lt;b&gt;checking&lt;/b&gt;, Dale recently remarked to me, "I think I understand the distinction. I don't yet understand what problem you're trying to solve with your specific choice of terminology.  Not the implications, but the problem."  That's an excellent critical statement, in that Dale is not disparaging, but he's trying to tell me something that I need to recognize and deal with.&lt;br /&gt;&lt;br /&gt;My answer is that sometimes having different vocabulary allows us to recognize a problem and its solution more easily.  As Jerry Weinberg says, "A rose by any other name should smell as sweet, yet nobody can seriously doubt that we are often fooled by the names of things."  (&lt;i&gt;An Introduction to General Systems Thinking&lt;/i&gt;, p. 74).  He also says "If we have limited memories, &lt;i&gt;decomposing&lt;/i&gt; a system into noninteracting parts may enable us to predict behavior better than we could without the decomposition.  This is the method of science, which would not be necessary were it not for our limited brains." (&lt;i&gt;ibid&lt;/i&gt;, p. 134).&lt;br /&gt;&lt;br /&gt;The problem I'm trying to address, then, is that the word &lt;b&gt;test&lt;/b&gt; lumps a large number of concepts into a single word, and &lt;b&gt;testing&lt;/b&gt; lumps a similarly large number of activities together.  As &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt; &lt;a href="http://www.satisfice.com/blog/?p=358"&gt;suggests&lt;/a&gt;, compiling is part of the activity of programming, yet we don't mistake compiling for programming, nor do we mistake the compiler for the programmer.&lt;br /&gt;&lt;br /&gt;If we have a conceptual item called a &lt;b&gt;check&lt;/b&gt;, or an activity called &lt;b&gt;checking&lt;/b&gt;, I contend that we suddenly have a new observational state available to us, and new observations to be made.  That can help us to resolve differences in perception or opinion.  It can help us to understand the process of &lt;b&gt;testing&lt;/b&gt; at a finer level of detail, so that we can make better decisions about strategy and tactics.&lt;br /&gt;&lt;br /&gt;In the Agile 2009 session, "&lt;a href="http://jamesshore.com/Calendar/2009-08-27c.html"&gt;Brittle and Slow:  Replacing End-To-End Testing&lt;/a&gt;", Arlo Belshee and James Shore took this as a point of departure:&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;End-to-end tests appear everywhere: test-driven development, story-test-driven development, acceptance testing, functional testing, and system testing. They're also slow, brittle, and expensive.&lt;/cite&gt;&lt;br /&gt;&lt;br /&gt;This was confusing to me.  My colleague Fiona Charles specializes in end-to-end system testing for great big projects.  The teams that she leads are fast, compared to others that I've seen.  Their tests are painstaking and detailed, but they're flexible and adaptable, not brittle.&lt;br /&gt;&lt;br /&gt;During the session, one person (presumably a programmer, but maybe not) said, "Manual &lt;b&gt;testing&lt;/b&gt; sucks."  There was a loud murmur of agreement from both the testers and the programmers in the room.&lt;br /&gt;&lt;br /&gt;I thought that was strange too.  I &lt;i&gt;love&lt;/i&gt; manual &lt;b&gt;testing&lt;/b&gt;. I like operating the product interactively and making observations and evaluations.  I like pretending that I'm a user of the program, with some task to accomplish or some problem to solve.  I like looking at the program from a more analytical perspective, too—thinking about how all the components of the product interact with one another, and where the communication between them might be vulnerable if distorted or disturbed or interrupted.  I like playing with the data, trying to figure out the pathological cases where the program might hiccupp or die on certain inputs.  In my interaction with the program, I discover lots of things that appear to be problems.  Upon making such a discovery, I'm compelled to investigate it.  As I investigate it, sometimes I find that it's a real problem, and sometimes I find that it isn't.  In this process, I learn about the system, about the ways in which it can work and the ways in which it might fail. I learn about my preconceptions, which are sometimes right and sometimes wrong.   As I test, I recognize new risks, whereupon I realize new test ideas.  I act on those test ideas, often right away. (By the way, I'm trying to get out of the habit of calling this stuff manual testing; I learning to call it &lt;a href="http://www.satisfice.com/blog/archives/99"&gt;&lt;i&gt;sapient&lt;/i&gt; testing&lt;/a&gt;, because it's primarily the eyes and the brain, not the hands, that are doing the work.  Whatever you call it, manual testing doesn't suck; it &lt;i&gt;rocks&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;So are the programmer in question and all the people who applauded &lt;i&gt;ignorant&lt;/i&gt;?  That seems unlikely.  They're smart people, and they know tons about software development.  Are they &lt;i&gt;wrong&lt;/i&gt;?  Well, that's a value judgment, but it would seem to me that as smart people who solve problems for a living, it would be very surprising if they weren't engaged by exploration and discovery and investigation and learning.  So there must be another explanation.&lt;br /&gt;&lt;br /&gt;Maybe when they're talking about manual &lt;b&gt;testing&lt;/b&gt;, they're talking about &lt;i&gt;something else&lt;/i&gt;.  Maybe they're talking about behaving like an automaton and precisely following a precisely described set of steps, the last of which is to compare some output of the program to a predicted, expected value.  For a thinking human, that process is slow, and it's tedious, and it doesn't really engage the brain.  And in the end, almost all the time, all we get is exactly what we expected to get in the first place.&lt;br /&gt;&lt;br /&gt;So if &lt;i&gt;that's&lt;/i&gt; what they're talking about, I agree with them.  Therefore: if we're going to understand each other more clearly, it would help to make the distinction between some kinds of manual &lt;b&gt;testing&lt;/b&gt; and other kinds.  The think that we don't like, that none of us like apparently, is manual &lt;b&gt;checking&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;Maybe Arlo and James were talking about end-to-end system &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt; being brittle and slow.  Maybe it's integration &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt;, rather than integration &lt;span style="font-weight: bold;"&gt;tests&lt;/span&gt;, that are a scam, as Joe (J.B.) Rainsberger puts it &lt;a href="http://www.jbrains.ca/permalink/239"&gt;here&lt;/a&gt;, &lt;a href="http://www.jbrains.ca/permalink/242"&gt;here&lt;/a&gt;, &lt;a href="http://www.jbrains.ca/permalink/251"&gt;here&lt;/a&gt;, and &lt;a href="http://www.jbrains.ca/permalink/278"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So having a handle for a particular concept may make it easier for us to make certain observations and to carry on certain conversations.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;If we can differentiate between manual &lt;b&gt;testing&lt;/b&gt; and manual &lt;b&gt;checking&lt;/b&gt;, we might be more specific about &lt;i&gt;what&lt;/i&gt;, &lt;i&gt;specifically&lt;/i&gt; sucks.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If we can comprehend the difference between automated &lt;b&gt;tests&lt;/b&gt; and automated &lt;b&gt;checks&lt;/b&gt;, we can understand the circumstances in which one might be more valuable than the other.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;If we tease out the elements of developing, performing, and evaluating a &lt;span style="font-weight: bold;"&gt;check&lt;/span&gt; (as I attempted to do &lt;a href="http://www.developsense.com/2009/09/elements-of-testing-and-checking.html"&gt;here&lt;/a&gt;) we might better see specific opportunities for increasing value or reducing cost.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;If we can recognize when we're &lt;b&gt;checking&lt;/b&gt;, rather than &lt;b&gt;testing&lt;/b&gt;, we can better recognize the opportunity to hand the work over to a machine.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If we can recognize that we're spending inordinate amounts of time and money preparing scripts directing outsourced testers in other countries to &lt;b&gt;check&lt;/b&gt;, rather than &lt;b&gt;test&lt;/b&gt;, we can recognize a waste of energy, time, money, and human potential, because &lt;b&gt;testers&lt;/b&gt; are capable of so much more than merely &lt;b&gt;checking&lt;/b&gt;.  (We might also detect the odour of immorality in asking people in developing countries to behave like machines, and instead consider giving a modicum of freedom and responsibility to them so that they can learn things about the product—things in which we might be very interested.)&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If we can recognize that &lt;b&gt;checking&lt;/b&gt; alone doesn't yield new information, we can better recognize the need to de-emphasize &lt;b&gt;checking&lt;/b&gt; and emphasize &lt;b&gt;testing&lt;/b&gt; when that's appropriate.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;If we can recognize when &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt; is pointing us to areas of the product that appear to be vulnerable to breakage, we might choose to emphasize inserting more and/or better &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt;, so as to draw our attention to breakage should it occur ("change detectors", as Cem Kaner calls them).&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;If we can distinguish between &lt;b&gt;testing&lt;/b&gt; and &lt;b&gt;checking&lt;/b&gt;, we can penetrate "the illusion that software systems are simple enough to define all the &lt;span style="font-weight: bold;"&gt;checks&lt;/span&gt; before any code is written", as my colleague Ben Simo recently pointed out—never mind all the &lt;span style="font-weight: bold;"&gt;tests&lt;/span&gt;.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;When someone asks, "Why didn't &lt;b&gt;testing&lt;/b&gt; find that bug when we spent all that money on all those automation tools?", maybe we can point to the fact that the tools foster &lt;b&gt;checking&lt;/b&gt; far more than they foster &lt;b&gt;testing&lt;/b&gt;.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Maybe we can recognize that &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt; tends to be helpful in preventing bugs that we can anticipate, but not so helpful at finding problems that we didn't anticipate.  For that we need &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt;.  Or, alas, sometimes, accidental discovery.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Maybe we'd be able to recognize that &lt;span style="font-weight: bold;"&gt;testing&lt;/span&gt; (but not &lt;span style="font-weight: bold;"&gt;checking&lt;/span&gt;) can reveal information on novel ways of using the product, information that can add to the perceived value of the product.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;When someone asks, "Can't we hire pretty much any programmer to write our &lt;b&gt;test&lt;/b&gt; automation code?", we can point out that the quality of &lt;b&gt;checking&lt;/b&gt; is conditioned largely by the quality of the testing work that surrounds it, and emphasize that creating excellent &lt;b&gt;checks&lt;/b&gt; requires excellent &lt;b&gt;testing&lt;/b&gt; skill, in addition to programming skill.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If we're interested in improving the efficiency and capacity of the &lt;b&gt;test&lt;/b&gt; group, we can point out that &lt;b&gt;test&lt;/b&gt; automation is far more than just &lt;b&gt;check&lt;/b&gt; automation.  &lt;b&gt;Test&lt;/b&gt; automation is, in James Bach's way of putting it, &lt;i&gt;any use of tools to support &lt;b&gt;testing&lt;/b&gt;&lt;/i&gt;.  &lt;b&gt;Testing&lt;/b&gt; tools help us to generate &lt;b&gt;test&lt;/b&gt; data; to probe the internals of an application or an operating system; to produce oracles that use a different algorithm to produce a comparable result; to produce macros that automate a long sequence of actionas in the application so that the tester can be quickly delivered to place to start exploring and testing; to rapidly configure or reconfigure the application; to parse, sort, and search log files; to produce &lt;a href="http://www.satisfice.com/blog/?p=33"&gt;blink oracles&lt;/a&gt; for &lt;a href="http://www.developsense.com/articles/2006-09-BlinkOrYoullMissIt.pdf"&gt;blink testing&lt;/a&gt;...&lt;/li&gt;&lt;br /&gt;&lt;li&gt;When a programmer says to a tester, "You should only test this stuff; here are the boundary conditions," the tester can respond "I will &lt;b&gt;check&lt;/b&gt; that stuff, but I'm also going to &lt;b&gt;test&lt;/b&gt; for boundary conditions that you might not have been aware of, or that you've forgotten to tell me about, and for other possible problems too."&lt;/li&gt;&lt;br /&gt;&lt;li&gt;When we see a test group that is entirely focused on confirming that a product conforms to some requirements document, rather than investigating to discover things that might threaten the value of the product to its users, we can point out that they may be &lt;b&gt;checking&lt;/b&gt;, but they're not &lt;b&gt;testing&lt;/b&gt;.&lt;/li&gt;&lt;/ul&gt;Here's a passage from Jerry Weinberg, one that I find inspiring and absolutely true&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;"One of the lessons to be learned ... is that the sheer number of tests performed is of little significance in itself.  Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers.  As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable.  With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests.  To do this, we must struggle to develop a suspicious nature as well as a lively imagination."&lt;/cite&gt;  &lt;i&gt;Jerry Weinberg, Computer Programming Fundamentals, 1961&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;To me, that's a magnificent paragraph.  But just in case, let's paraphrase it to make it (to my mind, at least) even clearer:&lt;br /&gt;&lt;br /&gt;&lt;cite&gt;"One of the lessons to be learned ... is that the sheer number of &lt;b&gt;checks&lt;/b&gt; performed is of little significance in itself.  Too often, the series of &lt;b&gt;checks&lt;/b&gt; simply proves how good the computer is at doing the same things with different numbers.  As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable.  With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each &lt;b&gt;test&lt;/b&gt; does some work not done by previous &lt;b&gt;checks&lt;/b&gt;.  To do this, we must struggle to develop a suspicious nature as well as a lively imagination.&lt;br /&gt;&lt;/cite&gt;&lt;br /&gt;Thank you to Dale for your critical questions, and to the others who have asked questions about the motivation for making the distinction and hanging a new label on it.  I hope this helps.  If it doesn't, &lt;a href="mailto:michael@developsense.com"&gt;please let me know&lt;/a&gt;, and we'll try to work it out.  In any case, there will be more to come.&lt;br /&gt;&lt;h2&gt;The Testing vs. Checking Series&lt;/h2&gt;1. &lt;a href="http://www.developsense.com/2009/08/testing-vs-checking.html"&gt;Testing vs. Checking&lt;/a&gt;&lt;br /&gt;2. &lt;a href="http://www.developsense.com/2009/09/transpection-and-three-elements-of.html"&gt;Transpection and the Three Elements of Checking&lt;/a&gt;&lt;br /&gt;3. &lt;a href="http://www.developsense.com/2009/09/pass-vs-fail-vs-is-there-problem-here.html"&gt;Pass vs. Fail vs. Is There a Problem Here?&lt;/a&gt;&lt;br /&gt;4. &lt;a href="http://www.developsense.com/2009/09/elements-of-testing-and-checking.html"&gt;Elements of Testing and Checking&lt;/a&gt;&lt;br /&gt;5. &lt;a href="http://www.developsense.com/2009/09/testing-checking-and-changing-language.html"&gt;Testing, Checking, and Changing the Language&lt;/a&gt;&lt;br /&gt;6. &lt;a href="http://www.developsense.com/2009/09/tests-vs-checks-motive-for.html"&gt;Tests vs. Checks:  The Motive for Distinguishing&lt;/a&gt;&lt;br /&gt;7. &lt;a href="http://www.developsense.com/2009/09/tester-asks-about-checking.html"&gt;A Tester Asks About Checking&lt;/a&gt;&lt;br /&gt;8. &lt;a href="http://www.developsense.com/2009/09/tests-vs-checks-should-we-call-test.html"&gt;Tests vs. Checks:  Should We Call Test-Driven Development Something Else?&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Related:  &lt;a href="http://www.satisfice.com/"&gt;James Bach&lt;/a&gt; on &lt;a href="http://www.satisfice.com/blog/archives/358"&gt;Sapience and Blowing People's Minds&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-5539288339784430154?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/09/tests-vs-checks-motive-for.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>8</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6567846.post-434533231346986872</guid><pubDate>Wed, 16 Sep 2009 18:36:00 +0000</pubDate><atom:updated>2009-09-16T14:38:10.049-05:00</atom:updated><title>Upcoming Events: KWSQA and STAR West</title><description>I'm delighted to have been asked to present a lunchtime talk at the &lt;a href="http://www.kwsqa.org/"&gt;Kitchener-Waterloo Software Quality Association&lt;/a&gt;, Wednesday September 30.  I'll be giving a reprise of my STAR East keynote talk, &lt;span style="font-style: italic;"&gt;What Haven't You Noticed Lately?  Building Awareness in Testers&lt;/span&gt;.   (The title has been pinched from &lt;a href="http://whatisthemessage.blogspot.com"&gt;Mark Federman&lt;/a&gt;, who got it from &lt;a href="http://www.youtube.com/watch?v=qz6cXkf06SU"&gt;Terence McKenna&lt;/a&gt;, who may have got it from &lt;a href="http://www.youtube.com/watch?v=A7GvQdDQv8g"&gt;Marshall McLuhan&lt;/a&gt;, but maybe not.)&lt;br /&gt;&lt;br /&gt;The following week, it's &lt;a href="http://www.sqe.com/STARWEST"&gt;STAR West&lt;/a&gt; in Anaheim, California.  I'll be giving a half-day workshop,                                                              &lt;a class="LinkBoldBlue" href="http://www.sqe.com/starwest/Tutorials/Default.aspx?Day=Tuesday#TG"&gt;Tester's Clinic: Dealing with Tough Questions and Testing Myths&lt;/a&gt; and  a track session, &lt;a class="LinkBoldBlue" href="http://www.sqe.com/starwest/Concurrent/Default.aspx?Day=Thursday#T17"&gt;The Skill of Factoring: Identifying What to Test&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I'll also be giving a bonus session, &lt;span class="BlackText"&gt;&lt;a class="LinkBoldBlue" href="http://www.sqe.com/starwest/BonusSessions.aspx#BS3"&gt;Using the Secrets of Improv to Improve Your Testing&lt;/a&gt;.  I've done this one at Agile 2008 in Toronto, and at the &lt;a href="http://www.ayeconference.com"&gt;AYE Conference&lt;/a&gt; in 2006, and it's fun, but because so much of the learning comes from the participants, in the moment, it's also been remarkably insightful both times.  Improv is about being aware of your actions, the actions of others, and how they relate to each other—immediately.  Even dipping one's toe in it is very exciting.  &lt;a href="http://www.adamkwhite.com/"&gt;Adam White&lt;/a&gt; talks compellingly about his experience of a couple of rounds of classes with Second City, and he did a well-regarded improv session at CAST 2008.&lt;br /&gt;&lt;br /&gt;There's an official panel discussion hosted by Ross Collard on Wednesday at 6:30, and there's an official Meet-The-Presenter session Thursday morning.  The rest of the time, &lt;a href="http://www.satisfice.com"&gt;James Bach&lt;/a&gt; and I will be holding unofficial versions of both of those things.  We'll be bringing testing toys and testing games, and workshopping old and new exercises with whomever wants to come.  He'll likely be talking about his new book, &lt;a href="http://www.amazon.com/Secrets-Buccaneer-Scholar-Self-Education-Pursuit-Lifetime/dp/1439109087"&gt;Secrets of a Buccaneer Scholar&lt;/a&gt;, a terrific memoir and guide to self-education.&lt;br /&gt;&lt;br /&gt;I'd like to meet you at the conference, but I'm not sure who you are.  If you'd like to do some hands-on testing puzzles, have chat about testing vs. checking, or to discuss anything you like, drop me a line—michael at developsense.com.&lt;br /&gt;&lt;/span&gt;&lt;span class="GreenBoldText"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6567846-434533231346986872?l=www.developsense.com%2Fblog%2Fblog.shtml' alt='' /&gt;&lt;/div&gt;</description><link>http://www.developsense.com/blog/2009/09/upcoming-events-kwsqa-and-star-west.html</link><author>michael.a.bolton@gmail.com (Michael Bolton http://www.developsense.com)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item></channel></rss>