Blog Posts from October, 2014

Testing is…

Tuesday, October 28th, 2014

Every now and again, someone makes some statement about testing that I find highly questionable or indefensible, whereupon I might ask them what testing means to them. All too often, they’re at a loss to reply because they haven’t really thought deeply about the matter; or because they haven’t internalized what they’ve thought about; or because they’re unwilling to commit to any statement about testing. And then they say something vague or non-commital like “it depends” or “different things to different people” or “that’s a matter of context”, without suggesting relevant dependencies, people, or context factors. So, for those people, I offer a set of answers from which they can choose one; or they can adopt the entire list wholesale; or they use one or more items as a point of departure for something of their own invention. You don’t have to agree with any of these things; in that case, invent your own ideas about testing from whole cloth But please: if you claim to be a tester, or if you are making some claim about testing, please prepare yourself and have some answer ready when someone asks you “what is testing?”. Please. Here are some possible replies; I believe everything is Tweetable, or pretty close.

Testing is—among other things—reviewing the product and ideas and descriptions of it, looking for significant and relevant inconsistencies.
Testing is—among other things—experimenting with the product to find out how it may be having problems—which is not “breaking the product”.
Testing is—among other things—something that informs quality assurance, but is not in and of itself quality assurance.
Testing is—among other things—helping our clients to make empirically informed decisions about the product, project, or business.
Testing is—among other things—a process by which we systematically examine any aspect of the product with the goal of preventing surprises.
Testing is—among other things—a process of interacting with the product and its systems in many ways that challenge unwarranted optimism.
Testing is—among other things—observing and evaluating the product, to see where all those defect prevention ideas might have failed.
Testing is—among other things—a special part of the development process focused on discovering what could go badly (or what is going badly).
Testing is—among other things—exploring, discovering, investigating, learning, and reporting about the product to reveal new information.
Testing is—among other things—gathering information about the product, its users, and conditions of its use, to help defend value.
Testing is—among other things—raising questions to help teams to develop products that more quickly and easily reveal their own problems.
Testing is—among other things—helping programmers and the team to learn about unanticipated aspects of the product we’re developing.
Testing is—among other things—helping our clients to understand the product they’ve got so they can decide if it’s the product they want.
Testing is—among other things—using both tools and direct interaction with the product to question and evaluate its behaviours and states.
Testing is—among other things—exploring products deeply, imaginatively, and suspiciously, to help find problems that threaten value.
Testing is—among other things—performing actual and thought experiments on products and ideas to identify problems and risks.
Testing is—among other things—thinking critically and skeptically about products and ideas around them, with the goal of not being fooled.
Testing is—among other things—evaluating a product by learning about it through exploration, experimentation, observation and inference.

You’re welcome.

Facts and Figures in Software Engineering Research (Part 2)

Wednesday, October 22nd, 2014

On July 23, 2002, Capers Jones, Chief Scientist Emeritus of a company called Software Productivity Research gave a presentation called “SOFTWARE QUALITY IN 2002: A SURVEY OF THE STATE OF THE ART”. In this presentation, he shows data on a slide titled “U.S. Averages for Software Quality”.

US Averages for Software Quality 2002

(Source: http://bit.ly/1rj19Ol, accessed September 5, 2014)

It is not clear what “defect potentials” means. A slide preceding this one says defect potentials are (or include) “requirements errors, design errors, code errors, document errors, bad fix errors, test plan errors, and test case errors.”

There no description in the presentation of the link between these categories and the numbers in the “Defect Potential” column. Yes, the numbers are expressed in terms of “defects per function point”, but where did the numbers for these “potentials” come from?

In order to investigate this question, I spent just over a hundred dollars to purchase three books by Mr. Jones: Applied Software Measurement (Second Edition) (1997) [ASM2]; Applied Software Measurement: Asssuring Productivity and Quality (Third Edition), 2008 [ASM3]; and The Economics of Software Quality (co-authored with Olivier Bonsignour) (2011). In [ASM2], he says

The “defect potential” of an application is the sum of all defects found during development and out into the field when the application is used by clients and customers. The kinds of defects that comprise the defect potential include five categories:

  • Requirements defects
  • Design defects
  • Source code defects
  • User documentation defects
  • “Bad fixes” or secondary defects found in repairs in prior defects

The information in this book is derived from observations of software projects that utilized formal design and code inspections plus full multistage testing activities. Obviously the companies also had formal and accurate defect tracking tools available.

Shortly afterwards, Mr. Jones says:

Note that this kind of data is clearly biased, since very few companies actually track life-cycle defect rates with the kind of precision needed to ensure really good data on this subject.

That’s not surprising, and it’s not the only problem. What are the biases? How might they affect the data? Which companies were included, and which were not? Did each company have the same classification scheme for assigning defects to categories? How can this information be generalized to other companies and projects?

More importantly, what is a defect? When does a coding defect become a defect (when the programmer types a variable name in error?) and when might it suddenly stop becoming a defect (when the programmer hits the backspace key three seconds later?)? Does the defect get counted as a defect in that case?

What is the model or theory that associates the number 1.25 in the slide above with the potential for defects in design? The text suggests that “defect potentials” refers to defects found—but that’s not a potential, that’s an outcome.

In Applied Software Measurement, Third Edition, things change a little:

The term “defect potential” refers to the probable number of defects found in five sources: requirements, design, source code, user documents, and bad fixes… The data on defect potentials comes from companies that actually have lifecycle quality measures. Only a few leading companies have this kind of data, and they are among the top-ranked companies in overall quality: IBM, Motorola, AT&T, and the like.

Note the change: there’s been a shift from the number of defects found to the probable number of defects found. But surely defects were either found or they weren’t; how can they be “probably found”? Perhaps this is a projection of defects to be found—but what is the projection based on? The text does not make this clear. And the question has still been begged: What is the model or theory that associates the number 1.25 in the slide above with the potential for defects in design?

These are questions of construct validity, about which I’ve written before. And there are many questions that one could ask about the way the data has been gathered, controlled, aggregated, normalized, and validated. But there’s something more troubling at work here.

Here’s a similar slide from a presentation in 2005:
US Averages for Software Quality 2005

(Source: http://twin-spin.cs.umn.edu/sites/twin-spin.cs.umn.edu/files/SQA05l.pdf, accessed September 5, 2014)

From a presentation in 2008:
US Averages for Software Quality 2008

(Source: http://www.jasst.jp/archives/jasst08e/pdf/A1.pdf, accessed September 5, 2014)

From a presentation in 2010:
US Averages for Software Quality 2010

(Source: http://www.sqgne.org/presentations/2010-11/Jones-Nov-2010.pdf, accessed September 5, 2014)

From a presentation in 2012:
US Averages for Software Quality 2012

(Source: http://sqgne.org/presentations/2012-13/Jones-Sep-2012.pdf, accessed September 5, 2014)

From a presentation in 2013:
US Averages for Software Quality 2013

(Source: http://namcookanalytics.com/wp-content/uploads/2013/10/SQA2013Long.pdf, accessed September 5, 2014)

And here’s one from all the way back in 2000:
US Averages for Software Quality 2000

(Source: http://www.ifpug.org/Conference%20Proceedings/IFPUG-2000/IFPUG2000-14-Jones-Function_Points_And_Software_Value.pdf, accessed October 22, 2014)

What explains the stubborn consistency, over 13 years, of every single data point in this table?

I thank Laurent Bossavit for his inspiration and assistance in exploring this data.

Facts and Figures in Software Engineering Research

Monday, October 20th, 2014

On July 23, 2002, Capers Jones, Chief Scientist Emeritus of a company called Software Productivity Research, gave a presentation called “SOFTWARE QUALITY IN 2002: A SURVEY OF THE STATE OF THE ART”. In this presentation, he provided the sources for his data on the second slide:

SPR clients from 1984 through 2002
• About 600 companies (150 clients in Fortune 500 set)
• About 30 government/military groups
• About 12,000 total projects
• New data = about 75 projects per month
• Data collected from 24 countries
• Observations during more than a dozen lawsuits

(Source: http://bit.ly/ZDFKaT, accessed September 5, 2014)

On May 2, 2005, Mr. Jones, this time billed as Chief Scientist and Founder of Software Quality Research, gave a presentation called “SOFTWARE QUALITY IN 2005: A SURVEY OF THE STATE OF THE ART”. In this presentation, he provided the source for his data, again on the second slide:

SPR clients from 1984 through 2005
• About 625 companies (150 clients in Fortune 500 set)
• About 35 government/military groups
• About 12,500 total projects
• New data = about 75 projects per month
• Data collected from 24 countries
• Observations during more than 15 lawsuits

(Source: http://bit.ly/1vEJVAc, accessed September 5, 2014)

Notice that 34 months have passed between the two presentations, and that the “total projects number” has increased by 500. At 75 projects a month, we should expect that 2550 projects have been added to the original tally; yet only 500 projects have been added.

On January 30, 2008, Mr. Jones (Founder and Chief Scientist Emeritus of Software Quality Research), gave a presentation called “SOFTWARE QUALITY IN 2008: A SURVEY OF THE STATE OF THE ART”. This time the sources (once again on the second slide) looked like this:

SPR clients from 1984 through 2008
• About 650 companies (150 clients in Fortune 500 set)
• About 35 government/military groups
• About 12,500 total projects
• New data = about 75 projects per month
• Data collected from 24 countries
• Observations during more than 15 lawsuits

(Source: http://www.jasst.jp/archives/jasst08e/pdf/A1.pdf, accessed September 5, 2014)

This is odd. 32 months have passed since the May 2005 presentation. With new data being added at 75 projects per month, there should have been 2400 projects new since the prior presentation. Yet there has been no increase at all in the number of total projects.

On November 2, 2010, Mr. Jones (now billed as Founder and Chief Scientist Emeritus and as President of Capers Jones & Associates LLC) gave a presention called “SOFTWARE QUALITY IN 2010: A SURVEY OF THE STATE OF THE ART”. Here are the sources, once again from the second slide:

Data collected from 1984 through 2010
• About 675 companies (150 clients in Fortune 500 set)
• About 35 government/military groups
• About 13,500 total projects
• New data = about 50-75 projects per month
• Data collected from 24 countries
• Observations during more than 15 lawsuits

(Source: http://www.sqgne.org/presentations/2010-11/Jones-Nov-2010.pdf, accessed September 5, 2014)

Here three claims about the data have changed: 25 companies have been added to the data sources, 13,500 projects comprises the total set, and “about 50-75 projects” have been added (or are being added; this isn’t clear) per month. 21 full months have passed since the January presentation (which came at the end of the month). To be fair, the claim of an increase of 1,000 projects almost fits the lower bound of the claimed number of per-month increases (which would be 1,050 new projects since the last presentation), but not the claim of 75 per month (1,575 new projects). What does it mean to claim “new data = about 50-75 projects per month”, when the new data appears to be coming in a rate below the lowest rate claimed?

On May 1, 2012, Mr. Jones (CTO of Namcook Analytics LLC) gave a talk called “SOFTWARE QUALITY IN 2012: A SURVEY OF THE STATE OF THE ART”. Once again, the second slide provides the sources.

Data collected from 1984 through 2012
• About 675 companies (150 clients in Fortune 500 set)
• About 35 government/military groups
• About 13,500 total projects
• New data = about 50-75 projects per month
• Data collected from 24 countries
• Observations during more than 15 lawsuits

(Source: http://sqgne.org/presentations/2012-13/Jones-Sep-2012.pdf, accessed September 5, 2014)

Here there has been no change at all in any of the previous claims (except for the range of time over which the data has been collected). The claim that 50-75 projects per month has been added remains. At that rate, extrapolating from the claims in the November 2010 presentation, there should be between 14,400 and 14,850 projects in the data set. Yet the claim of 13,500 total projects also remains.

On August 18, 2013, Mr. Jones (now VP and CTO of Namcook Analytics LLC) gave a presentation “SOFTWARE QUALITY IN 2013: A SURVEY OF THE STATE OF THE ART”. Here are the data sources (from page 2)

Data collected from 1984 through 2013
• About 675 companies (150 clients in Fortune 500 set)
• About 35 government/military groups
• About 13,500 total projects
• New data = about 50-75 projects per month
• Data collected from 24 countries
• Observations during more than 15 lawsuits

(Source: http://namcookanalytics.com/wp-content/uploads/2013/10/SQA2013Long.pdf, accessed September 5, 2014)

Once again, no change in the total number of projects, but the claim of 50-75 new projects remains. Again, based on the 2012 claim, 15 months in time passed (more like 16, but we’ll be generous here), and the growth claims in these presentations, there should be between 14,250 and 14,625 projects in the data set.

Based on the absolute claim of 75 new projects per month in the period 2002-2008, and 50 per month in the remainder, we’d expect 20,250 projects at a minimum by 2013. But let’s be conservative and generous, and base the claim of new projects per month at 50 for the entire period from 2002 to 2013. That would be 600 new projects per year over 11 years; 6,600 projects added to 2002’s 12,000 projects, for a total of 18,600 by 2013. Yet the total number of projects went up by only 1,500 over the 11-year period—less than one-quarter of what the “new data” claims would suggest.

In summary, we have two sets of figures in apparent conflict here. In each presentation,

1) the project data set is claimed to grow at a certain rate (50-75 per month, which amounts to 600-900 per year).
2) the reported number of projects grows at a completely different rate (on average, 136 per year).

What explains the inconsistency between the two sets of figures?

I thank Laurent Bossavit for his inspiration and help with this project.