Blog Posts from April, 2013

What Do You Mean By “Arguing Over Semantics”? (Part 2)

Thursday, April 4th, 2013

Continuing from yesterday

As you may recall, my correspondent remarked

“To be honest, I don’t care what these types of verification are called be it automated checking or manual testing or ministry of John Cleese walks. What I would like to see is investment and respect being paid to testing as a profession rather than arguing with ourselves over semantics.”

Here’s an example of the importance of semantics in testing. When someone says, “it works”, what do they mean? In the Rapid Testing class, we say that “it works” really means “it appears to meet some requirement to some degree”.

That expanded statement should raise a bunch of questions for a tester, a test manager, or a client: What observations gave rise to that appearance? Observations by whom? When did that person observe? Under what conditions? Did he induce any variation into those conditions? Did he observe repeatedly, or just the once? For a long time, or just a glance? If the observation was made somewhere, what might be different somewhere else? Has anything changed since the last observation? Which requirements were considered? Which requirement specifically seems to have been met? Only that requirement? Are there other explicit or implicit requirements that might have not been met? To some degree–to what degree? Have the right people been consulted on the degree to which the requirement seems to have been met? Do they agree? Is the requirement and the evidence that “it works” well-understood by and acceptable to other people who might matter? Yes, that’s a lot of questions—and testers who ask them find nests of bugs living underneath the questions, and in the cracks between the answers.

Semantics has its parallel in science and measurement, in the form of a concept called “construct validity”. In measurement, construct validity centres around the issue of what counts as an instance of something you’re measuring, and what doesn’t count as such an instance. In their book Experimental and Quasi-Experimental Designs for Generalized Causal Inference (I have no idea why a book with such a catchy title isn’t flying off the shelves), Shadish, Cook and Campbell say,

“The naming of things is a key problem in all sciences, for names reflect category memberships that themselves have implications about relationships to other concepts, theories, and uses. This is true even for seemingly simple labeling problems. For example, a recent newspaper article reported a debate among astronomers over what to call 18 newly discovered celestial objects… The Spanish astronomers who discovered the bodies called them planets, a choice immediately criticized by some other astronomers… At issue was the lack of a match between some characteristics of the 18 objects (they are drifting freely through space and are only about 5 million years old) and some characteristics that are prototypical of planets (they orbit a star and require tens of millions of years to form). Critics said these objects were more reasonably called brown dwarfs, objects that are too massive to be planets but not massive enough to sustain the thermonuclear processes in a star. Brown dwarfs would drift freely and be young, like these 18 objects. The Spanish astronomers responded that these objects are too small to be brown dwarfs and are so cool that they could not be that young.” (p. 66)

Well, tomayto-tomahto, right? There are these objects out there, and they’re out there no matter what we call them. Brown dwarfs, planets… why bother quibbling? Shadish, Cook, and Campbell answer: “All this is more than just a quibble: if these objects really are planets, then current theories of how planets form by condensing around a star are wrong!” (my emphasis)

They end the passage by noting that construct validity is a much more difficult problem in social science field experiments—and I would argue that most of software testing is far closer to the field sciences than to astrophysics. More on that in future blog posts.

(Shadish, Cook and Campell cite the newspaper article as “Scientists quibble on calling discovery ‘planets'” (2000, October 6). The Memphis Commercial Appeal p. A5. I did an online search through all of the the Commercial Appeal’s articles for that day, and a more global search for newspaper articles with that title, but I was unable to find it. However, the controversy over what constitutes a planet continues: http://en.wikipedia.org/wiki/IAU_definition_of_planet.)

Labels are intertwined with our ontologies (our ways of categorizing the world) and our theories. Combustion used to be explained by an element called “phlogiston” that escaped when something was burned, and that had negative mass. When problems with that theory arose, phlogiston evolved from an element into a principle. The phlogiston theory had considerable explanatory power, and drove a good deal of invention and research. Eventually, though, people recognized enough problems in the phlogiston theory that Joseph Priestly’s discovery—”dephlogisticated air”—came to be called by Lavoisier’s name, the name we still use today: oxygen. Interestingly, oxidation began as a principle, before the element was identified. So theories of combustion went from element to principle to inverted principle and back to element. (See Steven Johnson’s The Invention of Air.)

In the old days, people simply got sick. Some treatments worked; others didn’t work so well. Modern doctors don’t casually confuses bacteria and viruses. They prescribe antibiotics for bacterial infections, and retroviral drugs for viruses. Labels not only represent underlying meanings; they often incorporate those meanings.

If we’re pleading for professional respect for testing, it’s worth asking ourselves what we think is worthy of respect. I believe that what’s most respectable is our special interest in dispelling illusions, demolishing unwarranted confidence, recognizing unnoticed complexity, and revealing underlying truths in a rapidly changing and developing world. If you agree, I think you’ll see a professional problem in promoting ways of speaking that create or sustain illusions, instead of telling plain truths about our work. I think you’ll see a professional problem with oversimplified models of complex cognitive processes, and I think you’ll see a problem with keeping our vocabulary static.

To my correspondent way above: I understand that dealing with all this discussion is effortful. Taking on new ideas is a pain, and so is defending old ones that are being challenged. Revising some simple, seemingly stable beliefs is hard work. Recognizing that two labels might point to the same thing might feel like a distraction, and choosing whose vocabulary to use might be laden with politics and emotion. You’re almost certainly busy with getting stuff done, and it takes time to keep up with the conversation—never mind the racket. But be sure of this: investment and respect are paid to astronomy, chemistry, and medicine as professions precisely because it is the nature of a profession to question and redefine its ideas about itself. Studying a profession involves developing distinctions and definitions to aid in the study. If we’re going to talk seriously and credibly about developing skill in testing, it’s important for us to develop clarity on the activities we’re talking about.

I thank James Bach for his contributions to this essay.

What Do You Mean By “Arguing Over Semantics”?

Wednesday, April 3rd, 2013

Commenting on testing and checking, one correspondent responds:

“To be honest, I don’t care what these types of verification are called be it automated checking or manual testing or ministry of John Cleese walks. What I would like to see is investment and respect being paid to testing as a profession rather than arguing with ourselves over semantics.”

My very first job in software development was as a database programmer at a personnel agency. Many times I wrote a bit of new code, I got a reality check: the computer always did exactly what I said, and not necessarily what I meant. The difference was something that I experienced as a bug. Sometimes the things that I told the computer were consistent with what I meant to tell it, but the way I understood something and the way my clients understood something was different. In that case, the difference was something that my clients experienced as a bug, even though I didn’t, at first. The issue was usually that my clients and I didn’t agree on what we said or what we meant. That wasn’t out of ignorance or ill-will. The problem was often that my clients and I had shallow agreement on a concept. A big part of the job was refining our words for things—and when we did that, we often found that the conversation refined our ideas about things too. Those revelations (Eric Evans calls them “knowledge crunching”) are part of the process of software development.

As the only person on my development team, I was also responsible for preparing end-user documentation for the program. My spelling and grammar could be impeccable, and spelling and grammar checkers could check my words for syntactic correctness. When my description of how to use the program was vague, inaccurate, or imprecise, the agents who used the application would get confused, or would make mistakes, or would miss out on something important. There was a real risk that the company’s clients wouldn’t get the candidates they wanted, or that some qualified person wouldn’t get a shot at a job. Being unclear had real consequences for real people.

A few years later, my friend Dan Spear—at the time, Quaterdeck’s chief scientist, and formerly the principal programmer of QEMM-386—accepted my request for some lessons in assembly language programming. He began the first lesson while we were both sitting back from the keyboard. “Programming a computer,” he began, “is the most humbling thing that you can do. The computer is like a mirror. It does exactly what you tell it to do, and in doing that, it reflects any sloppiness in your thinking or in your way of expressing yourself.”

I was a program manager (a technical position) for the company for four years. Towards the end of my tenure, we began working on an antivirus product. One of the product managers (“product manager” was a marketing position) wanted to put a badge on the retail box: “24 hour support response time!” In a team meeting, we technical people made it clear that we didn’t provide 24-hour monitoring of our support channels. The company’s senior management clearly had no intention of staffing or funding 24-hour support, either. We were in Los Angeles, and the product was developed in Israel. It took development time—sometimes hours, but sometimes days—to analyse a virus and figure out ways to detect and to eradicate it. Nonetheless, the marketing guy (let’s call him Mark) continued to insist that that’s what he wanted to put on the box. One of the programming liaisons (let’s call him Paul) spoke first:

Paul: “I doubt that some of the problems we’re going to see can be turned around in 24 hours. Polymorphic viruses can be tricky to identify and pin down. So what do you mean by 24-hour response time?”

Mark: “Well, we’ll respond within 24 hours.”

Paul: “With a fix?”

Mark: “Not necessarily, but with a response.”

Paul: “With a promise of a fix ? A schedule for a fix?”

Mark: “Not necessarily, but we will respond.”

Paul: “What does it mean to respond?”

Mark: “When someone calls in, we’ll answer the phone.”

Sam (a support person): “We don’t have people here on the weekends.”

Mark: “Well, 24 hours during the week.”

Sam: “We don’t have people here before 7:00am, or after 5:00pm.”

Mark: “Well… we’ll put someone to check voicemail as soon as they get in… and, on the weekends… I don’t know… maybe we can get someone assigned to check voicemail on the weekend too, and they can… maybe, uh… send an email to Israel. And then they can turn it around.”

At this point, as the program manager for the product, I’d had enough. I took a deep breath, and said, “Mark, if you put ’24-hour response time’ on the box, I will guarantee that that will mislead some people. And if we mislead people to take advantage of them, knowing that we’re doing it, we’re lying. And if they give us money because of a lie we’re telling, we’re also stealing. I don’t think our CEO wants to be the CEO of a lying, stealing company.”

There’s a common thread that runs through these stories: they’re about what we say, about what we mean, and about whether we say what we mean and mean what we say. That’s semantics: the relationships between words and meaning. Those relationships are central to testing work.

If you feel yourself tempted to object to something by saying “We’re arguing about semantics,” try a macro expansion: “We’re arguing about what we mean by the words we’re choosing,” which can then be shortened to “We’re arguing about what we mean.” If we can’t settle on the premises of a conversation, we’re going to have an awfully hard time agreeing on conclusions.

I’ll have more to say on this tomorrow.