Blog Posts for the ‘Systems Thinking’ Category

Project Estimation and Black Swans (Part 4)

Monday, October 25th, 2010

Over the last few posts, exploratory automation has suggested some interesting things about project dynamics and estimation. What might we learn from these little mathematical experiments?

The first thing we need to do is to emphasize the fact that we’re playing with numbers here. This exercise can’t offer any real construct validity, since an arbitrary chunk of time combined with a roll of the dice doesn’t match software development in all of its complex, messy, human glory. In a way, though, that doesn’t matter too much, since the goal of this exercise isn’t to prove anything in particular, but rather to raise interesting questions and to offer suggestions or hints about where we might look next.

The mathematics appears to support an idea touted over and over by Agile enthusiasts, humanists, and systems thinkers alike: make feedback rapid and frequent. The suggestion we might take from the last model—fewer tasks and shorter projects —is that the shorter and better-managed the project, the less the Black Swan has a chance to hurt you in any given project.

Another plausible idea that comes from the math is to avoid projects where the power-distribution law applies—projects where you’re vulnerable to Wasted Mornings and Lost Days. Stay away from projects in Taleb’s Fourth Quadrant, projects that contain high-impact, high-uncertainty tasks. To the greatest degree possible, stick with things that are reasonably predictable, so that the statistics of random and unpredicted events don’t wallop us quite so often. Stay within the realm of the known, “in Mediocristan” as Taleb would say. Head for the next island, rather than trying to navigate too far over the current horizon.

In all that, there’s a caveat. It is of the essence of Black Swan (or even a Black Cygnet) that it’s unpredicted and unpredictable. Ironically, the more successful we are at reducing uncertainty, the less often we’ll encounter rare events. The rarer the event, the less we know about it—and therefore, the less we’re aware of the range of its potential consequences. The less we know about the consequences, the less likely we are to know about how to manage them—certainly the less specifically we know how to manage them. In short, the more rare the event, the less information and experience we’ll have to help us to deal with it. One implication of this is that our Black Cygnets, in addition to adding time, having a chance of screwing up other things in ways that we don’t expect.

Some people would suggest that we eliminate variability and uncertainty and unpredictability. What a nice idea! By definition, uncertainty is the state of not knowing something; by definition, something that’s unpredictable can’t be predicted. Snowstorms happen (even in Britain!). Servers go down. Power cuts happen in India on a regular basis—on my last visit to India, I experienced three during class time, and three more in the evening in a two-day stay at a business class hotel. In North America, power cuts happen too—and because we’re not used to them, we aren’t prepared to deal with them. (To us they’re Black Swans, where to people who live in India, they’re Grey Swans.) Executives announce all-hands meetings, sometimes with dire messages. Computers crash. Post-It notes get jammed in the backup tape drive. People get sick, and if they’re healthy, their kids get sick. Trains are delayed. Bicycles get flat tires. And bugs are, by their nature, unpredicted.

So: we can’t predict the unpredictable. There is a viable alternative, though: we can expect the unpredictable, anticipate it to some degree, manage it as best we can, and learn from the experience. Embracing the unpredictable reminds me of the The Fundamental Regulator Paradox, from Jerry and Dani Weinberg’s General Principles of System Design which I’ve referred to before:

The task of a regulator is to eliminate variation, but this variation is the ultimate source of information about the quality of its work. Therefore, the better job a regulator does, the less information it gets about how to improve.

This suggests to me that, at least to a certain degree, we shouldn’t make our estimates too precise, our commitments too rigid, our processes too strict, our roles too closed, and our vision of the future too clear. When we do that, we reduce the flow of information coming in from outside the system, and that means that the system doesn’t develop an important quality: adaptability.

When I attended Jerry Weinberg’s Problem Solving Leadership workshop (PSL), one of the groups didn’t do so well on one of the problem-solving exercises. During the debrief, Jerry asked, “Why did you have such a problem with that? You handled a much harder problem yesterday.”

“The complexity of the problem screwed us up,” someone answered.

Jerry peered over the top of his glasses. He replied, “Your reaction to the complexity of the problem screwed you up.”

One of the great virtues of PSL is that it exposes you to lots of problems in a highly fault-tolerant environment. You get practice at dealing with surprises and behaviours that emerge from giving a group of people a moderately complex task, under conditions of uncertainty and time pressure. You get an opportunity to reflect on what happened, and you learn what you need to learn. That’s the intention of the Rapid Software Testing class, too: to expose people to problems, puzzles, and traps; to give people practice in recognizing and evading traps where possible; and to help them dealing with problems effectively.

As Jerry has frequently pointed out, plenty of organizations fall victim to back luck, but much of the time, it’s not the bad luck that does them in; it’s how they react to the bad luck. A lot of organizations pillory themselves when they fail to foster environments in which everyone is empowered to solve problems. That leaves problem-solving in the hands of individuals, typically people with the title of “manager”. Yet at the moment a problem is recognized, the manager may not be available, or may not be the best person to deal with the problem. So, another reason that estimation fails is that organizations and individuals are not prepared or empowered to deal— mentally, politically, and emotionally—with surprises. The ensuing chaos and panic leaves them more vulnerable to Black Swans.

Next time, we’ll look at what all of this means for testing specifically, and for test estimation.

Challenges and Legibility

Thursday, October 14th, 2010

Lately, James Bach and I have been issuing challenges to some of our colleagues on Twitter, typically based on something they’ve said or observed. I think James would agree that the results have been very exciting. In our community, people build credibility by responding to challenges and probing the issues more deeply, and it’s been tremendous to see how some of them have risen to the challenge. For me, recent examples include Joe Harter and his response to the question “Why keep testing when we’ve got a swarm of bugs?”; and David O’Dowd and his recent tweets on how to address disagreement over the “right” temperature for a cup of coffee. It goes both ways, of course: we expect other people to challenge us, too. That’s how we test ideas.

Recently, James turned me on to an interesting Web site, authored by a fellow named Venkatesh Rao, and in particular to this blog post. I was very excited by the concept of legibility, making things more readable in a metaphoric sense, more understandable. To me, legibility is a powerful idea because it seems to explain a central conundrum in testing and in the management of software development: a good deal of the effort that we spend, so it seems, is not in producing better stuff, but rather in attempting to make complex stuff more understandable. One approach to understanding complexity is to take the general systems view, and model the system of interest in terms of other, simpler systems, and look at the aspects of elements, relationships, control, feedback, and effects, and the relationships between all of these. Another approach is to close your eyes to the complexity (as French governments and tax collecters tried to do in the 1800s) and pay attention only to a couple of specific elements in the model. Yet another approach, often used by large organizations and bureaucracies such as nation states, is a wholesale attempt to make the system more legible by eliminating the complexity by eliminating elements (as Prussian forest managers did in the 19th century, or as the builders of Brasilia did in the 20th).

I ordered the book Seeing Like a State to which Venkatesh refers, and I’m finding it interesting. More on that later, perhaps.

Before I ordered the book, though, I thought the idea of legibility would be of interest to a general systems thinker, so I sent a link along to Jerry Weinberg. He surprised me a little by replying,

“Well, yes, but it’s a far over-simplified vision itself. For instance, it doesn’t seem to account for why the “recipe” actually succeeds (value to some persons or groups). Think it through.”

Here’s my reply:

[quote]

Thank you for the challenge. Let me see if I can answer it.

I think it does account for why the “recipe” actually succeeds, although it may gloss over the point somewhat.

  • Success is subject to the Relative Rule. (As I described in my chapter of The Gift of Time, the Relative Rule states that “for any abstract X, X is X to some person”.) That is, success is success to some person(s).
  • Success is measured by some persons at some time (a refinement of the Relative Rule that I identified and that Markus Gartner seized on). Any determination of success (at some time and for some purpose) is like observing the part of the curve that looks linear. We cant’t save we’ve achieved the end result because a) not all the data is in yet, and b) as I’ve heard you say on a number of occasions, “nothing is ever settled”. (I think I’d like to call this The Unsettling Rule.)
  • Similarly, “complexity”, “reality”, “irrationality”, “orderliness”, “legibility”, etc. are all subject to the Relative Rule and the Unsettling Rule too. When Venkatesh says, “The big mistake in this pattern of failure is projecting your subjective lack of comprehension onto the object you are looking at, as ‘irrationality’”, that reminds me of your (Jerry’s) advice in the SHAPE Forum many years back: stop looking at it as “irrational”, and start looking at it as “rational from the perspective of a different set of values”.
  • Says Venkatesh, “This failure mode is ideology-neutral, since it arises from a flawed pattern of reasoning rather than values.” Well, that’s all very well, but you can’t have the concept of “a flawed pattern of reasoning” without imposing a value judgement.
  • By making something more legible, you might have a short-term effect that you consider negative, but which gives rise to a more “positive” long-term effect. For example, in the old days, anyone could cut down trees pretty much anywhere they liked. These days we seem to have a stricter sense of preserving some kinds of land so as not to be interfered with by the forestry business, and using other kinds of land for what is effectively tree farming. “Legibility” is always in flux.
  • “Rational and unlivable grid-cities like Brasilia, versus chaotic and alive cities like Sao Paolo.” Yeah, but I’ve heard about problems in Sao Paolo, and I’m not convinced that Brasilia is less livable than Sao Paolo, based on those problems.

I could go on… but have I shown you some evidence of thinking it through?

[/quote]

Jerry’s response was,

“Well done.

You’ve got another blog post there, I think.”

So here is that blog post.

In his challenge to me, Jerry was encouraging me (and, by extension, Venkatesh) to think about things in a more complex and nuanced way. For me, the key lesson is to remember that whatever you see as “broken” is almost certainly working for someone. That person, being different from you, is to some degree looking at everything from the perspective of a different set of values. When you see a problem in a product, or organization, or system, addressing that problem is going to take some effort for someone, and that person might see neither the problem, nor its the cost, nor the value of change as you see it. That person might have political authority over the situation, and like all people, that person is driven not only by rationality but also by emotion. That person might not even see you.

For example, as a tester, when you say that a product has “too many bugs”, it’s important to ask, “Too many compared to what?” “Too many for whom?” “Too many according to whom?” “Too many to meet what goal?” That’s one of the reasons that test framing is so important: your testing won’t be valued if it’s not congruent with the mission, whether implicit or explicit, that your client has in mind.

Now, having to deal with all this uncertainty and subjectivity might require us to give up an idealist Platonic sense of Goodness and Order and Godliness, and might force us to deal with messy, complex, and human concerns. But considering that we all have to live with each other, and that “ideal” is only ideal to some person, at some time, that might be a good thing.

Thank you to Jerry for his persistent, patient reminders.

Why We Do Scenario Testing

Saturday, May 1st, 2010

Last night I booked a hotel room using a Web-based discount travel service. The service’s particular shtick is that, in exchange for a heavy discount, you don’t get to know the name of the airline, hotel, or car company until you pay for the reservation. (Apparently the vendors are loath to admit that they’re offering these huge discounts—until they’ve received the cash; then they’re okay with the secret getting out.) When you’re booking a hotel, the service reveals the general location and the amenities. I made a choice that looked reasonable to me, and charged it to my credit card.

I had screwed up. When I got the confirmation, I noticed that I had booked for one night, when I should have booked for two. I wanted to extend my stay, but when I went back to the Web page, I couldn’t be sure that I was booking the same hotel. The names of the hotels are hidden, and I knew that the rates might change from night to night. One can obtain clues by looking at the amenities and the general location of the hotels, but I wanted to be sure. So instead of booking online, I called the travel service’s 1-800 number.

Jim answered the phone sympathetically. It turns out that not even the employees of the service can see the hotel name before a booking is made. However, this was a familiar problem to him, so it seemed, and he told me that he’d match the hotel by location and amenities, back out the first credit-card transaction for one night, and charge me for a new transaction of two nights. He managed to book the same hotel. So far so good.

I went to the hotel and checked in. The woman behind the counter asked for identification and a credit card for extras, and then she asked me, “How many keys will you be needing tonight, sir?” “Just one”, I said. She put a single key card into the electronic key programming machine, and handed the card to me. I took the elevator to room 761, which had a comfortable bed and desk with a window behind it, including a nice view. I went up to my room, unpacked some of my things, and decided to go for a dip in the hot tub. When I came back upstairs, I changed into dry clothes, took out my laptop, plugged it in, and sat down at the desk.

The floor was shaking. I mean, it was really vibrating. Some big motor—an air-conditioning compressor? a water pump?—had turned the office chair into a massager. I stood up, and it seemed that half of the room, including the bed was shaking. I tried to do a little work, but the vibration was enormously distracting. I called down to the front desk.

Peter answered the phone sympathetically. “I’ll send someone right up to check it out,” he said. Fair enough, but this problem was unlikely to go away any time soon, and until it did, I wanted another room. “No worries,” said Peter. “I’ll start the process now, and send someone up to check out the problem. Then you can come downstairs to exchange your key.” (“Why not send the new key up with the person coming upstairs?” I thought, but I didn’t say anything.) “I’ll need a few minutes to tidy up,” I said. “Very well, sir,” said Peter. I repacked my bags. A few minutes later, the phone rang, and Peter asked if I was ready for the staff member to arrive.  Yes.

After a short time, someone knocked on the door. He had a pair of new keys (two, not one), which he passed to me. He appeared skeptical at first, but I sat him down in the desk chair. “Oh, now I feel it,” he said. “Stand over here, next to the bed” I said. He got up, moved over, and felt the shaking. “Wow,” he said. We chatted for a few more moments, speculating on where the shaking was coming from. He left to investigate, and I decamped to my new room, 1021, on another floor on the other side of the building. So far so good.

This morning on my way to the shower, I noticed that a piece of paper had been slipped under the door. It was the checkout statement for my stay, noting my arrival and departure date and the various charges had been made to my credit card, including state sales tax, county tax, and a service fee for Internet use. I noticed that the checkout date and time was this morning, but I’m not supposed to be leaving until tomorrow morning. I called the front desk.

Zhong-li answered the phone sympathetically. I explained the situation, noting that I had booked through a travel service twice, once for one night and then later for two, and that the first booking should have been backed out (but maybe the service hadn’t done that), plus I had changed rooms the night before, so maybe it was an issue with the service but maybe it was an issue with the hotel’s own system too. Or maybe it was only the hotel. “No problem,” he said. “We can extend your stay for another night. But you’ll have to come downstairs at some point today so that we can re-author your room keys.”

So here’s the thing: how many variables can you see here? How many interconnected systems? How many different hardware platforms are involved? What protocols do they use to communicate?  To create, read, update, and delete? What are the overall transactions here?  What are the atomic elements of each one?  How does each transaction influence others?  How is each influenced by others?  What are the chances that everything is going to work right, and that I will neither under nor overpay?  What are the chances that the travel service will overpay (or underpay) the hotel for my stay, even if my credit card shows the appropriate entries and reversals?

It’s not even a terribly complicated story, but look at how many subtleties there are to the scenario. Have you ever seen a user story that has the richness and complexity of even this relatively simple little story? And yet, if we pay attention, aren’t there lots of stories like mine every day? Does my story, long as it is, include everything that we’d need to program or test the scenario? Does the card below include everything?

Index Card

Next question: if you want to create automated acceptance tests, do you want a scenario like this to be static, using record and playback to lock in on checking specific values in specific fields? Are we really going to get value from the story if we use the same data and the same outputs over and over again? This approach will be hard enough to program, but it will tend to be very brittle, resistant to change and variation. It will tend to miss details in the scenario that we would only learn about through repeated human interaction with the product.

Or would you prefer to have a flexible framework that allows you to explore and vary the scenario, designing and acting upon new test ideas, and observing the flow of each piece of data through each interconnected system? Might you be able to do this by exploiting testing tools that you’ve developed for the lower levels of the system and assembling them into progressively more powerful suites? This second apprach will likely be even harder to program, although you might be able to take advantage of lower-level test APIs, probes, and data generators that you and the programmers have developed as you’ve gone along. This approach, though, will tend to be far more powerful and robust to change, to learning, and to incorporating new and varied test ideas. Think well, and choose wisely.

In either case, unless you have people exploring and interacting with the product and the story directly, I guarantee you will miss important points in the story and you’ll miss important problems in the product.  Your tools, as helpful as they are, won’t ever pause and say, “What if…?” or “I wonder…” or “That’s funny…” You’ll need people to exercise skill, judgment, imagination, and interaction with the system, not in a linear set of prescribed steps but in a thoughtful, inventitive, risk-focused, and variable set of interactions.

In either case, you’ll also have a choice as to how to account for what you’re doing.  It’s one scenario, but is it only one test?  Is it dozens of tests?  Thousands?  If you use the second framework and induce variation, what does that do for your test count?  Or would it be better to report your work in an entirely different way, reporting on risks and test ideas and test activities, rather than try to quantify a complex intellectual interaction by using meaningless, quantitatively invalid units like “test cases” or “test steps”?

It’s been a while since I’ve posted this, but it’s time to do it again. This passage comes from a book on programming and on testing, written by Herbert Leeds and Gerald M. Weinberg (Jerry wrote this passage, he says). It’s understandable that people haven’t got the point yet, since the book is relatively new: it came out only 49 years ago (in 1961).  The emphases are mine.

“One of the lessons to be learned … is that the sheer number of tests performed is of little significance in itself. Too often, the series of tests simply proves how good the computer is at doing the same things with different numbers. As in many instances, we are probably misled here by our experiences with people, whose inherent reliability on repetitive work is at best variable. With a computer program, however, the greater problem is to prove adaptability, something which is not trivial in human functions either. Consequently we must be sure that each test does some work not done by previous tests. To do this, we must struggle to develop a suspicious nature as well as a lively imagination.

Amen.