I'm co-teaching "computer networks" this quarter, with my colleague Anne Rogers. While preparing, I came across the following anecdote about how a "system" must be able to tolerate imperfect behavior:
Larry David (on the show "Curb Your Enthusiasm") was frantically looking for a DVD case, but could not find it.
LD: "I don't know what happened. I have a system. I put the DVD in the player, and I put the case on top of the player. But now it is gone."
Friend: "That's not a system. A system is - you buy a bunch of empty DVD cases and put them next to the player."
More seriously, I read yesterday a fascinating article by Richard Cook on the causes of errors in complex systems such as hospitals, and the tendency to blame these errors on "operators." Cook notes that:
Our research group, along with many others, has explored the implications of such accidents, now called complex systems failures, for about 15 years. A startling feature of these accidents was how often they were attributed to human error. About 85 percent of accidents in aviation, nuclear power, shipping, medicine and the military were attributed subsequently to human error.
And:
Scientifically grounded study leads to the conclusion that accidents are not the abnormal operation of broken systems but the normal operations of systems under economic, social and political pressure to produce more with less. ...
When we look closely we discover that these systems are performing far more successfully than we expect: That is to say, the rate of accidents is not very high but actually quite low.
Our studies in industrial settings, transportation and health care lead us to conclude that the reason there are so few accidents is because the operators prevent them from happening. Operators—meaning power plant workers, nurses, pilots and others—are constantly working to detect and forestall accidents. Paradoxically, they do this so well that we can mistakenly attribute the smooth running of these systems to its inherent qualities rather than to active intervention by its operators.
He notes that this perspective can be unpopular, because it removes the "simple" explanation that it is all the operator's fault. However, it is (I think) ultimately an optimistic view, in that it suggests a way forward.
He concludes:
Our research on accidents has come full circle. We started out trying to discover why systems sometimes fail, thinking that practitioner “error” was somehow the cause. Now, instead of viewing them as threats to safety, we recognize practitioners as part of the resilience that makes it possible for so many people to benefit from the complex, hurried and often conflicted conditions that surround health care.
Instead of being critical of operators when they fail to rescue the system from failure, we are trying to understand how it is that they so often succeed.

Comments