background2

Note: This no attempt to do a proper summary (simply because it would be too much work to try to grasp a so complex story in a few words). I just want to highlight some items and give a general overview of this fascinating book. Hope this tickles your interest and you will read it for yourself!

On 14 April 1994, two US Air Force F-15 fighters accidently shot down two US Army Black Hawk helicopters over Northern Iraq, killing all 26 people on-board. A two-year investigation into the tragedy started, but found no guilty party. The author researched the material and the result is an amazing and fascinating book that takes one case and goes way beyond the boundaries of most incident investigations in order to make sense of what happened and not happened and what we can learn from it. Snook covers a wide array of theories, from behavioural to organisational and builds up a great case for his mechanism of practical drift that eventually would be the motor behind a situation where normal people doing their normal jobs lead to a tragedy. To quote a sub-chapter from the Introduction: “A Normal Accident in a High Reliability Organization”.

The motivation behind the book was the question of how such an accident could happen, given all the training, equipment, safeguards, organisation and more that was present. The outcome of the official investigation was a frustrating one (especially since it was exactly clear ‘what’ had happened) with no easy explanation of things to fix. Still, a staggering list of some hundred corrective actions was the consequence.

Snook says: “Some accidents are easily explained and hence reveal little”. This friendly fire accident is clearly not one of those. While this accident doesn’t teach us about broken parts, it does teach us a lot about context and circumstances that can lead to accidents like this. It can teach us about the organisational and behavioural conditions that lead to a tragedy. As a theoretical framework, Snook combined both Perrow’s Normal Accident Theory and HRO. Having both perspectives helped to understand the build-up with a long period of safe operation and the inevitable failure.

The end of chapter 1 presents a ‘causal map’ that pictures many of the relevant factors that will be discussed throughout the book. Interestingly, Snook took the empirical accounts and continuously asked “Why?”, like most Safety Professionals have learned to do. Then he also systematically challenged the causal significance of each major fact, which may be an interesting and novel thought for some, but (thinking about it) a good antidote against confirmation bias and therefore something that should be considered for general use!

A basic assumption for the analysis was that we cannot fully understand complex organisational events such as this shootdown by treating them as isolated events. What Snook found in his research were largely normal people, behaving in normal ways, in a normal organisation. Accidents can occur from the unanticipated interaction of non-failing components. Independently benign factors at multiple levels of analysis interact in unanticipated ways over time, often leading to tragedy.

Chapter 2 describes the actual shootdown and its broader context. As the chapter stresses: context and history are very important. We cannot fully understand the shootdown by treating it as an isolated incident or as a string of disconnected events. History is important. And so some historic background is described, there is a discussion of four generic processes of US military doctrine: command, control, management and leadership. Especially the command and control get a lot of attention. Then follows a description of the ‘players’: AWACS, the fighters and the helicopters with their personnel, organisation and context. After that, there is a critical look at the causal map and going beyond conventional explanations for accidents (what the official investigation board came up with, in effect) and instead look for multiple explanations, across levels and time. As a starting point the answers, found by the investigation board, were taken and transformed into questions, thereby benefiting from the work done by others and at the same time taking their work further.

Chapters 3, 4 and 5 explain each a different level of the analysis: looking at individual, group and organisational factors. Chapter 3 focuses on the question why the F-15 pilots misidentified the Black Hawk helicopters. At first, this seems to make no sense at all because the US Black Hawk and the Russian Hind helicopters took quite different. It is easy to label this as pilot error and leave it at that, but as James Reason pointed out: “Human error is a consequence, not a cause”. Therefore, Snook tries to see through the pilot’s eyes. Asking why the helicopters were shot down is not so much a question of what decisions were made (especially that what looks like the rational decision in hindsight), but what made sense to the pilots, because meaning comes before decision. As Weick wrote: “The image is here one of people making do with whatever they have, comparing notes, often imitating one another directly, or indirectly, and then operating as if they had some sense of what was up, at least for the time being”.

As Snook’s account shows, there were many factors that affected what the F-15 pilots saw. It is important to remember that our senses do not provide complete information and our brains continuously try to fill in the gaps. Therefore, we often perceive what we expect, what we predict there will be. What one then sees (or hears) is often compared to what is expected and, when close enough (satisficing), this perception is used as a confirmation of what was expected, even though reality may be different. Expectations can become self-fulfilling prophecies. Also, reality is socially constructed, at least in part. In this case for example through the special command structure. Other factors included the situation they were in, levels of arousal (and thereby narrowed focus), the automation of behaviour through overlearning, the ambiguity of what was seen.

The next chapter discusses why the onlooking AWACS crew failed to intervene in the situation that was developing. Also this is placed in the larger organisational context, operational history and command climate. One problem that leadership failed with regard to crew formation. One relied too much on defined positions, standardized operating procedures and interaction that turned out to be no replacement for deeply shared norms. The crew on-board the AWACS were not a proper team, but just a collection of technically qualified people. In part, this is unavoidable because military (and others) need to operate with organisational shells: teams that form and perform on short notice from individuals with the required knowledge and skills. Still, these generic crews need more to become a real team.

Other factors on-board included leadership and training. Snook spends a lot of attention discussing the diffuse responsibility (everybody was responsible, and therefore no one was) which led that everybody believe that either things were all right (no one reacted on signals), or taken care of, or that someone else (better qualified, more authority, greater proximity) would react. The presence of a shadow crew of experts may have enforced this bystander effect. Snook argues that this case illustrates that that, despite common HRO-wisdom, social redundancy not always works as a means to improvement of safety.

The firth chapter tells the organisational level account and tries to find out why the helicopter operations were not integrated in the other (air) task force operations. Snook starts by discussing the tradition of service separatism: army, navy and air force, which also played a role here. The helicopters were army and the rest of the operation air force. This separation is functional (e.g. because of different goals), but as Snook says, whatever you divide, you have to put back together again. The more divided, the more effort this will require. Organisational designers and leaders both have to be sensitive not only to the rational demands of mechanical interdependence, but also of the deeper challenges presented by differing subunit orientations.

Snook discusses then interdependence; the extent to which subunit activities are interrelated so that changes in the state of one element affect the state of the others. Drawing on Thompson, three different levels of interdependence are identified in order of increasing complexity: pooled (each part renders a discrete contribution to the whole and each is supported by the whole), sequential (activities are serial in nature, order is important) and reciprocal (activities related to each other as both inputs and outputs).

There were various failures to coordinate. Coordination can happen through standardisation (best used for pooled interdependence), by plan (for sequential) or by mutual adjustment (required to integrate reciprocal interdependence). As the book shows things went wrong on all three levels.

Coordination by standardisation only works when procedures are indeed standard. Rules are important and have their function. One advantage is that some decisions can be taken ahead of time, and thereby frees up capacity to handle unexpected things. The problem arises of course when for example (like in this case) local meanings are attached to commonly used acronyms. Subtle failures like these are usually no problem and therefore they do not have our attention. In the long run, however, they carry a much larger potential for organisational disaster than sudden large events.

Planning failed too. One example was that F-15s were not informed about helicopters, which actually makes perfect sense because F-15 are generally meant for high-altitude combat, while the all-round F-16s did receive this kind of info. This created kind of tunnel vision that left the F-15 blind to operational realities of others.

Coordination by mutual adjustment failed due to the fact that the various players had different radios and the policy of Min Comm - communication had to be as brief as possible. Sometimes, however, managing variability depends upon richness of information that was not available in this case. Weick states that “too much richness introduces the inefficiencies of overcomplication, too little media richness introduces the inaccuracy of oversimplification”. The Min Comm policy pushed towards the latter.

Chapter 6 takes a cross-level holistic approach (acknowledging the parts, and several levels, but at the same time looking at the whole) and comes with the theory of practical drift. Locally efficient practices can gain legitimacy through unremarkable repetition. Over time, globally designed but locally impractical procedures lose out to practical action when no one complains. Gradually, locally efficient behaviour becomes accepted practice.

Snook proposes a theoretical matrix for practical drift. On one side, there are the logics of actions (rules or task); on the other is the situational coupling (loose and tight). The third dimension is time that takes the people from one quadrant to the next. Organisational members shift back and forth between rule- and task-based logics of action depending on the context and these shifts have a predictable impact on the smooth functioning of the organisation.

Things start typically in the first quadrant, labelled Design State which is tight coupled and rule-based. Things are organised in a rational way and typically (especially in this, military, situation) rule writers lay out detailed schemes, despite the fact that there were many uncertainties before the operations started. These rules are often designed for worst case scenarios and on the conservative side of things.

While rules are often designed for tightly coupled situations, the practical reality in large organisations is rather loosely coupled. We find ourselves in the second quadrant where the logics of actions are rule-based but things are loosely coupled. During the first phase people will hang on to rules and follow them, but gradually find that they are cumbersome, overdone and the like, especially given the normal constraints in time and resources. 

And so there is a disconnect between what is written down and how reality is. The consequence is a behaviourally anchored shift in logics of action from rules to task-based. The rules don’t match most of the time and so pragmatic individuals adjust their behaviour accordingly; they act in ways that better align with their perceptions of current demands. They start bending and breaking the rules. Mismatches between the local demands of the situation and those of global design rules occur with increasing regularity as operators gain personal experience in the field. This is practical drift; the slow and steady uncoupling of local practice from written procedure. This is also the motor for moving from the second to the third quadrant. Behavioural balance tips in the direction of local adaptation at the expense of global synchronisation.

In the third quadrant there is an “applied” world where locally pragmatic responses to the intimate demands of the task drive action. These applied solutions to real-world demands drive out what appear to local practitioners as ill-conceived design deficiencies. This is a normal phenomenon in all organisations; drift is a normal phenomenon. The problem is that people interact with others and it is assumed that others will conform to standard rules of engagement, even when we ourselves only rarely do. The third quadrant is a rather stable and resilient situation, but only for so long as things are loosely coupled. And nothing ever stays loosely coupled.

Sometimes small changes can suddenly turn a system from loosely couple into a tightly coupled situation, like happened on that fateful day in April 1994. The fourth quadrant is characterised by failure. Task-based action logics are ill fitted for a tightly coupled situation. Everyone was doing what they always did, perfectly normal. Only the combination of all those normal actions was very unlucky at that precise moment.

The common reaction after such a failure is re-design into a new version of the first quadrant and often accompanied by over-correction, partly because in general the systemic nature of events is overlooked. Instead one tries to fix some broken parts, for example by stricter rules or tighter supervision. Left unchecked, such organisational knee jerks provide the system with the necessary energy to kick off the next cycle into disaster. The tighter the rules, the greater the potential for sizable practical drift to occur as the inevitable influence of local tasks takes hold.

The final chapter tries to come with some conclusions. One is that there were no bad guys, and so no one to blame. There weren’t any catastrophic failures of material or equipment, hence nothing to fix. This accident occurred not because of something extraordinary but because of the opposite.

The problem is that these things are hard or impossible to foresee. Partly because we have limited capacity to process information, partly because we tend to see causality in a linear deterministic way. And very much so because complex organisations and systems are inherently unpredictable. The enormous complexity outstrips our cognitive capacities as bounded information processors. We cannot handle an infinite number of possible outcomes.

One individual lesson that Snook delivers is that it is important to avoid framing causal questions as decisions. Blindly accepting the assumption that individual decisions are the key to understanding such events, makes us logically preclude other, perhaps more fruitful possibilities. Ask for a decision and the attribution falls onto the shoulders of the decision maker and away from potential situational factors that influenced action.

A group lesson from this book is that redundancy is no miracle cure. Redundancy can make operations more difficult to understand and diagnose. Redundancy ca be good, but also have adverse effects, and it is important to understand that redundant components often are less independent than assumed.

On an organisational level we can learn that just because something isn’t broken, this doesn’t mean that it isn’t breaking. Human beings are bad monitors of systems that rarely fail. Even near misses may not provide the needed hints, especially when we focus on the miss instead of on the near!

  • Look beyond individual error by framing puzzling behaviour in complex organisations as individuals struggling to make sense.
  • Follow the basic design principles of high-performance teams and think twice about chasing the advantages of social redundancy.
  • Treat organisational states of integration and reliability with chronic suspicion. Recognise them for what they are: constant outcomes of dynamic systems, ongoing accomplishments that require preventative maintenance.

Approaching cases like these do not look for a cause, but rather for a broader set of conditions that increase the likelihood of tragedies like these occurring. Who is conducting the search and why will determine the search. Better as what were the general conditions present in the task force prior to the accident that increased the likelihood that such a tragedy might occur? Framing the issue that way allows for more play in the system.

What appears to be dangerous drift from a global perspective often looks more like adaptive sailing from the local perspective. Comparable to Vaughan’s “normalisation of deviance”.

Snook lists the following general conditions that increased the likelihood of this accident:

  • A complex high-hazard organisation that couldn’t afford to learn from trial and error; hence the tendency to overdesign, and a bias to overcontrol.
  • A long enough period of loosely coupled time sufficient to generate substantial gaps between globally synchronised rules and local subgroup practice.
  • A reasonable chance that isolated subgroups would become tightly coupled at some point of time.

With regard to control, Snook remarks that, because we have the ability to interrupt the natural flow of things we do - quite often in a big way. But maybe we shouldn’t. When it comes to control sometimes less is more. Our natural inclination when it comes to managing hazardous systems is still to control. And when that doesn’t work, control some more.

(Princetown University Press, 2000, ISBN 0-691-09518-3)