Whenever I read professional literature, I tend to have highlighters and a pencil at hand. This enables me quick access to a book’s contents afterwards. It also is an indication how intensively I have interacted with the book. Some will have hardly any highlighted passages or comments in the margin and I have been mainly in a consumer, ‘nice to know’ mode. And then there are those one with many notes in the margin. This can mean two things: either it’s a sign that I disagree a lot with the author and/or that it makes me thinking and generates ideas (do mind that one does not exclude the other!).
I think it was 2006 that I read this book for the first time and looking at the state I left it in, almost a decade ago suggests that we’re talking definitely about the latter scenario. This is definitely a thought provoking book. A pity that I didn’t do a proper summary at the time but let me reconstruct some highlights and reflections based on what I jotted down in margins.
Framework
The book is intended for practitioners rather than researchers, which bodes well. The aim is to offer a framework for understanding accidents so that we are better able to prevent them. At the same time it should provide a consistent basis for analysing accidents and a method for responding to them in an effective manner.
The framework chooses a systemic point of view, meaning that accidents are due to complex coincidences rather than distinct individual causes. This goes with a method that is based on the concept of barriers as an effective means to guard against accidents and their consequences.
As Hollnagel says on page 200 (which you will read when you have 96% of the book) its main message is “that the way we think about systems has consequences for how we respond to them, both in direct interaction and in developing more considerate responses”. Or, as he posed 197 sides before: “We need to change our understanding of what accidents are, in order to effectively prevent them”.
Accidents and causes
The book defines an accident as a short, sudden, and unexpected event or occurrence that results in an unwanted and undesirable outcome. It must be the direct or indirect result of human activity rather than a natural event such as an earthquake. Accidents are linked to performance variations. Note that it’s possible that outcomes are sometimes unnoticed. Note also that while accidents are unexpected this does not mean that they are unimaginable. Even if we cannot prevent the event from taking place we may still be able to prevent the outcome from occurring.
There is much to learn from failures that are less severe than accidents. One doesn’t have to wait so long between opportunities to learn and the learning is less costly too because the consequence was smaller. With relation to the Heinrich triangle Hollnagel rightly observes that the importance lies not in the actual numbers, but in their meaning.
When something unexpected happens, people try to find an explanation. We even try to find causes when none exist. Humans are reluctant to accept that things happen without a cause: 1) Our technological world happens largely deterministic and reliable, 2) We are used to Laws of Physic and cause-effect thinking and 3) We are very uncomfortable with situations where we don’t know what to expect. This may lead to an attitude of any explanation being better than none.
Often one then looks for errors. A focus on errors takes (often) for granted that this is the most important thing to look at. It also implies quite simple cause-effect model instead of keeping an open mind and looking further.
The search for explanations is often based on the assumption that explanations can be deduced from the facts and that an objective truth can be found in the facts. Incorrect, according to Hollnagel - an intriguing point of view that goes contrary common belief and practice. He mentions a number of pitfalls like incomplete facts, facts that depend upon your choice of model, facts that are just causally unrelated observations. He then goes on explaining the difference between explanations and causes. This is a pretty thin line/nuance when looking at his example (and lack of proper definitions). Investigations often look for causes because of the models used. Hollnagel argues that if accidents have explanations we should rather try to account for how the accident took place and for what the conditions or the events were that led up to it. The response should not be to seek out and destroy causes, but to identify the conditions that may lead to accidents and find effective ways of controlling them. Still, looking for causes may be useful as parts of the explanations, rather than as ‘root causes’.
Another pitfall is that we have the tendency to equate chronological ordering with causality, especially if the two events have something in common. Another pitfall is that the search of causes stops when an acceptable one has been found - but what is acceptable at one point in time may not be so always so.
The chapter concludes with that the determination of causes is a relative and pragmatic rather than an absolute and scientific process. The value of finding the ‘correct’ cause or explanation is that it becomes possible to do something constructively to prevent future accidents. A cause can be defined as the identification, after the fact, of a limited set of aspects of the situation that are seen as the necessary and sufficient conditions for the observed effects to have occurred. The cause is constructed rather than found, just as the label ‘human error’ is a judgement only made in hindsight.
The first chapter may shake up a couple of existing beliefs, but this will only keep you sharp because you’ll have to rethink a couple of assumptions and established truths. Great stuff so far.
Models
Chapter 2 is about thinking about accidents. As we saw above sometimes we seek certainty rather than knowledge. An explanation is needed after an accident and an explanation we must have. This may lead to the situation where finding an acceptable cause can be more important than finding out why the accident really happened. This is not necessarily dishonest (not deliberately, at least), but can rather be seen as another example of ETTO: to save time and effort we don’t look beyond the first explanation found, especially when it confirms existing beliefs. The acceptability of a cause is furthermore often determined by psychological and sociological factors rather than logical factors.
David Hume argued that causality involved three components:
- Cause must be prior to effect in time
- Cause and effect must be close together in time and space
- There must be a necessary connection between cause and effect (the same cause has always same effect)
Causation is inferred from observation, but cannot be observed directly. The cause is constructed from understanding the situation, rather than found. Even more, a cause is selected from a set of possible causes, so it is rather the result of an act of inference rather than an act of deduction.
People are usually successful in reasoning from cause to effect. This tricks them into believing that the opposite can be done with equal justification. Concluding that the antecedent is true because the consequent is true may be plausible, but there can be also something else going on. In accident analysis it’s not sufficient to be plausible and one has to avoid logically incorrect conclusions.
It’s important to be aware of the accident model behind a description because the model determines both the search principles and the goals of the analysis (it also affects how you do risk assessment, by the way). The chapter then introduces three accident models, starting with the Sequential Accident Models (e.g. dominos). According to Dörner, sequential accident models are attractive because they encourage thinking in causal series rather than causal nets. They are also easy to represent graphically (fault trees) which helps communication of the results. Hollnagel then turns to root causes and as we will see Erik not too fond of the term root cause, or at least of some of its mainstream applications. His main message is that accident investigation often stops too early, at a convenient or arbitrary point. Constraints in time and resources often determine the stopping point, rather than completeness or correctness of the explanation.
The second type of models presented are Epidemiological Accident Models. These build upon four main points: 2) Performance deviations, 2) Environmental conditions, 3) Barriers and 4) Latent conditions. The latter are present within the system long before a recognizable accident sequence starts. They don’t start an accident, but can combine with active failures and become visible after the event. Epidemiological are better used to discuss complexity than sequential models, but still they draw very much on the same basis.
Systemic Accident Models are the third type. These models try to describe the characteristic performance on the level of the system as a whole, rather than on the level of special cause-effect mechanisms or even epidemiological factors. Systemic Accident Models show how a number of factors must work together to produce correct performance. Conversely performance failures can arise from instabilities in any of the control loops or from the interactions among them. In complex systems small and simple changes and events can have non-linear and disproportional effects.
Performance variability of people at the sharp end is determined by many factors. The main advantage of systemic models is their emphasis that accident analysis must be based on an understanding of the functional characteristics of the system, rather than on assumptions or hypotheses about internal mechanisms. Performance variability is necessary for users to learn and for a system to develop; monitoring of performance variability must therefore be able to distinguish between what is potentially useful and what is potentially harmful.
Barriers
Chapter 3 is relatively technical. It discusses the theory and concepts of barriers, barrier functions and barrier systems. Although one can describe an accident in terms of one or more barriers that have failed; the failure of a barrier is rarely a cause in itself.
Barriers are generally seen as something that 1) prevents an event from happening, or 2) lessens the impact of its consequences if it happens nevertheless. So barriers prevent or protect, they can be active or passive, they can be permanent or temporary. Understanding of barriers and their failure can help improvement.
The chapter makes the distinction between barriers by a number of classifications, characteristics or functions, including:
- Protection, Detection, Warning, Recovery, Containment, and Escape.
- Physical, Functional, Symbolic, Incorporeal (non-material).
The final part of the chapter looks at barrier quality through elements like Efficiency/Adequacy, Resources required, Robustness/Reliability, Delay in Implementation, Applicability in safety critical tasks, Availability, Evaluation and Dependence on Humans.
The role of barriers in accidents
Chapter 4 deals with understanding the role of barriers in accidents. On one side we have the straight-forward understood role of being something that blocks the path of an accident (and thus prevents it). Less understood is the other side, however, that a barrier not only prevents something unwanted from happening; barriers can also have unwanted side-effects and under certain conditions even the opposite of the desired effect.
The chapter starts by stressing once more that a failing barrier isn’t a cause in itself, although it may have an effect on the further development thereof. A barrier system exists to prevent something from happening but the harmful influences that it is expected to guard against are obviously different from the barrier as such.
The chapter spends some space in discussing various graphical representations (event tree, fault tree, causal tree and variation tree) and their limitations. Hollnagel then introduces a hexagonal function representation featuring 6 elements: Precondition, Input, Time, Control, Resource and Output. This systemic view emphasises how functions depend upon each other and can therefore show how unexpected connections may suddenly appear. The analogy is that the individual ‘snowflakes’ may come together and create an ‘avalanche’, i.e. an uncontrolled outcome.
Since we cannot be certain that functions will always be connected in a specific way, barriers must be considered in relation to individual functions rather than in relation to an overall structure. Barriers can themselves be described as functions, which make it easy to account for the effects of latent conditions and common failures. Barrier functions have multiple effects, of which some are unintended, and they may also affect each other, thus making the result less than straightforward.
Barriers are only one way of responding to the threat of an accident - it’s just one possible tool and not some kind of a miracle tool. The use of barriers for prevention and protection must be seen as relative to other solutions. A properly conducted risk assessment will make it evident where barrier functions must be located to bring about the necessary improvements in safety.
Systemic accident model
Chapter 5 discusses a systemic accident model that sees accidents as emerging phenomena in complex systems; as the result of an aggregation of conditions rather than the inevitable effect of a chain of causes.
It appears as if we are caught in a vicious circle:
- Technological innovations open up new possibilities and enable the construction of new or improved systems,
- We use these new and improved systems to make our lives ‘easier’ and thereby come to depend on them,
- In order to meet the demands for safe and reliable operation we require further technological innovation, which - unintentionally - may create new possibilities, etc.
Systems become increasingly complex and Perrow suggested that they will be uncontrollable for humans. Accidents in tightly coupled, complex systems are inevitable. This book, however, suggests that it’s not the complexity itself, but the variability of performance that is the main reason for accidents. This variability is definitely not the same as ‘human error’ and should not be considered as erroneous or unconstructive as such.
On the contrary, the variability is a necessary condition for the proper functioning of systems of even moderate complexity and without that they would not work. The variability comes from the need to be adaptive in a constructive manner, to be able to make ends meet. Humans are usually able to cope with the imposed complexity because they can adjust what they do and how they do it to match the current conditions. The net result of human performance is efficient because people quickly learn to disregard those aspects or conditions that normally are insignificant.
As far as the level of individual human performance is concerned, the local optimisation - through shortcuts, heuristics, and expectation-driven actions - is the norm rather than the exception. Indeed, normal performance is not that which is prescribed by rules and regulation but rather that which takes place as a result of the adjustments, i.e. the equilibrium that reflects the regularity of the work environment.
This chapter introduces the ETTO principle. The trade-off of efficiency and thoroughness is necessary for a complex system to work in the first place and therefore it’s in general rather useful than harmful. ETTO is generally a source of success rather than a source of failure. But besides being useful it also a source of variability and may facilitate an accident (e.g. because it invokes a number of heuristics that may be inappropriate in certain situations). This can be explained by the physical concept of resonance.
This leads to the presentation of stochastic resonance as an accident model (one that truly resonates with me, because I have actually experienced situations where several factors that were nominally okay enhanced each other and led to unexpected and unwanted results). The main building blocks for the systemic accident model are: Human performance variability, Technological glitches and failures, Latent conditions and Impaired or missing barriers. These factors don’t line up linearly and lead to an accident, but they will rather combine in an unexpected way so that small variations will lead to a detectable and unwanted outcome. Therefore it is often of limited value to find out exactly the specific causes of an accident. It’s more valuable to look at what brought it about. Look what is typical for an accident rather than what is unique.
The timing of accidents is unpredictable because they are the result of non-linear processes. We can, however, forecast where accidents are likely to occur by characterising the variability of the system, specifically the variability of components and subsystems and how this may combine in unwanted ways. Prevention can be of two kinds: barriers and performance variability management.
And so Hollnagel proposes his FRAM method (I think this book contained one of the first presentations of FRAM).
Preventing accidents
Chapter 6 is about accident prevention. While we cannot prevent all accidents, it’s desirable to prevent as many as possible and to learn from the ones that happened. After a short discussion of risk and the importance of imagination in understanding risks, Hollnagel does show how FRAM can help in ways that sequential models cannot through four steps:
- Identifying essential system functions
- Determining the potential for variability
- Defining the functional resonance (the main source of risks in this view)
- Deciding on appropriate countermeasures.
At the time I was unsure about the practical use of FRAM, but I saw surely some good value in the underlying thoughts. I still have reservations with regard to ease of use (interestingly, Hollnagel says on page 178: “In accident analysis, as in many other aspects of work, it is prudent to be pragmatic and not make things more complex than necessary”), but in the meantime FRAM has become some more tested and examples of its application are found online (besides, I see its greatest future as a tool for risk assessment rather than for accident investigation).
The final pages of the book deal with performance variability management, especially the recognition of the dual nature of performance. Because variability is normal, it should be possible to find indicators of the accident before the actual onset and use these as the basis for preventive actions. Alas the book feels a bit ‘rushed’ and unfinished at the end and may leave the reader with some questions how to deal with this further on.
(Ashgate, 2004, ISBN 978-0-7546-4301-2, hardcover)