Sie sind auf Seite 1von 20

OPERATIONAL SAFETY:- A NEW FRONTIER IN DAM SAFETY

MANAGEMENT
Dr. D.N. D. Hartford*
*
British Columbia Hydro and Power Authority (BC Hydro)
6911 Southpoint Drive, Burnaby, BC, V3N 4X8, Canada
e-mail: des.hartford@bchydro.com, webpage: www.bchydro.com

Keywords: Operational Modes, Functional Analysis, Safety Management, Uncertainty

Abstract. This paper examines the underlying philosophy and form of contemporary risk
assessment practices and questions whether or not the claimed benefits of risk assessment
are as real as they are perceived and disseminated at conferences and in the literature.
The paper then goes further and addresses the question “Has the ‘rush to risk
assessment’ been at the expense of other methods of dam safety management?” The
paper highlights key threats to the safety of dams associated with operational matters that
contemporary methods of dam safety and risk analysis omit or cannot include. It sets out
a rationale for a holistic systems engineering approach to dam safety analysis that is
much richer and more inclusive than the established decompositional approach to the
analysis of dam safety be it by traditional deterministic means or by means of risk
analysis. The paper sets out a rationale that argues against consideration of failure
modes as the foundation of a dam safety analysis, a view that has been central to modern
dam safety practice since modern approaches emerged in the 1970’s. Instead, the paper
considers that the functionality of the dam and reservoir system and the operational
modes of these functions serve as a much more useful starting point for dam safety
analysis and management of risk from dams.

1. INTRODUCTION
Limitations of contemporary accepted practice in risk assessment for dams became
apparent shortly after these methods were first introduced in a significant way in the early
1990’s. Just how significant some of these limitations might be became clearer in dam
safety analysis problems involving reservoir operations and flow control. These problems
are quite different to the commonly analysed problems of safety during floods and
earthquakes where the driving forces that the dam responds to are due to randomly
occurring natural events. In these conventional cases, the probability of demand is
derived directly from the statistical characteristics of the hydrological and seismic
hazards for the region where the dam is located. In these cases, the probability of failure
of the dam is usually expressed as a product of the, marginal probability of a demand on the
system,  ; and the conditional probability of inadequate system response given
that demand, 
 | .
In contemporary practice, demand normally relates to the physical loads applied to the
structure in terms of structural forces and flow volume and the capacity of the structure
relates to its physical strength and volumetric capacity, as characterized by flood volumes,
flow velocities, earthquake shaking and static forces due to the hydraulic pressure. The
1
Dr. D.N.D. Hartford.

analysis is normally of the static type characterized in terms of the peak force and peak
volume that serves to challenge the inherent structural and hydraulic capacity of the dam and
appurtenant structures. In many respects, the analysis of probability of failure is an extension
of the well-established load-resistance equations of factor of safety where the margin of
capacity over demand is represented in probabilistic terms.
Contemporary risk-informed dam safety practices follow the long established
deterministic philosophy of dam safety where a dam would be considered to be safe if it
could pass the design flood, withstand the design earthquake and exhibit industry accepted
factors of safety under normal, static operating conditions. This philosophy does not consider
demands caused by operational issues such as spillway gates failing to open for any of the
many possible operational reasons during normal inflows. Further, the philosophy did not
consider floods and earthquakes occurring consecutively as can occur during the flood
season. This was because floods and earthquakes are independent natural events and from a
design perspective, the joint probability of occurrence of the design flood and the design
earthquake was considered to be so low as to not constitute a concern. However, this design
focused thinking did not consider that a dam can be expected to suffer some degree of
damage to its structure or components during a moderate earthquake, and still be in a
damaged state when the flood season arrives.
Thus it became clear that contemporary risk analysis methods do not consider a great
many conditions that might occur during the life of a dam scheme with the result that the
estimate of risk that is arrived at is an underestimate of the actual risk. It might be argued that
contemporary practice can simply be extended to include these other factors and that its
current form is limited but not fundamentally flawed. However, in the context of inductive
analyses, which dominate contemporary risk analysis practice for dams, as noted in the Fault
Tree Handbook (page II-1)1: “For systems that exhibit any degree of complexity (i.e., for
most systems), attempts to identify all possible system hazards or all possible component
failure modes-both singly and in combination-:-become simply impossible.”
Thus, the degree of complexity of the design of the system, which in turn controls the
complexity of operation of the system, make the static analysis of the risk associated with a
dam challenging. However, dams do not operate statically; the operational state generally
changes over time as water flows into and out of the reservoir. In essence, the hydraulic load
applied to the dam is dynamic as controlled both by inflow and various orifice settings which
also change dynamically. This leads to complications in defining the hydraulic demand on
the dam. Further, the demands on operating components of the dam such as spillway gates or
production intake gates also change dynamically making definition of all of the actual
demands much more complicated than is assumed in static analysis.
While P(demand) may be a static probability that can be associated with external
natural processes like floods or earthquakes as in contemporary practice, more often, the
probability of demand is generated dynamically by the functioning of the system itself. The
pool level (and thus, load) on a spillway gate depends dynamically on how the dam and
reservoir as a system has been operated up to that time, and on how the dam and reservoir as
a system responds. Time is essential, as is the possibly complex interaction of many
operational factors. The critical chains of events leading to an accident or failure may have an
emergent character, meaning that they are not obvious or easily identified a priori but arise
from the interactions of the system itself. An analyst creating an event tree would likely never
conceive of them, and if he or she did, the important element of time might be impossible to
account for.
Considering all of the above, it can be taken as given that contemporary risk analysis
practice results in an underestimation of the risk with no way of knowing the extent to which
the risk has been underestimated.

2
Dr. D.N.D. Hartford.

2. THE PHILOSOPHY OF THE FOUNDATIONS OF A NEW APPROACH


In my paper on Re-thinking Risk Analysis in Dam Safety Practice2 at this conference
I have introduced the idea that treating dams as systems, is ambitious but necessary because
the probabilistic risk analysis and risk modelling approaches that have evolved for dam safety
over the past four decades—since the Reactor Safety Study of the early 1970’s3 — are
inadequate to the task. These include engineering methodologies such as failure modes and
effects analysis (FMEA), fault tree analysis (FTA), event tree analysis (ETA), and other
engineering reliability methods4.
In contrast to the consideration of extreme loads versus structural or geotechnical
capacities, experience has shown that many dam failures, and perhaps the majority of dam
incidents, do not result from extreme geophysical loads but rather from operational events.
These incidents and failures occur because an unusual combination of reasonably common
events occurs, and that unusual combination of events has a malicious outcome. For example,
a moderately high, but nowhere near extreme, reservoir inflow occurs; the sensor and
SCADA system fail to provide early warning for some unanticipated reason; one or more
spillway gates are unavailable due to maintenance, or an operator makes an error, or there is
no operator on site and it takes a long time for one to arrive; and the pool was uncommonly
high at the time. This chain of reasonable events, none of which by itself is particularly
dangerous, can in combination lead to an incident or even a failure.
While, in principle, one could accommodate an unusual sequence of operational
events in a contemporary risk analysis, it is unlikely that anyone would. Contemporary
risk analysis suffers from the limitations that only previously identified and enumerated
chains of events enter the analysis. An unforeseen or unusual combination of usual
conditions that is not specifically identified and enumerated cannot influence the
outcome of a probabilistic risk analysis (PRA). As a result, the availability on-
demand, reliability, durability, resilience and maintain-ability of most flow-control
systems remain uncertain. Maintenance policies change over the life of a dam. This
makes it difficult to integrate flow control into static, risk-based approaches for
assessing the safety of dams, to assess their ability to meet performance goals, and to
find good corrective measures where inadequacies exist.
The consideration of incident or failure scenarios resulting from operational chains of
events not specifically identified requires a new approach. This approach sees dams as
systems and includes the effects of successive or sudden changes of state due to operational
and maintenance activities, human and organisational factors, and laws, policies and
procedures, all of which occur in varying environmental conditions. It must attempt to
capture ‘emergent behaviours’ of the dam system. Emergent behaviours are those patterns
that arise through the interactions of system components and which are not necessarily
exhibited by the components themselves. They are, to a large extent, unidentifiable ex ante
but are observed in the system’s operating interactions. Thus, decompositional engineering
techniques such as event-tree and fault-tree analyses are not well suited to exposing these
types of problems.
Instead of focusing on failure modes as is the case in both traditional dam safety
practice and in contemporary risk assessment approaches, the essence of operational safety is
to focus on operational modes as the starting point and build the inspection, monitoring and
surveillance activities around the operational modes of the components. This philosophy will
be addressed in more detail in Section 5 below, where it will be demonstrated that a focus on
the operational modes of components and how these operational modes control the behaviour
of the system can be used to help overcome the limitations of established practices
where only previously identified and enumerated chains of events enter the analysis of

3
Dr. D.N.D. Hartford.

safety.

3. OPERATIONAL SAFETY
Dam owners are required to ensure that their dams are both structurally safe and are
operated in a safe manner where the term “safe” does not mean “absolutely safe”, but “safe in
a relative way”.
Dams, along with their associated spillways and other waterways, are built to retain
and control the flow of water for purposes of power production, water supply, navigation,
recreation, flood mitigation and possibly other things. Waterways, in this context, means all
facilities for water passage from a reservoir, whether outlets, channels, conduits, such as
penstocks and tunnels, or spillways. Waterways are themselves complex structures or
systems comprising structural, mechanical and electric subsystems, such as gates and control
equipment. A dam system – in contrast to just the dam itself – comprises the body of the dam
along with the various waterways past the dam, and usually with accompanying mechanical
and electrical equipment for on-site operational control. There could be a powerhouse or
other type of production unit included. The mechanical and electrical equipment increasingly
includes ever more complex real-time sensors and supervisory control and data acquisition
(SCADA) control systems. Cost pressures continually drive towards increasingly complex
automation. In a broader context, the dam system may also be considered to include the
whole of the reservoir and its surrounding drainage, communication links and the human
organisation responsible for operating the system, including on-site operators, the dispatch
centre and company policy makers. The state and nature of these many components of a dam
system do not remain static during the lifetime of the system because of wear, ageing and
maintenance, as well as changes to the surrounding infrastructure and to the society that is
being served.

3.1 The Buncefield and Taum Sauk Operational Failures


The period 10th – 14th December 2005 provides two remarkably similar operational
failures of hazardous installations on different continents and in different industries. The
Buncefield (UK) fuel storage tank explosion and fire, and the failure of the upper reservoir of
the Taum Sauk pumped storage scheme (USA).
“On the night of Saturday 10 December 2005, Tank 912 at the Hertfordshire Oil
Storage Limited (HOSL) part of the Buncefield oil storage depot was filling with petrol. The
tank had two forms of level control: a gauge that enabled the employees to monitor the filling
operation; and an independent high-level switch (IHLS) which was meant to close down
operations automatically if the tank was overfilled. The first gauge stuck and the IHLS was
inoperable – there was therefore no means to alert the control room staff that the tank was
filling to dangerous levels. Eventually large quantities of petrol overflowed from the top of
the tank. A vapour cloud formed which ignited causing a massive explosion and a fire that
lasted five days. Fortunately there was no loss of life, largely due to the occurrence of the
accident during the early hours of a Sunday morning when there was nobody at the plant
(Figure 1).”5
“The upper reservoir of the Taum Sauk Pumped Storage Project was overtopped
during the final pumping cycle the morning of December 14, 2005. Overtopping of the 10 ft.
high parapet wall and subsequent failure of the rockfill embankment formed a breach about
720 feet wide at the top of the rockfill dam and 430 feet at the base of the dam. Reservoir
data indicate that pumping stopped at 5:15 AM December 14, 2005 with the initial breach
forming at approximately the same time. Breach widening formed quickly, and complete
evacuation of the 4,350 acre-ft. upper reservoir occurred within about 25 minutes. The

4
Dr. D.N.D. Hartford.

breach flow passed into the East Fork of the Black River (the river upstream of the lower
Taum Sauk Dam) through a State park and campground area and into the lower reservoir.
Upon leaving the Lower Taum Sauk Dam Spillway area, the high flows proceeded
downstream of the Black River to the town of Lesterville, MO, located about 3.5 miles
downstream from the Lower Dam. The incremental rise in the river level was about 2 feet
which remained within the banks of the river. Fortunately, there was no loss of life in either
case, although in the case of Taum Sauk the superintendent of Johnson's Shut-Ins and Taum
Sauk State Parks and his family were swept away when the wall of water obliterated their
home. They survived, suffering from injuries and exposure.”6

Figure 1. Fire at the Buncefield Oil Storage Depot (2005)

Figure 2. Taum Sauk Upper Reservoir before and after the failure

According to the COMAH report on the Buncefield case5: “Failures of design and
maintenance in both overfill protection systems and liquid containment systems were the
technical causes of the initial explosion and the seepage of pollutants to the environment in
its aftermath. However, underlying these immediate failings lay root causes based in broader
management failings:
• Management systems in place at HOSL relating to tank filling were both
deficient and not properly followed, despite the fact that the systems were
independently audited.

5
Dr. D.N.D. Hartford.

• Pressures on staff had been increasing before the incident. The site was fed by
three pipelines, two of which control room staff had little control over in terms
of flow rates and timing of receipt. This meant that staff did not have sufficient
information easily available to them to manage precisely the storage of
incoming fuel.
• Throughput had increased at the site. This put more pressure on site
management and staff and further degraded their ability to monitor the receipt
and storage of fuel. The pressure on staff was made worse by a lack of
engineering support from Head Office.
Cumulatively, these pressures created a culture where keeping the process operating
was the primary focus and process safety did not get the attention, resources or priority that
it required.”
Although the Taum Sauk Failure Investigation6 was limited to physical causes of the
failure, the primary causes of failure were remarkably similar to those of Buncefield as
follows6:
• “The pressure transducers that monitored reservoir water levels became
unattached from their supports causing erroneous water level readings.
• The emergency backup level probes were set at an elevation above the lowest
points along the parapet wall; thus, they failed their protection role because
this enabled overtopping to occur before the probes could trigger shutdown.
• The normal operating high water levels of 1 ft. below the top of the parapet
wall was too near the top of the wall to allow for any mistakes of
misoperation.
• Visual monitoring of the Upper Reservoir water levels was almost nonexistent
and there was no systematic “ground–proofing” recorded of the relationship
of the top of the wall and associated water levels actually being achieved.
• There was no overflow spillway to safely carry accidental over-pumped water
downstream and below the dam.”

In a commentary on the Taum Sauk Failure and subsequent investigations, Rogers


and Watkins7 noted the following:
Visual oversight of the pumped storage operations were recommended by
Cooke in 1967 and initially implemented by Union Electric soon thereafter.
Sometime between 1968 and the failure in 2005, visual oversight was
discarded as being an unnecessary precaution by the operators (probably,
because there hadn’t been any safety incidents of note until the Niagara Falls
incidents in September 2005). The absence of visual inspections meant that the
deterioration of freeboard (due to progressive creep displacement of the
instrumentation conduits) was not noticed until the first overtopping incident
on September 25, 2005. At this juncture the actual water levels should have
been “ground truthed,” or compared with the levels being reported by the
reservoir’s instrumentation as reported by FERC Taum Sauk Investigation
Team in 2006. Instead, it was assumed that increasing the freeboard by three
feet would provide an adequate margin of error to account for the
instrumentation problems.
A retrospective review of the reservoir stage records suggests that
something was awry with the instrumentation because it repeatedly shows
water levels that do not make sense, based on the conditions prior to the
failure. Some examples include: 1) the water level within the reservoir not

6
Dr. D.N.D. Hartford.

rising when both pumping units were on; 2) the level rising 1 foot in 20
minutes with both pumping units on (it should have reported a 2.5 foot rise),
and, 3) a 1.9 foot decrease in the reservoir level with both pumps operating.
The system was not programmed to report or flag abnormal inflow rates to
alert plant operators although it was recorded in the facility’s computers as
reported by the FERC Taum Sauk Investigation Team in 2006.”

The above examples provide stark illustrations of the problems of operational safety
and the analysis of the risks that it presents, they also illustrate the problem of “unusual
combinations of usual conditions” as the operational status of the components and functions
of both installations at the time of failure was little different to the preceding days and even
years. However, there were subtle differences on the fateful days when the accidents
occurred. These incidents also illustrate the breadth and depth of analysis that might be
required to prevent incidents, failures, and tragedies caused by these “unusual combinations
of usual conditions”.

3.2 Addressing the problem of latent threats to safety


Techniques to address these types of latent threats to safety of installations are
emerging, and include Functional Resonance Analysis Method (FRAM) of Hollnagel8,9, and
the System-theoretic Accident Model and Processes (STAMP) of Leveson10. These incidents
also reveal that such incidents and accidents are not purely technical failures, there are human
and organisational factors that contribute to the incident or failure. The contributions to
latent conditions are many in the organization, but can be broadly outlined as including
the following:

• Licensing arrangements
• Societal expectations (including political expectations in the past and present)
• Organisation’s social responsibility (including corporate values and principles)
• Risk appetite (strategic and operational risk)
• Organisation’s strategies and policies
• Organisational culture
• Organisational arrangements
• Management and procedural arrangements including asset management
arrangements, and the maintenance and replacement regime.
• Human resourcing and competence (including compensation and rewards)
• Budgeting, financing and investment arrangements
• System reliability and availability targets and measures
• Human factors
• Design of the operations regime
• Implementation of the operations regime including forecasting
• Operator error in real-time operations
• Failures in the safety assurance process

In many respects, the above points to factors that go beyond human and organisational
factors within the boundaries of the owner of the hazardous installation and extends to wider
social factors. These wider social considerations can be illustrated by the following
developments after the Taum Sauk failure7:

7
Dr. D.N.D. Hartford.

“During the spring 2006 and 2007 legislative sessions the Missouri
governor and state legislature considered revising their dam safety act
(initially adopted in 1977, but not funded until 1981) to improve inspection
and maintenance of dams deemed to be a danger if they were to fail (e.g. lying
above populated areas). Some legislators from rural counties and agricultural
areas worried about increased costs associated with regulations so they voted
against the bill, defeating the measure.”

Recognition of the social dimensions of the nature of systems and the causes of
accidents, engineered systems are embedded within social systems is not new e.g. Miles11:
“Underlying every technology is at least one basic science, although the
technology may be well developed long before the science emerges (e.g.,
glassmaking). Overlying every technical or civil system is a social system
which provides purpose, goals, and decision criteria.”

Or as Leveson10 points out,

Effectively preventing accidents in complex systems requires using accident


models that include that social system as well as the technology and its
underlying science. Without understanding the purpose, goals and decision
criteria used to construct and operate systems, it is not possible to completely
understand and prevent accidents.

Against this background it is important to recognise that the public and political
environment may also contribute to the latent undesirable conditions. For example, while the
notion of the hazard potential of dams and reservoirs is well established and central to dam
engineering and dam safety, dams are frequently not considered as hazardous entities in the
public domain especially if the benefits of dams such as water supply, environmental flows,
power generation and recreational water sports are valued attributes. Further, reservoir-river
system operation, which involves the control of vast amounts of hydraulic energy on a
continual basis, is a hazardous process in the sense that process failure can lead to release of
the hazardous agent (the water) often with a catastrophic outcome. Under conditions at which
energy production is at its greatest, loss of control can be most threatening, and recovery
from loss of control most difficult if the design and implementation of the totality of the
system controls does not accommodate these “loss of control” circumstances.
Some elements of the system controls are more critical than others under normal
operational conditions. However, less critical elements can become critical to hydraulic
control if the system transitions to a state that renders them so. Thus a decision to defer
maintenance of a redundant feature may be the final causal factor in loss of control if it is
called into service. Situations such as this typify the way that “unusual combinations of not
uncommon conditions” can arise during operation within the normal operational parameters
and can lead to loss of control and system failure. This then points to the need to avoid the
“screening out potential failure modes of components or systems” as occurs in some
contemporary risk analysis practices. Instead, given the catastrophic loss dimensions of dam
failures and incidents it is more appropriate to retain all potential failure modes of
components and systems. As recommended by the US Nuclear Regulatory Commission as
far back as 19811: “In FMEA (and its variants) we can identify, with reasonable certainty,
those component failures having "non-critical" effects, but the number of possible component
failure modes that can realistically be considered is limited. Conservatism dictates that

8
Dr. D.N.D. Hartford.

unspecified failure modes and questionable effects be deemed "critical". This observation
points to a limited, if any, role if any for screening out failure modes on the basis of perceived
low probability.
Dams and reservoirs are not “fully engineered” systems in the sense that they depend
in part on natural systems; thus, dam engineering and dam safety assessment are difficult to
capture in simple rules and procedures. Dams and reservoirs by their very nature are not
deterministic systems, and their operation cannot be defined in deterministic procedures.

4. OPERATING ENVIRONMENT AND OPERATING OBJECTIVES


The legal regime and licensing arrangements of the jurisdiction in which the dam is
located govern the modus operandi of the owner-operator. Legal regimes for the storage and
release of water have existed over millennia. The legal regime reflects the customs of the
people in the jurisdiction and is a product of the social dynamics of that jurisdiction. There is
no reason to expect that the social dynamics and resulting culture and legal regime of one
jurisdiction can be imposed on a different society. Similarly, there is no reason to assume
that safety criteria or risk acceptance criteria that are considered to be appropriate for one
jurisdiction should apply elsewhere. The role of the ALARP principle in dam safety decision-
making is just one aspect of contemporary risk assessment practice that is not well
understood, and it is not correct to assume that ALARP is a generally accepted legal principle
that is applicable to all jurisdictions. Every jurisdiction should decide on the level of safety
that is appropriate for its circumstances.
Beyond the written laws which can be enforced either in a pre-emptive way or in a
reactive way, societal expectations of dam owners also control how the owner operates. Dam
owners may enter into types of social contracts with the communities affected by the dam,
reservoir and their operations regarding the management and use of the water. Such
arrangements often come under the leading of Social Licence to Operate over and above the
licensing requirements as applied by the responsible authority.
While these legal and social arrangements ultimately govern the methods of
operations, they do not necessarily control how an organisation operates in an absolute sense
on a day to day basis. The actual operations are controlled by the “directing mind” of the dam
owning organisation. The term “directing mind” is used to describe the thought processes that
guide the methods of operations of the organisation. These constitute the core of the
organisational and human factors and organisational culture that influence safety
performance.
The operational objectives are normally stated in general terms that are relevant to the
legal and social contexts of the operation of the dam and reservoir. These general terms and
objectives must then be transformed into operational procedures.

4.1 Implementation of the operating objectives


The operational procedures are generally focused on three questions:
• What is the given time for water to be released?
• How much water is to be released at a given time?
• By what means is the water to be released?

However, none of this can be achieved in isolation of the overall operational arrangements of
the dam owner. The components and assets that are required to achieve the operational
objectives must be established and maintained as part of the whole organisation. Dams and
reservoirs are very significant capital intensive fixed assets built for a purpose that are

9
Dr. D.N.D. Hartford.

best managed in terms of an Asset Management System with associated asset


management processes (Figure 3). The ellipse at the performance point in Figure 3 defines
the capability of the asset system to function normally and respond to the effects of
disturbances.

Figure 3. ISO 55000 (PAS 55) “Overview” asset management process diagram12
The objectives of dam and reservoir asset management include:

• ensuring the safety of the assets


• delivering the required availability of the assets
• meeting required levels of service and quality
• demonstrating economic and efficient use of funds
• optimising expenditure over time while maintaining an emphasis on safety
• demonstrating sound stewardship of the assets
• providing a strong basis for the “Safety Case” for the dam and reservoir.

From an operational safety perspective securing adequate long-term investment


funding for maintenance and renewa1 in competition with all the other potential uses of
available funding is one of the key drivers for improved dam and reservoir asset
management. In the absence of a clear asset management strategy, and well defined asset
management plans, organisations may not be able to secure the funding or plan the necessary
maintenance work effectively. This can lead to a maintenance backlog where essential
maintenance is deferred to release funding for other purposes which degrades safety.
Effective dam and reservoir asset management depends on the context of the business
and the physical nature of the assets. It requires the business context and the physical assets
to work in harmony. This is notoriously complex and can be very difficult to achieve. Where
an industry or business is dominated by infrastructure assets, the assets will have varied life
spans from possibly as little as 5 years up to 120 years and longer in the case of many large

10
Dr. D.N.D. Hartford.

dams. This means that there is no easy way to quantify the immediate benefits of effective
infrastructure asset management.
Asset management has a key objective of optimising strategies, policies, plans and
activities in the context of conflicting objectives. Infrastructure asset management should
encompass every stage of the life of the asset. The construction industry tends to concentrate
on certain parts of the infrastructure asset life cycle, often with different players at the
different stages. For long life infrastructure assets:
• the plan and design, and construct and handover stages are a very small part of
the whole life of the asset
• the renew and decommission stage, including repair, renewal and replacement,
can also be a very small part of the whole life cycle
• the eventual decommissioning phase (which may include deconstruction,
demolition and recycling) may be very brief, although sometimes taking place long after the
asset has passed out of useful service.
Single purpose reservoirs are not common although there are numerous examples of
dams built with the only objective to control and mitigate the floods. Such flood control
reservoirs often remain empty for prolonged periods of time and fill only during flood
periods, attenuating flood waters. There are also reservoirs built strictly for irrigation and in
many countries there are thousands of small dams built and operated by famers to provide
water for crops. Some dams are built for recreation with the only objective to capture water
during freshet and maintain a water level during the rest of the year for boating, swimming,
or fishing. Some dams are built strictly for hydroelectric generation and their only purpose is
to maximize generation output.
What happens most often, however, is that a single purpose dam changes in the way it
is operated over time because other goals are added due to the growing demands. In many
cases the construction of the dam invites further development downstream, and the
expectations of riparian communities change. Where once there was no demand for flood
control, now communities have been built in the downstream floodplain which require
protection. The dam could have been built with the only goal to generate power but now the
communities demand that the dam should also provide recreational benefits and even local
water supply and flood control.
The end result is that the dam and reservoir provide products and services that yield
an overall net benefit, recognising that full quantitative enumeration of the benefits may be
impossible. The products and services of (water) dams and reservoirs are all derived from the
inflows and the stored volume with the result that the effective operation of the reservoir
entails the development of a plan for the use of the water that is in harmony with the asset
management plan. In some organisations the reservoir operations may be within the asset
management organisation, in others, the reservoir operations may be part of the production
organisation. The relationship between the safety of the dam and maximising the benefits of
the water must be carefully considered. There are subtle differences between the
relationships between maximising benefits and securing safety of dams and reservoirs, in
comparison to the relationship between production benefits and safety in many hazardous
process industries;- in the case of dams and reservoirs, production is part of the safety
defences. This is because production provides significant safety benefits by way of control of
the hydraulic hazard. In hazardous process industries, safety concerns typically lead to a
reduction or shut down of production whereas for dams maintaining or increasing production
typically improves safety during dam incidents.
Figure 4 provides a schematic representation of how a dam and reservoir system is
operated12 which gives an initial sense of the types of interdependencies and

11
Dr. D.N.D. Hartford.

interrelationships in the operation of a dam and reservoir system.

Figure 4. Schematic representation of the operation of dam and reservoir assets


These interdependencies between safety, production and asset management can lead
to a complex array of interconnected activities, with numerous opportunities for things to go
wrong or to be executed in a way that over time are not in the interests of safety. One
question that arises is “just how numerous are the opportunities”? The nature of this problem
is illustrated in the following sub-section.

4.2 Operational analysis of inspection and maintenance activities


Inspections and maintenance are fundamental to safety assurance, and yet these
essential functions either considered in a simplistic way or not at all in contemporary dam
safety and risk analysis practices. A comprehensive safety analysis of a dam and reservoir
system should consider the “operational”, the “structural” and the “hydraulic” functions
together as they are interdependent. This necessarily requires considerably more detail than
the “failure modes identification and classification” workshops that are central to
contemporary risk analysis practices. Typically, these contemporary failure modes
identification workshops do not consider operational factors even though they could.
However the effort of doing so would be enormous given that in these contemporary
practices, failure modes are typically defined as sequences of events from initiation to failure.
The difficulties of associated with this systems function approach to failure modes analysis
have been illustrated in my “Re-thinking Risk Analysis” paper.
To help uncover the types of circumstances that underpin “unusual combinations of
usual conditions” it is useful to examine the number of opportunities for things to go wrong
in a sector of a routine inspection and maintenance process for spillway gates. Spillway gate
inspection, testing and maintenance is an essential element of the operational management of
dams and reservoirs, as it is the means of assurance of the relevance and accuracy of spillway
gate reliability parameters used in a spillway system reliability analysis. The broader
application is in the reliability of the totality of the discharge function, and the illustration
outlined below is just as applicable to maintenance, inspection and testing of the hydraulic
production systems. It is also applicable to any other function that relies on maintenance,

12
Dr. D.N.D. Hartford.

inspection and testing including human competence. Hollnagel’s Functional Resonance


Analysis Method is used to illustrate the considerations that should be included in such an
analysis. The four principal steps in a Functional Resonance Model (FRAM), analysis Figure
5 are:

1. Identify essential system functions; characterise each function by six basic parameters
(based on the Structured Analysis and Design Technique)
2. Characterise the (context dependent) potential variability (using a checklist)
3. Define functional resonance based on possible dependencies (couplings) among
functions
4. Identify barriers for variability (damping factors) and specify required performance
monitoring.

Figure 5. Unit of a Functional Resonance Model8


Step 1: Identifying essential system functions, and characterizing each
function by six basic parameters. The functions are described through six
aspects, in terms of their input (I, that which the function uses or transforms),
output (O, that which the function produces), preconditions (P, conditions that
must be fulfilled to perform a function), resources (R, that which the function
needs or consumes), time (T, that which affects time availability), and control
(C, that which supervises or adjusts the function), and may be described in a
table and subsequently visualized in a hexagonal representation. The main
result from this step is a FRAM ‘‘model’’ with all basic functions identified.

Step 2: Characterization of the (context dependent) potential variability


through common performance conditions. Eleven common performance
conditions (CPCs) are identified in the FRAM method to be used to elicit the
potential variability: 1) availability of personnel and equipment, 2) training,
preparation, competence, 3) communication quality, 4) human machine
interaction, operational support, 5) availability of procedures, 6)work
conditions, 7) goals, number and conflicts, 8) available time, 9) circadian
rhythm, stress, 10) team collaboration, and 11) organizational quality. These
CPCs address the combined human, technological, and organizational aspects

13
Dr. D.N.D. Hartford.

of each function. After identifying the CPCs, the variability needs to be


determined in a qualitative way in terms of stability, predictability,
sufficiency, and boundaries of performance.
Step 3: Defining the functional resonance based on possible
dependencies/couplings among functions and the potential for functional
variability. The output of the functional description of step 1 is a list of
functions each with their six aspects. Step 3 identifies instantiations, which are
sets of couplings among functions for specified time intervals. The
instantiations illustrate how different functions are active in a defined context.
The description of the aspects defines the potential links among the functions.
For example, the output of one function may be an input to another function,
or produce a resource, fulfil a pre-condition, or enforce a control or time
constraint. Depending on the conditions at a given point in time, potential
links may become actual links; hence producing an instantiation of the model
for those conditions. The potential links among functions may be combined
with the results of step 2, the characterization of variability. That is, the links
specify where the variability of one function may have an impact, or may
propagate. This analysis thus determines how resonance can develop among
functions in the system. For example, if the output of a function is
unpredictably variable, another function that requires this output as a resource
may be performed unpredictably as a consequence. Many such occurrences
and propagations of variability may have the effect of resonance; the added
variability under the normal detection threshold becomes a ‘signal’, a high risk
or vulnerability.

Step 4: Identifying barriers for variability (damping factors) and specifying


required performance monitoring. Barriers are hindrances that may either
prevent an unwanted event to take place, or protect against the consequences
of an unwanted event. Barriers can be described in terms of barrier systems
(the organizational and/or physical structure of the barrier) and barrier
functions (the manner by which the barrier achieves its purpose). In FRAM,
four categories of barrier systems are identified: 1) physical barrier systems
block the movement or transportation of mass, energy, or information, 2)
functional barrier systems set up pre-conditions that need to be met before an
action (by human and/or machine) can be undertaken, 3) symbolic barrier
systems are indications of constraints on action that are physically present and
4) incorporeal barrier systems are indications of constraints on action that are
not physically present. Besides recommendations for barriers, FRAM is aimed
at specifying recommendations for the monitoring of performance and
variability, to be able to detect undesired variability.

The process of inspection and maintenance can be set out in a process flow diagram
as illustrated in Figure 6. All aspects of dam safety management in the operational phase can
be systemised in terms of an organisation’s management system, which at the detailed level
of maintenance, inspection and testing could be of the form illustrated in Figure 6. This
process can be represented in a FRAM type analytical framework as shown in Figure 7.
A FRAM type model of the testing stage of the management procedure (Step 4) could
be represented as follows (Figure 8). There are obviously links between all of the functions in
Figure 8 through one or more of the six fundamental parameters in one step of the process
and between one or more of the six fundamental parameters of another step (Figure 9).

14
Dr. D.N.D. Hartford.

Figure 6. Maintenance Management Procedure within an Asset Management System

Figure 7. FRAM type concept model of the maintenance and testing procedure

15
Dr. D.N.D. Hartford.

Figure 8. FRAM type model of the testing procedure

There may even be several links between functional units and several reasons for
links between fundamental parameters. These links define the relationships between the
functions. Figure 9 illustrates the functional tasks and functional sub-tasks between some of
the links that might arise between steps 4 and 6 of the inspection and maintenance
management procedure. By proceeding in this manner for other activities and functions in the
hydraulic control process, a system functional model of the form illustrated in Figure 9
emerges.

Figure 9. Model (limited) of the functions and dependencies involved in the testing
procedure.
Clearly, there are numerous opportunities for things to go wrong between steps 4 and
6 of the inspection and maintenance process. This provides an indication of the potential for
organisational and human factors to compromise the whole inspection and maintenance
process, and that is assuming that inspection and maintenance has not been compromised by
other asset management and operational decisions. This leads to the conclusion that a high
degree of quality control and quality assurance is warranted to avoid the possibility that

16
Dr. D.N.D. Hartford.

operational activities do not compromise the safety of the dam.

5. OPERATIONAL MODES AND OPERATIONAL MODES MONITORING

The idea of “operational modes” as a basis for identifying failure modes was set out in
my Rethinking Risk Analysis Paper at this conference2 and elsewhere13,14. The essence of the
approach is that if one understands how a dam functions, then one can understand what
functional failure looks like. The causes of functional failure of components can be analysed
at a later stage. Thus, if one knows that the function of a spillway hoist motor is to rotate a
hoist drum which in turn applies a tensile force to a spillway hoist cable, then one knows that
the failure mode of the motor is loss of rotation. Thus, I am proposing that the starting point
for the safety analysis of dams should be in the establishment of the manner in which the dam
and reservoir function as a system under all operational conditions within the design
envelope, rather than the common practice of attempting to identify the failure modes, often
in a brainstorming workshop environment. This process involves defining the functional
modes of the components as well as the relationships between the functions of the
components.
In general and to be comprehensive, one must go beyond the physical components
and consider all entities that are engaged in the operation of the dam and reservoir, including
those entities that are indirectly and possibly only remotely involved. It appears to be best to
begin with a static (no time considerations) overview understanding of the functioning of the
dam and reservoir, gradually expanding the analysis to capture more considerations and
improving the resolution of the analysis by going to increasing levels of detail. Obviously,
for many dam and reservoir systems which tend to be complex, this process can be expected
to take a great deal of effort and consume vast resources.
Consideration of the amount of effort that goes into the analysis of the inspection and
maintenance function and also considering the complexity of the spillway analysis illustrated
in Figure 5 of my Re-thinking Risk Analysis paper2 raises the question as to whether or not
analysis of operational functions is worth the effort especially given that significant
confidence has been developed within the profession in the established brainstorming of
failure modes in a workshop format.

5.1 Operational modes analysis or failure modes analysis?


Although dam failures occur occasionally, dam safety practices have served dam
owners, governments and the public well over the decades with dam failures remaining
relatively rare amongst the world population of dams. Failures of dams are obvious and are
reported, what is less well reported is the number of safety incidents that occur. While the
number of dam failures compared to operational dams is a very low number, this low number
does not necessarily guarantee that such a large number of dams are operationally safe. This
arises because safety cannot be measured directly rather we measure safety in terms of
“unsafety”. Further, since the advent of risk analysis for dam safety it was taken as given that
risk was the antonym of safety and that acceptable risk meant “safe enough”, but this
assumption can be challenged from a philosophical perspective15. Rather, it can be argued
that safety is more than the antonym of risk, while at the same time it can be argued that
safety is the antonym of risk from some perspectives or risk16. However, in general, it is
unwise to assume that risk is the mathematical compliment of safety.
The net result is that there is no way presently to measure safety directly, and
therefore no direct way of determining if the available methods of dam safety analysis are
really giving us an objectively robust statement of the level of safety. Further, there is now
sufficient knowledge of risk analysis methods to know that all methods of estimating risk,

17
Dr. D.N.D. Hartford.

and therefore all statements of the level of risk associated with dams are an underestimate of
the actual level of risk with no way of understanding just how much too low the estimate is.
In other words, we don’t know if the identification of failure modes by contemporary
methods is exhaustive. This leads to a dilemma of errors of omission in risk analysis.
Something invariably gets left out. This then raises the question as to how many factors and
considerations can reasonably be left out of the risk-informed safety analysis without having
a detrimental effect on the robustness of the analysis and the determined level of safety. This
question can only be answered in a logical fashion if it is possible to conduct an exhaustive
analysis of the system, something that is impossible for a dam because dams and reservoirs
are not fully engineered deterministic systems. There is always natural variability in the
foundations, abutments, and the natural forces that are applied to the dam.
Failure modes types of analyses follows the well proven scientific method of
reductionist analysis whereby a complex system is increasingly de-aggregated into smaller
and smaller parts until the parts are amenable to analysis. The results of these individual
analyses are then recombined to provide the performance of the system. However, this
process assumes:
• The interactions between the resolved parts are either non-existent or so weak
that they can be neglected. This condition is necessary in order for the analysis
of the individual parts to actually be performed logically and mathematically
independently of each other.
• The relationships describing the behaviour of parts are linear. This condition is
necessary to ensure that the equations that describe the behaviour of parts is of
the same form as the equation describing the whole system, thereby permitting
the summative process required to “put the parts back together”.
In addition, the entities that reductionist methods are applied to are typically complex
systems and the conditions of superposition and linearity do not apply in systems; specifically
defined as “parts in interaction” where,
• The interactions between the parts are “strong” or,
• The interactions between the parts are “non-linear”.
The result is that gaps can be expected between the analysis assumptions and the
actual functioning of the entity. Therefore, there are two fundamental reasons why a
contemporary risk analysis results in an underestimate of the risk in the system:
1. Significant failure modes are left out unintentionally (or perhaps intentionally
based on low probability)
2. The interdependencies between components and functions are not considered

Alternatively, as introduced above, one could begin with a comprehensive operational


modes analysis as discussed elsewhere. Operational modes of components and entities can be
identified from the design, or determined by back analysis of the as-built functioning system.
The failure modes can be derived directly from the operational modes because the failure
mode of the component is the antonym of the operational mode. However, the identification
of failure mechanisms, that is the chains of events that result in failure as opposed to failure
modes of components and entities, is rather more difficult as the number of components is
typically large, as is the number of interactions and interdependencies. This means that
number of ways that combinations of adverse conditions that can lead to failure can combine
to give “unusual combinations of usual conditions” as the underlying cause of the failure can
be enormous, even unfathomably large.
Thus we are faced with an irresolvable dilemma, we either restrict the risk-informed
analysis of safety to an overview level as is the case with Failure Modes and Effects Analysis

18
Dr. D.N.D. Hartford.

recognising that many factors and considerations will probably be left out, or we carry out a
detailed analysis that is as exhaustive as possible recognising that somethings will probably
be left out, but that we will have done the best we can. While both approaches suffer from the
possibility that factors that have not been considered can arise, the problem is worse with
respect to those situations where identified actual or potential failure modes have been
omitted on the basis of low probability. This is because it is certain that failure modes that
are eliminated on the basis of low probability can occur. No-one should be surprised if the
screened out failure mode materialises. This is because as soon a probability is assigned, it
becomes certain that the event can happen sometime. It is also certain that the time of
occurrence of the event is unknowable, but that it could be to-morrow or sometime in the
near future. This then points again to the idea of analysis of all knowable failure modes and
combinations thereof at least to the extent that is practicable, inevitably omitting those that
are unknown and unknowable à-priori.
However, this brings us back to the problem of the resources and effort required. This
also requires a judgment of practicability by someone in authority on behalf of the owner, as
it is always possible that the owner will have to deal with an incident or failure. The incident
or failure may reveal circumstances that are at best embarrassing and at worst negligent, even
if the judgment of practicability is itself reasonable and in good conscience.
So what can we do? Well two things, early detection of deteriorating conditions and
application of conservatism in the face of uncertainty. Operational modes monitoring is an
effective way of at least getting ahead of finding failure modes and developing failure
mechanisms.

5.2 Operational modes monitoring as the platform for dam safety management?
While the effort involved in comprehensive identification of failure modes and failure
mechanisms is clearly problematic from a resource perspective, and while an exhaustive
identification appears to be impossible, knowledge of the operational modes of dams
provides an exceptionally powerful means of managing safety and controlling risk. This is
because deviations from the expected operating state provide the first signs of development
of potential failure sequences permitting intervention to correct any deviation13,14 and restore
the system to its normal stable state. Monitoring of the operational modes of a dam and its
components in the form of visual inspections, monitoring, and regular testing provide the
foundation of an effective dam safety management process. These activities can be carried
out at various levels of refinement where the most fundamental elements are available to and
implementable by all dam owners regardless of their institutional strength and sophistication.
Even if the design intent is not known (design details have been lost) or as constructed
records are inaccurate or don’t exist, it should be possible to re-establish the design intent,
erring on the conservative side to account for uncertainty. Leakage and deformation are the
two most fundamental indicators of poor performance, but regular inspection and testing of
equipment and components is an essential activity to uncover the operationally benign
deviations from expected performance.

6. CONCLUSIONS

Dam Safety clearly involves a great deal more than structural withstand and hydraulic
discharge capacity. The physical causes of loss of structural performance are relatively few
in comparison to the causes of inadequacy of hydraulic capacity. However, inadequacy in
monitoring and surveillance can lead to structural performance deterioration that leads to a
failure process. While the “rush to risk assessment” put major emphasis on failure modes and

19
Dr. D.N.D. Hartford.

their identification, the notion of failure modes identification long pre-dates the advent of
contemporary risk analysis methods in dam safety17. The rush to risk assessment also brought
with it probability and in particular subjective probability which is not without its difficulties.
There is no doubt that screening out of failure modes and failure mechanisms on the basis of
low probability is non-conservative. Such non-conservatism is not advisable given the
consequences of dam failure. The notion of basing the management of the safety of a dam on
a risk analysis process that is incomplete and non-conservative in dealing with its
incompleteness is not in the interests of dam safety. Instead, it is proposed that the focus
should be on the “operational” characteristics of the dam and reservoir system, and careful
monitoring for deviations from this operational state, with intervention to correct deviations
as early as possible.

REFERENCES
1
] USNRC. Fault Tree Handbook. Washington : US Nuclear Regulatory Commission, 1981.
2
] Hartford, DND. Re-thinking risk analysis in dam safety practice. 3rd International Dam
World Conference. Foz do Iguaçu, Brazil.
3
] USNRC. Reactor Safety Study. Washington : US Nuclear Regulatory Commission, 1975.
4
] Kumomoto, H. and Henley, EJ. Probabilistic risk assessment and management for
engineers and scientists. New York : IEEE, 1996.
5
] Competent Authority for the Control of Major Accident Hazards : Buncefield: Why did it
happen?, UK Health and Safety Executive, 2011
6
] Hendron, AJ, Ehasz, JL and Paul, K : Taum Sauk Upper Dam Breach – Technical Reasons
for the Breach, FERC No. P-2277. Federal Energy Regulatory Commission, Washington,
DC, 2006.
7
] Rogers, JD and Watkins CM. Overview of the Taum Sauk pumped storage power plant
upper reservoir failure, Reynolds County, MO. Paper 2-43, Proc. 6th International Conference
on Case Histories in Geotechnical Engineering, Arlington, VA. August 2008.
8
] Hollnagel, E., 2005. Functional Resonance Accident Model: method and examples.
Cognitive Systems Engineering Laboratory, University of Linköping, Sweden. open-
psa.org/joomla1.5/index2.php?option=com_sobi2&sobi2Task (link of June 2016), 2005.
9
] Hollnagel, E., FRAM – the functional resonance analysis method, Ashgate Publishing
Company Ltd. 2012.
10
] Leveson, N.G., Engineering a Safer World: Systems Thinking Applied to Safety, The MIT
Press. 2012.
11
] Miles, R.F., 1973. Systems Concepts: Lectures on Contemporary Approaches to Systems,
John Wiley & Sons Inc.
12
] Hartford, DND., Baecher, GB., Zielinski, PA., Patev, RC., Ascilla, R. and Rytters, K.
Operational Safety of Dams and Reservoirs, Thomas Telford, 2016.
13
] Hartford, DND, and Rigbey, SJ. Operational modes monitoring for prevention of failure
of dams within the design envelope. ATCOLD Symposium on Hydro Engineering, 26th
Congress, 86th Annual Meeting of International Commission on Large Dams, Vienna, 2018.
14
] Hartford, DND. Operational Safety and Prevention of Failure of Dams within the Design
Envelope. Q101-R21. 26th Congress, ICOLD, Vienna, 2018.
15
] Möller, N., Hansson, S.O. and Person, M. (2006) Safety is more than the antonym of risk.
Journal of Applied Philosophy, 23, 419–432.
16
] Aven, T. (2009b) Safety is the antonym of risk for some perspectives of risk. Safety
Science, 47, 925–930.
17
] US Bureau of Reclamation. Safety Evaluation of Existing Dams. 1983.

20

Das könnte Ihnen auch gefallen