Beruflich Dokumente
Kultur Dokumente
Session Number: 3
Making the Most of Alarms as a Layer of Protection
Todd Stauffer
Director Alarm Management Services, exida LLC
Abstract
Alarms and operator response are one of the first layers of protection in
preventing a plant upset from escalating into a hazardous event. This paper
discusses practices and procedures for maximizing the risk reduction of this
layer when it is considered in a layer of protection analysis. It reviews how the
performance of the alarm system can impact the probability of failure on
demand (PFD) when the alarm is in service. Key recommendations will be
drawn from the new ISA-18.2 standard on alarm management.
Introduction
Alarm management and functional safety go hand-in-hand. Alarms play a
significant role in maintaining plant safety. They are a means of risk reduction
(layer of protection) to prevent the occurrence of a process hazard, as shown in
Figure 1. The performance of the alarm system can impact the design of the
safety instrumented system (SIS) by limiting the level of risk reduction that can
be credited to an alarm. This effects the integrity requirement of any Safety
Instrumented Function (SIF) that is used in conjunction with an alarm.
Figure 2 – Risk Reduction through the use of multiple protection layers [3]
Thus poor alarm management could reduce the protective capability of this
layer or eliminate it altogether, which could mean that that the actual risk
reduction no longer meets or exceeds the company-defined tolerable risk level.
This could have a ripple effect on the Safety Integrity Level (SIL) requirements
for numerous SIFs throughout the plant. The higher the SIL level, the more
complicated and expensive is the Safety Instrumented System (SIS). A higher
SIL will also require more frequent proof testing, which adds cost and can be
burdensome in many plants.
Safety Control Systems Conference – IDC Technologies (May 2010)
2
Session Number: 3 “Making the Most of Alarms as a Layer of Protection”
Alarm Management
In June of 2009 the International Society of Automation (ISA) released the
standard ANSI/ISA-18.2, “Management of Alarm Systems for the Process
Industries” (ISA-18.2) [4]. ISA-18.2 provides a framework for the successful
design, implementation, operation and maintenance of alarm systems in a
process plant. It contains guidance on how to address the most common alarm
management problems and on how to sustain the performance of the alarm
system over time. The standard is expected to be “recognized and generally
accepted good engineering practice” (RAGAGEP) by both insurance
companies and regulatory agencies.
ISA-18.2 prescribes following a lifecycle approach, similar to the functional
safety standard IEC 61511/ ISA-84 [5,6]. Following the alarm management
lifecycle helps achieve optimum alarm system performance and thus is
absolutely critical to making the most of alarms as a layer or protection.
A J
Philosophy
B I
Identification
C
Rationalization
Management
of Change
D
Detailed Design
Audit
E
Implementation
F H
Operation
Monitoring &
Assessment
G
Maintenance
One of the key elements of following the lifecycle is understanding the definition
of an alarm.
Alarm: An audible and/or visible means of indicating to the operator an
equipment malfunction, process deviation, or abnormal condition requiring a
response [4].
It is interesting to note that the tank high level alarm in the Buncefield depot
incident would not have qualified as an IPL since the alarm was not
independent from the initiating event (the failure of the associated tank level
measurement).
In a LOPA the frequency of a potentially dangerous event is calculated by
multiplying the probability of failure on demand (PFD) of each individual layer of
protection times the frequency of the initiating event. In the example LOPA of
Figure 4, the likelihood of a fire occurring after the release of flammable
materials is calculated assuming that the initiating event (the loss of jacket
cooling water) occurs once every two years. In this example the operator
response to alarm layer was assigned a PFD of 0.2.
Initiating Event Protection Layer #1 Protection Layer #2 Protection Layer #3 Protection Layer #4 Outcome
Loss of Cooling Operator Response Pressure Relief
Water Process Design (to Alarm) Valve No Ignition Fire
0.3 2.10E-05
0.07 Fire
0.2
0.01
0.5 / yr
No Event
Determining PFD
The PFD for the operator’s response to an alarm can be determined by adding
two separate contributions:
1) the probability that the alarm fails to annunciate, and
2) the probability that the operator fails to successfully detect, diagnose, and
respond to the alarm correctly and within the allowable time.
To analyze the PFD of the operator response to alarm layer we must first look
at the sequence of events which would make for a successful operator
response. The first step after the initiating event is the triggering / annunciation
of the alarm. If a failure were to occur in the hardware or software associated
with the alarm (the sensor, the control logic for triggering the alarm, or the
operator interface) then the alarm would never be annunciated. This represents
the probability of failure on demand that the alarm is annunciated.
Once the alarm is annunciated, a series of steps must be performed by the
operator to bring the process back to the normal operating range (reference) as
shown in Figure 5.
Reviewing this table shows that there are specific conditions that must be met
for a PFD of 0.1 or 0.01 to be appropriate. It is very rare that conditions in a
process plant would be conducive to claiming a 0.01 PFD.
A survey of the literature shows some that there is some variation in
recommended PFD. EEMUA 191, which also provides performance-based
guidelines, recommends not using a PFD below 0.01 for any operator action,
even if it is multiple alarmed and very simple [11]. The IEC 61511 / ISA 84
standard also provides PFD recommendations that can be used for performing
a LOPA (shown in Table 2).
which is the time between the initiating event and occurrence of the hazardous
event, then the alarm plus operator action has failed as a layer protection.
Therefore a key requirement for a Safety IPL alarm to be valid is
t Detect, Diagnose, & Respond + t Deadtime < t Process Safety Time.
In a high pressure event, the process safety time may be short (on the order of
30 seconds), which would preclude the use of operator intervention and dictate
the need for an automated protection response. One company has set a
minimum operator response time of 10 minutes and prescribed that any alarm
which has a process safety time of less than 10 minutes cannot be claimed as
a layer of protection (PFD = 1.0).
In some cases it is possible to adjust the alarm limit (setpoint) to increase the
time available for the operator so that it is greater than the minimum operator
response time.
t Detect, Diagnose, & Respond > t Minimum Operator Response Time
Adjusting the alarm limit can have significant tradeoffs. Setting the alarm limit
closer to the normal operating conditions will provide the operator with greater
time to respond, but may result in nuisance alarms under some operating
conditions. The occurrence of nuisance alarms can reduce the operator’s
confidence in the alarm and affect the probability that they would initiate the
required actions in the event of a genuine alarm. Setting the alarm limit closer
to the consequence threshold maintains the operator’s confidence in the alarm,
but affects the probability that the operator would complete the required action
in time.
Rationalize the Alarm Database to ensure that every alarm is needed and
prioritized
The modern DCS makes it extremely easy to add alarms without significant
effort or cost. This has led to alarm overload in the control room and a
proliferation of nuisance alarms. Alarm rationalization is the process of finding
the minimum set of alarms that are needed to keep the process safe and in the
normal operating range. Rationalization entails reviewing potential or existing
alarms to justify that they meet the criteria for being an alarm. It includes
defining and documenting the design attributes (such as priority, limit, type and
Once nuisance alarms are identified, they are returned to the Rationalization
stage of the lifecycle where they are reviewed and redesigned as necessary.
Poor configuration practices are one of the leading causes of nuisance alarms.
Effective use of alarm deadbands and on / off-delays can reduce or eliminate
them. An ASM study found that diligent use of on-off delays was able to help
reduce the number of alarms by 45 – 90% [14]. Having an unrationalized alarm
configuration can also contribute to nuisance alarms.
One important performance metric is the “steady state” rate at which alarms are
presented to the operator. In order to provide adequate time to respond
effectively, an operator should be presented with no more than one to two
alarms on average every ten minutes. As shown in Figure 9, alarm
management tools make it easy to benchmark alarm system performance such
as the average # of alarms per 10 minutes. If the measured performance
exceeds the target value, then the reliability of the operator’s response to a
safety IPL alarm will be compromised and the effective PFD may be higher
than assumed in the LOPA.
Figure 9. Measuring Alarm Load on the Operator (Avg # of Alarms /10 mins)
Conclusion
Alarm system design and performance has a significant impact on the ability of
the operator to maintain plant safety. In particular it affects the probability of
failure on demand of alarms and operator intervention when used as a layer of
protection. The presence of nuisance alarms, alarm floods and poorly designed
HMI screens (from a human factors point of view) will have a direct effect on
whether the operator can detect, diagnose and respond to an alarm in time.
Some practitioners might go as far as downgrading the risk reduction provided
by an alarm protection layer if the alarm system does not meet certain
performance criteria!
Following the alarm management lifecycle and recommendations of ISA-18.2
will help get the most of out of alarms as a layer of protection. Because of the
interaction between functional safety design and alarm management,
practioners are urged to take a holistic approach leading to increased plant
safety, reduced risk, and better operational performance.
References
1. “The Buncefield Investigation” -
www.buncefieldinvestigation.gov.uk/reports/index.htm
2. Stauffer, T., Sands, N., and Dunn, D., “Get a Life(cycle)! Connecting
Alarm Management and Safety Instrumented Systems” ISA Safety &
Security Symposium (2010).
3. ANSI/ISA-84.00.01-2004 Part 3 (IEC 61511-3 Mod) “Functional Safety:
Safety Instrumented Systems for the Process Industry Sector- Part 3”