Sie sind auf Seite 1von 14

GCPS 2018

__________________________________________________________________________

Functional Safety Practices for Operations & Maintenance

Denise Chastain-Knight, PE, CFSE, CCPSC


Exida Consulting, LLC
80 North Main Street
Sellersville, PA, USA
dchastainknight@exida.com

Copyright exida, 2018, all rights reserved.

Prepared for Presentation at


American Institute of Chemical Engineers
2018 Spring Meeting and 14th Global Congress on Process Safety
Orlando, Florida
April 22 – 25, 2018

AIChE shall not be responsible for statements or opinions contained


in papers or printed in its publications
Functional Safety Practices for Operations & Maintenance

Denise Chastain-Knight, PE, CFSE, CCPSC


Exida Consulting, LLC
80 North Main Street
Sellersville, PA, USA
dchastainknight@exida.com

Keywords: Functional Safety, SIF, SIS, maintenance, proof test, demand, operations, O&M,
diagnostics

Abstract
Operation and maintenance teams bear a heavy responsibility to ensure that the Safety
Instrumented System integrity and reliability is sustained. Decisions made long before operation
begins, or outside the operating environment often impact the ability to be successful. Design
choices fix physical configuration and instrumentation capabilities. Inadequate bid assessment
results in procurement of equipment that does not meet the specification. Turn around schedules
are extended stretching proof test intervals beyond design plan. Budget constraints don’t provide
for resources to collect field failure information and complete performance analysis. Repair and
replacement is deferred. All these things conspire to reduce the reliability of a SIS and increase
the probability that it will fail when needed. This paper will discuss some of the significant
challenges that Operations and Maintenance teams face, and recommend techniques to
incorporate as good engineering practices.

1 Introduction
A Safety Instrumented System (SIS) is a high reliability system designed to detect specific
process conditions and act to take the process to a safe state. The system is comprised of sensors,
logic solver(s), and final elements that make up permissive, preventative or mitigating Safety
Instrumented Functions (SIF). For the purpose of this paper, we will consider the scenario where
a SIF is an interlock designed to protect against a single hazard that may have one or more
potential initiating events. The SIF is designed to achieve a targeted risk reduction expressed as
Probability of Failure. The reliability of a SIF is assessed with three criteria 1) Average
Probability of Failure on Demand (PFDavg) in low demand mode or Probability of Failure per
Hour (PFH) in high demand or continuous mode; 2) architectural constraints; and 3) systematic
capability. Many variables influence the pass/fail criteria and a holistic management approach is
required to provide a system that is compliant at startup, maintainable throughout its lifespan and
attains the risk reduction required over time.
Functional safety is a lifecycle approach for the management of SIS. Many decisions made
during the lifecycle prior to commissioning have a significant impact on the Operations &
Maintenance (O&M) phase. Failure to recognize hazards, underestimating demand or risk
mitigation requirements, design methodology, and incomplete specification can result in an
inflexible or inadequate design. Conversely, overestimating can lead to complex and expensive
designs that require more maintenance and increase lifecycle costs. Business decisions, such as
increasing time between turnarounds or accepting specification variance in order to procurement
from lowest cost supplier place additional constraints on O&M management options.

2 O&M Requirements of IEC 61511


O&M requirements for SIS are described in IEC 61511 Clause 16. The objective of the phase is
“To ensure that the functional safety of the SIS is maintained during operation and
maintenance.” 1 O&M responsibilities include routine proof testing of the SIF, calibration, repair,
documentation, data analysis, and proactive replacement of components before end of useful life.
Although functional safety standards have been available for more than 15 years, organizations
have been struggling to successfully implement the O&M phase. The 2nd edition of IEC 61511
includes additional language to clarify requirements. Key responsibilities for O&M include, but
are not limited to:

• SIS maintenance plan and procedures development


• Maintenance planning
• Periodic inspection and proof testing
• Preventative and reactive maintenance activities
• Response to faults and failures identified by diagnostics
• Performance data collection and analysis
• Verification of conformity to procedures
• Competency of responsible personnel
• Stage 4 Functional Safety Assessment (FSA)

A written SIS maintenance plan is critical to ensure that functional safety is sustainable. The plan
should describe the methods and procedures required to carry out SIS activities. Supporting
procedures are written to ensure quality and consistency of maintenance activities. The
procedures will define how specific activities are to be carried out, timeline for executing
activities, and procedures for collecting and analyzing relevant data.

Proof test procedures are specifically required and IEC 61511 Section 16.3.1 outlines specific
requirements for content. An individual procedure is to be prepared for each SIF. The purpose of
a proof test procedure is to identify dangerous failures so repairs may be effected. The
procedures will describe in detail the steps of a proof test and full function test including
inspection, documentation of as found conditions, trip points, steps in test execution and
information to be recorded during the test. The test must be conducted in accordance with the
schedule set in the SRS and repeated following any repairs to the SIF.
Other procedures will describe methods to be employed when SIFs are disabled or bypassed to
mitigate risk (compensating measures) in absence of SIF protection, and procedure for returning
SIF to service after bypass. Diagnostic and fault detection alarms may be employed to detect
failures between proof tests. Procedures must define the action to be taken on alarms to verify
failures and actions to be taken following a demand on a SIF. Repairs must be made in a timely
manner and within the Maximum Permitted Repair Time (MPRT).

Data is to be collected during the O&M phases of the lifecycle. The data is to be analyzed,
compared to requirements set forth in the Safety Requirements Specification (SRS). Information
to be monitored includes frequency and cause of:

• demand on the SIF,


• spurious trips,
• SIF equipment failure,
• demand on and failure of Independent Protection Layers (IPLs).

Information gained by the analysis is fed into re-verification to determine if components need to
be replaced, proof test intervals or methods need to be changed, or other adjustments are
necessary.

Provisions must be in place in many departments within the organization providing an


infrastructure to support SIS objectives. The components used in SIS functions are typically
specialized and replacement devices must be stored by individual tag, rather than in shared
stores. Parts storage space may need to be expanded to provide segregation of SIS and general
parts, and procedures for procuring, receiving, storing and use may need revision. Activities are
to be executed by personnel who are qualified in SIS processes; therefor, new roles may be
created with associated competency and training requirements specified, and new training
content must be developed.

IEC 61511 requires routine monitoring of the health of the SIS, as well as periodic assessments
(i.e. Stage 4 FSA) to confirm the implementation and effectiveness of supporting procedures, and
revalidation to confirm reliability. In addition to procedure and performance data review, and
audit should review operating experience in normal and abnormal conditions.

3 Turnaround, Demand and Useful Life


There are a number of variables that impact SIF compliance to the three hurdles described in the
introduction. Turnaround (TAR) cycle often defines the full Proof Test Interval (PTI) because
many existing systems are not designed with bypass capability to permit online testing. Run to
failure is a common operating practice; however SIF components must be proactively replaced
within useful life. With TAR schedules and budgets being established by others, organizations
are challenged to meet requirements of the O&M phase of SIS lifecycle.
For low demand SIFs, proof testing is an important opportunity to identify and correct failures
between demands on the SIF. A SIS may be initially designed for a specified proof test interval
that corresponds to the site TAR schedule. If a decision is made to lengthen the time between
TARs, reliability of SIFs is negatively impacted. Reliability curves are a graphical representation
of how probability of failure changes over a SIFs Mission Time (MT). Figure 1 is a reliability
curve for a SIF with a 1 year PTI. The curved portion of chart represents the reduction in
reliability of the SIF due to unidentified failures. The vertical lines represent the improved
reliability gained through proof testing and repair. Increasing peak height illustrate the effect of
imperfect proof testing and repair and reflects how reliability, expressed as PFDavg, is degraded
over time. The dashed horizontal line indicates the PFDavg over Mission Time (MT). RRF and
SIL are indicated to the right for reference.

RRF

10 SIL 1

100 SIL 2

1000 SIL 3

Figure 1. Sample Reliability Curve 12 Month PTI

Risk Reduction Factor (RRF), the inverse of PFDavg, is an intuitive representation of SIF risk
reduction capability. The smaller the RRF, the less risk is reduced. As proof test intervals are
increased, the positive impact of proof testing is diminished and SIF PFDavg is increased, thus
RRF is decreased.
Where bypass is not available, a total shutdown, is usually required in order to complete full
function proof tests, so many low demand applications PTI must correspond to the site TAR
schedule. As TAR interval is extended, PTI must also be increased, and reliability is reduced.
For example, consider a simple low demand SIL 1 function where a single sensor trips to close
2oo2 valves. Table 1 illustrates how the calculated RRF decreases as PTI is increased.
Table 1. Impact of PTI on SIF Reliability

Case PTI schedule SIL Achieved RRF (1/PFDavg)


SRS requirement 12 mo Target 1 Min RRF 10
TAR 1 yr 12 mo 1 16.3
TAR 2 yr 24 mo 1 11.9
TAR 3 yr 36 mo 0 9.9
TAR 4 yr 48 mo 0 8.5
TAR 5 yr 60 mo 0 6.9

Probability of failure increases as PTI is increased because the restorative benefits of proof
testing and repair is not realized as frequently. Table 1 illustrates that the PTI for the example
SIF may be extended to a 2 year period and it will still provide specified minimum risk reduction
(RRF of 10), but further extension of PTI will result in insufficient risk mitigation. Figure 2
compares the reliability curves for the example SIF cases for SRS specified 12 month PTI and
the 60 month PTI extended turnaround schedule.

12 Month PTI 60 Month PTI


SIL 1 achieved SIL 0 achieved
RRF 16.3 RRF 6.9

Figure 2. Reliability Curve 12 and 60 Month PTI


Reliability theory, used for probability of failure calculation, is based on the concept of constant
random failure rate during useful life. There will be a period of early failure, or infant mortality
where weak components fail, a constant predictable failure period, and an increase when wear-
out is significant. Useful life is determined based on the expectation that components will be
replaced before wear-out failure rate becomes more significant than the predictable failure rate.
Figure 3 is known as the ‘bathtub” curve illustrating the concept.

Figure 3. Bathtub Curve

It is a common practice in basic process control to operate components to failure. The practice is
accepted because BPCS instruments are active and routinely monitored by operators so failures
usually readily identified. In low demand applications, SIS components are passive and not
routinely monitored by operators; therefore, SIS components must be replaced (or restored to
like new condition) at the end of useful life while failure rates are in the predictable range.
Failure to replace SIS components before end of useful life is likely to result in a SIF component
being failed in place when needed for protections.
SIF demand mode is determined based on the relationship between the demand frequency and
the test and repair frequency. SIFs are classified as low demand when the proof test frequency is
at least twice the demand frequency. Reliability is improved because repairs can be implemented
between demands. In high demand and continuous mode SIFs, the benefit of proof testing is not
realized because the hazard is considered to be always present. When SIL determination fails to
consider multiple demands on the system, or if PTI is increased, O&M teams may find SIF
designed for low demand are actually high demand SIFs and target risk reduction is not
achieved.
Data from a recent project illustrate issues that can develop when project activities are not
aligned with O&M objectives. Project SIL determination methodology did not consider multiple
initiating events when setting SIL target and it was assumed all SIFs were low demand mode. In
addition, the project utilized a 12 mo PTI and operations intended a 48 mo TAR schedule. The
unit was not designed to permit on-line proof testing so PTI was tied to TAR schedule. SIL
targeting was re-evaluated before startup considering expected operations basis. Table 2 is a
summary of the project and operations findings for SIL targeting when considering PTI and
impact of multiple demands2.

Table 2. PTI and Demand Impact on SIL Targeting

SIL 3 SIL 2 SIL 1 CPI Total

Project
Single initiating event
12 mo PTI
Low demand mode assumed
Low Demand 5 44 159 0 208

Operations
Multiple initiating event
48 mo PTI
Demand mode assessed
Critical Process Interlock (CPIa) in BPCS 108 108
Low Demand 3 24 27
b
Low Demand with residual RRF 5 2 7
High Demand 0 7 7
High Demand with residual RRF 11 20 31

The impact of re-evaluating the SIL target incorporating PTI and demand frequency was
significant. The operations team was able to reduce the number of SIFs by classifying a number
of them as Critical Process Interlocks (CPI) and taking advantage of BPCS separation c. The CPIs
are managed similar to BPCS elements, and the lifecycle costs will be lower than managing as
SIFs. Because the project failed to recognize multiple demand sources and longer PTI, more than
half the remaining SIFs were high demand, had to mitigate residual risk, or both. Design had to
be modified on many SIFs to achieve desired risk reduction.

a
CPI is a critical process interlock implemented in BPCS
b
Residual RRF describes the minimum risk reduction required within a SIL band.
c
The process included PLC controlled package equipment with controls and interlocks independent of plant BPCS
and SIS.
4 Inherited from Design
Resolving the discrepancy between project and operation assessment of SIL requirements can be
an expensive journey. On-line (partial) proof tests and diagnostics can be useful tools to close
gaps, however, design modifications and device replacements are often required to implement. In
order to perform on-line proof testing, process isolation (bypass) and testing access must be
provided. Diagnostics are only possible if the devices are diagnostic capable, and a supporting
infrastructure is provided for diagnostic feedback and administrative controls.

The O&M phase is entered with the expectations that the procured equipment will meet integrity
requirements specified in the SRS; however, reality sometimes falls short. For example, proper
device acquisition can be compromised by inadequate specification or failure to recognize
vendor data does not meet the requirements of IEC 61511. Clause 11.9.3 states “The reliability
data used when quantifying the effect of random failures shall be credible, traceable,
documented, justified and shall be based on field feedback from similar devices used in a similar
operating environment.”3 The requirement mays be satisfied one of two ways; certification or
Proven in Use Justification. The latter is not addressed here but the subject is covered by others4.
Specification that SIS components must be SIL certified device is thought to be the ‘easy’
solution; however, it is not a foolproof solution. Device certification claims must be vetted to
assure the end user that the SIS, and SIF requirements are met. IEC 61508 compliant
certification is completed by a Certification Body (CB) accredited by a nationally recognized
Accreditation Body (AB) that is a member of the International Accreditation Forum (IAF). A
certificate for an IEC 61508 systematic capability compliant device will display logos for the
both the CB and AB as illustrated in Figure 4.

Figure 4. CB and AB Logo Examples5


When presented with documentation that lacks either AB or CB logos displayed on the
certificate, exida policy is to treat the item as non-certified and vet failure rate data claims
against industry databases to confirm that it falls within process experience limits. For example,
the SILSafeData tool, http://silsafedata.com/, provides upper and lower boundaries for dangerous
undetected failure rates for devices commonly used in process industries. When data is outside
the SILSafe data limits to the optimistic side it will suggest that the equipment is more reliable
than it actually is. The example presented in Table 3 illustrates a typical experience where the
owner/operator discovers that the final element components for a SIL 2 1oo2 voting SIF was
selected based on documentation with optimistic data. In this case the final elements were
replaced with certified devices.

Table 3. Verification Scenario Comparison

Situation Verification basis SIL Achieved RRF

Reliability reported by project Optimistic data 2 173

Reliability credited by Verification Generic industry data 1 72

Reliability achieved by replacing Replacement certified 2 119


final elements with certified devices
devices

One SIF design parameter that is specified in the SRS is SIF response time. A common error is
to set this time based on the device response capability, or to not specify it at all. A SIF is
designed to protect against a specific hazard scenario. The purpose for setting the SIF response
time is to ensure that the SIF action is fast enough to prevent the consequence from occurring.
Proper methodology is to first calculate process safety time: the time between process deviation
reaching the SIF trip point, and the ultimate consequence. SIF response time is then set to no
more than 50% of the process safety time d. This can be a challenge where process safety time is
short. If it is realized during the O&M phase that SIF response is too slow, resolution can be very
costly. Devices may need to be replaced often requiring piping arrangement changes. If the SIF
response time cannot be practically reduced, process operating limits may need to be imposed to
increase process safety time, which can reduce production rates.

Procurement activities can introduce issues that impact O&M SIF management. The most
common is failure to technically vet bids resulting in purchase of a lower cost device that does
not meet SIF requirements. This can be due to the selection of a different manufacturer, selecting
a non-certified version of a model family, or selecting the correct model family but failing to
select features that support testing and diagnostics and/or is not properly suited to the application

d
Based on the Nyquist theory that in order to adequately reproduce a signal, it should be sampled at 2x the highest
frequency that is to be recorded.
(i.e. severe service). Consequence of incorrect procurement is often a failure to meet one or more
of the SIL verification hurdles. Resolution requires replacement of the device with a model that
meets the original technical specification, or implementation of the process or procedural
limitation that can reduce production or increase shutdown frequency.

5 Proof testing
Proof testing is done to identify dangerous failures so that repairs can be completed in a timely
manner. A proof test is not only an elector/mechanical assessment but must also include a visual
inspection (e.g. to look for plugged impulse lines, signs of corrosion and/or loose wiring or
terminals). The effectiveness of a test, or Proof Test Coverage (PTC), is determined based on the
percentage of dangerous failures identified by the test method. The full proof test conducted
during a shutdown will generally provide the highest PTC. Continuous or periodic diagnostics
may be used to identify dangerous failures with effectiveness described as Diagnostic Coverage
(DC). Diagnostic tests will have a coverage factor less than the full test. DC coverage may be
credited for electronic component (e.g. transmitter) on-line self-diagnostics, on-line partial proof
tests or SIF actuation. Ideally O&M organizations will develop effective6 proof tests to identify
failures in a timely manner, and implement a data analysis program to analyze data and
implement improved processes.

Testing, repair, replacement and monitoring are the dominate activities during the O&M phase.
A full proof test is required per the interval specified in the SRS following a predefined proof
test procedure. This test should include pretest preparation, as found/as left documentation, step
by step test, and posttest closeout. If repairs are required, the proof test must be repeated before
returning to operation. The proof test should include a trip test to demonstrate full function, and
it should be timed to confirm SIF response time is met.

On-line partial proof tests can be a useful tool for O&M teams. An on-line test, such as a valve
Partial Stroke Test (PST) can increase reliability over the SIF mission time. Figure 5 illustrates
the benefit of a partial stroke test for a SIL 2 SIF with 1oo2 voting on the final elements. The
objective is to increase full proof test from 12 months to 48 months by implementing a yearly
PST e.

e
Logic solver proof test is completed at 48 month intervals in both examples.
48 Month PTI with 12 Month PST
12 Month PTI SIL 2 achieved
SIL 2 achieved RRF 122
RRF 192 (RRF 91, SIL 1 without PST)

Figure 5. Benefit of Manual PST

Diagnostics can provide continuous monitoring and immediately identify certain failures. With
prompt identification and repair, the SIF reliability is maintained. Full and partial proof tests are
performed periodically so dangerous failures can be present between tests. Diagnostics provide
real time identification of some failures and a quick repair will minimize the at risk period.
Figure 6 illustrates the benefits of automatic diagnostics on a SIL 2 toxic gas detection SIF with
a desired 2 year PTI f.

24 Month PTI 24 Month PTI with sensor diagnostics


SIL 1 achieved SIL 2 achieved
RRF 24 RRF 128

Figure 6. Benefit of Automatic Diagnostics

f
Logic solver proof test is completed at 48 month intervals in both examples.
Proof and diagnostic testing are robust O&M tools to maintain the reliability of a SIF. A critical
component to the success of these tools is accurate representation of the proof or diagnostic test
coverage. Whether the test is a full proof test conducted during a TAR, a partial on-line proof
test, or a diagnostic test, the coverage allocated must be based on the percentage of potential
dangerous failures that can be identified by the test. Assumed values for PTC and DC tend to be
overly optimistic. O&M teams should evaluate proof test and diagnostic coverage for each
individual test based on the practices that are implemented.

A special case is the consideration of the allowing test coverage credit for an actual SIF trip.
Considering a trip equivalent to a full proof test is inaccurate because condition of wear elements
is not evident by a process trip. At most, trip of a SIF may have some level of diagnostic benefit.
When caused by a failure outside the SIF, a diagnostic coverage can be determined and credited.
If the trip is spurious, or caused by a SIF component safe failure, the diagnostic credit is limited
to the devices not involved in the failure. In addition, the frequency of a trip cannot be predicted,
therefore cannot be included in predictive verification models. Data collected from SIF trips can
be useful to the O&M team in determining if the SIF reliability is consistent with expectations
and may help to identify potential systematic issues with personnel or procedures.

A critical responsibility of the O&M team is to monitor the performance of the SIS and ensure
that every SIF achieves intended reliability. Data collection to document SIF demand, testing,
and repair is the foundation for monitoring and key input to periodic assessment, and leading
and/or lagging indicators should be utilized7. How and what data is collected can impact the
analytical benefit8. Information about demand and failure must be collected and analyzed to
confirm that system performance is consistent with specification, verification and validation
basis. Demand data for every IPL intended to reduce demand on a SIF or considered in setting
the SIL target, as well as the actual demands on a SIF should be recorded. A periodic demand
evaluation should be conducted to confirm that the actual demand on a SIF is consistent with
demand rate specified in the SRS. When a spurious trip occurs, an exhaustive failure analysis
should be conducted to determine the root cause of the failure and document the cause of that
failure. Routine proof and diagnostic testing must include provisions to capture as found/as left
information. The duration of repair periods following diagnostic alarm, faults or trips must be
recorded to confirm repairs are implemented within MPRT. Data must be analyzed and
compared to design parameters to confirm that the SIF is meeting design objectives, and initiate
corrective action if it does not.

6 Conclusions
The majority of a SIFs lifespan is spent in the O&M phase of the functional safety lifecycle.
O&M teams must contend with issues inherited from previous phases of the lifecycle, and
business decisions that negatively impact SIF reliability. An operating organization must learn to
recognize issues inherited from upstream in the lifecycle and develop requirements that can be
fed into the Functional Safety Management Plan to reduce the potential for similar flaws in the
future. Maintenance must develop and use robust testing and diagnostic procedures to monitor
the health of SIS devices, and implement a routine analysis program to confirm design basis
parameters are met.
7 References
[1] IEC 61511-1 Ed 2.0, “Functional Safety: Safety instrumented systems for the process
industry sector – Part 1: Framework, definitions, system, hardware and application
programming requirements,” IEC, Table 2, Geneva, Switzerland 2016
[2] D. Chastain-Knight, R. Butz, and W. Donaldson. “Functional Safety Management
Planning,” Mary Kay O’Connor Process Safety Center, November 2017.
[3] IEC 61511-1 Ed 2.0, “Functional Safety: Safety instrumented systems for the process
industry sector – Part 1: Framework, definitions, system, hardware and application
programming requirements,” IEC, Clause 11.9.3, Geneva, Switzerland 2016
[4] Rachel Amkreutz and Iwan van Beurden, “What does Proven in Use Imply?”, Hydrocarbon
Processing, 2004
[5] Iwan van Beurden and William M. Goble, “Safety Instrumented System Design, Techniques
and Design Verification”, Figure 8.6, ISA, Research Triangle Park, NC. 2018
[6] D. Chastain-Knight and J. Jenkins, “Effective Proof Testing for Low Demand Safety
Instrumented Functions” Chemical Processing, Fall 2018
[7] Steve Gandy, “Conforming to IEC 61511: Operation and Maintenance Requirements”,
http://www.exida.com/Resources/Whitepapers
[8] William M. Goble. “ Field Failure Data – the Good, the Bad and the Ugly”,
http://www.exida.com/Resources/Whitepapers/Field-Failure-Rates-The-Good-The-Bad-
The-Ugly

Das könnte Ihnen auch gefallen