Beruflich Dokumente
Kultur Dokumente
H. Paul Barringer, P.E., Barringer & Associates, Inc., P.O. Box 3985, Humble, TX 77347, Phone: 713-852-6810, FAX: 713-852-3749
Abstract
Reliability is the probability of equipment or processes to function without failure, when operated correctly, for a given
interval of time, under stated conditions. Reliability numbers, by themselves, lack meaning for making improvements. For
business, the financial issue of reliability is controlling the cost of unreliability from equipment and process failures which
waste money. Reliability issues are understandable when converted into monetary values by using actual plant data. Several
reliability engineering tools are discussed.
Page 1 of 8
Are top ten Pareto items identified? productive maintenance programs. Use root
--What are the specific items for correction? cause analysis to efficiently solve problems.
--Who is responsible for corrective actions? Learn and use a host of straight forward
--What is the budget (time and cost) for change? reliability engineering tools to solve problems.
Spur improvement programs by using statistics to
Answering these questions requires identifying the quantify and understand scatter in the results.
expected annual failures occurrences (both obvious Build Monte Carlo computer models to simulate
failures and cutback/slow-down failures) and then plant availability, reliability, maintainability,
calculating the losses. These facts help motivate capability, and life cycle costs for deciding
reliability improvement programs. reliability strategies.
Reliability requirements for businesses change with Keep all reliability scores measured in dollars.
competitive conditions and business risksthe
playing field is always tilting. Unreliability values The reliability improvement hierarchy shown in
change with business conditions. You dont need the Figure 1 depends on having lower level programs
best reliability in the world for your businessyou operational as precedents to accomplish higher level
just need a cost advantage over your fiercest objectives.
competitor. Even low cost industry providers need
reliability improvement programsreliability does not
stand still. Unreliability costs usually increase.
R
Motivations for reliability improvements are driven by E
the cost of unreliabilitythis tells the magnitude of P
the pain. Where the pain occurs within the plant is
important. Why the pain occurs gets to the root Root
cause of the problem for corrective improvement Cause
programs. Dont rely on a magic bullet to fix
Analysis
reliability problems because seldom will correcting
one item make a big change in overall reliability.
Total
Reliability Tools Productive
Reliability tools showed their real value in the 1930s, Maintenance
40s, and 50s when used on exotic military programs.
Fortunately many reliability tools do not need a
rocket scientist to use them cost effectively. Some Good Maintenance
simple reliability tools provide big gains quickly and Practices
defer the use of higher powered tools for squeezing
out the remaining improvements. In all cases, score
Figure 1: Reliability Hierarchy
cards for reliability improvements in business need
measurements in dollars.
The reliability improvement hierarchy uses a host of
Use of reliability tools is evolutionary: new tools to reduce both failures and the effects of
Start reliability improvement programs with failures. This work-process initiative is to gain a
simple arithmetic and spreadsheets. Quantify beachhead and expand territory by improvements.
important cost and failure numbers. Few lasting improvements are maintained without
Gain momentum with good maintenance improving reliability as a work-process using four
practices. Improve teamwork using total important programs:
Page 2 of 8
Good maintenance practices (GMP) uses the Generally, reliability work-process programs are
best-of-class practices for doing activities. driven by the need for improved profitability. They
This requires having trained people at various begin by determining the cost of unreliability. This
levels with a commitment to on-going training. exercise requires plant failure data and cost data.
Use good procedures and practices to avoid
calamities. Teach new techniques and verify the Age-to-failure data- Acquiring good data for
workforce has accepted them as requirements equipment sounds easy. In many ways, it is a
for correct performancesuch as precision difficult task. The data acquisition task is as difficult
alignment of rotating equipment along with as taking beautiful and salable photos with a camera.
precision alignment of all piping and equipment. Photos are advertised as easy to acquire. However,
Start teamwork between maintenance and few candid photos are sold and used in professional
production departments. literature. Taking a beautiful photo requires:
Total productive maintenance (TPM) is a 1. Good data logging equipment i.e., camera,
way of life for involvement of production 2. Knowledge of art and science of photography,
personnel into appropriate maintenance tasks for 3. Careful illumination of the subject,
tender loving care of both equipment and 4. Clear understanding of the photos purpose, and
processes. Aim TPM for effective use of 5. Photo appreciation for either art or commerce.
equipment and loss prevention by a preventive
maintenance effort with involvement of all Failure of just one element results in a useless photo.
people from top to bottom in the organization. Not all photos made by the experts are salable
Promote preventive maintenance through the most are junk. Trying to take a beautiful photo by
TPM program by use of self-directed small work simply walking down the street taking snapshots
groups. wont produce good results. Similarly, acquiring
Root cause analysis (RCA) works on defining photos rapidly is equally unproductive. In short,
problems into categories such as people, taking a good photo requires a carefully constructed
procedures, or hardware. Demonstrate RCA planit doesnt just happen!
solutions to problems will prevent recurrence,
meet the organizations goal, and be within an Abernethy (1996) says acquiring equipment failure
individuals control for preventing recurrence. data has three basic requirements (items 1-3); and
Reliability engineering principles (REP) uses for commercial businesses, add two other elements
new tools to solve old nagging problems. The (items 4-5):
program is interested in solving the vital few 1) Define an unambiguous time origin,
problems that save/make the most money to 2) Define a scale measuring the passage of time,
reduce overall cost of unreliability. Many tools 3) The meaning of failure must be entirely clear,
use bathtub concepts to match correct tools to 4) Measure cost consequences for failure, and
cost effective strategies by applying science and 5) Gain data analysis expertise for using data.
engineering to reliability-centered maintenance
(RCM) efforts. A thoughtful plan to acquire a few pieces of
carefully logged age-to-failure data for equipment is
Work-processes for reliability improvements start at better than vast quantities of poorly planned data.
ground level with a firm foundation and builds toward Notice the parallels between photography and failure
greater successes. Successful, on-going, reliability data.
improvement programs dont just happen overnight
they are the results of well thought out programs People in many plants say they lack any data
which are carefully implemented. (Barringer 1995) when in fact, data is all around
them in various degrees of usefulness. Most
industrial plants have been acquiring equipment
Page 3 of 8
failure data for many years and seldom is the data rotating equipment areas) to reduce and prevent
analyzed in a scientific manner. Rarely do people failures which cause plant outages.
acquiring the data see the data used to solve their
problems. The net result is vast data banks of nearly Plant data for calculating failure rates is accumulated
useless information acquired haphazardly and in a simple form for evaluating the big picture in
annotated poorly. Todays task is to mine through blocks as illustrated in Figure 2. Use a reasonable
piles of existing data while acquiring new age-to- length of time for the study period. Count the
failure data in a carefully thought-out manner so it number of failures occurring during the time interval
can be used for an economic advantage. The key to calculate MTBF. The failure rate is the reciprocal
phrase to remember is age-to-failure and of course of MTBF. This effort for using plant failure data
that requires a consistent definition of failure. involves arithmeticfor example: MTBFC = (35,040
hours)/(2 failures) = 17,520 hours/failure.
The field of reliability offers many technical
guidelines for how data should be acquired, Block Diagram Of Plant
annotated, and used for analysis. In many cases, A B C Summary
failures need a death certificate just as occurs with
Study 43,800 26,280 35,040 8,760 hrs/yr
human failures. Death certificates for humans have Interval
been so productive in producing analyzable results, Number
1 3 2 1.7 failures/yr
that it now illegal in the civilized world for a person to Failures
be buried without a death certificate listing age and MTBF 43,800 8,760 17,520 5,153 hrs/fail.
cause of death. Failure
Rate 22.8E-06 114.2E-06 57.1E-06 194.1E-06 fail/hr
Substantial amounts of failure data exist in the Figure 2: Plant MTBF And Failure Rate
literature awaiting knowledgeable use of the facts for
improving the reliability of plants and equipment From the failure rate details, another spreadsheet is
(IEEE 1984) (RAC 1994) (Barringer 1993-reading prepared to determine the lost time from the failures
list). New data acquisition initiatives by the Center which leads to pricing the cost of unreliability.
For Chemical Process Safety (CCPS 1989) using
proven skills of data analysis experts at Det Norsk Block Diagram Of Plant
Veratas (OREDA 1992), are underway for acquiring
A B C Summary
data from chemical plants and refineries. Dont
Failure
wait. Start with common sense data from your Rate 22.8E-06 114.2E-06 57.1E-06 194.1E-06 fail/hr
plants. Make failure data work for solving your Failures
0.2 1.0 0.5 1.7
personal and company needs. Per Year
Corrective
Time Per 18 24 83 40.6 hrs
Many plants now know their mean time between Failure
failure (MTBF) for pumps from special emphasis on Lost Time/yr 3.6 24 41.5 69.1 hrs
reducing failures on rotating equipment. Because of
redundancy, pumps (and other rotating equipment) Figure 3: Time Lost From Unreliability
today seldom shutdown plants. However, non-
rotating equipment often causes plant shutdowns From Figure 3 it is clear all failures are not equal.
few plants know MTBF for heat exchanges, Setting work priorities only on the basis of failure
columns, reactors, and so forth. It is important to use body counts can be misleading. Its important to
failure data to prevent failures and concentrate determine average corrective times for failures to
solutions on the vital few problems. For many plants make good estimates for total downtimeslost
the beachhead is secure in the rotating equipment production time for the plant is also lost money.
area. It is time to reassign resources into non-
rotating equipment areas (while holding gains in
Page 4 of 8
Average times for corrective efforts can be Block Diagram Of Plant
calculated in the same manner as shown in Figure 2. Losses A B C Summary
Gross
Generally several questions arise in using failure data Margin 36,000 240,000 415,000 691,000 1
Page 5 of 8
caused by numerous breakdowns. Examination of Most engineers need graphical representation of data
the reliability equation for Figure 5 shows chances of to fully understand problems. Without graphs,
meeting a five year turnaround interval without engineers are often overwhelmed by scatter as age-
failures is only 0.02% chance for successthis plant to-failure data provides no traditional X-Y facts for
will pay a high price for outages requiring corrective making conventional plots.
maintenance! Furthermore, inclusion of loss time
from turnaround outages (whenever they occur) will Probability tools are growing in importance with the
pull down availability. Likewise inclusion of costly use of personal computers which generate the
turnarounds will also increases the cost of curves with ease (Fulton 1996). Weibull probability
unreliability. charts are the tool of choice for reliability problems
because they often tell failure modes (how
How much can we afford to spend to fix the components die):
problems causing unreliability? Using a rule of infant mortalityuse a run to failure strategy,
thumb for one year payback, Figure 4 tells the chance failuresuse a run to failure strategy, or
expenditure limits: Dont spend more than $625,000 wearout failure modesconsider a timed
to correct problems in Cell C or more than $365,000 replacement strategy based on costs.
in Cell B.
Data from Weibull plots support RCM decisions
The cost of unreliability index is a simple and based on highly idealized bathtub curves. (Moubray
practical reliability tool for converting failure data into 1992) Weibull plots tell component failure modes.
costs. When the entire organization can understand
the problem on one side of a sheet of paper, effort Weibull charts are particularly valuable for pointing
focuses on solving important problems. correct direction for finding root causes of problems
using a few data points. Larger quantities of data
Of course other reliability data can be converted into add confidence to the decision making process, but at
uncomplicated, figure-of-merit, performance indices considerable greater expense for acquiring both
useful for getting a grip on reliability by using mean failures and data. The motivation for using
times between failure. Reliability is observed when probability charts is to understand failure data and
mean time between failure (MTBF) is large reduce costly failures by appropriate corrective
compared to the mission time. Likewise, small actions. Consider the curves in Figure 6.
values for mean time indices, compared to the
mission time, reflect unreliability. The slope of the line tells the failure mode is wearout
and the chances for failure can be read directly from
Accuracy of these simple indices are improved when the Y-axis.
many data points are screened using well known
statistical tools described in IEEE, OREDA, and In Figure 6, Pump Bs seal life is short. By standing
RAC publications referenced above. When only a and waiting for duty it experiences a service which is
small volume of data is available it is best analyzed 38.1/10.4 = 3.7 times more severe than Pump A
using Weibull analysis techniques for component based on characteristic life values shown in the
failures to arrive at better estimates of MTBF. figure. Perhaps a better strategy may be to swing
the pumps into service every other week rather than
Probability plots-Failure data is chaotic because of continue the current strategy of operation.
scatter in the data. Data scatter can be studied
arithmetically for first, quick look results, or refined
into statistical details providing richer descriptions
when converted into straight line plots of time-to-
failure against cumulative chances for the failure.
Page 6 of 8
WinSMITH Weibull Software
Pump Seal Life Strategy
making a model does not improve reliability
O 99 improvements must be converted into hardware.
C 95 W/rr
C 90
U 80
R 70 Static spreadsheet model always produce the same
60
R 50
Pump B--Back-up Pump
Runs Only When Pump A results each time using arithmetic failure data.
E 40 Fails
N 30 However equipment never fails at the same age-to-
C
E
20 failure or repair times in real life. Thus Monte Carlo
10 Pump A--Primary Pump models give different answers for availability,
C Run To Failure
D
5
reliability, and costs each time the model is solved.
F
Eta Beta r^2 n/s
% 2 10.402 2.151 0.983 10/0 Pump B
38.132 3.901 0.983 10/0 Pump A
Now with todays fast PCs and Monte Carlo models,
1 thousands of solutions can be generated quickly at
1 10 100
very low cost to model thousands of years of
Pump Seal Life (months) operation. From the multitude of answers produced
by the model come trends, plant characteristics, and
Figure 6: Weibull Probability Chart
often alternatives for correcting deficiencies without
incurring expense and delay of building hardware.
Knowing the odds for success/failure is an important
fact for assessing risks. Probability charts are easily Monte Carlo reliability models can realistically assess
interpreted, and simple plots of probabilities multiplied plant conditions when combined with costs, repair
by costs can be plotted against time to quantify times, and statistical events. Monte Carlo simulation
decisions and consider alternatives. In practice we models are very helpful for considering approximate
seldom have too many data points for assessing operating conditions in a plant including cost effective
risks. Weibull plots use few data and help the sizing of tankage to provide protection for short
decision making processsome data is better than duration equipment failures.
no data for making cost effective decisions.
Good simulation models help determine maintenance
Monte Carlo models-Plant availability/reliability strategies and turnaround timing for equipment
models are constructed starting with P&ID diagrams renewal as simulation models are usually based on
and predominate failure modes determined by simple, heuristic rules. Heuristic rules are based on
Weibull analysis. Monte Carlo models use failure observed behavior of components or systems and
data statistics along with computer generated random heuristic rules are easy to construct using computer
numbers to gain insight into economic improvements. systems knowledge based software.
Some simple, but practical, Monte Carlo Reliability models stimulate creative ideas for solving
demonstration models are available in Excel costly problems and prevents replication of old
spreadsheets to provide ideas about how Monte problems. Reliability models offer a scientific
Carlo models function. The demonstration models method for studying actions, responses, and costs in
also show end results for availability, reliability, and the virtual laboratory of the computer using actual
cost of unreliability when changes are made to failure data from existing plants. Monte Carlo
components of the modelthese models can be models are never better than the data supplied.
downloaded from the Internet. (Barringer 1996a).
Monte Carlo models provide a way to search for
Results from Monte Carlo models depend on good lowest cost operating alternatives and conditions by
failure data (either actual or engineered data), good predicting the outcome of conditions, events, and
repair data, and accurate configuration by the equipment. Monte Carlo models aid in finding the
modeler of how the plant physically operates. Of lowest long term cost of ownership.
course models are only approximations of plants and
Page 7 of 8
Other reliability tools-The list of reliability tools is Magazine 4 th International Reliability
great. (Barringer 1996b) The real key to success is Conference, Houston, TX.
putting reliability tools, developed in the past 60
years, to work for cost reductions. Few engineers Barringer, H. Paul, 1996, temporary web address at
receive university level training in the use of http://www.forbes.net//barring and look for a
reliability tools for solving practical problems. The permanent world wide web site address at
new reliability engineering tools require educating http://www.barringer1.com
management and training engineers in their use to
reduce risk, reduce costs, and improve operations. Barringer, H. Paul, 1996, An Overview Of
Reliability Engineering Principles, Energy Week
Summary 1996 Conference, PennWell Conferences &
Use failure data from existing plants with new tools Exhibitions , Book IV, Houston, TX.
to solve reliability problems in practical ways.
Businesses cannot afford too little nor too much Center For Process Safety, 1989, Guidelines For
reliabilityreliability must be harmonized with cost Process Equipment Reliability Data, American
issues. Institute of Chemical Engineers, New York.
Know the cost of unreliability based on failure data. Fulton, Wes, 1996, WinSMITH Weibull version
It is an important reliability index. The cost of 1.0J for Windows 3.1, Windows 95 and Windows
unreliability must be engineered and controlled by NT, Fulton Findings, Torrance, CA, Phone: 310-548-
solving top level items on the Pareto list. 6358.
As expertise in using failure data grows, use Weibull Moubray, John, 1992, Reliability-centered
analysis to help with problem solving by putting Maintenance, Industrial Press, New York
statistics to work on practical problems. Weibull
results give details for supporting RCM activities. IEEE Std 500-1984: IEEE Guide to the Collection
and Presentation of Electrical, Electronic,
Train staffs in use of new reliability tools to gain a Sensing Component, and Mechanical
competitive business advantage by increasing skills to Equipment Reliability Data for Nuclear-Power
reduce costs. Cutting edge companies are using, or Generating Stations , John Wiley & Sons, Inc.,
considering, new tools such as Monte Carlo models New York.
for finding cost effectively alternatives to reduce
their cost of unreliability. Omdahl, Tracy Philip, 1988 Reliability,
Availability, and Maintainability (RAM)
References Dictionary, ASQC Quality Press, Milwaukee, WI,
Abernethy, Dr. Robert B., 1996, The New Weibull Phone: 1-800-248-1946
Handbook, 2nd edition, Self published, North Palm
Beach, FL, Phone: 407-842-4082. OREDA-1992 (Offshore Reliability Data),
Contact Andy Wolford at DNV Technica, 16340
Barringer, H. Paul, 1993, Reliability Engineering Park Ten Place, Suite 100, Houston, TX 77084,
Principles, Self published, Humble, TX, Phone: 713- phone 713-647-4225, or FAX 713-647-2858.
852-6810.
RAC-NPRD-95, 1994 Nonelectronic Parts
Barringer, H. Paul, and Weber, David P., 1995, Reliability Data 1995, Reliability Analysis Center,
Where Is My Data For Making Reliability PO Box 4700, Rome, NY 13442-4700, Phone: (800)-
Improvements, Hydrocarbons Processing 526-4802
Page 8 of 8
May 5, 1996
Page 9 of 8