Sie sind auf Seite 1von 77

System Reliability

Resit Unal
Engineering Management & Systems
Engineering Dept.
Old Dominion University
runal@odu.edu

Slide 1
System Life Cycle Concepts
NEED

1. CONCEPTUAL PRELIMINARY DESIGN


Acquisition
2. DETAIL DESIGN & DEVELOPMENT

3. PRODUCTION/CONSTRUCTION

4. OPERATION & SUPPORT


Utilization

5. PHASE OUT / DISPOSAL

Slide 2
Lifecycle Costs (LCC)

• 60 – 80% of LCC spent during operation phase

• Reliability is major cost driver


(Failures, repair, lost operation time, redesign..)

• 70% of LCC committed during Design Phase


Design fixes how system will be operated, maintained

Slide 3
System Reliability

• Engineering is concerned with how products/systems


work, but also need to understand,

• The ways in which they fail, effects of failures, &


aspects of design which affect the likelihood of
failure,

• Reliability Engineering.

Slide 4
Defining Reliability

Reliability is the probability that a given system will


perform as anticipated under given operating
conditions.

It can predict the probability that a system will


operate for a specified # of hours or a certain
average time between failures.

Slide 5
Failure Patterns: Bath Tub Curve
λ(t)
Failure Decreasing
Rate

Constant Increasing Failure


(Random Failure)

time

Burn-In Useful Life Wear-out

Slide 6
Non-Repairable systems

• The instantaneous probability of the first and


only failure is called the failure rate.

• Mean Time to Failure (MTTF)

Slide 7
Repairable systems

• Mean Time to Between Failures (MTBF)

• Repair takes time, V(t)= repair rate

• Mean time to Repair (MTTR)

MTBF
• Availability; A =
MTBF+MTTR

Slide 8
Tasks of Reliability
F(t) = P(t < T)
R(t) = 1 - F(t)

T = time to failure

I) First task is to derive & study this equation

II) Find the best way to increase Reliability

Slide 9
Find Best Ways to Increase Reliability

1. Reduce complexity
2. Increase R of components/subsystems
3. Parallel redundancy
4. Stand-by redundancy
5. Preventive Maintenance
6. Repair
7. Combination

Slide 10
Failure f(t): Exponential Distribution

1. Failures occur at random intervals.


2. Failure rate stays constant (time independent).

f(t) EXPONENTIAL

Constant
Failure
Rate (CFR)
λ(t) ↓ λ(t)↑
λ

Slide 11
Exponential Distribution; CFR

f(t) = λe-λt Failure


Rate

R(t) = e-λt
λ

F(t) = 1 – e-λt time

MTTF = 1/ λ (time unit)

Constant Failure Rate (time independent)

Slide 12
Example

Reliability of a pump seal given by R(t) = e- λt

λ = 0.0005

MTTF = ‫ 𝑒 ׬‬−λ𝑡 𝑑𝑡

MTTF = 1/λ
MTTF = 2000

Slide 13
Time Dependent Failure Distributions

• WEIBULL DISTRIBUTION Failure


Rate

𝑚 𝑡 𝑚_1 − 𝑡 𝑚
• f(t) = ( ) 𝑒 θ
θ θ
Time

• m = Shape parameter
θ = Scale parameter
(unit of time)

Slide 14
Weibull Distribution

m<1
Decreasing failure rate (Burn-in)
Failure
Rate

Time

m>1
Increasing failure rate (wear-out)
Failure
Rate

Time

Slide 15
Weibull Distribution

• Example: Ball Bearing, Weibull Distribution


Failure
m=4 Rate
θ = 100

50 Hour Mission

F(t) = 1 – e –(t/θ)m
time
F(50) = 0.0606

R(50)=0.9394

Slide 16
General System Reliability Models
• 1. Series (non-redundant) system
R1 R2

1 2

R1 = e –λ1t R2 = e –λ2t

Rss = R1.R2

Rss = ςki Ri

Slide 17
Series System

0.85 0.85
R1 R2

Rss = (0.85)(0.85)

= 0.7225

For series systems, high reliability of


components/subsystems are required.

Slide 18
Parallel Reliability Model

• Active Redundancy
R1
RPS = R1 + R2 – R1R2

R2

Reliability for parallel System:

RPS = 1 - ς𝑘𝑖 (1 − 𝑅𝑖)

Slide 19
Parallel Reliability Model

• Two Component System 0.85

R1
RPS = R1 + R2 - R1 R2

R1 = R2 = 0.85
R2

0.85
RPS = 0.98

Active redundancy: Reliability increases

Slide 20
m out–of–N Units System

• Active redundancy. At least m units out of N must


function for the system to operate normally. If
identical, independent, → Binomial Distribution
M

Rm/N = ∑ (mN) Rm (1-R) N-m


m

m/N Active Redundancy

Slide 21
m–out–of–N Units System

Aircraft has 4 identical, independent engines with R = 0.98

At least 2 engines must function (Active redundancy).


N=4
• R2/4 = ∑ ( m4 ) (0.98)m(1 - 0.98)4-m
m=2

• R2/4 = 0.99996

Slide 22
Complex System Reliability Analysis
Methods

• Network Reduction Approach


• Fault Tree Analysis
• FMEA Failure Modes and Effects Analysis
• FMECA Failure Modes, Effects and Criticality
Analysis

Slide 23
System Reliability Analysis Methods
I. Network Reduction Approach
Ex
a1
b1
a2

c R
system
a3
b2
a4
RSYST = (2Rb – R2b ) Rc

Reliability Block Diagram


Slide 24
Fault-Tree Analysis (FTA)

• FTA – Top down approach (Bell Labs/Boeing)


– Start with identifying an undesirable event

TOP EVENT

– Events that can lead to the Top-Event are


described with Logic Operators (AND, OR, EOR..)

Slide 25
FTA Logic Operators

• AND Gate
• OR Gate

• AND Gate: Provides a True Out-Put if ALL


inputs are True. A B AND
0 0 0
A 0 1 0
B 1 0 0
1 1 1

Slide 26
FTA Logic Operators

• OR GATE : Provides a true output if one or


more inputs are true
A B OR
0 0 0
A
0 1 1
B 1 0 1
1 1 1

Slide 27
FTA Logic Operators
n
• FOR = 1 - ∏ (1 - Fi)
i-1

n
• FAND = ∏ Fi
i-1

Slide 28
FTA Example

System designed to deliver emergency cooling to a nuclear reactor.

Protection system will not deliver a signal to pump & valve actuators
(p of failure = 0.0001)

Pump will fail to start when the actuation signal is received (p = 0.02)

A valve will fail to open when the actuation signal is received (p = 0.1)

The reservoir will be empty at the time of the accident (p = 0.00005)

Slide 29
FTA: Emergency cooling to a nuclear reactor

Coolant Sys Fails


pc= 0.000903

Both Subsys Fail


pvs= 0.000888 Reservoir Dry Signal Fail
pr=0.000005 pps=0.00001

Pump/Valves Fails Pump/Valves Fails


ppv= 0.0298 ppv= 0.0298

Pump Fails Both Valves Fail Pump Fails Both Valves Fail
pp= 0.02 pvs= 0.01 pp= 0.02 pvs= 0.01

Valve Fails Valve Fails Valve Fails Valve Fails


pv= 0.1 pv= 0.1 pv= 0.1 pv= 0.1

Slide 30
FTA: Emergency Cooling to a Nuclear Reactor

Using FTA Analysis:

• Probability of failure = 0.000915

• Reliability = 0.9991

Slide 31
FTA Use Advantages/Issues

1. One event at a time


2. Provides insight into system behavior
3. Top-down approach
4. FTA can get complicated for large systems
5. Difficult to handle degraded component states.

Slide 32
FMEA

FMEA = Failure Modes and Effects Analysis

• Concerned with determining design R by


considering potential failures and their
effects on the system.
• List each failure mode and effect on paper.
• Bottom-Up Approach.

Slide 33
FMEA

• “Military Standards: Procedures for


performing failure modes, effects and
criticality analysis” (1980)

• TYPICAL STEPS IN FMEA:

1. SYSTEM DEFINITION. Identify systems that


may fail.

Slide 34
FMEA

2. IDENTIFICATION OF FAILURE MODES.


Ways components may fail:
• Short
• Rupture
• Fracture
• Power Loss
• Out-of-Tolerance

• Operational & Environmental Conditions should be


listed.

Slide 35
FMEA

3. DETERMINE CAUSE.
– Stress
– Contamination
– Evaporation
– Fatigue
– Wear-Out
– Corrosion
– Errors

Slide 36
FMEA Documentation

Failure Cause Failure Action


Mode Mechanism
Fracture Excessive Fatigue Redesign
Vibration Mounts

Slide 37
FMEA Documentation

4. ASSESSMENT OF THE EFFECT. (leakage,


rupture)

Failure Cause Failure Effect


Mode Mechanism
Brittle seal Sustained Leakage Critical
low
temperature

Slide 38
FMEA

5. CLASSIFICATION OF SEVERITY

I. Catastrophic: Major damage/loss of life


II. Critical: Mission may be lost
III. Marginal: System degraded
IV. Negligible: Minor with no effect on perf.

Slide 39
FMEA

6. PROBABILITY OF OCCURRENCE.
Reliability testing, Failure Data, Expert Judgment

When NO sufficient Data Exist:


Military Standard: Procedures performing a FMECA
(1980)

Slide 40
FMEA

• High Prob. of failure (P ≥ 0.20)


• Moderate Prob. of fail. (0.1 ≤ P < 0.20)
• Occasional (0.01 ≤ P < 0.10)
• Unlikely (P < 0.001)

7. CORRECTIVE ACTION.

Slide 41
FMECA

List:
• Failure Modes
• Causes of failure
• Possible Effects
• Probability of Occurrence
• Criticality
• Possible Action
FMECA
Handbook of Reliability Engineering and Management
Slide 42
FMECA

Slide 43
FMEA/FMECA

• Serve as each possible failure mode detection


technique
• All possible failure modes & effects on
mission, people, & system can be identified
• Provide useful input data in performing
system safety and maintainability analysis
• Systematic approach to classify hardware
failures

Slide 44
FMEA/FMECA

• Provides input for development of built in test


software and equipment
• Can be used for design comparison studies
• Provides improved communication
• Procedure begins from detailed level and
works upward.

Slide 45
Failure Data Collection, Analysis
• FAILURE DATA USES
1. Compute Failure Rate
2. Determine failure distribution
3. Decisions on Redundancy
4. Trade-off Studies
5. Replacement Studies
6. Preventative Maintenance Decisions
7. Availability
8. Design Changes

Slide 46
Failure Data Collection, Analysis

• LIFE TESTING
– Time-to-failure (DOE Techniques)

• FIELD DATA
– # of Failures

Slide 47
Identifying Failure Distribution
We try to fit the data to a known distribution f(t)
1. Collect data
2. Hypothesize a distribution
3. Plot data on appropriate graph paper for this
distribution
4. If there is a good fit: the data points will be
clustered along a straight line
5. Estimate distribution parameters from the slope &
intercept

Slide 48
Fitting Data to an Exponential Distribution

Constant Failure Rate (λ)


R(t) = e- λt
F = 1-e- λt

ℓn(1/1-F) = λt This is in the form of y = mx


Estimate λ

Slide 49
Fitting Data to an Exponential Distribution

Example: Failure data given. We think it is Exponential

i ti Ln(1/1-F)
1 80 0.11778
2 134 0.25132
3 148 0.40546
4 186 0.58778
5 238 0.81093
6 450 1.09861
7 581 1.50407
8 890 2.19722

Slide 50
Fitting Data to an Exponential Distribution

• ℓn (1/1-F) = λt ln(1/1-F)
2.5
y = 0.0025x + 0.0346
R² = 0.9783

• Y = mx 2

• Slope is λ 1.5


ln(1/1-F)
λ =0.0025 1
Linear (ln(1/1-F))

• MTTF = 1 / λ

0.5
MTTF = 400 hrs
0 t
0 200 400 600 800 1000

Slide 51
Fitting Data to Weibull Distribution (m, θ)

• F(t) = 1 –exp [-(t/ θ)m]


• Linearize this by taking log twice.
1
• ℓn[ℓn ( )] = mℓn t - mℓn θ
1−𝐹(𝑡)

• This is in the form of: Y = mx + b

Slide 52
Fitting Data to Weibull Distribution (m, θ)

Failure Data given. We think it is Weibull distributed.

i ti Ln t Ln(Ln(1/1-F))
1 67 4.204 -1.706
2 120 4.787 -0.904
3 130 4.867 -0.366
4 220 5.393 0.092
5 290 5.669 0.582

Slide 53
Fitting Data to Weibull Distribution (m, θ)

From Graph, m= 1.53 (slope), θ = 197 hrs


Ln(Ln1/1-F)
1

y = 1.5307x - 8.0889
0.5 R² = 0.9676

0
0 1 2 3 4 5 6
ln(ln1/1-F)
-0.5 Linear (ln(ln1/1-F))
Linear (ln(ln1/1-F))

-1

-1.5

-2 Lnt

Slide 54
Operational Reliability Analysis

• Using Reducible Markov Chains

– MARKOV CHAIN ANALYSIS


• A Probabilistic Technique

Slide 55
Space Transportation Vehicle, STV

• STV on the launch site, no problems


• Launch preparations
• Launch pad operations
• STV in powered ascent
• Orbital operations
• Re-entry
• Landing, Site-1
• Post flight checkout
Success oriented path
Slide 56
What can go wrong ?
• Delay due to problems in launch preparations
• Launch delay, minor problems
• Launch delay, major problems
• Abort
• Landing, Contingency site
• Post Flight Check, Minor problems
• Post Flight Check, Major problems
• Attrition
• Major Damage/scrap

Slide 57
Reducible Markov Chains

• What Information we can get?

E= Expected number of times the process will


cycle, before STV is trapped in an absorbing
state (expected life)

A= Probabilities of reaching a particular


trapping (failing) state

Slide 58
Operational Reliability Model for STV

Slide 59
Results of Markov Chain Analysis

E = 47.98 LCC= $11,018

Probability of Attrition = 0.64


Probability of Major damage = 0.36

LAUNCH RELIABILITY EXPECTED LIFE


0.995 47.98 Sensitivity
0.99 33.66 Analysis
0.98 25.31
0.95 14.5

Improved reliability makes a significant difference


on the expected life of the STV.

Slide 60
Maintained Systems

I. Preventive Maintenance: Performed before


Failure Occurs
Measure: Resulting Increase In Reliability

I. Corrective Maintenance: Performed after


Failure Occurs (Repair)
Measure: Availability: The Probability That
System will be Operational When Needed

Slide 61
Maintained Systems

• Maintenance Issues
– Cost
– Safety
– Prob. of Maintenance Introducing Failure
– Human Reliability

Repair Times & Maint Probability are more Variable than


Failure Rates of Hardware

Slide 62
Preventive Maintenance

• Assume Ideal Preventive Maintenance:


• System is Restored to as-good-as-new
Condition.

• How much reliability improvement from


preventive maintenance?

Slide 63
Preventive Maintenance- CFR
Exponential: Constant Failure Rate
• Preventive Maint. has No Effect On Reliability

Exponential,
λ Constant Failure Rate

Time

DON’T DO IT as Preventive Maintenance


itself may introduce failures

Slide 64
Preventive Maintenance (Wear Out)

• Effect Of Preventive Maintenance on


Aging or wear (Weibull m > 1)
Failure
Rate

• WEIBULL m>1

Time

R(t) = e –(t/Ѳ)m

Preventive maintenance has a Positive Effect

Slide 65
Preventive Maintenance
Failure
Rate
WEIBULL EXPONENTIAL WEIBULL
m<1 m>1

CFR

time
DON’T DON’T DO
“LEAVE IT ALONE”

Slide 66
Corrective Maintenance (Repair)

• Corrective Maint: Performed after Failure


• Interested in:

• Reliability, but Also,


• # of Failures
• Time Required To Make Repairs

Slide 67
Corrective Maintenance (Repair)

• With corrective maintenance, two new


parameters come into play:

I. AVAILABILITY

II. MAINTAINABILITY

Slide 68
Corrective Maintenance (Repair)

• AVAILABILITY: The probability that a system is


available for use at a given time (the fraction
of time a system is in an operational state)

• MAINTAINABILITY: Is a measure of how fast a


system may be repaired after a failure.

Slide 69
AVAILABILITY
POINT AVAILABILITY A(t):
A(t) = Probability that system is operating
at time t.

INTERVAL OR MISSION AVAILABILITY A*(T)


A*(T) = Probability for Interval or mission
Availability
1 𝑇
A*(T) = ‫׬‬0
𝐴 𝑡 𝑑𝑡
𝑇

Slide 70
Steady State Availability

• it is often found that after some initial


transient effects, A(t) assumes a time-
independent value.

T →∞
1 𝑇
• A*(∞) = ℓim ‫׬‬0
𝐴 𝑡 𝑑𝑡
𝑇

STEADY STATE AVAILABILITY

Slide 71
Mean Time to Repair (MTTR)
• Constant Repair Rate: Vr(t) → Vr

1
• MTTR = Mean time to repair
Vr
• Availability tends to depend more on MTTR
than on the details of the repair distribution

Slide 72
Availability
𝑉𝑟
• A=
λ+𝑉𝑟

𝑴𝑻𝑻𝑭
• A=
𝑴𝑻𝑻𝑭 + 𝑴𝑻𝑻𝑹

• Availability tends to depend more on MTTR than on


the details of the repair distribution

Slide 73
EXAMPLE
i Tf (DAYS) Tr (DAYS)
1 12.8 13
2 14.2 14.8
3 25.4 25.8
4 31.4 33.3
5 35.3 35.6
6 56.4 57.3
7 62.7 62.8
8 131.2 134.9
9 146.7 150.0
10 177.0 177.1

Tf= Time Failed Tr = Time Repaired

Slide 74
EXAMPLE

• a) Calculate 6 month (182.5 days) availability


from data. There are 10 failures.
• b) Estimate MTTF & MTTR From Data

A(t) = 0.937
MTTF = 16.56 DAYS
MTTR = 1.15 DAYS

Slide 75
Conclusions
• Reliability is major cost driver
• Reliability Definitions
• Failure Patterns, Distributions
• How to determine failure patterns
• Failure Data Analysis Methods
• Operational Reliability Modeling
• Maintainability, Maintenance Decisions
• Availability

Slide 76
Resources

• Reliability & Maintainability Engineering: C. Ebeling.


• Reliability Engineering: E.E. Lewis.
• Handbook of Reliability Engineering.
• Military Standards: Procedures for performing
failure modes, effects and criticality analysis.

Slide 77

Das könnte Ihnen auch gefallen