System Reliability: Resit Unal Engineering Management & Systems Engineering Dept. Old Dominion University Runal@odu - Edu

System Reliability
Resit Unal
Engineering Management & Systems
Engineering Dept.
Old Dominion University
runal@odu.edu
Slide 1
System Life Cycle Concepts
NEED
1. CONCEPTUAL PRELIMINARY DESIGN

Acquisition
2. DETAIL DESIGN & DEVELOPMENT
3. PRODUCTION/CONSTRUCTION
4. OPERATION & SUPPORT

Utilization
5. PHASE OUT / DISPOSAL
Slide 2
Lifecycle Costs (LCC)
• 60 – 80% of LCC spent during operation phase
• Reliability is major cost driver

(Failures, repair, lost operation time, redesign..)
• 70% of LCC committed during Design Phase

Design fixes how system will be operated, maintained
Slide 3
System Reliability
• Engineering is concerned with how products/systems

work, but also need to understand,
• The ways in which they fail, effects of failures, &

aspects of design which affect the likelihood of
failure,
• Reliability Engineering.
Slide 4
Defining Reliability
Reliability is the probability that a given system will

perform as anticipated under given operating
conditions.
It can predict the probability that a system will

operate for a specified # of hours or a certain
average time between failures.
Slide 5
Failure Patterns: Bath Tub Curve
λ(t)
Failure Decreasing
Rate
Constant Increasing Failure

(Random Failure)
time
Burn-In Useful Life Wear-out
Slide 6
Non-Repairable systems
• The instantaneous probability of the first and

only failure is called the failure rate.
• Mean Time to Failure (MTTF)
Slide 7
Repairable systems
• Mean Time to Between Failures (MTBF)
• Repair takes time, V(t)= repair rate
• Mean time to Repair (MTTR)
MTBF
• Availability; A =
MTBF+MTTR
Slide 8
Tasks of Reliability
F(t) = P(t < T)
R(t) = 1 - F(t)
T = time to failure
I) First task is to derive & study this equation
II) Find the best way to increase Reliability
Slide 9
Find Best Ways to Increase Reliability
1. Reduce complexity
2. Increase R of components/subsystems
3. Parallel redundancy
4. Stand-by redundancy
5. Preventive Maintenance
6. Repair
7. Combination
Slide 10
Failure f(t): Exponential Distribution
1. Failures occur at random intervals.

2. Failure rate stays constant (time independent).
f(t) EXPONENTIAL
Constant
Failure
Rate (CFR)
λ(t) ↓ λ(t)↑
λ
Slide 11
Exponential Distribution; CFR
f(t) = λe-λt Failure

Rate
R(t) = e-λt
λ
F(t) = 1 – e-λt time
MTTF = 1/ λ (time unit)
Constant Failure Rate (time independent)
Slide 12
Example
Reliability of a pump seal given by R(t) = e- λt
λ = 0.0005
MTTF = ‫ 𝑒 ׬‬−λ𝑡 𝑑𝑡
MTTF = 1/λ
MTTF = 2000
Slide 13
Time Dependent Failure Distributions
• WEIBULL DISTRIBUTION Failure

Rate
𝑚 𝑡 𝑚_1 − 𝑡 𝑚
• f(t) = ( ) 𝑒 θ
θ θ
Time
• m = Shape parameter
θ = Scale parameter
(unit of time)
Slide 14
Weibull Distribution
m<1
Decreasing failure rate (Burn-in)
Failure
Rate
Time
m>1
Increasing failure rate (wear-out)
Failure
Rate
Time
Slide 15
Weibull Distribution
• Example: Ball Bearing, Weibull Distribution

Failure
m=4 Rate
θ = 100
50 Hour Mission
F(t) = 1 – e –(t/θ)m
time
F(50) = 0.0606
R(50)=0.9394
Slide 16
General System Reliability Models
• 1. Series (non-redundant) system
R1 R2
1 2
R1 = e –λ1t R2 = e –λ2t
Rss = R1.R2
Rss = ςki Ri
Slide 17
Series System
0.85 0.85
R1 R2
Rss = (0.85)(0.85)
= 0.7225
For series systems, high reliability of

components/subsystems are required.
Slide 18
Parallel Reliability Model
• Active Redundancy
R1
RPS = R1 + R2 – R1R2
R2
Reliability for parallel System:
RPS = 1 - ς𝑘𝑖 (1 − 𝑅𝑖)
Slide 19
Parallel Reliability Model
• Two Component System 0.85
R1
RPS = R1 + R2 - R1 R2
R1 = R2 = 0.85
R2
0.85
RPS = 0.98
Active redundancy: Reliability increases
Slide 20
m out–of–N Units System
• Active redundancy. At least m units out of N must

function for the system to operate normally. If
identical, independent, → Binomial Distribution
M
Rm/N = ∑ (mN) Rm (1-R) N-m

m
m/N Active Redundancy
Slide 21
m–out–of–N Units System
Aircraft has 4 identical, independent engines with R = 0.98
At least 2 engines must function (Active redundancy).

N=4
• R2/4 = ∑ ( m4 ) (0.98)m(1 - 0.98)4-m
m=2
• R2/4 = 0.99996
Slide 22
Complex System Reliability Analysis
Methods
• Network Reduction Approach

• Fault Tree Analysis
• FMEA Failure Modes and Effects Analysis
• FMECA Failure Modes, Effects and Criticality
Analysis
Slide 23
System Reliability Analysis Methods
I. Network Reduction Approach
Ex
a1
b1
a2
c R
system
a3
b2
a4
RSYST = (2Rb – R2b ) Rc
Reliability Block Diagram

Slide 24
Fault-Tree Analysis (FTA)
• FTA – Top down approach (Bell Labs/Boeing)

– Start with identifying an undesirable event
TOP EVENT
– Events that can lead to the Top-Event are

described with Logic Operators (AND, OR, EOR..)
Slide 25
FTA Logic Operators
• AND Gate
• OR Gate
• AND Gate: Provides a True Out-Put if ALL

inputs are True. A B AND
0 0 0
A 0 1 0
B 1 0 0
1 1 1
Slide 26
FTA Logic Operators
• OR GATE : Provides a true output if one or

more inputs are true
A B OR
0 0 0
A
0 1 1
B 1 0 1
1 1 1
Slide 27
FTA Logic Operators
n
• FOR = 1 - ∏ (1 - Fi)
i-1
n
• FAND = ∏ Fi
i-1
Slide 28
FTA Example
System designed to deliver emergency cooling to a nuclear reactor.
Protection system will not deliver a signal to pump & valve actuators
(p of failure = 0.0001)
Pump will fail to start when the actuation signal is received (p = 0.02)
A valve will fail to open when the actuation signal is received (p = 0.1)
The reservoir will be empty at the time of the accident (p = 0.00005)
Slide 29
FTA: Emergency cooling to a nuclear reactor
Coolant Sys Fails

pc= 0.000903
Both Subsys Fail

pvs= 0.000888 Reservoir Dry Signal Fail
pr=0.000005 pps=0.00001
Pump/Valves Fails Pump/Valves Fails

ppv= 0.0298 ppv= 0.0298
Pump Fails Both Valves Fail Pump Fails Both Valves Fail
pp= 0.02 pvs= 0.01 pp= 0.02 pvs= 0.01
Valve Fails Valve Fails Valve Fails Valve Fails

pv= 0.1 pv= 0.1 pv= 0.1 pv= 0.1
Slide 30
FTA: Emergency Cooling to a Nuclear Reactor
Using FTA Analysis:
• Probability of failure = 0.000915
• Reliability = 0.9991
Slide 31
FTA Use Advantages/Issues
1. One event at a time

2. Provides insight into system behavior
3. Top-down approach
4. FTA can get complicated for large systems
5. Difficult to handle degraded component states.
Slide 32
FMEA
FMEA = Failure Modes and Effects Analysis
• Concerned with determining design R by

considering potential failures and their
effects on the system.
• List each failure mode and effect on paper.
• Bottom-Up Approach.
Slide 33
FMEA
• “Military Standards: Procedures for

performing failure modes, effects and
criticality analysis” (1980)
• TYPICAL STEPS IN FMEA:
1. SYSTEM DEFINITION. Identify systems that

may fail.
Slide 34
FMEA
2. IDENTIFICATION OF FAILURE MODES.

Ways components may fail:
• Short
• Rupture
• Fracture
• Power Loss
• Out-of-Tolerance
• Operational & Environmental Conditions should be

listed.
Slide 35
FMEA
3. DETERMINE CAUSE.
– Stress
– Contamination
– Evaporation
– Fatigue
– Wear-Out
– Corrosion
– Errors
Slide 36
FMEA Documentation
Failure Cause Failure Action

Mode Mechanism
Fracture Excessive Fatigue Redesign
Vibration Mounts
Slide 37
FMEA Documentation
4. ASSESSMENT OF THE EFFECT. (leakage,

rupture)
Failure Cause Failure Effect

Mode Mechanism
Brittle seal Sustained Leakage Critical
low
temperature
Slide 38
FMEA
5. CLASSIFICATION OF SEVERITY
I. Catastrophic: Major damage/loss of life

II. Critical: Mission may be lost
III. Marginal: System degraded
IV. Negligible: Minor with no effect on perf.
Slide 39
FMEA
6. PROBABILITY OF OCCURRENCE.
Reliability testing, Failure Data, Expert Judgment
When NO sufficient Data Exist:

Military Standard: Procedures performing a FMECA
(1980)
Slide 40
FMEA
• High Prob. of failure (P ≥ 0.20)

• Moderate Prob. of fail. (0.1 ≤ P < 0.20)
• Occasional (0.01 ≤ P < 0.10)
• Unlikely (P < 0.001)
7. CORRECTIVE ACTION.
Slide 41
FMECA
List:
• Failure Modes
• Causes of failure
• Possible Effects
• Probability of Occurrence
• Criticality
• Possible Action
FMECA
Handbook of Reliability Engineering and Management
Slide 42
FMECA
Slide 43
FMEA/FMECA
• Serve as each possible failure mode detection

technique
• All possible failure modes & effects on
mission, people, & system can be identified
• Provide useful input data in performing
system safety and maintainability analysis
• Systematic approach to classify hardware
failures
Slide 44
FMEA/FMECA
• Provides input for development of built in test

software and equipment
• Can be used for design comparison studies
• Provides improved communication
• Procedure begins from detailed level and
works upward.
Slide 45
Failure Data Collection, Analysis
• FAILURE DATA USES
1. Compute Failure Rate
2. Determine failure distribution
3. Decisions on Redundancy
4. Trade-off Studies
5. Replacement Studies
6. Preventative Maintenance Decisions
7. Availability
8. Design Changes
Slide 46
Failure Data Collection, Analysis
• LIFE TESTING
– Time-to-failure (DOE Techniques)
• FIELD DATA
– # of Failures
Slide 47
Identifying Failure Distribution
We try to fit the data to a known distribution f(t)
1. Collect data
2. Hypothesize a distribution
3. Plot data on appropriate graph paper for this
distribution
4. If there is a good fit: the data points will be
clustered along a straight line
5. Estimate distribution parameters from the slope &
intercept
Slide 48
Fitting Data to an Exponential Distribution
Constant Failure Rate (λ)

R(t) = e- λt
F = 1-e- λt
ℓn(1/1-F) = λt This is in the form of y = mx

Estimate λ
Slide 49
Example: Failure data given. We think it is Exponential
i ti Ln(1/1-F)
1 80 0.11778
2 134 0.25132
3 148 0.40546
4 186 0.58778
5 238 0.81093
6 450 1.09861
7 581 1.50407
8 890 2.19722
Slide 50
• ℓn (1/1-F) = λt ln(1/1-F)
2.5
y = 0.0025x + 0.0346
R² = 0.9783
• Y = mx 2
• Slope is λ 1.5
•
ln(1/1-F)
λ =0.0025 1
Linear (ln(1/1-F))
• MTTF = 1 / λ
•
0.5
MTTF = 400 hrs
0 t
0 200 400 600 800 1000
Slide 51
Fitting Data to Weibull Distribution (m, θ)
• F(t) = 1 –exp [-(t/ θ)m]

• Linearize this by taking log twice.
1
• ℓn[ℓn ( )] = mℓn t - mℓn θ
1−𝐹(𝑡)
• This is in the form of: Y = mx + b
Slide 52
Failure Data given. We think it is Weibull distributed.
i ti Ln t Ln(Ln(1/1-F))
1 67 4.204 -1.706
2 120 4.787 -0.904
3 130 4.867 -0.366
4 220 5.393 0.092
5 290 5.669 0.582
Slide 53
From Graph, m= 1.53 (slope), θ = 197 hrs

Ln(Ln1/1-F)
1
y = 1.5307x - 8.0889
0.5 R² = 0.9676
0
0 1 2 3 4 5 6
ln(ln1/1-F)
-0.5 Linear (ln(ln1/1-F))
Linear (ln(ln1/1-F))
-1
-1.5
-2 Lnt
Slide 54
Operational Reliability Analysis
• Using Reducible Markov Chains
– MARKOV CHAIN ANALYSIS

• A Probabilistic Technique
Slide 55
Space Transportation Vehicle, STV
• STV on the launch site, no problems

• Launch preparations
• Launch pad operations
• STV in powered ascent
• Orbital operations
• Re-entry
• Landing, Site-1
• Post flight checkout
Success oriented path
Slide 56
What can go wrong ?
• Delay due to problems in launch preparations
• Launch delay, minor problems
• Launch delay, major problems
• Abort
• Landing, Contingency site
• Post Flight Check, Minor problems
• Post Flight Check, Major problems
• Attrition
• Major Damage/scrap
Slide 57
Reducible Markov Chains
• What Information we can get?
E= Expected number of times the process will

cycle, before STV is trapped in an absorbing
state (expected life)
A= Probabilities of reaching a particular

trapping (failing) state
Slide 58
Operational Reliability Model for STV
Slide 59
Results of Markov Chain Analysis
E = 47.98 LCC= $11,018
Probability of Attrition = 0.64

Probability of Major damage = 0.36
LAUNCH RELIABILITY EXPECTED LIFE

0.995 47.98 Sensitivity
0.99 33.66 Analysis
0.98 25.31
0.95 14.5
Improved reliability makes a significant difference

on the expected life of the STV.
Slide 60
Maintained Systems
I. Preventive Maintenance: Performed before

Failure Occurs
Measure: Resulting Increase In Reliability
I. Corrective Maintenance: Performed after

Failure Occurs (Repair)
Measure: Availability: The Probability That
System will be Operational When Needed
Slide 61
Maintained Systems
• Maintenance Issues
– Cost
– Safety
– Prob. of Maintenance Introducing Failure
– Human Reliability
Repair Times & Maint Probability are more Variable than

Failure Rates of Hardware
Slide 62
Preventive Maintenance
• Assume Ideal Preventive Maintenance:

• System is Restored to as-good-as-new
Condition.
• How much reliability improvement from

preventive maintenance?
Slide 63
Preventive Maintenance- CFR
Exponential: Constant Failure Rate
• Preventive Maint. has No Effect On Reliability
Exponential,
λ Constant Failure Rate
Time
DON’T DO IT as Preventive Maintenance

itself may introduce failures
Slide 64
Preventive Maintenance (Wear Out)
• Effect Of Preventive Maintenance on

Aging or wear (Weibull m > 1)
Failure
Rate
• WEIBULL m>1
Time
R(t) = e –(t/Ѳ)m
Preventive maintenance has a Positive Effect
Slide 65
Preventive Maintenance
Failure
Rate
WEIBULL EXPONENTIAL WEIBULL
m<1 m>1
CFR
time
DON’T DON’T DO
“LEAVE IT ALONE”
Slide 66
Corrective Maintenance (Repair)
• Corrective Maint: Performed after Failure

• Interested in:
• Reliability, but Also,

• # of Failures
• Time Required To Make Repairs
Slide 67
• With corrective maintenance, two new

parameters come into play:
I. AVAILABILITY
II. MAINTAINABILITY
Slide 68
• AVAILABILITY: The probability that a system is

available for use at a given time (the fraction
of time a system is in an operational state)
• MAINTAINABILITY: Is a measure of how fast a

system may be repaired after a failure.
Slide 69
AVAILABILITY
POINT AVAILABILITY A(t):
A(t) = Probability that system is operating
at time t.
INTERVAL OR MISSION AVAILABILITY A*(T)

A*(T) = Probability for Interval or mission
Availability
1 𝑇
A*(T) = ‫׬‬0
𝐴 𝑡 𝑑𝑡
𝑇
Slide 70
Steady State Availability
• it is often found that after some initial

transient effects, A(t) assumes a time-
independent value.
T →∞
1 𝑇
• A*(∞) = ℓim ‫׬‬0
𝐴 𝑡 𝑑𝑡
𝑇
STEADY STATE AVAILABILITY
Slide 71
Mean Time to Repair (MTTR)
• Constant Repair Rate: Vr(t) → Vr
1
• MTTR = Mean time to repair
Vr
• Availability tends to depend more on MTTR
than on the details of the repair distribution
Slide 72
Availability
𝑉𝑟
• A=
λ+𝑉𝑟
𝑴𝑻𝑻𝑭
• A=
𝑴𝑻𝑻𝑭 + 𝑴𝑻𝑻𝑹
• Availability tends to depend more on MTTR than on

the details of the repair distribution
Slide 73
EXAMPLE
i Tf (DAYS) Tr (DAYS)
1 12.8 13
2 14.2 14.8
3 25.4 25.8
4 31.4 33.3
5 35.3 35.6
6 56.4 57.3
7 62.7 62.8
8 131.2 134.9
9 146.7 150.0
10 177.0 177.1
Tf= Time Failed Tr = Time Repaired
Slide 74
EXAMPLE
• a) Calculate 6 month (182.5 days) availability

from data. There are 10 failures.
• b) Estimate MTTF & MTTR From Data
A(t) = 0.937
MTTF = 16.56 DAYS
MTTR = 1.15 DAYS
Slide 75
Conclusions
• Reliability is major cost driver
• Reliability Definitions
• Failure Patterns, Distributions
• How to determine failure patterns
• Failure Data Analysis Methods
• Operational Reliability Modeling
• Maintainability, Maintenance Decisions
• Availability
Slide 76
Resources
• Reliability & Maintainability Engineering: C. Ebeling.

• Reliability Engineering: E.E. Lewis.
• Handbook of Reliability Engineering.
• Military Standards: Procedures for performing
failure modes, effects and criticality analysis.
Slide 77

System Reliability: Resit Unal Engineering Management & Systems Engineering Dept. Old Dominion University Runal@odu - Edu

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

System Reliability: Resit Unal Engineering Management & Systems Engineering Dept. Old Dominion University Runal@odu - Edu

Hochgeladen von

Copyright:

Verfügbare Formate

System Reliability

1. CONCEPTUAL PRELIMINARY DESIGN

4. OPERATION & SUPPORT

5. PHASE OUT / DISPOSAL

• 60 – 80% of LCC spent during operation phase

• Reliability is major cost driver

• 70% of LCC committed during Design Phase

• Engineering is concerned with how products/systems

• The ways in which they fail, effects of failures, &

Reliability is the probability that a given system will

It can predict the probability that a system will

Constant Increasing Failure

Burn-In Useful Life Wear-out

• The instantaneous probability of the first and

• Mean Time to Failure (MTTF)

• Mean Time to Between Failures (MTBF)

• Repair takes time, V(t)= repair rate

• Mean time to Repair (MTTR)

I) First task is to derive & study this equation

II) Find the best way to increase Reliability

1. Failures occur at random intervals.

f(t) = λe-λt Failure

F(t) = 1 – e-λt time

MTTF = 1/ λ (time unit)

Constant Failure Rate (time independent)

Reliability of a pump seal given by R(t) = e- λt

• WEIBULL DISTRIBUTION Failure

• Example: Ball Bearing, Weibull Distribution

For series systems, high reliability of

Reliability for parallel System:

RPS = 1 - ς𝑘𝑖 (1 − 𝑅𝑖)

• Two Component System 0.85

Active redundancy: Reliability increases

• Active redundancy. At least m units out of N must

Rm/N = ∑ (mN) Rm (1-R) N-m

m/N Active Redundancy

Aircraft has 4 identical, independent engines with R = 0.98

At least 2 engines must function (Active redundancy).

• Network Reduction Approach

Reliability Block Diagram

• FTA – Top down approach (Bell Labs/Boeing)

– Events that can lead to the Top-Event are

• AND Gate: Provides a True Out-Put if ALL

• OR GATE : Provides a true output if one or

System designed to deliver emergency cooling to a nuclear reactor.

The reservoir will be empty at the time of the accident (p = 0.00005)

Coolant Sys Fails

Both Subsys Fail

Pump/Valves Fails Pump/Valves Fails

Valve Fails Valve Fails Valve Fails Valve Fails

Using FTA Analysis:

• Probability of failure = 0.000915

1. One event at a time

FMEA = Failure Modes and Effects Analysis

• Concerned with determining design R by

• “Military Standards: Procedures for

• TYPICAL STEPS IN FMEA:

1. SYSTEM DEFINITION. Identify systems that

2. IDENTIFICATION OF FAILURE MODES.

• Operational & Environmental Conditions should be

Failure Cause Failure Action

4. ASSESSMENT OF THE EFFECT. (leakage,