Sie sind auf Seite 1von 28

Introduction to Reliability

Reliability is: An inherent feature of design Concerned with performance in the field, as opposed to quality of production (conformance to design specs) Definition Reliability is the probability that a system will perform in a satisfactory manner for a given period of time when used under specified operating conditions.

Introduction to Reliability (cont)

What is Satisfactory ? All critical functions Time-oriented quantitative factors--MTBF P (X>to), with X = Lifetime Qualitative factors, too Operating Conditions Use Handling, Transport, Installation, Storage

Reliability in the System Life-Cycle

Conceptual Design Phase
Define reliability requirements of a system Plan Reliability Program

Preliminary Design Phase

Allocate reliability requirements Predict reliability of components/subsystems Provide reliability estimates to cost estimating and design trade-off studies Participate in design reviews Assess subsystem/ component supplier reliability estimates

Reliability in the System LifeCycle(cont)

Detail Design Phase More detailed reliability prediction Assist in detail design decisions Assist in logistic support analysis Assist in prototype development Recommend changes prior to production Evaluate reliability of prototype Participate in other test and evaluation activities as related to reliability

Reliability in the System LifeCycle(cont)

Production/ Construction Phase Monitor production Perform reliability tests of selected items Qualification Tests -Prior to production, repetitive tests to determine MTBF, degradation, failure modes Acceptance Tests- Random or 100%, testing of items exiting production to assure that reliability demonstrated during qualifying-test is being achieved in production items. Collect and analyze data on operational test (product evaluation tests at a designated site) Recommend Corrective action Continue to update reliability models and predictions 5

Reliability in the System LifeCycle(cont)

System Use Phase Data collection and analysis Reliability improvement studies Change recommendations Equipment redesign projects

Measures of Reliability
Let T = Random Variable Measuring Lifetime of an item (time to first - next - failure) Range Space of T={t:t 0} Tests to establish PDF & Parameters of T are called Life Testing Cum, Distribution Function F(t)=P(T t) is called the Failure Distribution Function

Measures of Reliability(cont)
The Reliability Function is: R(t)=P(T>t)=1-F(t)= f (t )dt
1 Prob 0

F (t)

Reliability Density Function

R (t) t

Four ways to determine R(t) for a particular system Test many systems to failure. Develop curve empirically. Test many subsystems, use historical field data on others, develop subsystem reliability functions, use a reliability system model to combine. Extrapolate past experience with similar systems. 8 Physical properties--Hypothesize a certain distribution.

Failures and Failure Rates

3 Types of Failure (See Figure 12.4) Initial ( Failure at t=0) Random Wearout IF initial failures are to be disregarded in your analysis, g (t ) T T > 0) then use, f (t ) = , t>0; as density for( [ 1 P (T = 0)]

Failures and Failure Rates(cont)

The Hazard Function is the instantaneous failure rate at time t, given survival up to t has formula:
h(t ) = R(t ) f (t ) = R (t ) R (t )
t 0

Note: H(t)= number failures in [0,t]= h( x )dx is called the failure count function


Failures and Failure Rates(cont)

How are H(t),R(t),F(t) Related? 0
t e 0 e

H (t ) =

R( x) dx = log R( x) ] = log

R( x )

R(t ) + loge R(0)

So, R(t)=

e H (t )


Mean Lifetime (Time Between Failures)

Mean Life = E(T)=

tf (t )dt,
0 0


Example: Random failures often are modeled by time-to-failure is exponential with rate :
f (t ) = e t , t 0 = 0 otherwise F(t ) = 1 e t R(t ) = e t

[1- F(t)]dt = R(t )dt



Example (cont)

f (t ) e t h(t ) = = t = Constant R (t ) e

Also, because =E(T)= 1

R(t ) = e H ( t )

, H(t)=t Linear in t and

P(T< )=F()=

1 e = 1 e 1 = 1

1 = 1 .3679 = .6321 e

P(T )=.3679 , Independent of (or )


Examples on Pages 349

Example 1 5 Components did not fail in 600 hours 5 Others failed at various points

= = 0.001196 Example 2 4180 hours Operating Cycle = 168.8 hours Downtime = 26.8 hours Operating Time = 142

5 failures

Examples on Pages 352-353(cont)

Number of failures = 6 = 6 / 142 = 0.042 MTBF = 23.81 hours = 1 / Operational Availability = Only if we treat MTBF = MTBM (instant maintenance)
MTBM 23.81 = = 0.841 MTBM + MDT 23.81 + 4.4666

Other examples are on handouts Hines and Montgomery, example 15-7 Halpern, examples 10-1 thru 10-6 Note: For exponential failure module R(t) = e- t is the first Term in a poisson distribution with parameter x.

What if Failure Rate Not Constant?

Distribution Normal Lognormal Weibull Failure Rate h(t) Behavior Increasing Function Various Shapes Decreasing <1 Constant =1 Increasing >1 Gamma Decreasing n<1 Constant n =1 Increasing n>1

What if Failure Rate Not Constant(cont)

Have different h(t) for each time interval where rate is constant use average failure rate (AFR) between t1 and t2

AFR(t1 , t2 ) =

h(t )dt

t2 t1

H (t2 ) H (t1 ) ln R(t1 ) ln R(t2 ) = t2 t1 t2 t1

Note: AFR (0, t) =

H (t) - ln R(t) = t t

Concepts Our Text Skips

Renewal Rate Function r(t) = Instantaneous failure rate at time T accounting for replacement of failed items with new components from same population as original parts Censored Type I Data : A fixed test duration T is pre-set. Units that do not fail before T are censored in that the data doesnt account for their survival beyond time T. If T is poorly chosen, may get no failures by time T--then what?


Concepts Our Text Skips (cont)

Censored Type II Data : A fixed number of failures is prespecified, n items are tested until r fail. If r is poorly chosen, test make take too long. Readout Time Data : Record actual failure times of each failed component


Estimation of for Exponential Life

= (number of failures) / (total unit test hours) Type I Censored Data n items, r failures

i =1

+ ( n r )T

ti = time of i th failure
Type II Censored Data

(ends at r th failure time t r )

i =1

+ (n r )tr

If system has n components and system fails when first n component fails s = i
i =1



System Reliability Models

Defined: Math models of the system that show functional relationships among subsystems, components, etc. Examples Reliability block diagram Shows all possible success/failure combinations Series and parallel; also k-out-of-n configuration Any closed path through system is success May not resemble system physically Standby redundancy

System Reliability Models(cont)

Coherent systems models Fault tree analysis and other cause-consequence diagrams Work from top level events (failures) To primary events ( causes)



Series Configuration
1 2 n Static Model: Dynamic Model:

Rs =

R = R * R *...R
i =1 i 1 2

Rs (t ) = hs (t ) = Hs ( t ) =

R (t )
i =1 n i

h (t )
i i =1

H (t )
i i =1

Exponential Subsystem Failure Models
+ + ... + n ) t Rs (t ) = e ( 1 2

hs (t ) = i
i =1


= MTBF =

i =1

See example on page 354



Active Parallel Configuration


Ra = 1

(1 Ri )
i =1

1 2

Dynamic: Ra (t ) = 1 Identical Components:

(1 R (t ))
i =1


Ra (t ) = 1 [1 R(t )]

System fails only if all n subsystems fail


Example 1
Always Keep in Mind Redundancy Has a Cost

# of Components in Parallel


Benefit/ Cost


5 lb


10 lb

.0475 / 5 lbs


15 lb

.002375 / 5 lbs



Example 2
Exponential Subsystem Lifetime, Identical Subsystems

Ra (t ) = 1 1 e t
a =
n a 0 i =1

R (t )dt = * i = i
i =1

e. g., if n = 3 and

1 = = 1000 hours

a =

1000 1000 1000 + + 1 2 3 = 1000 + 500 + 333.33 = 1833.33


Special Configurations
K-out-of-n Configuration Systems works only if at least k of n components are working. Assume identical components with reliability R(t):
Rs (t ) =

( i )[ R(t )] [1 R(t )]
i i=k



R(T ) = e t = e exponential, then s =




Special Configurations (cont)

Combined Series-Parallel Key:Treat Components in parallel as single component, then expand

Rs = Ra * RBUC = Ra [ 1 ( 1 RB )( 1 RC )]

=[ 1- ( 1 - R A )( 1 RB ) ] [ 1 ( 1 RC )( 1 RD ) ]

See pages 354 - 355

Availability Measurement
Inherent Availability (Ideal Support Environment)
Ai = MTBF MTBF + M ct

M ct = mean corrective maintenance time = mean time to repair (MTTR)

Does not include preventive maintenance, logistics delay, or administrative delay. Achieved Availability ( Ideal Support Environment) M = mean active maintenance time MTBM Aa = = weighted average of corrective MTBM + M and preventive maintenance time. MTBM = mean time between any maintenance action, corrective or preventive


Availability Measurement
Operational Availability ( Actual Support Environment)

A =


MDT = mean downtime = weighted average of active maintenance (current and previous) and delays (logistical and administrative.


Comments on Availability
Availability is a function of both: Reliability of a prime item The logistics support subsystem Equipment designer can exert little control over support operations, but can design in: Built-in diagnostics Easy access Rapid disconnect / connect



Comments on Availability (cont)

The proper balance of R&M must be decided in early stages, when flexibility is great. Discussion of availability is always in some context: Actual failure or not Which mission, what is critical to success Maintenance crew, equipment, spares availability


Reliability Techniques in System Design Phase

Conceptual Design Phase Assignment of system reliability goal based on: Mission analysis Cost analysis Technical Limits Preliminary Design: Block Diagram Models Estimation of Ri(t) Functions Study of failure points, solutions


Reliability Techniques in System Design Phase(cont)

Preliminary Design Phase (Cont.) Definition of Success/ Failure criteria Budgeting/ Revision of Reliability Requirements Detail Design: Material and Parts Selection Standardization Test and Evaluation Requirements for Suppliers Series-Parallel Recommendations De-rating


Standardization: Means selection of components and materials whose reliability characteristics are known, as well as their degradation under stress and aging. This indirectly eases the burden on spare parts inventories, by having same component used in several systems



De-rating: - Use part in application below its rated value A type of overdesign to provide reliability margin Steps: Identify operating interval Select de-rating % ( see RCA Corp. Table) Calculate de-rated value of component to be used Example: ceramic capacitor for 100v (max) application - RCA recommends 70% de-rating - X (0.7) = 100, X = 142.85 v minimum requirement for component

Binomial Expansion to Explain Parallel-Redundant Systems

Consider 3 Identical Components in Parallel P = Probability of Operation of Each Q = Probability of Failure of Each
3 3 3 (P + Q)3 = P 3 + P 2 Q + P1Q 2 + P 0 Q3 1 2 3 = P 3 + 3P 2 Q + 3P1Q 2 + Q3

All 3 up

2 up, one failed

One up, two failed

All 3 down

P (System operating)

1 Q3 = 1 (1 P)3 P 3 + 3 P 2 Q + 3 P1Q 2


Binomial Expansion to Explain Parallel-Redundant Systems

Let PA=PB=PC=PD = 0.9 Which configuration is more reliable? Why? A C A C B D B D

Parallel Redundancy Has Its Drawbacks

Each subsystem must have a switch to assure its failure doesnt disable the remaining components Sometimes necessary to disconnect failed system Redundancy increases weight, volume, cost and sometimes complexity. The failure sensing device may be unreliable

Alternatives to Redundancy
Reduce number of parts Simplify Improve reliability level of parts used, especially at critical nodes Burn-in of Parts On-board spares, repairs



Standby Redundancy
Assume cold standby, not energized until failure detected in original component Assume reliability of decision switch is 100% Lifetime variable is T=T1++Tn Standby always more reliable than simple parallel, if switch is 100% Reliable 1 DS 2


Standby Redundancy (Cont,)

Assume lifetime variable is as follows:

E (T ) V (T ) = V (T )
E (T ) =
i i

T = T1 + T2 + - - - - + Tn

If Ti each exponential, t is gamma ( , n)

n n V (T ) = 2 E (T) =

n=2 n=3

R(t) = P(system life > t / one standby) = e - t + (t )e t R(t) = P(system life > t / two standbys) = e - t + (t )e t + (t )2 t e 2!


Benefits of Computerized Reliability Models

Helps keep track of reliability relationships Across levels of design Within a given level Rapid Sensitivity Analysis Is overall R goal even feasible Study effect of different R allocations Study effects of configuration changes on R Study effects of substituting different components Perform worst-case analysis Can be adapted to multiple missions -in essence, one model for each set of mission equipment/conditions Can be used to evaluate proposed modifications to existing system

Analytical Methods to Support Reliability Estimation and Assist in Design Decisions

Stress-Strength Analysis Critical-Useful-Life Analysis For Complex Systems ( radar, missiles, computers) Failure Mode and Effect Analysis Worst-Case Analysis Sneak-Circuit Analysis Safety Analysis Techniques Fault-Tree Analysis Task and Error Analysis Hazard Analysis


Discussion of Stress-Strength Analysis

Measures Resistance to Stress (strength) Examples: operating wattage versus rated wattage Operating temperature vrs rated temperature pounds/square inch Includes: Stress distribution, especially maximum stress Stress causes, timing, frequency Stress testing, such as metal fatigue tests


Discussion of Critical-Useful-Life Analysis

Critical-Useful-Life Analysis: Identification of critical item list and requirements of each of these items for a preventive maintenance, corrective maintenance, and replacement. Includes studies of how to eliminate critical items through redesign



Discussion of FMEA
Failure Mode and Effect Analysis: Identification of all possible failure modes of equipment, the possible causes and the possible immediate/ ultimate effects on the system and operation Formal documentation in words not diagrams Estimation of probability of occurrence Classify each failure by criticality Describe corrective action alternatives

Discussion of Worst-Case Analysis

Worst Case Analysis: Examining how the performance of an electrical circuit (or other device) will change over time as a result of drift in part characteristics. Provides guidance on how to allow for part parameter variation in design



Discussion of Sneak-Circuit Analysis

Sneak-Circuit Analysis: Use of math models to identify any unanticipated performance signal paths in a circuit that may degrade performance or introduce failure.


Reliability Prediction at Part, Circuit, and Subsystem Level

Based On:
Similar equipment--Extrapolate. Not very accurate. Number and complexity of active element groups--these are controllers or converters of energy part types, counts, failure rates are combined into an estimate of system reliability Prediction based on testing, such as stress tests

Used For:
Higher-level reliability prediction As input to maintenance and logistic support analysis Comparison with requirement, where are we over/ under reliability


Reliability Degradation Studies/ Action

Determine and correct potential/ actual adverse effects due to: Storage, packing, transportation, handling Unpacking, assembly, set-up Preventive and corrective maintenance Carelessness Wrong tools and equipment Didnt follow/ know proper procedure


Reliability Test and Evaluation

To answer question : will the mature system achieve its MTBF requirement in operation ? Should be part of an integrated test plan to test entire spec. Type I Tests:
are early enough in design process so that design changes are fairly cheap

Type II and III Tests must :

Follow approved procedures ( first drafts of tech manuals and training courses) Use test and support equipment that was specified in the maintenance concept and detailed in LSA Be provided with ( test ) supply support Be carefully planned, instrumented, documented, analyzed


Type II Reliability Testing

Evaluation of prototype and early production models, using producer personnel Includes: Reliability qualification tests, to determine MTBF MTBM Failure sequences, detection, performance degradation Maintenance procedure adequacy Maintenance induced failures Production sampling acceptance tests

Types of Type 2 Tests

Sequential Qualification Tests Environmental test chambers Environmental test cycle, equipment duty cycle Multiple identical test items Statistics-based accept-reject test plan Producers Risk Consumers Risk
}usually range from .05 to .25 (negotiated)



Types of Type 2 Tests (cont)

Reliability Acceptance Testing- Plot MTBF versus time, look for growth/decline Reliability Life Testing- To determine failure distribution Continuous (Steady) Fixed Time, Count Failures Fixed number of Failures, Count Time Step-Stress (Accelerated) Testing Step up stress until all units fail Aids in planning burn in

Type 3 Testing
Definition- Operational Testing Using:
A group of production units Designated field test sight Representative mix of mission profiles User personnel (first trained) 1st sets of support equipment; spares

All elements of the system are operational and evaluated together Where the true R, M, A and other performance measures are known for first time, rather than estimated via models plus some type 1 & 2 test data 56