Beruflich Dokumente
Kultur Dokumente
5
6
Three Barriers Created due to Design Techniques
7
Fault Tolerant Systems
“To find fault is easy;
to do better may be difficult”
--
“fault-tolerant system” is one that continues to perform
Plutarch
at desired level of service in spite of failures in some
components
Fault Tolerance System Property to Recover
from Partial Failure
14
Fail-stop: The component exhibits crash failures, but
its failure can be detected (either through
announcement or timeouts)
Fail-silent: The component exhibits omission or
crash failures; clients cannot tell what
went wrong
Or
Fault Masking i.e. preventing introduction
of errors in a system due to faults
Reconfiguration process of
eliminating faulty component and restoring
system to some operational state
Reconfiguration Steps
Fault detection Recognizing fault occurrence
Long-Life High-
Hazardous
Systems Availability
Production
Critical Control/ Maintenance
Environments Postponement
Computation
•Satellites,
•Unmanned/ Chemical industry
Manned •Aircraft (methyl iso cyanate
Space Probes (Bhopal gas tragedy)
Controllers,
•Space Shuttles nitric acid !
Failed
F (t)
Unreliability ; Q (t) Components
N
Wear-out period/
Early life /
Useful Life Period/ End of Life Period
burn in period
Constant Failure Rate (λ)
Failure Rate
(Random Failure)
Time
EXPONENTIAL FAILURE LAW ⇒Relates Reliability &
Failure Rate
λov = Σ Nk λk
∞
1
MTBF = ∫ R (t) dt = (hours)
º λ
t
MTBF =
1 – R (t)
RELIABILITY CURVE 1.0
Reliability Decreases with
Reliability R (t)
0.8 Increasing Time
0.6
0.4
0.2
Time t
TYPES of AVAILABILITY
Depends on Depends On
Inherent design
Inherent Design
Theoretical value Availability of Spare Parts
Maintenance Policy
Maximum
Highly Available Systems ⇒ May Have Frequent
Inoperability Periods of Extremely Short Duration
Availability ⇒ Depends Upon Frequency of
Inoperability & Quickness of Repairing
Availability ⇒ Important Design Goal
MTBF
= Since λ = 1/MTBF
MTBF + MTTR
10000 A = 99%
1000
Actions Increases
100
10 A = 95%
MTBM A = 90%
1 A =
o
MTBM + MDT
A = 85%
.1
1/(Mean Down Times) .0001 .001 .01 .1 1 10 100 1000 10000 1/hrs
or
Mean Down Times (hrs) 10000 1000 100 10 1 .1 .01 .001 .0001 hrs
Maintainability Improves as Time to Repair Decreases
SAFETY: S (t) ⇒ Probability of Correct System Performance;
else Discontinuing Function with Overall Safety
(of other systems or people)
1 2 N
Rov = R N MTBF = 1 / Nλ
Decreases by N - fold Decrease by factor N
2 Rov = 1 - ( 1 - Ri ); i = 1 to N
Rov = 1- ( 1 – R ) N
MTBF = Σ(1 / j) λ ; j =1 to N
DIFFERENT INTERCONNECTIONS
A B A B
C D C D
Always R ps > R sp
REFERENCES:
http://www.barringer1.com/jul01prb.htm
http://www.relex.com/resources/maintpred.asp
http://en.wikipedia.org/wiki/Mean_time_between_failure