Sie sind auf Seite 1von 9

Classification based on Nature of

H/W Faults
• Art of War: to win a battle, you need to
understand your enemy (and the
environments).
• For F/T: to combat computer faults, we
need to understand the nature of faults

chp04 1
Temporal Nature of H/W Faults
(cont)
• Fault duration (temporal)
– Duration of causes and effects
– Categories
• Permanent faults
• Intermittent faults
• Transient faults
– Most common: intermittent and transient faults

chp04 2
Permanent Faults
• The cause remains indefinitely if no repairs
– h/w permanently damaged due to:
• wear and tear
• design mistakes
• manufacturing defects, etc
• Capability/Risk of inducing an error is
always there
• Example: stuck-at fault in memory
chp04 3
Intermittent Faults
• The cause will not disappear
• environmental stress
• design deficiencies
• partially defectives due to aging
• h/w fatigue
• heat sensitivity
• voltage threshold
• The capability of inducing an error may not always
be present
• depending on its state: benign or active
chp04 4
Transient Faults
• The cause exists for some period of time and
then disappears without even repair
• temporary environmental, electrical, or mechanical
conditions
– power jitter
– ionisation due to cosmic rays or alpha particles
– electro-magnetic interference
– solar wind/flares

• The capability of inducing an error is transient


• No direct h/w damage; h/w still usable
chp04 5
Latent Faults (Byzantine Faults)
• A latent fault: an existing fault which is
lurking in a system and yet to be activated
to produce an error
– most common: permanent h/w faults (and
software faults)
• A latent error: a system error not yet
detected nor resulted in a system failure

chp04 6
Having understood the concept
and nature of computer faults, we
ask, “How could we implement
fault-tolerance approach to
combat computer faults?” …next
slide.

chp04 7
Computing Techniques to
implement fault-tolerance
approach
What are the techniques used to achieve
computer fault tolerance? Answer:
– redundancy technique (the core requirement)
– fault/error detection technique
– fault masking technique
– fault confinement/isolation technique
– fault/error diagnosis technique
– reconfiguration technique
– system recovery technique

chp05_6_7 8
We first look at the core requirement,
Redundancy, in the next lecture.

What is redundancy?

How many types of redundancy are there


and what are they?

chp04 9

Das könnte Ihnen auch gefallen