Beruflich Dokumente
Kultur Dokumente
1/37 2/37
1 2
7/37 8/37
Bathtubs Improving reliability of a system
● Define fault model describing what could
● Failure rate of hardware is considered go wrong
to follow a “bathtub” distribution
● E.g., bit-flips in memory, spurious
interrupts, noisy sensor data
“Infant
Failure
mortality” Wear-out ● Assess fault tolerance of system
rate
● Can a fault lead to a failure? How critical?
● Testing, fault injection
Normal life
● Measures to improve fault tolerance
(“useful life”)
● E.g., error correction codes, redundancy
Time
9/37 10/37
17/37 18/37
27/37 28/37
31/37 32/37
Properties of recovery blocks (2) Watchdogs
● Of course, introduces new problems
● One of the most important techniques
● Side-effects of alternates can be hard
to undo ● Watchdog is a component that
● Some computations might be monitors the system
impossible to repeat ● If system does not react any more,
● Checkpointing can be expensive watchdog restarts it
● Maybe no time for repetition ● Should be as independent as possible
from system
● But: most micro-controllers have built-
● Concept is related to in watchdogs
transactional memory (often with independent clock)
33/37 34/37
Watchdogs in general
37/37