Beruflich Dokumente
Kultur Dokumente
A New Diskless Checkpointing Approach for Multiple Processor Failures Ge-Ming Chiu, Member, IEEE Computer
INTRODUCTION
Check Point Snapshot of currentapplicationstate.
Used to restart the execution in case offailure. Very important in large scale distributed computing.
4/29/12
4/29/12
4/29/12
Diskless checkpointing
neighborbased
each processor saves its checkpoints in entirety in the memory of peer processors.
Parity-based
use a dedicated checkpoint processor to store the parity of the checkpoints taken by all the application processors using XOR operations.
Reed-Solomon codingbased
encodes checkpoints of multiple processors using Reed-Solomon erasure coding techniques.
4/29/12
Extra dedicated processors for storing checkpoint data. Difficulty finding extra processors. Eg: mobile computing systems this addition increases failure probability. Memory overhead.
4/29/12
System Model
... ,Pn-1,
4/29/12
GOALS
1. Diskless checkpointing scheme to tolerate up to k simultaneous failures. 2. Reduce memory overhead.
4/29/12
4/29/12
Important terms:
1. 2.
P5
S7
S5
P5
S8 S9
Steps:
Each Pi send its checkpoint to at least k other processors (CSi). -- at least one of CSi will remain alive for each failed processor.
Pi also stores a copy of the state in a distinct section of its memory. -- to help other failed processors decode their previous checkpoints.
4/29/12
Each Pi calculates the parity from CCi using XOR. Stores only the parity result in memory.
4/29/12
4/29/12
Recovery
P5 want to recover
P5
P6 node is used
S6 .
P6 State:
S6 = P1 + P2 + P5
P1 P2
S6
P5 = S6 P1 P2
4/29/12
4/29/12
Fundamentals of CSi
the
k.
the
to ensure good
4/29/12
P
3
S
0
S2 S4
P
4
P
0
S0 = P3 + P4 S1 = P2 + P3 S2 = P 0 + P1 S3 = P2 + P4 S4 = P 0 + P1
P
3
S
0
S1 S2
P
4
P
0
S0 = P3 + P4 S1 = P 0 + P4 S2 = P 0 + P1 S3 = P1 + P2 S4 = P2 + P3
How? P1 CS0 ;
CS0
Theorms
For all Pi and Pr,
4/29/12
Design of CSis
4/29/12
4/29/12
Design of CS0
PSR Sequence:
d0, d1, d2, ... ,dr-1 is PSR if NO l, m, p, and q 0lm<pqr1 satisfy,
4/29/12
Construct a PSR sequence of 3 (i.e k -1) +ve integers. Select sequence with minimum sum, D. Eg: d0 = 1 ; d1 = 3 ; d2 = 2 & D = 6. First element of CS0 : PD+1 = P7 ADD d0 , d1 , d2 as respective increments to P7. CS0 ={ P7, P8, P11, P13}
4/29/12
4/29/12
Requirements
Ensure theorm 2.
4/29/12
Performance Analysis
4/29/12
4/29/12
?
4/29/12 28
Thank You .
4/29/12