Sie sind auf Seite 1von 29

Society, and Jane-Ferng Chiu Click to edit Master subtitle style Presented By, Linda Maria Pulickal S7 CSE

A New Diskless Checkpointing Approach for Multiple Processor Failures Ge-Ming Chiu, Member, IEEE Computer

IEEE TRANSACTIONS ON DEPENDABLE AND 4/29/12 SECURE COMPUTING

INTRODUCTION
Check Point Snapshot of currentapplicationstate.

Used to restart the execution in case offailure. Very important in large scale distributed computing.

4/29/12

What is DiskLess CheckPointing?

Checkpoints are stored in the primary storage memory of peer processors.

No need of secondary storage - saves time.

4/29/12

Advantage of DiskLess approach

No latency = No performance degradation.

When stable storage is unavailable. Eg: mobile computing systems.

Effective in a large scale (10,000-100,000 processors).

4/29/12

Diskless checkpointing
neighborbased
each processor saves its checkpoints in entirety in the memory of peer processors.

Parity-based
use a dedicated checkpoint processor to store the parity of the checkpoints taken by all the application processors using XOR operations.

Reed-Solomon codingbased
encodes checkpoints of multiple processors using Reed-Solomon erasure coding techniques.

4/29/12

Problem with existing techniques

Extra dedicated processors for storing checkpoint data. Difficulty finding extra processors. Eg: mobile computing systems this addition increases failure probability. Memory overhead.

4/29/12

System Model

Collection of n processors (or nodes), P0, P1, P2,


inter connected by a (wired or wireless) network.

... ,Pn-1,

4/29/12

GOALS
1. Diskless checkpointing scheme to tolerate up to k simultaneous failures. 2. Reduce memory overhead.
4/29/12

Basic Operation of the Proposed Scheme

4/29/12

Important terms:
1. 2.

Checkpoint Storage Nodes Checkpoint Coverage Nodes

P1 P2 P3 P4 Checkpoint COVERAGE - CCi

P5

S7

S5

P5

S8 S9

S10 Checkpoint STORAGE - CSi


4/29/12

Steps:

Each Pi send its checkpoint to at least k other processors (CSi). -- at least one of CSi will remain alive for each failed processor.

Pi also stores a copy of the state in a distinct section of its memory. -- to help other failed processors decode their previous checkpoints.
4/29/12

Each Pi calculates the parity from CCi using XOR. Stores only the parity result in memory.

Advantage: Memory space of size equal to the maximum checkpoint.

4/29/12

The conceptual framework of diskless checkpointing approach.

4/29/12

Recovery
P5 want to recover

P5

P6 node is used

S6 .
P6 State:

S6 = P1 + P2 + P5
P1 P2

S6

P5 = S6 P1 P2
4/29/12

DETERMINING THE CHECKPOINT STORAGE NODE SET


Safe Recovery Criterion
For any failed processor Pi, at least one node in CSi has all of its checkpoint coverage nodes intact.

4/29/12

Fundamentals of CSi
the

cardinality of CSi must be at least

k.

the

cardinality of CCi is load balance.

to ensure good

4/29/12

P
3

S
0

S2 S4

P
4

P
0

S0 = P3 + P4 S1 = P2 + P3 S2 = P 0 + P1 S3 = P2 + P4 S4 = P 0 + P1

Not Good Design..

How? CS0 CS1 = { P2, P4 } , more than 14/29/12

P
3

S
0

S1 S2

P
4

P
0

S0 = P3 + P4 S1 = P 0 + P4 S2 = P 0 + P1 S3 = P1 + P2 S4 = P2 + P3

Not Good Design..

How? P1 CS0 ;

CS0

CS1 = { P2} 4/29/12

Theorms
For all Pi and Pr,

(1) CSi CSr 1 , i r


For each Pi, (2) CSi CSr =

, for any Pr CSi.

4/29/12

Design of CSis
4/29/12

Cyclic design concept. Derived from CS0 as,

Only focus on CS0 design

4/29/12

Design of CS0
PSR Sequence:
d0, d1, d2, ... ,dr-1 is PSR if NO l, m, p, and q 0lm<pqr1 satisfy,
4/29/12

Steps for k =4:

Construct a PSR sequence of 3 (i.e k -1) +ve integers. Select sequence with minimum sum, D. Eg: d0 = 1 ; d1 = 3 ; d2 = 2 & D = 6. First element of CS0 : PD+1 = P7 ADD d0 , d1 , d2 as respective increments to P7. CS0 ={ P7, P8, P11, P13}
4/29/12

4/29/12

Requirements

total no. of processors in the system 3D+2.

Ensure theorm 2.

4/29/12

Performance Analysis

4/29/12

4/29/12

?
4/29/12 28

Click to edit Master subtitle style

Thank You .

4/29/12

Das könnte Ihnen auch gefallen