Beruflich Dokumente
Kultur Dokumente
Distributed systems
Takashi
Nanya
Canon,
Inc.
Purpose: Fault tolerance, Performance enhancementExtensibility, Resource sharing All information describing the global system state ( process state and data) must be maintained so that all participating processes have a consistent and identical view of the global state Issues
Clock Synchronization, Mutual Exclusion, Concurrency control, Multiple copy update, Error recovery
2
Clock Synchronization
Distributed systems are asynchronous in nature Variable delays in computing and communication Possible inconsistency in processes recognizing the temporal ordering of event occurrences
P1
P2
P3
D P2 E F B P3 G
(Logical) Clock Ci for each process Pi is defined to be a function which assigns a number Ci(a) to any event a in that process (Correct) Clock condition: For any events a, b, If a->b then C(a)<C(b) The clock condition is satisfied if the following two conditions hold; C1: If a and b are in process Pi, and a comes before b, then Ci(a)<Ci(b) C2: If a is the sending of a message by process Pi, and b is the receipt of the same message by process Pj, then Ci(a)<Cj(b)
Anomalous behavior
Logical clock based on the relation => may cause anomalous behavior
a Phone call
This can happen because the system has no way of knowing the actual precedence information a->b that is based on the phone message external to the system => we need a system of physical clocks
Physical clock
Ci(t) denotes the reading of clock Ci at physical time t For Ci to be a true physical clock, the following must be satisfied; PC1: There exists a constant <<1 such that for all i, |dCi(t)/dt-1|< PC2: There exists a constant such that for all i, j, |Ci(t) Cj(t)|< To prevent anomalous behavior, for such a number that is less than the shortest transmission time for interprocess messages, it must be made sure that, for any i, j, Ci(t+) Cj(t) > 0 Combining the above with PC1 implies that Ci(t+) Ci(t) > (1-) Using PC2, it actually holds that Ci(t+) Cj(t) > 0 if it holds that /(1-) Let m be a message sent at physical time t and received at t, and the minimum transmission delay m for m be known to the process that receives m Assuming PC1, PC2 can be insured by the following Implementation Rule; IR1: For each i, if Pi does not receive a message at physical time t, then Ci is differentiable at t and dCi(t)/dt>0 IR2: (a) If Pi sends a message m at physical time t, then m contains a timestamp Tm=Ci(t). (b) Upon receiving a message m at time t, process Pj sets Cj(t) equal to MAX{Cj(t-0), Tm+m) To synchronize physical clocks, a process only needs to know its own clock reading and the timestamps of messages it receives
A problem of coping such a situation that one or more faulty components of a system send conflicting information to different part of the system A group of generals of the Byzantine army camped with their troops around an enemy city Communicating with one another only by messenger, the generals must agree upon a common battle plan However, some of the generals may be traitors trying to prevent the loyal generals from reaching agreement [Byzantine Generals Problem]: A commanding general must send an order to his n-1 lieutenant generals such that IC1: All loyal lieutenants obey the same order IC2: If the commanding general is loyal, then every loyal lieutenant obeys the order he sends
14
he said retreat
he said retreat
Algorithm OM(1)
Commander
traitor
v
Lieutenant 1
v
Lieutenant 2
v
Lieutenant 3
Commander
x
y
Lieutenant 1
y
Lieutenant 2
z
Lieutenant 3
z x
z
All lieutenants obtain the same value majority(x, y, z)
Signed message
If the traitors ability to lie can be restricted, an algorithm exists to cope with m traitors for any number ( m+2) of generals A4 (Additional assumption): (a) A loyal generals signature cannot be forged, and any alteration of the contents of his signed messages can be detected (b) Anyone can verify the authenticity of a generals signature (No assumption is made about a traitorous generals signature. His signature is allowed to be forged by another traitor, thereby permitting collusion among the traitors) The commander sends a signed order to each of his lieutenants Each lieutenant then adds his signature to that order and send it to the other lieutenants, who add their signatures and send it to others, and so on Let x:i denote the value x signed by General i. Thus, x:i,j denotes the value x signed by i, and then that value x:i signed by j Let General 0 be the commander Each lieutenant i maintains a set Vi of properly signed orders he has received so far
19
Lieutenants 1 and 2 obey the order choice ({a\ack, retreat}) and know the commander is a traitor because of his signature on two dierent orders
Concurrency Control
Even if mutual exclusion is realized at a basic action level of processes, a inconsistent state may appear when two or more processes try to access the same database Example: Two client A and B may wish to send $10 and $20, respectively, to a common account independently of one another
Process A A1: READ BALANCE ADD $10 A2: WRITE BALANCE
Process B B1: READ BALANCE ADD $20 B2: WRITE BALANCE
Making READ and WRITE being atomic actions individually is not enough ! What will happen if the order of the READ and WRITE commands being executed is, for example, A1, B1, A2, B2 ? Transactiona sequence of READ and WRITE commands sent by a client to the file system Concurrency control (Serializability control) Executing multiple transactions that occur simultaneously as serializable atomic actions
22
Serializability
T1
T2
READ
Y
Y=Y
-
20
WRITE
Y
READ
Z
Z=Z
+
20
WRITE
Z
READ
X
X=X
-
10
WRITE
X
READ
Y
Y=Y
+
10
WRITE
Y
T3
READ
X
Y=Y
-
20
X=X
-
10
WRITE
Y
WRITE
X
READ
Z
READ
Y
Z=Z
+
20
Y=Y
+
10
WRITE
Z
WRITE
Y
Serial
ExecuDon
Serializable
ExecuDon
T4
READ
Y
T6
Non-serializable ExecuDon
2-phase locking
A lock is an access privilege on a data item, which is granted to a particular transaction so that one transition can access the data item at a time When a transaction tries to access a data item, it must lock the item before accessing it, and unlock it on finishing the access In order for the 2-phase locking to guarantee consistency, each transaction Does not lock the data item that has been already locked locks a data item before accessing it unlocks all the data items before finishing the transaction Once having unlocked a data item, does not acquire any more locks Each transaction is divided into two phases, i.e. growing phase and shrinking phase. The number of locked items increases monotonically at the growing phase and decreases monotonically at the shrinking phase The 2-phase locking makes all the transactions serializable, i.e. atomic actions !
24
Timestamp
Every transaction are given a timestamp when it occurs Every request for accessing a data item are given its transactions timestamp If there is a conflict among requests for accessing a data item, the earliest one is granted according to the order of timestamps Algorithm for the scheduler at each site; For each data item X , the scheduler records the largest timestamp W(X) of WRITE requests and the largest timestamp R(X) of READ requests that have been processed For READ request with timestamp Tif T<W(X), the scheduler rejects the READ requests . Otherwise, it outputs the READ request and set R(X) to MAX(R(X), T). For WRITE request with timestamp T, if T<MAX(R(X), W(X)), the scheduler rejects the WITE request. Otherwise, it outputs the WRITE request and sets W(X) to T If READ request or WRITE request is rejected, the requesting transaction is aborted, assigned a new larger timestamp and restarted
25
Commit control for replicated database Ensures that either a transaction is committed by every site or aborted by every site (all or nothing) Involves a) commit control for a single transaction, and b) serialization of concurrent transactions
26
Error recovery
When a transient fault or a process abort occurs, affected processes are rolled back to a point (checkpoint) prior to the occurrence of the fault Checkpointingrecording a snapshot of the entire state of a process at a moment that is needed to restart the process from that point
CA
Process
A
communicaDon
failure
Process B
CB
CA: checkpoint for process A CBcheckpoint for process B If the communication line intersects the line that links CA and CB , there will be an inconsistency in the system state when the failed process is rolled back to the checkpoint CA
28
Domino effects
If processes establish their checkpoints independently of each other, there will occur the Domino effects
Process
A
Process
B
SA
SB
CA1
CB1
CA2
CB2
Process
C
SC
CC1
CA1
CB1
CA2
CB2
CB3
CC2
Process
A
SA
Process
B
SB
Process
C
SC
CC1
CC2
29