Sie sind auf Seite 1von 29

March

18 22, 2013 IIITDM-Jabalpur Dependable CompuDng

Distributed systems
Takashi Nanya Canon, Inc.

Issues in distributed systems


Distributed Systems
Include arbitrary number of system processes and user processes Consist of two or more processor/memory modules Processes communicate with each other by message passing (with no shared memory) Global control for inter-process communication and system management Variable delays exist in communication between processes

Purpose: Fault tolerance, Performance enhancementExtensibility, Resource sharing All information describing the global system state ( process state and data) must be maintained so that all participating processes have a consistent and identical view of the global state Issues
Clock Synchronization, Mutual Exclusion, Concurrency control, Multiple copy update, Error recovery
2

Clock Synchronization
Distributed systems are asynchronous in nature Variable delays in computing and communication Possible inconsistency in processes recognizing the temporal ordering of event occurrences
P1 P2 P3

P2 perceives P1 -> P2, while P3 perceives P2 -> P1 !


3

Atomic action (1)


Main problem in Distributed Systems: Maintaining Consistency Basic concept for solution: Atomic Action To realize the Atomic Action (consistency control), processes need to have a common agreement on the following; Temporal ordering of event occurrences in the system Global system state and state transition Possibility that the agreement may be impaired due to delays in inter-process communication and faults in nodes and/or links Clock synchronization, Byzantine agreement

Atomic action (2)


A method of process structuring for allowing the writer of a procedure to secure the same benefit of atomicity, i.e. Indivisibility, noninterference, strict sequencing Basic notion to solve consistency problems in distributed systems Generalized notion of transactions for database concurrency control Several definitions: 1) An action is atomic if the process performing it is not aware of the existence of any other active process (can detect no spontaneous state change) and no other process is aware of the activity of this process (its state changes are concealed) during the time the process is performing the action 2) An action is atomic if the process performing it does not communicate with other processes while it is executing the action 3) Actions are atomic if they can be considered, so far as other processes are concerned, to be indivisible and instantaneous, such that the effects on the system are as if they were interleaved as opposed to concurrent
5

Nested structure of Atomic AcDons


Dened relaDvely at any level of process structure An atomic acDon at a level consists of two or more atomic acDons at a lower level A P1 C

D P2 E F B P3 G

Logical Clock (1)


L.Lamport: Time, clocks, and the ordering of events in a distributed system, C.ACM, Vol.21, No.7, pp.558 565 (1978) System: a collection of distinct processes, each of which consists of a sequence of events Event a happened before event b denoted by a -> b:
If a and b are in the same process, and a comes before b, then a->b If a is the sending of a message by one process and b is the receipt of the same message by another process, then a->b If a->b and b->c, then a->c

(Logical) Clock Ci for each process Pi is defined to be a function which assigns a number Ci(a) to any event a in that process (Correct) Clock condition: For any events a, b, If a->b then C(a)<C(b) The clock condition is satisfied if the following two conditions hold; C1: If a and b are in process Pi, and a comes before b, then Ci(a)<Ci(b) C2: If a is the sending of a message by process Pi, and b is the receipt of the same message by process Pj, then Ci(a)<Cj(b)

Logical Clock (2)


Assume that the processes are algorithms and the events represent certain actions during their execution. Process Pis clock is represented by a register Ci, so that Ci(a) is the value contained by Ci during the event a Condition C1 and C2, and therefore the Clock Condition, are satisfied if the following implementation rules are satisfied; IR1: Each process Pi increments Ci between any two successive events IR2: (a) If event a is the sending of a message m by process Pi, then message m contains a timestamp Tm = Ci(a). (b) Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm Hence, the simple implementation rules guarantee a correct system of logical clocks A system of clocks satisfying the Clock Condition can be used to place a total ordering on the set of all system events

Total ordering of events


Define a relation => as follows; If a is an event in process Pi and b is an event in process Pj, then a=>b if and only if either (i) Ci(a)<Cj(b) or (ii) Ci(a)=Cj(b) and Pi<<Pj , where << is any arbitrary total ordering of processes Then, relation => is a total ordering, i.e. relation => is a way of completing the happened before partial ordering to a total ordering

Total ordering is useful in implementing a distributed system !

Mutual Exclusion Problem


Find an algorithm for granting the single shared resource to a process which satisfies the following three conditions; (I) A process which has been granted the resource must release it before it can be granted to another process. (II) Different requests for the resource must be granted in the order in which they are made. (III) If every process which is granted the resource eventually release it, then every request is eventually granted This is a non-trivial problem. A central scheduling process will not work!

Distributed algorithm for M.E.


1. To request the resource, process Pi sends the message Tm:Pi requests resource to every other process, and puts that message on its request queue, where Tm is the timestamp of the message 2. When process Pj receives the message Tm:Pi requests resource, it places it on its request queue and sends a (timestamped) acknowledgment message to Pi 3. To release the resource, process Pi removes any Tm:Pi requests resource message from its request queue, and sends a (timestamped) Pi releases resource message to every other process 4. When process Pj receives a Pi releases resource message, it removes any Tm:Pi requests resource message from its request queue 5. Process Pi is granted the resource when the following two conditions are satisfied: (i) There is a Tm:Pi requests resource message in its request queue which is ordered before any other request in its queue by the relation => (ii) Pi has received a message from every other process timestamped later than Tm Note that conditions (i) and (ii) of rule 5 are tested locally by Pi

Anomalous behavior
Logical clock based on the relation => may cause anomalous behavior

Computer C TA:Req A Computer A TB:Req B Computer B

a Phone call

TB < TA can happen While actually a -> b

This can happen because the system has no way of knowing the actual precedence information a->b that is based on the phone message external to the system => we need a system of physical clocks

Physical clock
Ci(t) denotes the reading of clock Ci at physical time t For Ci to be a true physical clock, the following must be satisfied; PC1: There exists a constant <<1 such that for all i, |dCi(t)/dt-1|< PC2: There exists a constant such that for all i, j, |Ci(t) Cj(t)|< To prevent anomalous behavior, for such a number that is less than the shortest transmission time for interprocess messages, it must be made sure that, for any i, j, Ci(t+) Cj(t) > 0 Combining the above with PC1 implies that Ci(t+) Ci(t) > (1-) Using PC2, it actually holds that Ci(t+) Cj(t) > 0 if it holds that /(1-) Let m be a message sent at physical time t and received at t, and the minimum transmission delay m for m be known to the process that receives m Assuming PC1, PC2 can be insured by the following Implementation Rule; IR1: For each i, if Pi does not receive a message at physical time t, then Ci is differentiable at t and dCi(t)/dt>0 IR2: (a) If Pi sends a message m at physical time t, then m contains a timestamp Tm=Ci(t). (b) Upon receiving a message m at time t, process Pj sets Cj(t) equal to MAX{Cj(t-0), Tm+m) To synchronize physical clocks, a process only needs to know its own clock reading and the timestamps of messages it receives

Byzantine Generals Problem


L.Lamport,et al:The Byzantine generals problem, ACM Trans. Prog. Lang. Syst., Vol.4, No.3, pp.382-401 (1982)

A problem of coping such a situation that one or more faulty components of a system send conflicting information to different part of the system A group of generals of the Byzantine army camped with their troops around an enemy city Communicating with one another only by messenger, the generals must agree upon a common battle plan However, some of the generals may be traitors trying to prevent the loyal generals from reaching agreement [Byzantine Generals Problem]: A commanding general must send an order to his n-1 lieutenant generals such that IC1: All loyal lieutenants obey the same order IC2: If the commanding general is loyal, then every loyal lieutenant obeys the order he sends
14

Impossibility for n<3m+1


With oral messages, no solution for fewer than 3m+1 generals can cope with m traitors
traitor a\ack Lieutenant 2

Commander a\ack Lieutenant 1

he said retreat

Commander a\ack Lieutenant 1 retreat Lieutenant 2

he said retreat

There is no way for Lieutenant 1 to distinguish between the two scenarios

Oral message for n3m+1


Oral message is one whose contents are completely under the control of senders, so a traitorous sender can transmit any possible message Assumptions A1: Every message that is sent is delivered correctly A2: The receiver of a message knows who sent it A3: The absence of a message can be detected We inductively define the Oral Message algorithm OM(m), for all non-negative integers m, by which a commander sends an order to n-1 lieutenants OM(m) solves the Byzantine Generals Problem for 3m+1 or more generals in the presence of at most m traitors We consider the case in which only possible decisions are attack or retreat The algorithm is described in terms of Lieutenants obtaining a value rather than obeying an order

Oral Message algorithm OM(m)


Algorithm OM(0) (1) The commander sends his value to every lieutenant (2) Each lieutenant uses the value he receives from the commander, or uses the value RETREAT if he receives no value Algorithm OM(m), m>0 (1) The commander sends his value to every lieutenant (2) For each i, let vi be the value Lieutenant i receives from the commander, or else be RETREAT if he receives no value. Lieutenant i acts as the commander in OM(m-1) to send the value vi to each of the n-2 other lieutenant (3) For each i, and each ji, let vj be the value Lieutenant i received from Lieutenant j in step (2) (using OM(m-1)), or else RETREAT if he received no such value. Lieutenant i uses the value majority(v1, v2, , vn-1)

Algorithm OM(1)
Commander traitor

v
Lieutenant 1

v
Lieutenant 2

v
Lieutenant 3

Lieutenant 2 obtains the correct value v = majority(v, v, x)

Commander

x y
Lieutenant 1

y
Lieutenant 2

z
Lieutenant 3

z x

z
All lieutenants obtain the same value majority(x, y, z)

Signed message
If the traitors ability to lie can be restricted, an algorithm exists to cope with m traitors for any number ( m+2) of generals A4 (Additional assumption): (a) A loyal generals signature cannot be forged, and any alteration of the contents of his signed messages can be detected (b) Anyone can verify the authenticity of a generals signature (No assumption is made about a traitorous generals signature. His signature is allowed to be forged by another traitor, thereby permitting collusion among the traitors) The commander sends a signed order to each of his lieutenants Each lieutenant then adds his signature to that order and send it to the other lieutenants, who add their signatures and send it to others, and so on Let x:i denote the value x signed by General i. Thus, x:i,j denotes the value x signed by i, and then that value x:i signed by j Let General 0 be the commander Each lieutenant i maintains a set Vi of properly signed orders he has received so far
19

Signed message algorithm SM(m)


Initially Vi = (1) The commander signs and sends his value to every lieutenant (2) For each i : (A) If Lieutenant i receives a message of the form v:0 from the commander and he has not yet received any order, then (i) he lets Vi equal {v}; (ii) he sends the message v:0:i to every other lieutenant (B) If Lieutenant i receives a message of the form v:0:j1: :jk and v is not in the set Vi, then (i) he adds v to Vi; (ii) if k<m, then he sends the message v:0:j1: jk:i to every lieutenant other than j1, , jk (3) For each i: When Lieutenant i will receive no more messages, he obeys the order choice (Vi)

Commander a\ack:0 a\ack:0:1 Lieutenant 1 retreat:0:2 Lieutenant 2 retreat:0

Lieutenants 1 and 2 obey the order choice ({a\ack, retreat}) and know the commander is a traitor because of his signature on two dierent orders

Commander a\ack:0 a\ack:0:1 Lieutenant 1 a\ack:0:x Lieutenant 2 a\ack:0

Lieutenant 1 obeys the order choice ({a\ack})

Concurrency Control
Even if mutual exclusion is realized at a basic action level of processes, a inconsistent state may appear when two or more processes try to access the same database Example: Two client A and B may wish to send $10 and $20, respectively, to a common account independently of one another
Process A A1: READ BALANCE ADD $10 A2: WRITE BALANCE Process B B1: READ BALANCE ADD $20 B2: WRITE BALANCE

Making READ and WRITE being atomic actions individually is not enough ! What will happen if the order of the READ and WRITE commands being executed is, for example, A1, B1, A2, B2 ? Transactiona sequence of READ and WRITE commands sent by a client to the file system Concurrency control (Serializability control) Executing multiple transactions that occur simultaneously as serializable atomic actions

22

Serializability
T1 T2 READ Y Y=Y - 20 WRITE Y READ Z Z=Z + 20 WRITE Z READ X X=X - 10 WRITE X READ Y Y=Y + 10 WRITE Y T3 READ X Y=Y - 20 X=X - 10 WRITE Y WRITE X READ Z READ Y Z=Z + 20 Y=Y + 10 WRITE Z WRITE Y Serial ExecuDon Serializable ExecuDon T4 READ Y

X=20, y=40, z=60 X+Y+Z is preserved

T5 READ X X=X - 10 WRITE X

T6

READ Y Y=Y - 20 READ Y WRITE Y Y=Y + 10 WRITE Y READ Z Z=Z + 20 WRITE Z

Non-serializable ExecuDon

2-phase locking
A lock is an access privilege on a data item, which is granted to a particular transaction so that one transition can access the data item at a time When a transaction tries to access a data item, it must lock the item before accessing it, and unlock it on finishing the access In order for the 2-phase locking to guarantee consistency, each transaction Does not lock the data item that has been already locked locks a data item before accessing it unlocks all the data items before finishing the transaction Once having unlocked a data item, does not acquire any more locks Each transaction is divided into two phases, i.e. growing phase and shrinking phase. The number of locked items increases monotonically at the growing phase and decreases monotonically at the shrinking phase The 2-phase locking makes all the transactions serializable, i.e. atomic actions !
24

Timestamp
Every transaction are given a timestamp when it occurs Every request for accessing a data item are given its transactions timestamp If there is a conflict among requests for accessing a data item, the earliest one is granted according to the order of timestamps Algorithm for the scheduler at each site; For each data item X , the scheduler records the largest timestamp W(X) of WRITE requests and the largest timestamp R(X) of READ requests that have been processed For READ request with timestamp Tif T<W(X), the scheduler rejects the READ requests . Otherwise, it outputs the READ request and set R(X) to MAX(R(X), T). For WRITE request with timestamp T, if T<MAX(R(X), W(X)), the scheduler rejects the WITE request. Otherwise, it outputs the WRITE request and sets W(X) to T If READ request or WRITE request is rejected, the requesting transaction is aborted, assigned a new larger timestamp and restarted
25

Multiple Copy Update


Multiple copies of a complete database distributed for higher reliability/ availability requirements must be kept consistent commitmake all the update made by a transaction permanent Abortroll back( or undo) a transaction to ensure that no effect of the transaction remains in the database

Commit control for replicated database Ensures that either a transaction is committed by every site or aborted by every site (all or nothing) Involves a) commit control for a single transaction, and b) serialization of concurrent transactions

26

2-phase commit protocol


Given a coordinator node designated, the commit control for a single transaction can be realized by the following the 2-phase commit protocol; Commit-request phaseCoordinator node sends a query to commit message to all the other nodes. Each node replies to the coordinator with agree-to-commit message if the transaction succeeded, or abort message if the transaction failed Commit phaseIf the coordinator receives agree to commit from all the other nodes, it sends them a commit message, otherwise sends a roll back message to all the nodes Access control of replicated database including serialization of concurrent transactions
L.Svobodova:Attaining resilience in distributed systems, Chapter 5 of Dependability of Resilient Computers(Ed. By T.Anderson) BSP Professional Books 1989

Error recovery
When a transient fault or a process abort occurs, affected processes are rolled back to a point (checkpoint) prior to the occurrence of the fault Checkpointingrecording a snapshot of the entire state of a process at a moment that is needed to restart the process from that point
CA Process A
communicaDon failure

Process B

CB

CA: checkpoint for process A CBcheckpoint for process B If the communication line intersects the line that links CA and CB , there will be an inconsistency in the system state when the failed process is rolled back to the checkpoint CA
28

Domino effects
If processes establish their checkpoints independently of each other, there will occur the Domino effects
Process A Process B
SA SB CA1 CB1 CA2 CB2

Process C SC
CC1 CA1 CB1 CA2 CB2 CB3 CC2

Process A SA Process B
SB

Process C SC
CC1 CC2
29

A recovery line is created if CB3 is additionally established

Das könnte Ihnen auch gefallen