Beruflich Dokumente
Kultur Dokumente
ô Key concepts
Introduction
IPC
Group communication
m m m m
¬ntroduction
ô In distributed system, processes executing on different computers
often need to communicate with each other to achieve some
common goal
© m m m
¬ntroduction (contd«)
ô ~hared-Memory approach
Write A Read A
Shared common
Memory area
ô Message-passing approach
Send A Receive A
§ m m m
Ôesirable Features of a good message-
message-passing
system
ô ~implicity
ô Uniform semantics
^ocal communication
Remote communication
ô ufficiency
If the message passing system is not efficient, IPC become more
expensive. i.e. users will not feel like using this mechanism)
6 m m m
Features of a good message-
message-passing system
(contd
contd«)
«)
~ome optimizations normally adopted
è Avoiding the cost of establishing and terminating connection between
the processes for each and every message exchange.
è Minimizing the costs of maintaining connections
è Piggy backing of acknowledgement.
ô Reliability
D~ are prone to node crashes or link failures. Retransmit the message (
may be based on timeouts)
Due to timeouts ± Duplicate Message
º m m m
Features of a good message-
message-passing system
(contd
contd«)
«)
ô Correctness - related to group communication
Atomicity ± either to all or None
Ordered delivery ± Order acceptable to the application
~urvivability ± guarantees message delivery despite of failures
ô lexibility
IPC primitives must also have the flexibility to permit any kind of
control flow between the co-operating processes including
synchronous and asynchronous send Receive.
m m m
Features of a good message-
message-passing system
(contd«)
ô ~ecurity
è Authentication of the receiver sender
è uncrypted message
ô Portability
m aspects of portability
è [he message passing system should itself be portable
è [he applications written by using the primitives of IPC protocols of the
message passing system should be portable. ~o, Heterogeneity must
be considered while desiging message passing system
D m m m
¬ssues in ¬ by message passing
ô A message is a block of information formatted by a sending process in
such a manner that it is meaningful to receiving process
ô It consists of a fixed length header and a variable size collection of
typed data objects
ô [he header consists of:
m m m
¬ssues in ¬ (contd«)
Structural information
Addresses
m m m
¬ssues in ¬ (contd«)
ô Issues in IPC are addressed by:
~ynchronization
Buffering
Process Addressing
ailure Handling
Group Massaging
m m m m
Synchronization
ô ~emantics used for synchronization may be broadly classified as
Blocking ± its invocation blocks the execution of its invoker
Nonblocking - Its invocation does not block the execution of its invoker
© m m m
Synchronous mode of communication with both < and
primitives having blocking-type semantics
Receive (message);
Execution suspended
Send (message)
Execution suspended Message
Execution resumed
Send (acknowledgement)
Execution resumed
Acknowledgement
Blocked state
Executing state
§ m m m
Buffering
ô Messages can be transmitted from one process to another by copying
the body of the message from the address space of sending process to
the address space of the receiving process
ô [he message buffering strategy in IPC is strongly related to
synchronization strategy
ô our types of buffering strategy are:
Null buffer ( or no buffering)
6 m m m
Buffering (contd«)
ô Null buffer (or No buffering)
[here is no place to temporarily store the message
~trategies used are:
è [he message remains in the senders process¶s address space and
the execution of the send is delayed until the receiver executes the
corresponding receive
è [he message is simply discarded and the timeout mechanism is used
to resend the message after a timeout period
º m m m
Buffering (contd«)
MSG
m m m
Buffering (contd«)
ô ~ingle-message buffer
Null buffer strategy is not suitable for synchronous communication
èA message has to be transferred two or more times, and receiver of the
message has to wait for the entire time taken to transfer the message
across the network
~ynchronous communication mechanisms in Distributed systems use a
single-message buffer strategy
A buffer having the capacity to store a single-message is used on the
receiver¶s node
D m m m
Buffering (contd«)
ô Idea is to keep the message ready for use at location of the receiver
ô [he request message is buffered on the receiver¶s node if the receiver is not
ready to receive the message
ô [he message buffer may be either in kernel¶s address space or in the
receiver¶s process¶s address space
Single-message
buffer
Node boundary
m m m m
Buffering (contd«)
ô inite-bound (or multiple-message) buffer
Unbounded capacity of a buffer is practically impossible
m m m m
Buffering (contd«)
Receive
Send
MSG
Multiple-message
buffer/mailbox port
Message transfer in asynchronous send with multiple-message buffering strategy
ô [he message is first copied from the sending process¶s memory into the
receiving process¶s mailbox
ô [hen message is copied from the mailbox to the receiver¶s memory
when the receiver calls for the message
mm m m m
Multidatagram messages
ô Maximum transfer unit (M[U)
Upper bound on the size of data that can be transmitted at a time
ô Message whose size is greater than M[U has to be fragmented into
multiples of M[U and sent separately
ô uach fragment is sent in a packet (known as datagram)
ô Messages smaller than M[U can be sent in a single packet (known as
single-datagram messages)
ô Messages larger than M[U have to separated and sent in multiple
packets (known as Multidatagram messages)
ô [he disassembling and reassembling of messages on sender and
receiver side is the responsibility of message passing system
m© m m m
Encoding and Ôecoding of message data
ô [he structure of the message data should be preserved between the
sending and receiving processes
ô It is very difficult to achieve this goal in both heterogeneous and
homogenous systems
m reasons
m§ m m m
Encoding and decoding of message data
(contd«)
ô [wo representation for encoding and decoding of message data:
[agged representation
è [he type of each program object along with its value is encoded in the
message
è Because of self-describing nature of the coded data format
o Receiving process does not need prior knowledge
Untagged representation
è Message data only contains program objects
è No information is included in the message data to specify the type of each
program object
o Receiving process must have prior knowledge of how to decode the
received data
m6 m m m
rocess addressing
ô Message passing system usually supports m types of process addressing
uxplicit addressing
[o the process
mº m m m
rocess addressing (contd«)
Implicit addressing
è Processwilling to communicate does not explicitly name a process for
communication
o ~end-any (service_id, Message)
~end a message to any process that provides the service of
type ³service id´
o Receive any (Process_id, Message)
Receive a message from any process & return the
³process_id´ of the process from which message was received.
m m m m
rocess addressing (contd«)
ô Processes can be identified by the combination of three fields:
Machine_id, local_id, machine_id
è irst field identifies the node on which process is created
mD m m m
rocess addressing (contd«)
ô ^ink-based addressing:
When a process is migrated from its current node to a new node, a link
information {process id, networks M c id} is left on its previous node and
on a new node,
a new local id is assigned to a process, and its process identifier and the
new local-id is entered in a mapping table maintained by the kernel of
the new node for all processes created on another node but running on
their node.
If the value of the third field is equal to the first field, the message will be
sent to the node on which the process was created
m m m m
rocess addressing (contd«)
ô Drawbacks: uventhough it supports migration facility, it suffers from m
main drawbacks
è [he overhead of locating a process may be large if the process has
migrated several times during its lifetime
è It
may not be possible to locate a process if an intermediate node on
which the process once resided during its lifetime is down
ô Both process addressing methods are nontransparent due to the need
to specify the machine identifier
ô What are the alternatives?
© m m m
rocess addressing (contd«)
. Centralized process identifier allocator
Maintains a counter. When it receives a request for identifier, it
returns the current value of the counter and increments the counter
It suffers from poor reliability and scalability
m. [wo level naming scheme for processes
. Machine independent high level name
m. Machine dependent low level name
with a centralized( or replicated distributed) name server maintaining
the map table that maps high level name to the low level name
© m m m
Failure handling
ô Possible problems in IPC due to different types of system failures
^oss of request message
©m m m m
Failure handling (contd«)
Sender Receiver
Send Request
a) Request request message
message is lost
Lost
Send Request
b) Response request message
message is lost Successful request execution
Response
message Send response
Lost
Send
c) Receiver¶s Request
request
computer crashed message
Successful request
execution
rash
Restarted
©© m m m
Failure handling (contd«)
ô our-message reliable IPC protocol for client-server communication
between two processes
lient Server
Request
Acknowledgement
Blocked state
Acknowledgement
©§ m m m
Failure handling (contd«)
ô [hree-message reliable IPC protocol for client-server communication
between two processes
lient Server
Request
Blocked state
Acknowledgement
©6 m m m
Failure handling (contd«)
ô [wo-message reliable IPC protocol for client-server communication
between two processes
lient Server
Request
Blocked state
©º m m m
Failure handling (contd«)
ô ault tolerant communication between a client and a server
Client Server
Send
Request REQUEST Message
Successful Execution
Response
© m m m
Failure handling (contd
(contd«)
«)
ô Idempotency and handling of duplicate request messages
©D m m m
Failure handling (contd«)
ô uxample : Non Idempotent operation
int Cal_inal_Marks (int und_~em_Marks, int attndnce)
{ [otal_Marks += und_~em_Marks ;
if ( attndnce > 9 )
[otal_Marks += ;
else if ( attndnce > 9 )
[otal_Marks += 3 ;
else if ( attndnce > 8 )
[otal_Marks += m ;
else if ( attndnce > 8 )
[otal_Marks += ;
return([otal_Marks );
}
© m m m
Failure handling (contd«)
CLIENT SERVER
Send Total_Marks = 43
Request Cal_Final_Marks (34, 87)
Execute Cal_Final_Marks.
Total_Marks=43+34+2 = 79
Timeout Retrun(79)
Lost
Total_Marks=79++34+2 = 115
Retrun(115)
Receive Total_Marks
= 115
A nonidempotent procedure
§ m m m
Failure handling (contd«)
When no response is received by the client, it is impossible to
determine whether the failure was due to server crash or loss of the
request or response message.
Using timeouts client resends the request.
Repeated execution of NonIdempotent requests results in
³ORPHAN´ executions
How to ensure only one execution of NonIdempotent requests ?
Using uxactly once semantics
uxactly once semantics is implemented using unique identifier for
each request at the client side and reply cache on the server side
§ m m m
Failure handling (contd«)
CLIENT SERVER
Total_Marks = 43
Send
Cal_Final_Marks (34, 87) Reply Cache
Request01 Check reply Cache for request01.
NOT FOUND
Execute Cal_Final_Marks. REQUEST REPLY TO BE
IDENTIFIER SENT
Total_Marks=43+34+2 = 79
Time Save Reply
out Retrun(79) Request 01 79
Request02 45
Lost
.. ..
Send Retransmit
Request01 Cal_Final_Marks(34, 87)
§© m m m
Failure handling (contd«)
When B^A~[ protocol is used, Node or Common link failure leads to
^oss of packets
[o solve this:
Use of Bitmap to identify the packet of a message using m extra
fields to the Header.
[otal No of Packets, Bit map specifying the position of the packet.
Use ³ ~u^uC[IVu RuPuA[ ³ method to transmit the ^ost packets
after time out period.
§§ m m m
Failure handling (contd«)
SENDER RECEIVER
Send Request message
Packets of
the
Response Timeout
message
Resend
Missing
packets
§6 m m m
Group communication
ô [hree types of group communication:
§º m m m
Group communication (contd«)
ô One-to-many communication
Also known as multicast communication
~pecial case of multicast communication is broadcast communication
è Message is sent to all processors connected to a network
ô Group management
è Closed Group - Only the members of the group can send message to
the group.
è Open Group ± Any person in the system can send the message to the
group.
è CentralizedGroup ~ervers (with Replication) ± or dynamic
management of Group members.
§ m m m
Group communication (contd«)
ô Group addressing
m level naming scheme is normally used for group addressing
§D m m m
Group communication (contd«)
ô Message delivery to receiver process
User applications use high-level group names in programs
[he centralized group server maintains a mapping of high-level group
names to their low-level names
Group server also maintains a list of the process identifiers of all the
processes for each group
§ m m m
Group communication (contd«)
ô Buffered and unbuffered multicast
Multicast is an asynchronous communication mechanism
Multicast send cannot be synchronous due to:
è Itis unrealistic to expect a sending process to wait until all the receiving
processes that belong to the multicast group are ready to receive the
multicast message
è [he sending process may not be aware of all the receiving processes that
belong to the multicast group
or unbuffered multicast, the message is not buffered
è ^ost if receiving process is not in a state to receive it
or buffered multicast, the message is buffered for receiving process
è uach process of group receive the message
6 m m m
Group communication (contd«)
ô m types of semantics for one-to-many communications
è ~end-to-all semantics
è Bulletin-board semantics
6 m m m
Group communication (contd«)
ô lexible reliability in multicast communication
In one to many communication, the degree of reliability is normally
expressed in:
è [he -reliable
6m m m m
Group communication (contd«)
ô Atomic multicast
Has an all-or-nothing property
When message is sent to group, it is either received by all processes that are
members of the group or else it is not received by any of them
ô Many-to-one communication
Multiple senders send messages to a single receiver
~ingle receiver may be selective or nonselective
~elective receiver specifies a unique sender
è Message exchange takes place only if that sender sends a message
6© m m m
Group communication (contd«)
ô Many-to-many communication
Multiple senders send messages to multiple receivers
6§ m m m
Group communication (ontd«)
ô Absolute ordering
All messages are delivered to all receiver processes in the exact order
in which they were sent
~ystem is assumed to have clock at each machine, and clocks are
synchronized with each other
Uses global timestamp as message identifiers
Kernal of the receiver places the message in a queue
~liding window mechanism is used to deliver the message periodically
Messages whose time stamp falls within the current window are
delivered to the receiver
66 m m m
Group communication (ontd«)
ô Absolute ordering
S1 R1 R2 S2
Time
t1
m1
t2
m1 t1 < t2
m2
m2
S1 R1 R2 S2
Time
t1
t2
m2
m2 t1 < t2
m1 m1
6D m m m
Group communication (ontd«)
ô Implementation of consistent-ordering (Contd«)
6 m m m
Group communication (ontd«)
ô ABCA~[ protocol :
[his sequence number should be greater than the previous number used
by the sender. A counter is used.
m. On receiving the message, each member of the group returns a
proposed sequence number to the sender
è Member(i) calculates its proposed sequence number as
max ( max, Pmax) + + i N
o max o largest final sequence number agreed upon so far for a
message received by the group
o Pmax o largest proposed sequence number by this member
o N o total number of members in the multicast group
o i o member number
º m m m
Group communication (ontd«)
ô ABCA~[ protocol :
3. When sender has received the proposed sequence numbers from all
the members, it selects the largest one as the final sequence number
for the message and sends it to all members in a COMMI[ message
º m m m
Group communication (contd«)
ô Casual ordering
[wo message sending events are said to be casually related if they are
co-related by the happened-before relation
ºm m m m
Group communication (contd«)
ô Casual ordering
S1 R1 R2 R3 S2
m1
t1
Time
m1 m2
m3
m1
m2
m3
CBCA~[ protocol
º§ m m m
Group communication (contd«)
§. [o send a message, a process increments the value of its own component
in its own vector and sends the vector as part of the message
. When message arrives at a receiver process¶s site, it is buffered by the
runtime system and the Runtime system tests the two conditions, to decide
whether message can be delivered or it must be delayed to ensure casual-
ordering semantics
è ~ i ] = R i ] + and
è ~ j ] = R j ] for all j != i
where ~ is Vector of ~ender process and R is Vector of Receiver process
º6 m m m
Group communication (contd«)
~ i] = R i] + ensures that the receiver has not missed any message from the
sender
~ j] = R j] for all j!=i ensures that the sender has not received any message
that the receiver has not yet received
6. If message passes these two tests, the runtime system delivers it to the
user process
7. Otherwise the message is left in the buffer and the test is carried out again
for it when a new message arrives
ºº m m m
Group communication (contd«)
ô CBCA~[ protocol for implementing casual ordering
º m m m
Remote rocedure alls
ô It is a special case of general message-passing model of IPC
ô RPC has become a widely accepted IPC mechanism in distributed
systems because of the following features
~imple call syntax
amiliar semantics ( similar to ^ocal procedure calls)
Well-defined interface
uase of use
Generality
ufficiency
Can be used as an IPC mechanism to communicate between
processes on different machines as well as between different
processes on the same machine
ºD m m m
R model
ô RPC model is similar to the procedure call model used for the transfer of
control and data within a program in the following manner:
or making a procedure call, the caller places arguments to the
procedure in some well specified location
Control is then transferred to the sequence of instructions that
constitutes the body of the procedure
[he procedure body is executed in a newly created execution
environment
After the procedure¶s execution, control returns to the calling point,
possibly returning a result
º m m m
Typical Model of a R
aller allee
(lient rocess) (Server rocess)
Receive request
it can be & start rocedure
asynchronous , Execution
so that client
can do other rocedure Executes
task while
waiting for replya Send Reply & Wait
for next Request
Resume Execution
m m m
Transparency of R
ô A transparent RPC mechanism is one in which local procedures and
remote procedures are indistinguishable to programmers
. ~yntactic transparency
è RPC should have exactly the same syntax as a local procedure call
m. ~emantic transparency
m m m
Transparency of R (ontd«)
ô Differences between RPC and ^PC:
With RPC, the called procedure is executed in an address space that
is disjoint from the calling program¶s address space. ~o, remote
procedure cannot have access to any variables or data values in the
calling program¶s environment
RPC are more vulnerable to failure than ^PC¶s
è ~ince they involve m different processes and possibly a network and m
different computers
RPCs consume much more time (- times more) than ^PCs
è Due to involvement of a communication network
m m m m
¬mplementation of R mechanism
ô Implementation of RPC mechanism involves five elements of program:
. [he client
m. [he client stub
3. [he RPCRuntime
§. [he server stub
. [he server
ô [he client, the client stub, and one instance of RPCRuntime execute
on the client machine
ô [he ~erver, the ~erver stub, and one instance of RPCRuntime execute
on the server machine
© m m m
¬mplementation of R mechanism
Client Machine Server Machine
10
1 6 5
2 9 7 4
Result packet D
Call packet 3
§ m m m
¬mplementation of R mechanism
ô Client
User process that initiates a RPC
6 m m m
¬mplementation of R mechanism
ô RPCRuntime
Handles transmission of messages across the network between client
and server machines
It is responsible for retransmission, acknowledgements, packet routing
and encryption
RPC runtime on the client machine receives the call request message
from the client stub and sends it to the server machine. It also receives
the result message from the server and passes it to the client stub
RPC runtime on the ~erver machine receives the result message from
the server stub and sends it to the client machine. It also receives the
call request message from the client and passes it to the server stub
º m m m
¬mplementation of R mechanism
ô ~erver stub
[wo tasks:
ô ~erver
On receiving call request from server stub, the server executes the
appropriate procedure and returns the result of procedure execution to
the server stub
m m m
¬mplementation of R mechanism
ô ~tub generation:
m ways
è Manually : RPC implementor provides a set of translation functions
from which a user can construct stubs
è Automatically : Uses Interface Definition ^anguage (ID^) to define the
interface between a client and a server.
ô RPC messages:
m types of messages involved in the implementation of an RPC system
are:
è Call messages
è Reply messages
D m m m
¬mplementation of R mechanism
ô Call messages:
m m m
¬mplementation of R mechanism
D m m m
¬mplementation of R mechanism
RPC reply message format
m) ~erver Creation
Dm m m m
Stateful server
Client Process Server Process
Open ( Filename, Mode )
File Mode R/W Pointer
Id
Return ( Fid )
Close ( fid )
Return ( Successful )
~tateful file server
D© m m m
Stateless server
Client Process Server Process
Read( Filename,0, 200,buffer )
File Mode R/W Pointer
Id
Return ( bytes 0 to 199 )
Read( Filename,400,20,buffer )
D6 m m m
Server reation Semantics
ô ~ever processes may either be created and installed before their client
processes or be created on demand basis.
ô Based on the time duration for which RPC server survive, RPC servers
are classified as
. Instance ± per-call ~erver.
m. Instance ± per- session ~erver
3. Persistent ~erver
Dº m m m
Server reation Semantics
Not commonly used approach because,
It is stateless approach, needs state information to be presented
either at client process ([ime consuming and loss of data
abstraction) or at server O.~. (uxpensive)
Multiple invocation of same server becomes more expensive.
m. Instance ± per- session ~erver : ~erver exists for the entire session
for which client & server interact. ~erver can maintain internal state
information. Overhead involved in creation and destruction is
minimized.
3. Persistent ~erver : ~erver remains in existence indefinitely. A
persistent server can be shared unlike other two.
D m m m
ommunication protocols for Rs
. [he Request(R) protocol
lient Server
Request message
rocedure
First R
execution
Request message
Next R rocedure
execution
DD m m m
ommunication protocols for Rs
ô [he Request protocol
Used in RPC in which the called procedure has nothing to return and
client requires no confirmation that procedure is executed
Only one message per call is transmitted
D m m m
ommunication protocols for R¶s
m. [he Request Reply(RR) protocol
lient Server
Request message
Request message
m m m
ommunication protocols for R¶s
3. [he Request Reply Acknowledge-reply(RRA) protocol
lient Server
Request message
First rocedure
R execution
Reply message
Request message
rocedure
Next
execution
R
Reply message
Client acknowledges the reply message only if it has received the reply
for all the previous requests
~erver deletes information from its cache only after receiving an
acknowledgement for it from the client
^oss of acknowledgement is harmless, since an acknowledgement
message guarantees the receipt of reply for earlier messages
© m m m
lient Server Binding
ô Binding: Process by which client become associated with server so that
calls can take place.
~erver locating:
. Broadcasting:
Message is broadcast to all nodes.
Node housing the desired server responds.
uasy to implement & suitable for small networks. uxpensive for large
networks.
m. Binding Agent:
A name server used to bind a client to a server.
Name server maintains the Binding [able.
§ m m m
lient Server Binding
Name Server
Binding Agent
2 1
3
4
Client Calls the Server
Client Process Server Process
6 m m m
lient Server Binding
Advantages of using Binding Agent:
ô Can support Multiple ~ervers having the same interface type so that any
of the available server may be used to service the client¶s request.
ô Binding agent can Balance the load evenly among the servers providing
the same service.
ô User Authorization facility can be provided for binding
Disadvantages:
ô Overhead becomes large when many client processes are short lived.
ô Binding Agent may become a performance bottleneck
º m m m
lient Server Binding
Binding time: -
. Compile time Binding ĺ Hard coding of ~erver¶s network addresses.
uxtremely Inflexible (if configuration changes)
m. ^ink time Binding ĺ Request B.A. before making call
~erver process exports its services by registering it
Client makes Import request to the binding agent for the service before
making call
Binding Agent returns the server details to the client
Client caches it to avoid contacting the Binding agent for subsequent
calls
3. Call time Binding
Client is bound to a server at the time when it calls the server for the
first time during its execution.
m m m
lient Server Binding - all time Binding
Binding Agent
1
2
4 3
5
Subsequent calls are Server Process
Client Process Sent directly
D m m m
omplicated R¶s
ô m types of complicated RPC¶s are:
. RPC¶s involving long-duration calls or large gaps between calls
è m methods used to handle
o Periodic probing of the server by the client
o Periodic generation of an acknowledgement by the server
m. RPC¶s involving arguments and or results that are too large to fit in a
single-datagram packet
è A long RPC argument or result is fragmented and transmitted in
multiple packets
m m m
Special types of R¶s
. Call Back RPC
m. Broadcast RPC
3. Batch-mode RPC
. Call Back RPC
ô In Normal RPC, the caller and callee processes have a client-server
relationship, where as in call back RPC uses Peer-to-Peer paradigm
where a node acts as both client and ~erver.
ô Call Back RPC is for interactive applications, which require user
intermediate inputs
ô During procedure execution the server process makes a callback RPC
to client process
Start procedure
execution
Stop procedure
rocess callback execution
request and send temporarily
reply
Reply (result of callback)
Resume procedure
execution
rocedure
execution ends
R R
R
Return (tag)
Return (result)
Reply (result)
Polling
for
Waiting
Reply ( Tag) request
Carry
out other Reply ( Tag, Parameter)
activities Check for result ( Tag)
Execute the
Procedure
Polling
for Acknowledgement
result
è [oo small timeout value will cause timers to expire too often, resulting in
unnecessary retransmissions
è [oo large timeout value will cause a needlessly long delay in the event
that a message is actually lost
è [he server stub file and the XDR filters file are compiled to get a client
stub object file