Chapter 1

Chapter 1: Introduction
Ajay Kshemkalyani and Mukesh Singhal

Distributed Computing: Principles, Algorithms, and Systems
Cambridge University Press
A. Kshemkalyani and M. Singhal (Distributed Computing)
Introduction
CUP 2008
1 / 36
Definition
Autonomous processors communicating over a communication network

Some characteristics
I
I
I
I
No common physical clock

No shared memory
Geographical seperation
Autonomy and heterogeneity
Introduction
CUP 2008
2 / 36
Distributed System Model
P M
P M
P M
P processor(s)
M memory bank(s)
Communication network
(WAN/ LAN)
P M
P M
P M
P M
Figure 1.1: A distributed system connects processors by a communication network.
Introduction
CUP 2008
3 / 36
Relation between Software Components

Distributed application
Extent of
distributed
Application layer
Operating
Transport layer
system
Network layer
Data link layer
Network protocol stack
protocols
Distributed software
(middleware libraries)
Figure 1.2: Interaction of the software components at each process.
Introduction
CUP 2008
4 / 36
Motivation for Distributed System
Inherently distributed computation

Resource sharing
Access to remote resources
Increased performance/cost ratio
Reliability
I
availability, integrity, fault-tolerance
Scalability
Modularity and incremental expandability
Introduction
CUP 2008
5 / 36
Parallel Systems
Multiprocessor systems (direct access to shared memory, UMA model)

I
I
I
Interconnection network - bus, multi-stage sweitch

E.g., Omega, Butterfly, Clos, Shuffle-exchange networks
Interconnection generation function, routing function
Multicomputer parallel systems (no direct access to shared memory, NUMA

model)
I
I
bus, ring, mesh (w w/o wraparound), hypercube topologies

E.g., NYU Ultracomputer, CM* Conneciton Machine, IBM Blue gene
Array processors (colocated, tightly coupled, common system clock)

I
Niche market, e.g., DSP applications
Introduction
CUP 2008
6 / 36
UMA vs. NUMA Models
PM
PM
PM
Interconnection network
Interconnection network
PM
(a)
PM
PM
(b)
M memory
P processor
Figure 1.3: Two standard architectures for parallel systems. (a) Uniform memory
access (UMA) multiprocessor system. (b) Non-uniform memory access (NUMA)
multiprocessor. In both architectures, the processors may locally cache data from
memory.
Introduction
CUP 2008
7 / 36
Omega, Butterfly Interconnects
P0 000
P1 001
000 M0 P0 000
001 M1 P1 001
000 M0
001 M1
P2 010
P3 011
010 M2 P2 010
011 M3 P3 011
010 M2
011 M3
P4 100
P5 101
100 M4 P4 100
101 M5 P5 101
100 M4
101 M5
P6 110
P7 111
110 M6 P6 110
111 M7 P7 111
110 M6
111 M7
(a) 3stage Omega network (n=8, M=4)
(b) 3stage Butterfly network (n=8, M=4)
Figure 1.4: Interconnection networks for shared memory multiprocessor systems.

(a) Omega network (b) Butterfly network.
Introduction
CUP 2008
8 / 36
Omega Network
n processors, n memory banks

log n stages: with n/2 switches of size 2x2 in each stage
Interconnection function: Output i of a stage connected to input j of next
stage:

2i
for 0 i n/2 1
j=
2i + 1 n for n/2 i n 1
Routing function: in any stage s at any switch:
to route to dest. j,
if s + 1th MSB of j = 0 then route on upper wire
else [s + 1th MSB of j = 1] then route on lower wire
Introduction
CUP 2008
9 / 36
Interconnection Topologies for Multiprocesors
0100
0000
0101
0001
0110
0010
1100
1000
0111
0011
1101
1001
1110
1010
1111
1011
processor + memory
(a)
(b)
Figure 1.5: (a) 2-D Mesh with wraparound (a.k.a. torus) (b) 3-D hypercube
Introduction
CUP 2008
10 / 36
Flynns Taxonomy
I
I
C
I
C
I
P
(a) SIMD
I
C
I
P
I
C
I
P
(b) MIMD
I
P
Control Unit
P Processing Unit
P
I instruction stream
D data stream
(c) MISD
Figure 1.6: SIMD, MISD, and MIMD modes.

SISD: Single Instruction Stream Single Data Stream (traditional)
SIMD: Single Instruction Stream Multiple Data Stream
I
I
scientific applicaitons, applications on large arrays

vector processors, systolic arrays, Pentium/SSE, DSP chips
MISD: Multiple Instruciton Stream Single Data Stream

I
E.g., visualization
MIMD: Multiple Instruction Stream Multiple Data Stream

I
distributed systems, vast majority of parallel systems
Introduction
CUP 2008
11 / 36
Terminology
Coupling
I
Interdependency/binding among modules, whether hardware or software (e.g.,

OS, middleware)
Parallelism: T (1)/T (n).

I
Function of program and system
Concurrency of a program
I
Measures productive CPU time vs. waiting for synchronization operations
Granularity of a program
I
I
Amt. of computation vs. amt. of communication

Fine-grained program suited for tightly-coupled system
Introduction
CUP 2008
12 / 36
Message-passing vs. Shared Memory
Emulating MP over SM:

I
I
Partition shared address space

Send/Receive emulated by writing/reading from special mailbox per pair of
processes
Emulating SM over MP:

I
I
Model each shared object as a process

Write to shared object emulated by sending message to owner process for the
object
Read from shared object emulated by sending query to owner of shared object
Introduction
CUP 2008
13 / 36
Classification of Primitives (1)
Synchronous (send/receive)
I
I
I
Handshake between sender and receiver

Send completes when Receive completes
Receive completes when data copied into buffer
Asynchronous (send)
I
Control returns to process when data copied out of user-specified buffer
Introduction
CUP 2008
14 / 36
Classification of Primitives (2)
Blocking (send/receive)
I
Control returns to invoking process after processing of primitive (whether sync

or async) completes
Nonblocking (send/receive)
I
I
I
Control returns to process immediately after invocation

Send: even before data copied out of user buffer
Receive: even before data may have arrived from sender
Introduction
CUP 2008
15 / 36
Non-blocking Primitive
Send(X, destination, handlek )

//handlek is a return parameter
...
...
Wait(handle1 , handle2 , . . . , handlek , . . . , handlem )
//Wait always blocks
Figure 1.7: A nonblocking send primitive. When the Wait call returns, at least
one of its parameters is posted.
Return parameter returns a system-generated handle
I
I
I
I
Use later to check for status of completion of call

Keep checking (loop or periodically) if handle has been posted
Issue Wait(handle1, handle2, . . .) call with list of handles
Wait call blocks until one of the stipulated handles is posted
Introduction
CUP 2008
16 / 36
Blocking/nonblocking; Synchronous/asynchronous;
send/receive primities
process i
S_C
buffer_i
W
P, S_C
kernel_i
kernel_j
buffer_j
process j
R_C
(a) blocking sync. Send, blocking Receive
S
process i
S_C
P, R_C
W
(b) nonblocking sync. Send, nonblocking Receive
buffer_i
P,
S_C
kernel_i
(c) blocking async. Send
S
R
P
W
(d) nonblocking async. Send
duration to copy data from or to user buffer

duration in which the process issuing send or receive primitive is blocked
Send primitive issued
S_C processing for Send completes
Receive primitive issued R_C processing for Receive completes
The completion of the previously initiated nonblocking operation
Process may issue Wait to check completion of nonblocking operation
Figure 1.8:Illustration of 4 send and 2 receive primitives

Introduction
CUP 2008
17 / 36
Asynchronous Executions; Mesage-passing System
P0
m1
m7
P1
m2
m6
m4
P2
m3
m5
P3
internal event
send event
receive event
Figure 1.9: Asynchronous execution in a message-passing system
Introduction
CUP 2008
18 / 36
Synchronous Executions: Message-passing System

P
0
P
1
P
2
P
3
round 1
round 2
round 3
Figure 1.10: Synchronous execution in a message-passing system

In any round/step/phase: (send | internal) (receive | internal)
(1) Sync Execution(int k, n) //k rounds, n processes.
(2) for r = 1 to k do
(3)
proc i sends msg to (i + 1) mod n and (i 1) mod n;
(4)
each proc i receives msg from (i + 1) mod n and (i 1) mod n;
(5)
compute app-specific function on received values.
Introduction
CUP 2008
19 / 36
Synchronous vs. Asynchronous Executions (1)
Sync vs async processors; Sync vs async primitives

Sync vs async executions
Async execution
I
I
I
No processor synchrony, no bound on drift rate of clocks

Message delays finite but unbounded
No bound on time for a step at a process
Sync execution
I
I
I
Processors are synchronized; clock drift rate bounded

Message delivery occurs in one logical step/round
Known upper bound on time to execute a step at a process
Introduction
CUP 2008
20 / 36
Synchronous vs. Asynchronous Executions (2)
Difficult to build a truly synchronous system; can simulate this abstraction

Virtual synchrony:
I
I
async execution, processes synchronize as per application requirement;

execute in rounds/steps
Emulations:
I
I
Async program on sync system: trivial (A is special case of S)

Sync program on async system: tool called synchronizer
Introduction
CUP 2008
21 / 36
System Emulations
Asynchronous
messagepassing (AMP)
MP>SM
A>S
S>A
SM>MP
Asynchronous
shared memory (ASM)
Synchronous
messagepassing (SMP)
MP>SM
A>S
S>A
SM>MP
Synchronous
shared memory (SSM)
Figure 1.11: Sync async, and shared memory msg-passing emulations

Assumption: failure-free system
System A emulated by system B:
I
I
If not solvable in B, not solvable in A

If solvable in A, solvable in B
Introduction
CUP 2008
22 / 36
Challenges: System Perspective (1)

Communication mechanisms: E.g., Remote Procedure Call (RPC), remote
object invocation (ROI), message-oriented vs. stream-oriented
communication
Processes: Code migration, process/thread management at clients and
servers, design of software and mobile agents
Naming: Easy to use identifiers needed to locate resources and processes
transparently and scalably
Synchronization
Data storage and access
I
Schemes for data storage, search, and lookup should be fast and scalable
across network
Revisit file system design
Consistency and replication

I
I
Replication for fast access, scalability, avoid bottlenecks

Require consistency management among replicas
Introduction
CUP 2008
23 / 36
Fault-tolerance: correct and efficient operation despite link, node, process

failures
Distributed systems security
I
Secure channels, access control, key management (key generation and key
distribution), authorization, secure group management
Scalability and modularity of algorithms, data, services

Some experimental systems: Globe, Globus, Grid
Introduction
CUP 2008
24 / 36
API for communications, services: ease of use

Transparency: hiding implementation policies from user
I
I
I
I
I
I
I
Access: hide differences in data rep across systems, provide uniform operations
to access resources
Location: locations of resources are transparent
Migration: relocate resources without renaming
Relocation: relocate resources as they are being accessed
Replication: hide replication from the users
Concurrency: mask the use of shared resources
Failure: reliable and fault-tolerant operation
Introduction
CUP 2008
25 / 36
Challenges: Algorithm/Design (1)
Useful execution models and frameworks: to reason with and design correct
distributed programs
I
I
I
I
Interleaving model
Partial order model
Input/Output automata
Temporal Logic of Actions
Dynamic distributed graph algorithms and routing algorithms

I
I
I
I
System topology: distributed graph, with only local neighborhood knowledge

Graph algorithms: building blocks for group communication, data
dissemination, object location
Algorithms need to deal with dynamically changing graphs
Algorithm efficiency: also impacts resource consumption, latency, traffic,
congestion
Introduction
CUP 2008
26 / 36
Time and global state

I
I
I
I
I
3D space, 1D time
Physical time (clock) accuracy
Logical time captures inter-process dependencies and tracks relative time
progression
Global state observation: inherent distributed nature of system
Concurrency measures: concurrency depends on program logic, execution
speeds within logical threads, communication speeds
Introduction
CUP 2008
27 / 36
Synchronization/coordination mechanisms
I
I
I
I
Physical clock synchronization: hardware drift needs correction

Leader election: select a distinguished process, due to inherent symmetry
Mutual exclusion: coordinate access to critical resources
Distributed deadlock detection and resolution: need to observe global state;
avoid duplicate detection, unnecessary aborts
Termination detection: global state of quiescence; no CPU processing and no
in-transit messages
Garbage collection: Reclaim objects no longer pointed to by any process
Introduction
CUP 2008
28 / 36
Group communication, multicast, and ordered message delivery

I
I
I
Group: processes sharing a context, collaborating

Multiple joins, leaves, fails
Concurrent sends: semantics of delivery order
Monitoring distributed events and predicates

I
I
Predicate: condition on global system state

Debugging, environmental sensing, industrial process control, analyzing event
streams
Distributed program design and verification tools

Debugging distributed programs
Introduction
CUP 2008
29 / 36
Data replication, consistency models, and caching

I
I
I
Fast, scalable access;

coordinate replica updates;
optimize replica placement
World Wide Web design: caching, searching, scheduling

I
I
I
I
Global scale distributed system; end-users

Read-intensive; prefetching over caching
Object search and navigation are resource-intensive
User-perceived latency
Introduction
CUP 2008
30 / 36
Distributed shared memory abstraction

I
Wait-free algorithm design: process completes execution, irrespective of

actions of other processes, i.e., n 1 fault-resilience
Mutual exclusion
F
Register constructions
F
F
Bakery algorithm, semaphores, based on atomic hardware primitives, fast

algorithms when contention-free access
Revisit assumptions about memory access
What behavior under concurrent unrestricted access to memory?
Foundation for future architectures, decoupled with technology (semiconductor,
biocomputing, quantum . . .)
Consistency models:
F
F
coherence versus access cost trade-off

Weaker models than strict consistency of uniprocessors
Introduction
CUP 2008
31 / 36

Reliable and fault-tolerant distributed systems
I
I
I
I
I
Consensus algorithms: processes reach agreement in spite of faults (under

various fault models)
Replication and replica management
Voting and quorum systems
Distributed databases, commit: ACID properties
Self-stabilizing systems: illegal system state changes to legal state;
requires built-in redundancy
Checkpointing and recovery algorithms: roll back and restart from earlier
saved state
Failure detectors:
F
Difficult to distinguish a slow process/message from a failed process/ never

sent message
algorithms that suspect a process as having failed and converge on a
determination of its up/down status
Introduction
CUP 2008
32 / 36
Load balancing: to reduce latency, increase throughput, dynamically. E.g.,

server farms
I
I
I
Computation migration: relocate processes to redistribute workload

Data migration: move data, based on access patterns
Distributed scheduling: across processors
Real-time scheduling: difficult without global view, network delays make task
harder
Performance modeling and analysis: Network latency to access resources
must be reduced
I
I
Metrics: theoretical measures for algorithms, practical measures for systems

Measurement methodologies and tools
Introduction
CUP 2008
33 / 36
Applications and Emerging Challenges (1)
Mobile systems
I
I
I
Wireless communication: unit disk model; broadcast medium (MAC), power

management etc.
CS perspective: routing, location management, channel allocation, localization
and position estimation, mobility management
Base station model (cellular model)
Ad-hoc network model (rich in distributed graph theory problems)
Sensor networks: Processor with electro-mechanical interface

Ubiquitous or pervasive computing
I
I
Processors embedded in and seamlessly pervading environment

Wireless sensor and actuator mechanisms; self-organizing; network-centric,
resource-constrained
E.g., intelligent home, smart workplace
Introduction
CUP 2008
34 / 36
Peer-to-peer computing
I
No hierarchy; symmetric role; self-organizing; efficient object storage and

lookup;scalable; dynamic reconfig
Publish/subscribe, content distribution

I
Filtering information to extract that of interest
Distributed agents
I
Processes that move and cooperate to perform specific tasks; coordination,

controlling mobility, software design and interfaces
Distributed data mining

I
I
Extract patterns/trends of interest

Data not available in a single repository
Introduction
CUP 2008
35 / 36
Grid computing
I
I
Grid of shared computing resources; use idle CPU cycles

Issues: scheduling, QOS guarantees, security of machines and jobs
Security
I
I
Confidentiality, authentication, availability in a distributed setting

Manage wireless, peer-to-peer, grid environments
F
Issues: e.g., Lack of trust, broadcast media, resource-constrained, lack of

structure
Introduction
CUP 2008
36 / 36

Chapter 1

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chapter 1

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 1: Introduction

Ajay Kshemkalyani and Mukesh Singhal

Cambridge University Press

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Autonomous processors communicating over a communication network

No common physical clock

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Distributed System Model

Figure 1.1: A distributed system connects processors by a communication network.

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Relation between Software Components

Network protocol stack

Figure 1.2: Interaction of the software components at each process.

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Motivation for Distributed System

Inherently distributed computation

availability, integrity, fault-tolerance

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Multiprocessor systems (direct access to shared memory, UMA model)

Interconnection network - bus, multi-stage sweitch

Multicomputer parallel systems (no direct access to shared memory, NUMA

bus, ring, mesh (w w/o wraparound), hypercube topologies

Array processors (colocated, tightly coupled, common system clock)

Niche market, e.g., DSP applications

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

UMA vs. NUMA Models

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Omega, Butterfly Interconnects

(a) 3stage Omega network (n=8, M=4)

(b) 3stage Butterfly network (n=8, M=4)

Figure 1.4: Interconnection networks for shared memory multiprocessor systems.

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

n processors, n memory banks

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Interconnection Topologies for Multiprocesors

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Figure 1.6: SIMD, MISD, and MIMD modes.

scientific applicaitons, applications on large arrays

MISD: Multiple Instruciton Stream Single Data Stream

MIMD: Multiple Instruction Stream Multiple Data Stream

distributed systems, vast majority of parallel systems

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Interdependency/binding among modules, whether hardware or software (e.g.,

Parallelism: T (1)/T (n).

Function of program and system

Measures productive CPU time vs. waiting for synchronization operations

Amt. of computation vs. amt. of communication

A. Kshemkalyani and M. Singhal (Distributed Computing)

Distributed Computing: Principles, Algorithms, and Systems

Message-passing vs. Shared Memory

Emulating MP over SM:

Partition shared address space

Emulating SM over MP: