Intro to communication networks

Intro to communication
9/8/09
Hypercubes
• Advantages:
– Best diameter and bisection width we have seen so far.
– Practically, these are very useful bounds and multiple systems
use a hypercube network topology.
• Disadvantages:
– Number of links per node is not constant
– p = 2d, 2d links per node = Θ(log p)
1
Bit labeling
• Because the number of processors along
any given dimension is 2, we can
represent a single coordinate by a single
bit: 0 or 1
• Each processor, therefore, has an

identifier of d bits.
– For example, (0,1,1) in a 8-processor
hypercube
Helpful facts
• Flipping any of the d bits returns a neighbor of a
given processor.
– Each flip corresponds to a link in the network.
• Because at most d bits can be flipped, a

processor has d neighbors.
• This also gives us a quick algorithm to check if

any two processors are neighbors.
– Hamming distance of 1
2
Communication paths
• First, find the number of bits that two
messages differ.
• Route the message to a neighbor, and

continue until it arrives.
• If the processor ids differ by k bits, there

are k! possible shortest paths possible.
– High connectivity = good
Our model of parallel computation

• As described previously, our model will be:
– p processors connected by an interconnection

network
– Each processor has its own local memory
– Accessing remote memory requires using the network
• We want processors to communicate

simultaneously
– Using MPI function calls for your assignments
– Further complicated because accessing remote
memory often requires a send/receive
3
Details
• We will assume that only one message is being
routed per processor.
• Before the message is sent, it must be prepared,

a destination and error correcting applied and
any other preprocessing.
• This is called setup time and we will use τ to

model this parameter.
More details
• The transmission of the message can be
performed at the rate of µ time per word.
• It follows that total communication time can be

modeled as:
– τ + µm, where m is the size of the message
• To be comparable to computation, we usually

measure this in floating point operations, or
flops.
4
Supporting simultaneous
communication
• A processor can send and receive a message at
any given time step.
• It is clear that a processor can not perform

multiple sends and receives at the same time.
• Therefore, we would like communication steps

such that no two destinations are the same:
– We will call this a permutation network
Example
• This permutation (2 3 1 0) implies:
– Processor 0 sends a message to 2 (0, 2)
• Note that any step where a processor is not

involved can be represented by communication
with itself, e.g., (0,0).
5
Multistage communication networks
• Multistage interconnection networks

(details next class and in text) are
topologically equivalent.
• It can be shown that these can achieve a

permutation in at most 3 steps.
Take home messages

• We make the theoretical assumption that
any permutation can be supported in
parallel (with some constant overhead).
• We will use permutation networks as our

interconnection network in this course.
– Simplifies algorithm design and analysis
6
In class example
• In small groups, what are some
permutations that can be achieved using a
ring network?
• What about a hypercube? Come up with

one example for a 3D cube of 8
processors.
Often used permutations

• The permutation network, by definition,
allows any communication where no two
destinations are the same in one step
• Even so, we usually only use the following:

– Shift permutations
– Hypercubic permutations
7
Shift permutations
• A shift permutation is one such that each
processor communicates with its neighbor
• Left shift:
– Processor i communicates with processor i -1
• Right shift:
– Processor i communicates with processor (i +
1) mod p
Hypercubic permutations
• Let d = ceiling (log p)
• Just as in a static hypercube, each processor’s

id can be viewed as a string of d bits
• Fix a bit position j. The communication where

processors that only differ at this bit is a
hypercubic permutation
• There are d hypercubic permutations
8
Another reason why
hypercubes are cool
• Obviously, these can be simultaneously

routed on any hypercube network
• This is better than many multistage

networks, some of which we will discuss
next class.
Tips for parallel algorithm analysis

• In any algorithm we design, we will
separate its computation and
communication costs.
• For example:
– Computation time = O(f(n,p))
– Communication time = O(τg(n,p) + µh(n,p))
9
A simple algorithm example
• Suppose we’d like to add n numbers,
where n >> p.
• For simplicity, lets assume p evenly

divides n.
• What is one algorithm for this?
An approach
• It is clear that computing the sum of n/p
integers on a single processor requires n/p
time.
• Adding p resulting integers can be

performed in log p steps.
• Computation time = O(n/p + log p)
10
Communication time
• Communication time = O(τ + µ)log p
• Spoiler: Can this be done on a

hypercube?
How do we know if this good?

• The sequential runtime of adding n
integers is Θ(n).
• Therefore, a parallel algorithm must take

at least Ω(n/p); ours is O(n/p + log p)
• As a result, we are efficient as long as:

– n > p log p (in class)
11
But why?
• Note τ and µ are constants, and we are
using O notation here, not Θ
• Why isn’t the algorithm?

– Max{f(n,p), g(n,p), h(n,p)}
Older values of communication

constants (relative to one flop)
Machine Τ µ τ/µ
IBM BG/L 3000 50-100 15-30
Intel Delta 4650 87 54
Intel Paragon 7800 9 867
Cray T3D 27000 9 3000
IBM SP1 28000 50 560
CM5 450 3 113
12
An example
• Suppose our algorithm achieves ideal speedup
• Further,
– f(n,p) = g(n,p) = h(n,p) = T(n, 1) / p
• If τ, however, is 1000, our speedup would be

p/1000
– This implies 1000 processors would be as fast as 1
Rules of thumb
• We would like f(n,p) to be as close to
O(T(n,1)/p) as possible
• Ideally, h(n,p) is a slower growing function

than f(n,p) and g(n,p) even slower
13

Intro to communication networks

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Intro to communication networks

Hochgeladen von

Copyright:

Verfügbare Formate

Intro to communication

• Each processor, therefore, has an

• Because at most d bits can be flipped, a

• This also gives us a quick algorithm to check if

• Route the message to a neighbor, and

• If the processor ids differ by k bits, there

Our model of parallel computation

– p processors connected by an interconnection

• We want processors to communicate

• Before the message is sent, it must be prepared,

• This is called setup time and we will use τ to

• It follows that total communication time can be

• To be comparable to computation, we usually

• It is clear that a processor can not perform

• Therefore, we would like communication steps

• Note that any step where a processor is not

• Multistage interconnection networks

• It can be shown that these can achieve a

Take home messages

• We will use permutation networks as our

• What about a hypercube? Come up with

Often used permutations

• Even so, we usually only use the following:

• Just as in a static hypercube, each processor’s

• Fix a bit position j. The communication where

• There are d hypercubic permutations

• Obviously, these can be simultaneously

• This is better than many multistage

Tips for parallel algorithm analysis

• For simplicity, lets assume p evenly

• What is one algorithm for this?

• Adding p resulting integers can be

• Computation time = O(n/p + log p)

• Spoiler: Can this be done on a

How do we know if this good?

• Therefore, a parallel algorithm must take

• As a result, we are efficient as long as:

• Why isn’t the algorithm?

Older values of communication

IBM BG/L 3000 50-100 15-30

Intel Delta 4650 87 54

Intel Paragon 7800 9 867

Cray T3D 27000 9 3000

IBM SP1 28000 50 560

CM5 450 3 113

• If τ, however, is 1000, our speedup would be

• Ideally, h(n,p) is a slower growing function

Das könnte Ihnen auch gefallen