Sie sind auf Seite 1von 14

CS 498 : Theory of Parallel Computing

Marc Snir

Lecture 13 Oct 7th

1
Switching Networks, Permutations and FFT Graphs

� A switching network (SN) is a DAG


with N inputs and N outputs. Each
node is indegree=outdegree=d . The
node is a switch that can connect each straight crossed

input to each output (in an arbitrary 0 0

1-to-1 connection). 1 1

� Figure shows SN with 2x2 switches ;


each has 2 possible states (straight and 2 2
crossed) 3 3

� SN is configured to connect
0 → 1, 1 → 2, 2 → 3, 3 → 2.
� SN is permutation network if it can
connect any permutation

2
Benes Network

N/2xN/2
Benes
� A network with N inputs/outputs Network

(N power of 2) is build
recursively from two lines of 2x2
switches and 2 N/2 Benes
networks
� Each switch connects to top N/2xN/2
Benes
and bottom subnetworks. Network

3
Benes Networks Permute

� Construct bipartite graph : nodes are the left and right row of
switches. Two nodes uv are connected if an input at switch u
has to reach an output at switch v .
� Color graph with two colors – this is alkways possible (all
cycles have even length)
� Route connectionss of one color thru top half subnetwork and
edges of the other color to bottom half subnetwork
� Each switch routes one connection to top subnetwork and one
connection to bottom subnetwork
� Use same approach recursively to set up each subnetwork.

4
Example
� Permute : 0->4, 1->2, 2->0, 3->7, 4->5, 5->3, 6->1, 7->6
� Coloring : 0->4, 1->2, 2->0, 3->7, 4->5, 5->3, 6->1, 7->6

Coloring Network Setup

10 01
0 0
1 1

23 23 2 2
3 3

4 4
45 54 5 5

6 6
7 7

67 67

5
FFT Network

The network with the topology of an FFT computation : N = 2n


inputs and outputs, n = lg N stages, N/2 nodes at each stage.
Nodes at stage i are connected to inputs that differ in bit i of
binary representation.
000

001

010

011

100

101

110

111

6
Homework 1

� Prove : A Benes network consists of a reverse FFT network


followed by an FFT network (middle stage is common)

7
Homework 2

� Prove : an FFT SN, with all switches straight, performs the bit
reversal permutation (an . . . a1 → a1 . . . an )

000
001

010
011

100
101

110
111

8
Homework 3

� Prove : A reverse FFT graph is isomorphic to a regular FFT


graph ; the isomorphism maps output a1 . . . an of the reverse
FFT to input an . . . a1 of the regular FFT graph

9
Conclusion

� A SN that consists of the concatenation of 3 FFT graphs is a


permutation network
� The middle FFT graph is set with all switches straight. The
combination is isomporphic to a Benes network (if we glue
outputs of the first FFT to the inputs of the 3rd FFT they are
connected to).
� An algorithm that evaluates the FFT graph takes at least
Ω( P max(BNlglgM,lg(N/P))
N
) steps
� The algorithms can be modifed to evaluate (pebble) 3 FFT
graphs in a row. The same communication steps can be used
to emulation a Benes network and, hence, to perform any
permutation.

10
Transpose

� Specific, hard√permutation
√ : √
transpose of N × N matrix. N B
� Assume N = 2n ,n even.
Transpose is B
Xan ...a1 ↔ Xan/2 ...a1 an ...an/2+1
(rotate binary address). √
N
� Algorithm, for B 2 < M :
Processors read B × B
submatrices, transpose them in
memory and store back
� T = N/PB

11
Cont.

� B 2 < M. Assume M = 2m , B = 2b .
� Basic operation : read aligned 2m−b block of 2b words,
permute in cache, write back. Can permute
Xan ...a1 ↔ Xan ...an/2+m−b+1 am−b a1 an/2 ...am−b+1 am−b+1 an/2+m−b ...an/2+1
(permute M/B × B submatrices) in one pass – time N/BP.
� Problem essentially solved when each block (of size B)
contains the right set of elements – can then move blocks to
right place inb one pass.
� Can complete transposition in b/2(m − b) passes
� �
N lg B
T =O ·
PB lg(M/B)

12
Lower Bound

� Need to “gather” words from distinct lines into each line. Show
this cannot be done too fast.
� “Step” : One I/O operation done by one processor.
� t is the number of words in line i that have to go to line j at
xi,j
end of step t (0 < i, j ≤ N/B).
� t is the number of words in cache i that have to go to line j
yi,j
at end of step t (0 < i ≤ P, 0 < j ≤ N/B).
� Φt = ∑i,j xi,,j
t lg x t +
i,j ∑i,j yi,j lg yi,j (entropy-like function)
� Initially, xi,j = 0 or xi,j = 1, yi,j = 0, so Φ0 = 0
� T = B, x T = 0 if i �= j and y T = 0, so
Finally, xi,i i,j i,j
T
Φ = (N/B) · (B lg B) = N lg B .

13
I/O
� Easy to check that write does not increase Φ
� Read by processor i from line k at step t + 1 : Let yj = yi,j t and
t
xj = xk,i . ∑ yj = M − B and ∑ xJ = B. The change in potential
is
∇Φ = ∑ ((yj + xj ) lg(yj + xj ) − yj lg yj − xj lg xj )
j

� ∇Φ is maximized when x1 = · · · = xN/B = B/(N/B) and


y1 = · · · = yn/N = (M − B)/(N/B). Thus
� �
N M M M −B M −B B B
∇Φ ≤ lg − lg − lg = MH(B/M)
B N/B N/B N/B N/B N/B N/B
� (H(α) = −α lg α − (1 − α) lg(1 − α). For α < 0.5
H(α) = O(−α lg α), so that ∇Φ = O(B lg(M/B). It follows
that the number of I/O steps is Ω((N lg B)/(B lg(M/B)) and
� �
N lg B
T =Ω
PB lg(M/B)
14