Beruflich Dokumente
Kultur Dokumente
Optimal Communication
D. P. BERTSEKAS,
C. OZVEREN,G. D. STAMOULIS,
P. TSENG,ANDJ.N. TSITSIKLISt
Laboratoryfor Information and Decision Systems,Room 35-210, MassachusettsInstitute o/Technology, Cambridge,Massachusetts02139
263
0743-7315/91
$3.00
Copyright@ \99\ by Academic Press.Inc.
All rights of reproduction iin any form reserved.
7101
~
d22d-'
264
BERTSEKAS ET AL.
110
111
ro;;
001
000
1010//~""
1110
0110
0010~
-~1oij:::7I'1"
'-- ~1 V
0000
I~
~.
~~~~~~~~~~~
~1
IXXI 1100
0101
1101
1XX11
1001
FIG. 1. Construction of a 3-cubeand a 4-cube by connecting the correspondingnodesof two identicallower-dimensional cubes.A node belongs
to the first lower-dimensional cube or the seconddepending on whetherits
identity has a leading0 or a leading I.
Numberof
transmissions
r~l
d2d-
r~l
2d(2d_l)
for transmissionon
265
i=
266
BERTSEKAS ET AL
= {(OO- --O)} and Sj C {(OO- --O)} U (U~-:'\Ek). Furthermore, everynonzero node identity must belongto some
Ej. The set of all nodes together with the set of links
(Uf= I Aj) must form a subgraphthat contains a spanningtree
(see Fig. 2a); in fact, to minimize the number of packet
transmissions, the sets of links AI, A2, ..., Aq should be
2. MUL TINODE BROADCAST UNDER
disjoint and should form a spanningtree.
THE MLA ASSUMPTION
Consider now a d-bit string t representingthe identity
number of some node on the d-cube. For any node identity
We first note that in a multinode broadcast each node z, we denote by t e z the d-bit string obtained by performing
must receive a total of 2 d -1 packets over its d incident
modulo 2 addition of the jth bit of t and z for eachj = 1,2,
links, so r (2 d -1)/ dl is a lower bound for the time required
..., d. It can be seenti\at an algorithm for broadcastinga
by any multinode broadcastalgorithm under the MLA as- packet from the node with identity t can be specified by the
sumption. We obtain an algorithm that attains this lower sets
bound.
As a first steptoward constructing such an algorithm, we
Aj(t) = {(t0x,t0y)l(x,y)EAj},
,2,
,q,
represent any single node broadcast algorithm from node
(00. ..0) to all other nodes that takes q time units by a
sequenceof setsof directed links A I , Az, ..., Aq. Each Ai is where Ai(t) denotes the set of links on which transmission
the set of links on which transmission of the packet begins of the packet begins at time i-I
and ends at time i. The
at time i-I
and ends at time i. Naturally, the setsAi must proof of this is based on the fact that t e x and t e y differ
satisfy certain consistency requirements for accomplishing in a particular bit if and only if x and y differ in the same
the single node broadcast. In particular, if Si and Ei are the bit, so (t e x, t e y) is a link if and only if(x, y) is a link.
sets of identity numbers of the start nodes and end Figure 2 illustratesthe setsAi(t) correspondingto all possible
nodes, respectively, of the links in Ai, we must have S. t for the casewhere d = 3.
that the emergingstandard for high-speedcommunications,
the Asynchronous Transfer Mode (see,e.g., [13]), is based
on fixed packetlengthsas well as minimal packet processing
at the nodes, which favors neither splitting nor combining
packets.
A,
A2
AJ
(at
A,IOO1!AVOOll A3(OOll
?-.~~ ~O
001
101
001
011
010
110
100
1OO~l~:~~~~~~.
~ 1
'"~::~~~; v
'-~'~~~~~~~--~
~~l
111
010
001
100
000
010
110
010
(XX)
(b)
FIG. 2. Generation of a multinode broadcast algorithm for the d-cube, starting from a single node broadcastalgorithm. (a) The algorithm tha~
broadcastsa packet from the node with identity (00. ..0) to all other nodes is specified by a sequenceof setsof directed links AI , A2, ., ., Aq. Each Aj
is the set of links on which transmissionbegins at time i-I and ends at time i. (b) A corresponding broadcastalgorithm for eachroot node identity t is
specifiedby the setsof links Aj(t) = {(I e x, t e y) I (x, y) EAj}, where we denote by t e z the d-bit string obtained by performing modulo 2 addition
of the jth bit of t and z for j = 1, 2, ..., d. The multinode broadcastalgorithm is specifiedby the requirement that transmissionof the packet of node t
starts on each link in A;(t) at time i-I.
The figure showsthe construction for an examplewhere d = 3. Here the setA2 has two links of the same type
and the multinode broadcastcannot beexecutedin threetime units. However, if the link (000, 0 10)belongedto AI instead of A2, the requiredtime would
bethe optimal three time units.
...Rkl
...Rknk
.R(d-2)
I ...R(d-2)nd-2R(d-l)
I(
...1)
+ [(n(t)
-l)(mod
d)].
~
0)1.
(OO...O)R11R21
R11R21R22o
0 OR(d-l)1
267
is 1,2, ..., d, 1,2, ..., d, 1,2, ...(cf. Fig. 3 for the case
d = 4). We specifythe order of node identities within each
setRknas follows: the first element t in eachsetRknis chosen
so that the relation
the bit in position m(t) from the right is a
1)
n(t) ~ id},
define
EO={(OO""_,Eq = {t I (q -l)d
~ n(t) ~ 2d -
268
BERTSEKAS ET AL.
101
001
B
(al
No
r-"-"
N,
~"
N2
.,,'
N3
,~
N.
(KX)OI ((XXI1) (00101 (01001 (1(XX11(0011) (0110) (11001 (10011 (0101) (1010) (11011 (10111 (01111 (1110) (1111)
',
.'
R"
m((XXI1}=1
,
m(1001)=4
(~)..
m(OOIII=1
I m(101It=4
--0
.'
R31
m(IIIII=3
,0
m(OI0It=1
m(II00I-3
(::=~
A2
m(Olllt=1
-:""~
~::~~
m(1~
"
R22
m(11011=J
~
m(OI10)a2
A,
'
R21
m(10101s2
m(1110)=2
-~
A.
AJ
R"
N2
(00011) (00110) (011001 (11000j (10001) (001011 (010101 (101001 (010011 (10010)
R22
R21
N3
,
(00111) (011101 (111001 (110011 (10011)
R.,
~1
10001
01$X)1
11$X>1
01)01
11101
00911
1~10
1~11
11910
11911
~
~o
1<XXXJ
~o.-~~~
01100
11000
o-
01010
10100
01110
01011
11100
10110
11
01111
<>
11110
11111
(cl
FIG. 3.
~~~==~-6~~~~:
COMMUNICATION
269
With this rule, s startstransmitting its last packetto the subtree T j no later than time Nj -I, where Nj denotesthe number of nodes in Tj, and all nodes in Tj receive their packet
no later than time Nj. (To seethe latter, note that all packets
destined for the nodes in T j that are k links away from s are
sent no later than time Nj -k, and each of these packets
completesits journey in exactly k time units.) Therefore, all
packets are received at their respective destinations in
max {Nt, N2,
N,} time units. Hence, the above algorithm attains the optimal time if and only if T has the property that s has d neighbors in T and that eachsubtree T j, i
= I,..., d, contains at mostr(2d -I )/dl nodes.If Tis in
addition a shortestpath tree from s, then eachpackettravels
along the shortestpath to its destination and this algorithm
also attains the optimal number of packettransmissions.
3. SINGLE NODE SCAlTER UNDER THE
We assumewithout loss of generality that s = (00. ..0)
MLA ASSUMPTION
in what follows. To construct a spanning tree T with the
Considerthe d -cubeand the problem of singlenode scatter above two properties, let us considerthe equivalenceclasses
with root node s. Since 2d -I different packets must be Rknintroduced in Section2 in connectionwith the multinode
transmitted by the root node over its d incident links, any broadcastproblem. As in Section 2, we order the classesas
algorithm solving theseproblems requires at leastr(2d -1)/
dl time units under the MLA assumption.This time can be (00
RZn2 ...Rkl
...Rknk
O)R11R21'
achievedby modifying the correspondingoptimal multinode
.R(d-2)nd-2R(d-I)I(II..
.1
broadcastalgorithm of the previoussection,therebyjustifying
...R(d-2)
the entries of Table 1 for single node scatter.
The modified multinode broadcastalgorithm is not, how- and we considerthe numbersn (t) and m (t) for eachidentity
ever, optimal for the scatter problem with respectto the t, but for the moment, we leavethe choice of the first element
number of packet transmissions. To see this, note that a in eachclassRknunspecified.We denote by mknthe number
packetdestined for somenode must travel a number of links m( t) of the first elementt of Rknand we note that this number
at least equal to the Hamming distance betweenthat node dependsonly on Rknand not on the choice of the first element
and the root. Therefore, a lower bound for the optimal num- within Rkn.
We say that classR(k-l)n' is compatible with classRkn if
ber of packet transmissions is the sum of the Hamming
distances of all nodes to the root. There are (t) = d!/ R(k-l)n' has d elements (node identities) and there exist
(k!(d -k)!) nodes that are at distance k from the root, so identities t' E R(k-l)n' and t E Rkn such that t' is obtained
from t by changing some unity bit of t to a O. Since the
this bound is
elementsof R(k-l)n' and Rknare obtained by left shifting the
bits of t' and t, respectively,it is seenthat for every element
(2) x' of R(k-l)n' there is an element x of Rkn such that x' is
obtained from x by changingone of its unity bits to a O.The
reverseis also true, namely that for every element x of Rkn
The lower bound ofEq. (2) is much smaller than the 2d(2d
there is an elementx' of R(k-l)n' suchthat x is obtained from
-I) packettransmissionsrequiredby a multinode broadcast.
x' by changingone of its zero bits to unity.
While it is possibleto extract from the optimal multinode
An important fact for the subsequentspanningtree conbroadcast algorithm a single node scatter algorithm which
struction is that for everyclassRknwith 2 ~ k ~ d -1, there
attains the lower bound of Eq. (2), such an algorithm is
existsa compatible classR(k-l )n'. Sucha classcan beobtained
quite complex to visualize and to implement. The following
as follows: Take any identity t E Rknwhose rightmost bit is
alternative algorithm is much simpler.
a 1 and leftmost bit is a O. Let (1be a string of consecutive
For any spanningtree Tofthe d-cube, let,be the number
Oswith maximal number of bits and let t' be the identity
of neighbor nodes of the root node s in T, and let T i be the
obtained from t by changingto 0 the unity bit immediately
subtree of T rooted at the ith neighbor of s. Consider the to the right of (1. [For example, if t = (0010011), then t'
following rule for s to send packetsto eachsubtree T i:
= (00 10001)or t' = (0000011), and if t = (0010001), then
t'
= (00 10000).]Then the equivalenceclassoft' is compatible
Continuously send packetsto distinct nodesin the subwith Rkn,becauseit has delements [t' * (00- --0) and t'
tree (using only links in T), giving priority to nodes
contains a unique substringof consecutiveOswith maximal
furthest away from s (break ties arbitrarily).
270
ET AL.
(3)
Since eachnode has d incident links, at most d2d transmissions may take place at each time unit. Therefore, if T d is
the execution time of a total exchangealgorithm in the dcube, we have
Td~ 2d-l.
(4)
The construction of T is such that each node x
* (00' ..0) is in the subtree Tm(x).Since there are at most For an algorithm to achievethis lower bound, it is necessary
r(2d -1)/ dl nodesx having the samevalue of m(x), each that packets follow shortestpaths and that all links are busy
subtreecontains at most r ( 2 d -1) / dl nodes.Furthermore, (in both directions) during all of the 2d-1 time units. In what
the number of links on the path of T connecting any node follows, we present an algorithm for which Td = 2d-l. In
and (00. ..0) is the corresponding Hamming distance. light of the above, this algorithm is optimal with respectto
Hence, T is also a shortest path tree from (00. ..0), as both the time and the number of packettransmissionscriteria
desired.
and achieves 100%link utilization.
Note that for the spanning tree constructed above, each
We construct the algorithm recursively. We assumethat
of the subtreesTj, contains eitherl(2d -1)/ dJorr(2d -1)/
we have an optimal algorithm for total exchangein the dcube with certain properties to be statedshortly, and we use
this algorithm to perform an optimal total exchangein the
001
01I
III
mIll-!
(d + I )-cube. The construction is as follows: we decompose
QO
~~~~~IO
110
the
(d + I )-cube into two d-cubes, denoted C1 and C2 (cf.
000
m('1-2
the construction of Fig. I). Without loss of generality we
100
101
assumethat C1contains nodes0,. .., 2d -I, and that their
m(II-3
counterparts in C2are nodes 2d, ..., 2d+1-I, respectively.
The total exchangealgorithm for the (d + I )-cube consists
of three phases.In the first phase,there is a total exchange
m(tl-1
(using the optimal algorithm for the d -cube) within eachof
the cubes C1 and C2 (each node in C1 and C2 exchangesits
m("-2
packetswith the other nodes in C1and C2,respectively). In
the secondphase,eachnode transmitsto its counterpart node
m(tJ-3
in the opposite d-cube all of the 2dpackets that are destined
~~_.:-.~~--~,
"""...mltl-4
for the nodes of the opposite d-cube. In the third phase,
FIG. 4. Spanningtreeconstruction
foroptimalsinglenodescatterunder there is an optimal total exchangein eachof the two d-cubes
the MLA assumptionfor d = 3 andd = 4.
of the packetsreceived in phase2 (seeFig. 5). Phase3 must
IBERTSEKAS
~
FIG. 5. Recursive construction of a total exchangealgorithm for the dcube. Let the (d + 1 )-cube bedecomposed into two d-cubes denoted C.
and C2. The algorithm consistsof three phases.In the first phase,there is a
total exchangewithin each of the cubesC. and C2. In the second phase,
each node transmits to its counterpart node in the opposite d-cube all of
the 2 d packetsthat are destined for the nodes of the opposite d-cube. In the
third phase,there is a total exchange in each of the two d-cubes of the
packetsreceived in phase2.
271
2d - 1
""
- 1
vn -,...,
be carried out after phase I becauseduring phase I all the
links of the cubes C1 and C2 are continuously busy (since
the d-cube total exchange algorithm is assumedoptimal).
On the other hand, phase 2 may take place simultaneously
with both phase 1 and phase 3. In an algorithm presented
in [I], phase 3 starts after the end of phase2, resulting in an
execution time of 2 d -1 units. Here, we improve on this
(5)
2 d-t. ,1-,- 0
(6)
, 2d-
1.
272
BERTSEKAS ET AL.
+ n, Vn = 2d-1 +
Time
Link 1
0001
Time
Link 1
Link 2
0001
0011
, 2d.
0010
Vn =
,2d.
Since the choice of i was arbitrary, this implies that the inductive hypothesis (6) holds for the (d + I)-cube.
Implementation ofthe Optimal Algorithm
In what follows, we presentthe rules used by the nodes
of the d-cube for transmitting their own packetsand forwarding the packetsthey receivefrom other nodes,whenever
they require further transmission. We write the identity of
node i as (id, ..., il), where each ik, k = I, ..., d, is a 0
or a 1. Moreover, we denote by ekthe identity of node 2k-1 ,
for k = 1, ..., d. The link betweeni and i e ekis called the
kth link incident to i. Finally 4 denotesthe reverseof bit ik;
namely 4 = (ik + 1) mod 2.
We first describe the order in which an arbitrary node i
transmits its own packets. It can be seenthat during time
units 1, ..., 2k-l, node i transmits all its packetsdestined
for nodes
(id,...,ik+t,4,Xk-t,...,Xt),
where Xm= 0 or I, for m =
Nd+J(i,n)~2d+n-
,k-
Time
Link 1
0001
Link 2 Link 3
0011
0101
0010
0111
OliO
0100
Time
Link 1
0001
Link 2 Link 3
Link 4
0011
0101
1001
0010
0111
1011
0110
1101
0100
1010
1111
1110
1100
1000FIG.
5.
~=
Max {mlxm=lm}
l~m~k-1
The rules presentedabove follow from the recursivecon- Let m be the smallestpositive integer for which
struction of the algorithm. Our earlier analysis guarantees
2m=
(A3)
(mod d).
that packetsare always in time at the intermediate nodes(if
any) of the paths theyhaveto traverse.Note that the traveling
We claim that m < d. To seethis, note that the numbers
schedule of each packet may be locally determined at the
intermediate nodes of its path by examining the packet's
origin and destination,so packetsdo not haveto carry timing
information.
belong to {I, ..., d -I}. Since there are d such numbers,
some of them must repeat; i.e., for some integers rand s
CONCLUSIONS
with 1 ~ r < s ~ d,
2
mod
d ,
22
mod
"
...2d
mod
(mod d).
obtain
22m=2m=
(mod d).
we have
274
BERTSEKAS ET AL.
23m = 2 m =
(mod d),
(mod d).
(mod d).
REFERENCES
I. Bertsekas,D. P., and Tsitsiklis,J. N. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, EnglewoodOilfs, NJ, 1989.
2. Bhatt, S. N., and Ipsen, I. C. F. How to embedtrees in hypercubes.
Res. Rep. Y ALEU /DCS/RR-443, Department of Computer Science,
Yale Univel1!ity, 1985.
3. Dally, W. J., and Seitz,C. L. Deadlock-freemessagerouting in multiprocessorinterconnectionnetworks.IEEE Trans.Comput.C-36 (1987),
547-553.
4. Dekel, E., Nassimi, D., and Sahni, S. Parallel matrix and graph algorithms. SIAM J. Comput. 10 (1981),657-673.
5. Greenberg,A. G., and Hajek, B. Deflection routing in hypercubenetworks. Unpublished report, 1989.
6. Hedetniemi, S. M., Hedetniemi, S. T., and Liestman, A. L. A survey
of gossipingand broadcasting in communication networks. Networks
18 (1988),319-349.
7. Johnsson,S. L., and Ho, C. T. Optimum broadcastingand personalized
communication in hypercubes.IEEE Trans. Comput.38 (1989), 12491268.
8. Johnsson,S. L. Cyclic reduction on a binary tree.Comput. Phys.Comm.
37(1985),195-203.
9. Johnsson,S. L. Communication efficient basic linear algebra computationson hypercubearchitectures.J. Para/leiDistrib. Comput.4 ( 1987),
133-172.
10. Krumme, D. W., Venkataraman, K. N., and Cybenko, G. The token
exchangeproblem. Tech. Rep. 88-2, Tufts Univel1!ity, 1988.
11. Kermani, P., and Kleinrock, L. Virtual cut-through: A new computer
communicating switchingtechnique. Comput.Networks3 (1979), 267286.
12. McBryan, O. A., and Van de Velde, E. F. Hypercubealgorithms and
their implementations. SIAM J. Sci. Stat. Comput.8 (1987),227-287.
13. Minzer, S. E. Broadband ISDNand asynchronoustransfermode(ATM).
IEEEComm. Mag. (Sept. 1989),17-57.
14. Ozveren,C. Communication aspectsof parallelprocessing.Rep. LIDSP-1721, Laboratory for Information and Decision Systems,MIT, Cambridge, MA, 1987.
15. Saad,Y., and Schultz,M. H. Data communication in hypercubes.Res.
Rep. YALEU/DCS/RR-428, Yale Univel1!ity,Oct. 1985(revisedAug.
1987).
16. Saad Y., and Schultz,M. H. Data communication in parallel architectures. Yale University Report, Mar. 1986.
17. Saad Y., and Schultz,M. H. Paralleldirect methods for solvingbanded
linear systems.Linear Algebra Appl. 88/89 (1987),623-650.
18. SaadY., and Schultz,M. H. Topological propertiesof hypercubes.IEEE
Trans. Comput. 37(1988), 867-872.
19. Saad,Y ~Communication complexity of the Gaussianelimination algorithm on multiprocessors.Linear AlgebraAppl. 77 (1986),315-340.20.
Stamoulis, G. D., and Tsitsiklis, J. N. Efficient routing schemesfor
multiple broadcastsin hypercubes.Proc. 29th IEEEConf Decision and
.Control. Honolulu, H:awaii, 1990,pp. 1349-1354.
21. Stout,Q. F., and Wagar,B. Passingmessages
in link-bound hypercubes.
Proc. 1986 HypercubeConference.SIAM, Philadelphia, 1987,pp. 251-
257.
22. Topkis, D. M. Concurrent broadcast for information dissemination.
IEEE Trans. SoftwareEngrg. 13 (1983),207-231.
23. Varvarigos,E. A., and Bertsekas,D. P. Communication algorithms for
isotropic tasks in hypercubesand wraparound meshes.Rep. LIDS-P1972, Laboratory for Information and Decision Systems,MIT, Cambridge, MA,.Mar. 1990;submitted for publication.
24. Varvarigos,E. A. Optimal communicationalgorithms for multiprocessor
computers. M.S. thesis, Department of Electrical Engineering, MIT,
Cambridge, MA; Rep. CICS-TH-192, Center for Intelligent Control
Systems,MIT, Cambridge, MA, Jan. 1990.
275