A Survey of Parallel Algorithms in Numerical Linear Algebra

Carnegie Mellon University
Research Showcase
Computer Science Department School of Computer Science
1-1-1976
A survey of parallel algorithms in numerical linear
algebra
Don E. Heller
Carnegie Mellon University
Follow this and additional works at: htp://repository.cmu.edu/compsci
Tis Technical Report is brought to you for free and open access by the School of Computer Science at Research Showcase. It has been accepted for
inclusion in Computer Science Department by an authorized administrator of Research Showcase. For more information, please contact research-
showcase@andrew.cmu.edu.
Recommended Citation
Heller, Don E., "A survey of parallel algorithms in numerical linear algebra" (1976). Computer Science Department. Paper 1614.
htp://repository.cmu.edu/compsci/1614
NOTI CE WARNI NG CONCERNI NG COPYRI GHT RESTRI CTI ONS:
The copyright law of the United States (title 17, U. S. Code) governs the making
of photocopies or other reproductions of copyrighted material. Any copying of this
document without permission of its author may be prohibited by law.
A SURVEY OF PARALLEL ALGORI THMS
I N NUMERI CAL LI NEAR ALGEBRA
Don Hel l er
Febr uar y 1976
Depar t ment of Comput er Sci ence
Car negi e- Mel i on Uni ver si t y
Pi t t sbur gh, Pa. 15213
Thi s wor k has been suppor t ed by t he Nat i onal Sci ence Foundat i on
under Gr ant MCS75- 222- 55 and t he Of f i ce of Naval Resear ch under
Cont r act N00014- 76- C- 0370, NR 044- 422.
TABLE OF CONTENTS
Page
1. I nt r oduct i on 1
2. Par al l el and Pi pel i ne Comput er s 7
a. Par al l el Comput er s 7
b. Pi pel i ne Comput er s 12
c. Consequences of t he Model s 16
3. Basi c Al gor i t hms 22
a. Gener al Expr essi ons and Recur r ences 22
b. I nner Pr oduct s and Rel at ed Comput at i ons 25
28
30
c. The Fast Four i er Tr ansf or m
2 8
4
#
Li near Syst ems
on
a. Gener al Dense Mat r i ces
J U
38
b. Tr i angul ar Syst ems
c. Tr i di agonal Syst ems 43
d. Bl ock Tr i di agonal and Band Syst ems 49
e
#
Syst ems Ar i si ng f r omDi f f er ent i al Equat i ons . . . 52
f . Gener al Spar se Mat r i ces 60
5, Ei genval ues . 63
Acknowl edgment s 69
Ref er ences .
#
69
i
ABSTRACT
new
The exi st ence of par al l el and pi pel i ne comput er s has i nspi r ed a
appr oach t o al gor i t hmi c anal ysi s. Cl assi cal numer i cal met hods ar e gener al l y
unabl e t o expl oi t mul t i pl e pr ocessor s and power f ul vect or - or i ent ed har dwar e.
Ef f i ci ent par al l el al gor i t hms can be cr eat ed by r ef or mul at i ng f ami l i ar al gor -
i t hms or by di scover i ng new ones, and t he r esul t s ar e of t en sur pr i si ng. A
compr ehensi ve sur vey of par al l el t echni ques f or pr obl ems i n l i near al gebr a
i s gi ven. Speci f i c t opi cs i ncl ude: r el evant comput er model s and t hei r
consequences, eval uat i on of ubi qui t ous ar i t hmet i c expr essi ons, sol ut i on of
l i near syst ems of equat i ons, and comput at i on of ei genval ues.
Wat son:
! f
You have f or med a t heor y, t hen?
1
'
Hol mes:
f l
At l east I have got a gr i p of t he essent i al f act s of t he
case. I shal l enumer at e t hem t o you, f or not hi ng cl ear s
up a case so much as st at i ng i t t o anot her per son, and I
can har dl y expect your cooper at i on i f I do not show you
t he posi t i on f r omwhi ch we st ar t .
11
- - Si r Ar t hur Conan Doyl e, Si l ver Bl aze
1. I NTRODUCTI ON
Numer i cal al gor i t hms can be gener al l y cl assi f i ed i n var i ous ways, such
as al gebr ai c vs. anal yt i c, f i ni t e vs. i nf i ni t e, exact vs. appr oxi mat e. Wi t h-
i n r ecent year s a new cl assi f i cat i on has become i mpor t ant : sequent i al vs.
par al l el , br ought about by t he devel opment of par al l el and pi pel i ne comput er s,
These devi ces al l ow concur r ent ar i t hmet i c pr ocessi ng, can easi l y handl e l ar ge
vol umes of i nf or mat i on, and of t en pr ovi de har dwar e f aci l i t i es f or many i n-
her ent l y par al l el oper at i ons f ound i n numer i cal l i near al gebr a.
I n pr evi ous sur veys of numer i cal par al l el al gor i t hms, Mi r anker [ 71]
descr i bed ear l y wor k i n sever al ar eas, and Pool e and Voi gt [ 74] pr epar ed a
gener al annot at ed bi bl i ogr aphy. Our pr esent i nt ent i on i s t o pr ovi de a mor e
compl et e and up- t o- dat e di scussi on of par al l el met hods f or l i near syst ems of
equat i ons and ei genval ue pr obl ems, al ong wi t h backgr ound i nf or mat i on concer n-
i ng t he comput er model s and f undament al t echni ques. Or i gi nal mat er i al has
been i ncl uded t o cr eat e a uni f i ed t r eat ment . We consi der al gor i t hms usi ng
bot h exact and f i ni t e- pr eci si on ar i t hmet i c, and t he i nher ent compl exi t y of
comput at i ons. As wi l l be seen, good al gor i t hms f or par al l el and pi pel i ne
comput er s may be st r i ki ngl y di f f er ent f r om good al gor i t hms f or or di nar y com-
put er s.
- 2-
The i dea behi nd par al l el comput er s i s t hat pr ogr ams usi ng P " cent r al "
pr ocessor s shoul d r un P t i mes f ast er t han ot her wi se i dent i cal pr ogr ams usi ng
onl y one pr ocessor , al t hough exper i ence and t heor y show t hat t he act ual
speedup i s of t en much smal l er . Exampl es of l ar ge par al l el comput er s ar e
t he Car negi e- Mel i on Uni ver si t y C. mmp, const r uct ed f r omup t o 16 asynchr onous
mi ni comput er pr ocessor s ( Wul f and Bel l [ 72] ) , and t he Uni ver si t y of I l l i noi s
I l l i ac I V, wi t h 64 synchr onous pr ocessi ng el ement s under t he di r ect i on of a
si ngl e cont r ol uni t ( Bar nes et al . [ 68] , Boukni ght et al . [ 72] ) . The I l l i ac I V
was bui l t by Bur r oughs Cor por at i on and i s now l ocat ed at NASA Ames Resear ch
Cent er . The Goodyear Aer ospace STARAN, wi t h an associ at i ve memor y and up
t o 32 pr ocessor s t hat can per f or m ser i al - by- bi t oper at i ons on 256 wor ds
si mul t aneousl y, i s a somewhat di f f er ent appr oach t o hi gh- speed comput at i on
( Rudol ph [ 72] ) , and wi l l not be consi der ed her e.
The i dea behi nd pi pel i ne comput er s i s essent i al l y t hat of an assembl y
l i ne: i f t he same ar i t hmet i c oper at i on i s goi ng t o be r epeat ed many t i mes,
t hr oughput can be gr eat l y i ncr eased by di vi di ng t he oper at i on i nt o a sequence
of sub- t asks and mai nt ai ni ng a f l ow of oper and pai r s i n var i ous st at es of
compl et i on. Thi s pr ocess i s especi al l y sui t ed t o vect or comput at i ons, and
vect or s ar e t aken as f undament al dat a t ypes. Thus pi pel i ne comput er s ar e
al so known as vect or comput er s. Avai l abl e pi pel i ne comput er s ar e t he Cont r ol
Dat a Cor por at i on STAR- 100, wi t h t wo pi pes ( Hi nt z and Tat e [ 72] ) , t he Texas
I nst r ument s Advanced Sci ent i f i c Comput er , wi t h up t o f our pi pes ( Wat son [ 72] ) ,
and t he r ecent l y announced CRAY- 1 ( Cr ay [ 75] ) . The I BM 2938 Ar r ay Pr ocessor
i s al so pi pel i ned ( Ruggi er o and Cor yel l [ 69] ) .
- 3-
Sect i on 2 of t he sur vey i s devot ed t o a gener al di scussi on of par al l el
and pi pel i ne comput er s, and how cer t ai n ar chi t ect ur al f eat ur es can af f ect
t he behavi or of al gor i t hms. We consi der t wo r eal i st i c model s and a t heor et i -
cal model i n whi ch an unl i mi t ed number of par al l el pr ocessor s ar e avai l abl e.
These ar e not mat hemat i cal model s of comput at i on, but mi ni mal descr i pt i ons
i nt ended t o al l ow br i ef and hopef ul l y meani ngf ul anal yses. Of pr i me i mpor -
t ance ar e t he obser vat i ons t hat decr eases i n comput at i on t i me depend on t he
abi l i t y t o move dat a qui ckl y, and t hat dat a must be ar r anged t o conf or m wi t h
t he ar chi t ect ur al r est r i ct i ons of t he comput er .
Sect i on 3 concer ns t he eval uat i on of gener al ar i t hmet i c expr essi ons and
some i mpor t ant speci al cases, such as r ecur r ences, i nner pr oduct s and mat r i x
mul t i pl i cat i on. The sol ut i on of a l i near syst em of equat i ons Ax = v i s di s-
cussed i n Sect i on 4. Csanky [ 75] has r ecent l y answer ed a maj or t heor et i cal
quest i on concer ni ng bounds on t he number of par al l el comput at i on st eps r equi r ed
t o comput e x, but t he met hod di spl ayed i s unst abl e. St abl e al gor i t hms, such
as Gaussi an el i mi nat i on wi t h pi vot i ng, use consi der abl y mor e t i me and consi der abl y
f ewer r esour ces. I f A has a r egul ar pat t er n of zer o and nonzer o el ement s t hen
much f ast er st abl e al gor i t hms can be der i ved, and t hese appr oach t he known
t heor et i cal l ower bounds on comput at i on t i me. Fi nal l y, par al l el ei genval ue
cal cul at i ons ar e consi der ed i n Sect i on 5; t he br evi t y of t hi s sect i on r ef l ect s
t he r el at i vel y smal l number of r esul t s.
For si mpl i ci t y of l anguage we wi l l r ef er t o al gor i t hms f or par al l el or
pi pel i ne comput er s as par al l el al gor i t hms, and hope t hat no conf usi on r esul t s.
Act ual l y, t her e ar e many common char act er i st i cs, especi al l y t he abi l i t y t o
pr ocess vect or s ef f i ci ent l y.
- 4-
Par al l el al gor i t hms depend on one si mpl e yet cr uci al obser vat i on: i n-
dependent comput at i ons may be execut ed si mul t aneousl y. A set of comput at i ons
i s sai d t o be i ndependent i f each r esul t var i abl e appear s i n onl y one comput a-
t i on. For exampl e, i n vect or addi t i on t he set of component sums i s i ndepen-
dent . Thus t wo N- vect or s may be added i n a si ngl e st ep usi ng N par al l el
pr ocessor s; t he vect or addi t i on i s sai d t o exhi bi t i nher ent par al l el i sm, as
i s any al gor i t hmwi t h l ar ge set s of i ndependent comput at i ons.
A par al l el al gor i t hmmay be cr eat ed by r ecogni zi ng t he i nher ent par al l el -
i smof a sequent i al al gor i t hm ( i . e. , an al gor i t hm f or a si ngl e- pr ocessor com-
put er ) . Whi l e many f ami l i ar al gor i t hms r eal l y ar e sequent i al , st andar d oper a-
t i ons of l i near al gebr a of t en have consi der abl e i nher ent par al l el i sm. I t i s
somet i mes necessar y t o r est r uct ur e an al gor i t hm t o i ncr ease t hi s pr oper t y.
Thi s can i nvol ve r eor der i ng a l i near syst emor r eor gani zi ng a comput at i on t o
spr ead oper at i ons acr oss sever al pr ocessor s. The l at t er t echni que l eads t o
t he i mpor t ant t heor et i cal r esul t t hat t he i nner pr oduct of t wo N- vect or s can
be comput ed i n (i ogNJ + 1 st eps usi ng N pr ocessor s, and i t can be shown t hat
t her e i s no f ast er al gor i t hm. I t i s al so possi bl e t o use an aut omat i c sear ch
f or par al l el i sm i n sequent i al pr ogr ams ( f or exampl e, see Kuck [ 73] ) , but we
wi l l not consi der t hi s aspect . I n f act , good sequent i al pr ogr ams somet i mes
obscur e t hei r par al l el i sm pr eci sel y because t hey ar e wr i t t en f or sequent i al
comput er s.
We use a par ent hesi zed l i st of i ndi ces t o emphasi ze t he i nher ent par al l el -
i smof a set of comput at i ons i n whi ch t he oper at i ons ar e i dent i cal and onl y
t he oper ands di f f er . For exampl e, i f C AB, A g R
mX n
, B R
n X p
, a sequent i al
Al l l ogar i t hms ar e base 2. [ x] i s t he uni que i nt eger sat i sf yi ng x ^ [ x| <x+1.
- 5-
pr ogr am t o comput e C i s
f or i
83
1 st ep 1 unt i l m do
f or j = 1 st ep 1 unt i l p do
begi n s - 0; f or k = 1 st ep 1 unt i l n do
s - s + a. _ b, . ;
i k kj
c. . <- s
end.
I n t he par ent hesi s not at i on t hi s becomes
S
i j *" *
0
*
1
*
m ; 1
*
j
*
p ) ;
f or k = 1 st ep 1 unt i l n do
S
i j "
S
i j
+ a
i k
b
k j 0 * 1 1 * J * P>;
c
i j *"
s
i j '
( 1
*
1 5 m ; 1
*
j s p )
or even
n
I j " X
a
i k
b
k j '
( U i S m ; U
J
s
P )
c
1.1
k- 1
when we do not want t o be r est r i ct ed t o any par t i cul ar met hod f or comput i ng
t he summat i on.
Once an al gor i t hmhas been gi ven, we woul d l i ke t o know how good i t i s:
how much t i me and how many r esour ces ar e needed, ar e t hese r equi r ement s mi ni mal
or near l y so, can t he al gor i t hm be r econst r uct ed t o use f ewer r esour ces and
st i l l have a r espect abl e r unni ng t i me, i s i t numer i cal l y st abl e? Ther e ar e
t wo vi ewpoi nt s f or t hi s anal ysi s. On t he one hand, we want t o know how di f -
f i cul t a pr obl em can be, so we ask how much wor k must be done by any al gor i t hm
- 6-
t hat sol ves t he pr obl em ( compl exi t y) . On t he ot her hand, we want t o di s-
cover al gor i t hms t hat wi l l l ead t o r el i abl e, ef f i ci ent pr ogr ams ( si mpl i ci t y) ,
t hough i t i s al ways di f f i cul t t o pr edi ct t he ef f ect of a speci f i c i dea on
t he execut i on t i me of a pr ogr am. The cr i t i cal pr obl em f or any comput at i on
model i s t o i dent i f y whi ch par t s of an al gor i t hm ar e most i mpor t ant , and put
t he most ef f or t i nt o opt i mi zi ng t hose par t s ( see, e. g. , Mol er [ 72] , Par l et t
and Wang [ 75] ) .
For t he pr act i t i oner , t he hope i s t hat par al l el i smwi l l al l ow t he cost -
ef f ect i ve and f ast sol ut i on of l ar ger and mor e compl i cat ed pr obl ems. For
t he mat hemat i ci an and comput er sci ent i st , t her e ar e many i nt er est i ng t heor et -
i cal quest i ons about t he compl exi t y and si mpl i ci t y of comput at i ons.
- 7-
2. PARALLEL AND PI PELI NE COMPUTERS
I n t hi s sect i on we descr i be cer t ai n f eat ur es of par al l el and pi pel i ne
comput er s, and some of t he ways i n whi ch t hese f eat ur es af f ect par al l el al -
gor i t hms. I n many cases t he choi ce of an al gor i t hmwi l l be di ct at ed by non-
ar i t hmet i c consi der at i ons, such as st or age and communi cat i on r equi r ement s,
and r el at i ve per f or mances wi l l be af f ect ed by wi del y var yi ng oper at i ng char -
act er i st i cs. Sect i on 2. c i s concer ned wi t h consequences of t he model s f or
numer i cal al gor i t hms, especi al l y l ower bounds on comput at i on t i me.
For mor e i nf or mat i on about pr ogr ammi ng and i mpl ement at i on, see Lawr i e
et al . [ 75] , Newel l and Rober t son [ 75] , Sameh and Layman [ 74] , St evenson [ 75] ,
t he t echni cal r epor t s l i st ed by Pool e and Voi gt [ 74] , and t he Mar ch 1975 i ssue
of SI GPLAN Not i ces, whi ch cont ai ns t he Pr oceedi ngs of a Conf er ence on Pr o-
gr ammi ng Languages and Compi l er s f or Par al l el and Vect or Machi nes. Ot her
r el evant sur veys ar e Baer [ 73] , Mi l l er [ 73] , Owens [ 73] , and St one [ 73b] ;
ar chi t ect ur al i ssues ar e di scussed by St one [ 75b] and T. C. Chen [ 75] .
2. a. Par al l el Comput er s
Our model of par al l el comput at i on depends on t he use of P i dent i cal pr o-
cessor s, al t hough r eal comput er s l i ke Cmmp al l ow some di f f er ences i n t he
pr ocessor s. We r egar d t he number of pr ocessor s as t he i mpor t ant par amet er .
I t i s assumed t hat t he st andar d r oundi ng er r or hypot heses hol d f or f i ni t e
pr eci si on ar i t hmet i c, and t hat doubl e pr eci si on accumul at i on of i nner pr oduct s
i s possi bl e. The basi c oper at i ons ( e. g. , +, - , X, / , max, mi n, and or der r el a-
t i ons) t ake t wo i nput s and pr oduce one r esul t i n t i me t , whi ch depends
onl y on t he oper at i on and t he dat a t ypes, and not on t he number of pr ocessor s.
I t i s possi bl e t o gener al i ze t o r i nput s, but as l ong as r i s f i xed i t s val ue
i s of mi nor i mpor t ance.
- 8-
We assume a l ar ge f i ni t e pr i mar y memor y accessi bl e by each pr ocessor ,
and make t he si mpl e but ver y st r ong assumpt i on t hat any pr ocessor can ob-
t ai n any pi ece of i nf or mat i on i n uni t t i me. Regi st er l oad/ st or e cost s and
i / o f or secondar y st or age wi l l be i gnor ed.
I n r eal i t y, al l par al l el al gor i t hms must deal wi t h t he compl ex pr obl ems
of dat a mani pul at i on, st or age al l ocat i on, memor y i nt er f er ence and i nt er pr o-
cessor communi cat i on pr esent ed by exi st i ng par al l el comput er s. For exampl e,
t he chaot i c r el axat i on met hods ( Sect i on 4. e) ar e desi gned t o handl e del ays
caused by conf l i ct i ng r equest s t o shar ed memor y. To r educe t hese conf l i ct s,
par al l el comput er s pr ovi de each pr ocessor wi t h a l ocal memor y, al ong wi t h a
communi cat i on net wor k t o per mi t dat a t r ansf er s bet ween pr ocessor s. Si nce
i t i s f ar t oo cost l y t o l i nk each pr ocessor di r ect l y t o al l ot her s * a r e-
st r i ct ed net wor k must be i mpl ement ed, causi ng an accessi ng del ay due t o i n-
cr eased pat h l engt hs bet ween pr ocessor s. I t i s possi bl e f or t hi s del ay t o
ser i ousl y af f ect pr ogr amexecut i on t i mes. The I l l i ac I V net wor k ar r anges
t he 64 pr ocessor s as an 8x8 gr i d and connect s each pr ocessor t o i t s f our
nei ghbor s. Thi s l eads t o skewed st or age of mat r i ces i f i t i s desi r abl e t o
access r ows and col umns wi t h equal ease ( Kuck [ 68] ) .
I nst r uct i ons f or each pr ocessor come f r om t he i ndi vi dual pr ocessor or
f r oma cent r al cont r ol uni t . I n t he f i r st case, t he i nst r uct i ons may di f f er
acr oss t he pr ocessor s, yi el di ng t he Mul t i pl e I nst r uct i on St r eam - Mul t i pl e
Dat a St r eammodel ( MI MD) ; t he pr ocessor s need not oper at e synchr onousl y. I n
t he second case, each pr ocessor execut es t he same i nst r uct i on at t he same
t i me, t hough usi ng di f f er ent dat a and dependi ng on a l ocal on/ of f swi t ch ( a
mask i n t he I l l i ac I V t er mi nol ogy) . Thi s i s t he Si ngl e I nst r uct i on St r eam-
- 9-
Mul t i pl e Dat a St r eammodel ( SI MD) . The t er ms MI MD, SI MD wer e f i r st used
by Fl ynn [ 66] ; t hi s i s not t he onl y possi bl e t axonomy, but i t f i t s our
pur poses. The gr eat er par t of t he sur vey concer ns al gor i t hms f or SI MD
comput er s, r ef l ect i ng bot h pr esent knowl edge and t he i nher ent si mpl i ci t y
of t he model .
For def i ni t eness we assume t hat t he pr ocessor s ar e synchr onous* and count
one st ep f or each set of oper at i ons per f or med si mul t aneousl y. Thus t he i mpor -
t ant consi der at i on i s t he number of st eps, and not t he t ot al number of oper a-
t i ons per f or med by al l t he pr ocessor s. Of cour se, any oper at i ons per f or med
i n par al l el must be i ndependent .
Var i ous model s of par al l el comput at i on depend on t he number of pr ocessor s.
The pr act i cal model has a f i xed number of pr ocessor s, i ndependent of t he ap-
pl i cat i on. The t heor et i cal model al l ows unl i mi t ed par al l el i sm ( "suf f i ci ent l y
many pr ocessor s" ) , so t hat t he number may var y wi t h t he appl i cat i on. We di s-
t i ngui sh t wo cases of unl i mi t ed par al l el i sm. I f a par al l el al gor i t hm sol ves
a probl emof si ze N usi ng P( N) pr ocessor s, t he number of pr ocessor s i s ( pol y-
nomi al l y) bounded i f P( N) p( N) f or some pol ynomi al p and al l N, and essen-
t i al l y i nf i ni t e ot her wi se. The number of memor y l ocat i ons may be si mi l ar l y
t r eat ed.
Most of t he al gor i t hms consi der ed i n t hi s sur vey i nvol ve bounded par al l el -
i smand memor y. Unl i mi t ed par al l el i sm i s consi der ed f or t heor et i cal pur poses,
but i t can pr ovi de usef ul i nf or mat i on f or f i xed par al l el i sm i f al gor i t hms
can be t r ansf or med t o r educe t he r esour ce r equi r ement s. I t shoul d be not i ced
f r om t he f ol l owi ng di scussi on t hat bounded par al l el i sm i s pr ef er r ed f or t hi s
pur pose, si nce al gor i t hms usi ng an essent i al l y i nf i ni t e number of pr ocessor s
Thi s has t wo i nt er pr et at i ons: i f sever al di f f er ent i nst r uct i ons ar e execut ed
i n par al l el , t hen each must be al l owed t o f i ni sh bef or e t he next r ound of i n-
st r uct i ons i s st ar t ed, or , al l i nst r uct i ons execut ed i n par al l el ar e i dent i cal .
-10-
per f or m an exponent i al number of oper at i ons per st ep.
Suppose we ar e gi ven an al gor i t hmusi ng ( N) pr ocessor s and ( N)
st eps, and we want t o const r uct a new al gor i t hmusi ng P
2
( N) < P^( N) pr o-
cessor s and T
2
( N) st eps, wher e i s not much l ar ger t han T. j . Two pr i n-
ci pl es, al gor i t hmdecomposi t i on and pr obl em decomposi t i on ( t he names ar e due
t o Hyaf i l and Kung [ 74b] ) under l i e t he t r ansf or mat i ons t o a smal l er number
of pr ocessor s. The f i r st depends on a si mul at i on ar gument : i f oper at i ons
ar e per f or med i n st ep i of al gor i t hm 1, t hen t hi s one st ep becomes P^/^JJ
st eps i n al gor i t hm 2. Each new st ep consi st s of at most P^ oper at i ons per -
f or med i n par al l el , and ( Br ent [ 74] )
T
l
T
1
T
2 - I P A] * I ( V V
1
^
i =i i =i
T
i
-
T ]
+ ( q- V/ P
2
, q = I q. .
i =l
Pr obl emdecomposi t i on depends on t he obser vat i on t hat t her e i s of t en an M such
t hat P, ( M) < P
0
( N) . By par t i t i oni ng t he or i gi nal pr obl em ( si ze N) i nt o smal l er
pr obl ems ( si ze M) and appl yi ng al gor i t hm 1 t o t he smal l pr obl ems, a new " boot -
st r apped" al gor i t hmmay be obt ai ned. The sol ut i on of a l i near syst em by
bl ock el i mi nat i on i s a f ami l i ar exampl e of t hi s pr i nci pl e.
Recur si ve doubl i ng, a power f ul met hod of gener at i ng par al l el al gor i t hms,
i s r el at ed t o pr obl emdecomposi t i on. The i dea i s t o r epeat edl y separ at e each
comput at i on i nt o t wo i ndependent par t s of equal compl exi t y, whi ch ar e t hen
comput ed i n par al l el . Thi s i s act ual l y a speci al case of di vi de- and- conquer .
For exampl e,
- 11-
f o- 1
1 S- F' Z . ) ( ! \ ) . - M-
i 1 \ i =1 / \ i =n /
and by f ur t her appl i cat i ons of t hi s spl i t t i ng t he summat i on can be comput ed
i n J l ogNJ st eps usi ng N/ 2 pr ocessor s. The subpr obl ems need not be smal l er
ver si ons of t he or i gi nal pr obl em, but shoul d exhi bi t associ at i ve pr oper t i es
so t hat t he par t i t i oni ng can be cont i nued. Recur si ve doubl i ng has been ap-
pl i ed t o sever al pr obl ems, not abl y i n t he wor k of St one and Kogge on r ecur -
r ence r el at i ons. Thi s i s di scussed br i ef l y i n Sect i on 3.
I t i s of consi der abl e pr act i cal i nt er est t o be abl e t o measur e t he ef -
f ect i veness of par al l el i sm. For a gi ven pr obl em, par amet er i zed by N, l et
T
1
( N) be t he r unni ng t i me of t he best known sequent i al al gor i t hm, and l et
Tp( N) be t he r unni ng t i me of a par al l el al gor i t hmusi ng P pr ocessor s. Speed-
up, def i ned as S
p
( N) Tj ( N) / T
p
( N) , measur es t he i mpr ovement i n sol ut i on
t i me usi ng par al l el i sm, whi l e ef f i ci ency, def i ned as E
p
( N)
8 3
S
p
( N) / p, at -
t empt s t o measur e how wel l t he pr ocessi ng power i s bei ng used. A si mpl e
ar gument shows t hat S
p
P and E
p
1. Not e t hat E^ = 1, so we don' t neces-
sar i l y want t o choose t he number of pr ocessor s i n or der t o maxi mi ze t hi s
f unct i on.
For a cl ass of pr obl ems i t i s necessar y t o consi der some di f f er ent mea-
sur es. Let f ( x) be t he pr obabi l i t y t hat we want t o sol ve pr obl em x t aken
f r omcl ass X. Kung and Tr aub [ 74] def i ne speedup on t he aver age as
SA
p
( X
f
f ) f ] [ f ( x) T
1
( x) J / l f ( x) T
p
( x) j
\ xX / W /
and t he aver age speedup as
- 12-
AS
p
( X, f ) = ^ f ( x) T
1
( x) / T
p
( x) .
x0C
Each of t hese f unct i ons has di f f er ent char act er i st i cs and, dependi ng on f ,
can expose di f f er ent f eat ur es of an al gor i t hm.
The goal i s t o const r uct al gor i t hms exhi bi t i ng l i near ( i n P) speedup and
hence ut i l i zi ng t he pr ocessor s ef f i ci ent l y. That i s, f or pr obl ems of si ze N we
want an asympt ot i c speedup of t he f or mS
p
( N) = cP - g( P, N) , wi t h 0 < c 1,
0 g( P, N) = o( 1) as N -* a>; c shoul d be i ndependent of P and cl ose t o 1. I t
i s suggest i ve t o t hi nk of g as a penal t y f or t he use of par al l el i sm on smal l
pr obl ems, so f or l ar ge pr obl ems t he r ewar ds shoul d out wei gh t he penal t i es.
However , l i near speedup i s not al ways possi bl e. Ther e ar e cer t ai n com-
put at i ons f or whi ch t he maxi mal speedup i s S
p
( N) < k f or a const ant k, and
such a comput at i on cl ear l y makes poor use of par al l el i sm. For many i mpor t ant
pr obl ems i n l i near al gebr a t he best speedup i s S
p
( N) = cP/ l ogP - g( P, N) ,
whi ch i s accept abl e t hough l ess t han l i near .
2, b. Pi pel i ne Comput er s
We now consi der a di f f er ent appr oach t o hi gh- speed comput at i on, t he
cl ass of pi pel i ne or vect or comput er s. These ar e SI MD machi nes, but t hei r
speed i s achi eved pr i mar i l y by a f or mof i nst r uct i on l ookahead i n a si ngl e
pr ocessor r at her t han by use of mul t i pl e pr ocessor s.
By par t i t i oni ng f l oat i ng poi nt oper at i ons ( +, x, et c ) i nt o mor e basi c
sub- oper at i ons ( exponent adj ust ment , mant i ssa ar i t hmet i c, et c. ) an assembl y
l i ne st r uct ur e or pi pel i ne can be set up f or r epet i t i ve cal cul at i ons such
as component wi se vect or oper at i ons and i nner pr oduct s. Successi ve compl et ed
r esul t s l eave t he pi pel i ne at a r at e det er mi ned by t he memor y t r ansf er r at e
- 13-
and t he i nt er nal st age del ay, and not by t he t ot al t i me r equi r ed f or each
ar i t hmet i c oper at i on. To si mpl i f y memor y t r ansf er s, vect or oper ands on
t he STAR must be cont i guous bl ocks of memor y l ocat i ons. Vect or oper ands on
t he ASC and Cr ay- 1 can be any sequence of l ocat i ons i n ar i t hmet i c pr ogr es-
si on. I nt er l eavi ng i s used t o i ncr ease t r ansf er r at es f r om r el at i vel y sl ow
cor e memor i es.
The execut i on t i me f or a vect or oper at i on consi st s of t wo par t s, an
i ni t i al del ay ( cal l ed t he vect or st ar t up t i me) and t he sequent i al appear ance
of compl et ed r esul t s. We denot e t hi s t i me as T N + a wher e N i s t he
op op*
l engt h of t he vect or s i nvol ved. For scal ar oper at i ons we cont i nue t o denot e
t he t i me as t Typi cal val ues of T, a and t f or t he STAR ar e gi ven i n
Tabl e 1; t he ( cor e- t o- cor e) vect or t i mes i ncl ude l oad/ st or e cost s whi l e t he
( r egi st er - t o- r egi st er ) scal ar t i mes do not .
TABLE 1
Sel ect ed CDC STAR I nst r uct i ons, 64 Bi t Fl oat i ng Poi nt Oper ands ( CDC [ 74] )
N such t hat
oper at i on t T a \o/ ( t - T) | TN+Q 1. 5TN
add, subt r act 13 . 5 96 8 384
mul t i pl y 17 1 156 10 312
di vi de 47 2 156 4 156
squar e r oot 73 2 152
f l oor , cei l i ng 11 . 5 90
3 152
5 360
376
380
V l * %
( 1 3 )
-
5 9 4
<
8
>
, a b ( br anch) 15- 46 -
j A - B ( set condi - - . 5 95 ( 3- 7)
' t i on code)
a < b ( br anch) 15- 46
maxi mum - 6 85 ( 3- 10) 29
summat i on ( 13) 4 98 ( 11) 49
i nner pr oduct ( 30) 4 100 ( 4) 50
- 14-
Ti mes i n t he pr ecedi ng t abl e ar e gi ven as mul t i pl es of t he 40 nano-
second cycl e t i me, under cer t ai n assumpt i ons. Devi at i ons f r om t hese
assumpt i ons can cause o t o i ncr ease.
Two i mmedi at e consequences of t he t i mi ng f or mul a T N + a ar e t he l i near
dependence of execut i on t i me on t he number of oper at i ons per f or med, and t he
encour agement t o use vect or oper at i ons when T i s much smal l er t han t . For
N > a/ ( t -T) i t i s bet t er t o use one vect or oper at i on t han N scal ar oper a-
t i ons. Mor eover , t he use of l ong vect or s i s encour aged i n or der t o mi ni mi ze
t he ef f ect s of a, whi ch i s gener al l y much l ar ger t han T. The l ast col umn of
Tabl e 1 shows how l ong t he vect or s must be so t hat t he act ual r esul t r at e i s
50^ gr eat er t han t he asympt ot i c r esul t r at e.
The STAR al so pr ovi des anot her set of i nst r uct i ons f or use wi t h spar se
vect or s, each of whi ch consi st s of a packed vect or of nonzer o component s and
a bi t vect or descr i bi ng t he t r ue posi t i on of t hese component s. The gener al
t i mi ng f or mul a f or a spar se vect or oper at i on i s r n + pN 4- a, wher e n i s t he
number of nonzer os i n t he r esul t vect or and N i s t he number of bi t s i n t he
oper and vect or s. For a spar se addi t i on, T = 1, p = 1/ 8, a = 183.
Many of t he pr obl ems i n const r uct i ng pr ogr ams f or pi pel i ne comput er s
i nvol ve t he i sol at i on of vect or oper at i ons i n a par t i cul ar set t i ng, and dat a
mani pul at i on t o change concept ual dat a st r uct ur es i nt o t he vect or oper and
f or mat . Consi der , f or exampl e, t he addi t i on of t wo col umns of an N x N
mat r i x t hat i s st or ed by r ows. Si nce a vect or on t he STAR i s def i ned t o be
a cont i guous sequence of memor y l ocat i ons, a mat r i x col umn i s not a vect or ,
cont r ar y t o t he usual mat hemat i cal i nt er pr et at i on. A vect or addi t i on ( O( N)
t i me) must t her ef or e be pr eceded by a speci al ext r act i on oper at i on ( O( N^) t i me)
- 15-
t o copy t he r equi r ed col umn el ement s i nt o vect or oper and f or m. A bet t er
dat a st r uct ur e or N scal ar oper at i ons wi l l pr eser ve t he l i near r unni ng t i me,
but t hese sol ut i ons ar e not al ways possi bl e or pl easi ng. The mor e gener al
def i ni t i on of a vect or used by t he ASC avoi ds t hi s di f f i cul t y and t he col umn
addi t i on may be per f or med di r ect l y wi t h a vect or oper at i on i n 0( N) t i me.
A compl et el y di f f er ent dat a mani pul at i on pr obl em can occur when second-
ar y st or age ( i . e. , di scs) must be used ( Lynch [ 74] , Kni ght , Pool e and Voi gt
[ 75] ) . Wi t h cur r ent t echnol ogy, t he pi pel i ne comput at i on r at e i s so much
gr eat er t han t he di sc t r ansf er r at e t hat , f or cer t ai n l ar ge l i near syst ems,
t he ant i ci pat ed i / o t i me over whel ms t he ant i ci pat ed comput at i on t i me. The
ar i t hmet i c uni t wi l l be i dl e f or a si gni f i cant per i od, si mpl y wai t i ng f or
i t s oper ands. Thi s si t uat i on i s by no means uni que t o t he pi pel i ne ar chi -
t ect ur e, and wi l l occur i n any comput er syst emwi t h a mi smat ch bet ween
comput at i on and t r ansf er r at es. I t i s suggest ed t hat i nt er medi at e quant i t i es
be r ecomput ed r at her t han st or ed on di sc, wi t h t he expect at i on t hat a l ar ge
i ncr ease i n comput at i on t i me wi l l be of f set by a gr eat er decr ease i n t he
I / O t i me.
We not e one si de ef f ect of st udyi ng vect or comput er s, whi ch occur r ed
dur i ng t he t r ansi t i on f r om t he CDC 7600 t o t he STAR at Lawr ence Li ver mor e
Labor at or y ( Owens [ 73] , Zwackenber g [ 75] ) . LRLTRAN, t he l ocal di al ect of
For t r an, was ext ended t o i ncl ude vect or oper at i ons r ef l ect i ng t he STAR i n-
st r uct i ons, and exi st i ng LRLTRAN pr ogr ams wer e r ewr i t t en usi ng t he ext ensi ons.
Pr i or t o t he ar r i val of t he STAR, t he new pr ogr ams wer e r un on t he 7600, and
i t was f ound t hat t hey r an 1. 2 t o 2. 8 t i mes f ast er t han t he or i gi nal codes,
si nce t he sof t war e vect or i nst r uct i ons t ook gr eat er advant age of t he 7600
f
s
- 16-
l ookahead and segment ed f unct i onal uni t s. Thi s has become known as t he
"vect or 7600" ef f ect .
To si mpl i f y f ur t her di scussi on, as we di d wi t h t he model of par al l el
comput at i on, we wi l l i gnor e t he t i me r equi r ed f or dat a mani pul at i on and con-
cent r at e on t he ar i t hmet i c t i me.
For a pr obl em of si ze N, wher e t he best scal ar al gor i t hm uses R
g
( N)
oper at i ons, suppose we ar e consi der i ng an al gor i t hm usi ng t he vect or i nst r uc-
t i ons and a t ot al of
R
y
( N) oper at i ons. I f N i s l ar ge we cer t ai nl y want t o
avoi d t he si t uat i on wher e t he use of vect or oper at i ons i s har mf ul r at her
t han benef i ci al . Fol l owi ng Lambi ot t e and Voi gt [ 75] , we say t hat t he vect or
al gor i t hm i s asympt ot i cal l y consi st ent i f R
y
( N) = 0( R
g
( N) ) as N ->, An i n-
consi st ent vect or al gor i t hm, t hough not uni ver sal l y usef ul , may st i l l be ap-
pl i cabl e f or some val ues of N, and i t i s possi bl e t hat a consi st ent al gor i t hm
may not be appl i cabl e at al l because of a l ar ge asympt ot i c const ant , but vec-
t or al gor i t hms of i nt er est wi l l gener al l y be consi st ent , t ake advant age of t he
i ncr eased r esul t r at e, and mi ni mi ze t he ef f ect s of t he st ar t up t i me a.
2. c. Consequences of t he Model s
The f act t hat t he par al l el pr ocessi ng el ement uses onl y unar y and bi nar y
oper at i ons al l ows t he i mmedi at e concl usi on of a si mpl e l ower bound on t he
comput at i on t i me f or a pr obl emwi t h N i nput s and a si ngl e out put . Cl ear l y
N- l bi nar y oper at i ons ar e necessar y and may be suf f i ci ent t o comput e t he
r esul t sequent i al l y, but we ar e i nt er est ed i n knowi ng how many of t hese oper a-
t i ons may be execut ed i n par al l el . For t hi s pur pose, consi der t he set BT( P)
of r oot ed bi nar y t r ees def i ned as f ol l ows:
- 17-
1. t he t r ee wi t h one node ( bot h t he r oot and a l eaf ) i s i n BT( P)
and has dept h 0;
2. gi ven a dept h n t r ee i n BT( P) , i f we r epl ace at most P l eaves wi t h
t he t r ee ^/ ^^ t hen t he new t r ee i s i n BT( P) and has dept h n+1 ;
3. al l t r ees i n BT( P) ar e const r uct ed usi ng 1. and 2,
A dept h n t r ee i n BT( P) cor r esponds t o n st eps of par al l el comput at i on wi t h
P pr ocessor s. I t i s i mpor t ant t o not e t hat dept h i s di f f er ent f r omhei ght ,
whi ch i s def i ned as t he maxi mum number of br anches bet ween t he r oot and a
l eaf . Whi l e not equal i n gener al , we al ways have hei ght ^ dept h. Now, l et
L( P, n) be t he maxi mum number of l eaves on a dept h n t r ee i n BT( P) , and l et
m( P, N) be t he mi ni mum dept h of any t r ee i n BT( P) wi t h N l eaves. Cl ear l y
L( P, 0) = 1,
L(P, rrt- 1) = mi n( L( P, n) +P, 2L( P, n) ) , and
L( P, n- l ) < N <: L( P, n) i mpl i es m( P, N) - n.
Let t i ng k = | l ogl j , we have L( P, n)
2
m i n ( k , n )
4- Pmax( 0, n- k) , so
m( P, N) = mi n( k, f l ogN| ) + max( 0, [ ( N- 2
k
) / p] ) .
I n summar y, at l east m( P, N) st eps ar e r equi r ed t o comput e one r esul t f r omN
i nput s usi ng P pr ocessor s.
Munr o and Pat er son [ 73] have der i ved a si mi l ar but mor e gener al l ower
bound on comput at i on t i me. Thi s r esul t i s i mpor t ant because i t al l ows t he
t r ansl at i on of compl exi t y t heor ems f r om sequent i al comput at i on t o par al l el
comput at i on.
- 18-
Theor em. I f at l east q oper at i ons ar e r equi r ed t o comput e a si ngl e number
Q, t hen any al gor i t hm usi ng P pr ocessor s t o comput e Q must t ake at l east
m( P, q- f l ) st eps.
Pr oof . The maxi mum number of oper at i ons t hat can be done i n n st eps wi t h P
pr ocessor s t o comput e one r esul t i s L( P, n) - 1, whi ch i s j ust t he maxi mum num-
ber of non- l eaf nodes on a dept h n t r ee i n BT( P) . Let t be t he uni que posi -
t i ve i nt eger such t hat L( P, t - 1) - 1 < q ^ L( P, t ) - 1, so t hat t =m( P, q+1) .
Fewer t han t st eps cannot comput e Q, al t hough we cannot concl ude t hat t
st eps ar e suf f i ci ent .
Act ual l y, we have descr i bed t he cl ass of opt i mal al gor i t hms t o comput e
A
N
= a
i
a
2 * *
#

A
N*
w
^
e r e

i s a n
y
a
ssoci at i ve oper at i on. Each t r ee i n
BT( P) wi t h N l eaves and mi ni mal dept h r epr esent s an al gor i t hm f or P pr ocessor s,
whi ch we wi l l cal l an associ at i ve f an- i n al gor i t hm, and whi ch uses m( P, N) st eps
and N- 1 oper at i ons. Fi gur e 1 shows one such t r ee f or N = 8, P = 3. These
met hods ar e mor e f ami l i ar l y known as l og- sum or l og- pr oduct al gor i t hms f or
t he or i gi nal appl i cat i ons i n whi ch o = + or X, P = [ N/ 2) and m( P, N) f l ogNJ ; w<
have al r eady used t he summat i on exampl e t o i l l ust r at e t he r ecur si ve doubl i ng
t echni que.
I n spi t e of t hei r si mpl i ci t y, t he f undament al i mpor t ance of t he associ -
at i ve f an- i n al gor i t hms shoul d not be under est i mat ed. We not e t hat m( P, N)
i s appr oxi mat el y N/ P + l og( P/ 2) f or P < N, and m( P, N) - f l ogNJ f or P ;>N.
Thus t he obt ai nabl e speedups i n comput i ng A
N
ar e S
P
( N) P - 0( l / N) and
S
N
( N) - N/ l ogN. I t i s seen t hat al t hough l i near speedup i s i mpossi bl e f or
cer t ai n val ues of P, t he f an- i n al gor i t hms ar e opt i mal i n t he sense of
achi evi ng mi ni mal comput at i on t i me f or al l val ues of P.
Fi gur e 1
Comput at i on of o . . .
a
g wi t h 3 pr ocessor s.
Number s next t o a node denot e t he st ep i n whi ch t he oper at i on i s per f or med.
As an al l i ed quest i on, we can ask how f ew pr ocessor s ar e act ual l y needed
t o comput e A
N
i n mi ni mal t i me. Mur aoka [ 71] showed t hat i f n J l ogNJ and
| ( N- 2
n
-
2
) / 2] i f 2
1 1
"
1
< N ! 2
n
-
}
P( N)
I N - 2
0
"
1
l f
|
2
n- l ^ N 2
t hen m( P( N) , N) * n and m( P( N) - l , N) n+1. Thus P( N) pr ocessor s ar e necessar y
and suf f i ci ent t o eval uat e A
N
i n n st eps. Kogge [ 72c] exami ned t he br oader
pr obl em of eval uat i ng A^, 1 j N, i n n st eps, and al so consi der ed con-
st r ai nt s on t he communi cat i on net wor ks bet ween pr ocessor s. Cl ear l y P( N)
pr ocessor s ar e st i l l necessar y, but t he const r ai nt s and addi t i onal comput a-
t i ons make t he suf f i ci ency quest i on mor e di f f i cul t , and onl y par t i al r esul t s
ar e avai l abl e. One si mpl e r esul t t hat we wi l l need l at er i s t hat t he power s
z^, 1 j <> N, may be comput ed i n n st eps usi ng max( 2
n
"
2
, N- 2
n
~
1
) <> N/ 2
pr ocessor s,
- 20-
Anot her usef ul obser vat i on i s t hat i f P( N) = [ N/ l ogN] t hen m( P( N) , N)
21ogN and Sp( N) > P/ 2. Thus we can r educe t he number of pr ocessor s some-
what and obt ai n bot h a l ogar i t hmi c r unni ng t i me and l i near speedup.
The er r or anal ysi s f or summat i ons and pr oduct s usi ng t he associ at i ve
f an- i n al gor i t hm i s not di f f i cul t . I f h i s t he hei ght of t he comput at i on
t r ee t hen i t may be shown t hat
N
1
a
i )
=
I
a
i
( 1 +
5i >' I- 5J ^ >
i =l ' i =i
/ N \ N
F I
IT
a
i
= a
i I T
a
i
( i + e
i ^ ' ' J
\ i - 1 ' i 2
wher e p. i s a smal l const ant dependi ng on t he f l oat i ng poi nt number syst em
( Babuska [ 68] , Kogge [ 72b] ) . Babuska ( al so Vi t en
!
ko [ 68] ) has obser ved t hat
t he f ul l y br anched l og- sumal gor i t hm ( h = |~l ogNJ | ) yi el ds t he best st abi l i t y
bounds of any al gor i t hmusi ng N- l addi t i ons. Thi s does not pr ecl ude use of
pseudo- doubl e pr eci si on summat i on met hods compar abl e t o t he ones pr oposed f or
sequent i al comput at i on ( e. g. , Kahan [ 71] ) , si nce t hese use mor e t han N- 1 ad-
di t i ons.
Fi nal l y, l ower bounds f or pi pel i ne comput er s may be compar ed t o l ower
bounds f or par al l el comput er s wi t h a f i xed number of pr ocessor s, f or i n each
case t he comput at i on t i me i s bounded bel ow by a l i near f unct i on of t he num-
ber of oper at i ons act ual l y per f or med. I ndeed, we can est i mat e t he t i me f or
a par al l el vect or oper at i on as t N/ p + s, wher e s i ncl udes over head and t he
" wr ap- up" t i me t l og p/ 2 f or oper at i ons wi t h a si ngl e r esul t . Thi s i s i n shar p
cont r ast t o t he " suf f i ci ent l y many pr ocessor s" model , wher e vect or addi t i ons
- 21-
can be done i n a si ngl e st ep and summat i ons i n a l ogar i t hmi c number of
st eps. I t i s i mpor t ant t o be awar e of t he di f f er ences bet ween a f i xed
and unl i mi t ed number of pr ocessor s, and t hat al gor i t hms execut ed under
t he t wo model s wi l l have qui t e di f f er ent oper at i ng char act er i st i cs.
We obser ve t hat , f or t he l i mi t ed model of ar i t hmet i c t i me and as an
i nt ui t i ve gui de onl y, a good al gor i t hm f or a pi pel i ne comput er shoul d al so
be a good al gor i t hm f or a SI MD comput er wi t h f i xed par al l el i sm, and vi ce
ver sa. One of t he f l aws of t hi s compar i son i s t he r el at i ve i mpor t ance of
l ower or der t er ms, f or a wi l l t ypi cal l y be much l ar ger t han s, and we have
t o be car ef ul i n di scussi ng al gor i t hms t hat ar e expect ed t o be f ast
n
i n
t he l i mi t
11
, t he def i ni t i on of whi ch can change consi der abl y.
- 22-
3. BASI C ALGORI THMS
Thus f ar we have est abl i shed t hat cer t ai n si mpl e but usef ul comput a-
t i ons can be per f or med opt i mal l y i n par al l el . I n f act , ar bi t r ar y ar i t h-
met i c expr essi ons wi t h N t er ms can be eval uat ed i n 0( m( P, N) ) st eps under
t he P pr ocessor MI MD model . Much ef f or t has gone i nt o i mpr ovi ng t he
asympt ot i c const ant s and i n consi der at i on of speci al expr essi ons such as
pol ynomi al s and r ecur r ences. I mpor t ant cases f or l i near al gebr a ar e t he
vect or i nner pr oduct , mat r i x mul t i pl i cat i on, l i near combi nat i ons of vec-
t or s, vect or nor ms, l i near r ecur r ences, and t he f ast Four i er t r ansf or m.
The r esul t s of t hi s sect i on most l y appl y t o par al l el comput er s, f or no
gener al t heor y pr esent l y exi st s concer ni ng t he opt i mal eval uat i on of
ar bi t r ar y expr essi ons on a pi pel i ne comput er .
3. a. Gener al Expr essi ons and Recur r ences
I n der i vi ng t he associ at i ve f an- i n al gor i t hms we wer e abl e t o use t he
associ at i vi t y of o i n or der t o r educe t he dept h of t he comput at i on t r ee
r epr esent i ng t he sequent i al al gor i t hmA^ = a^, A j A^ o
a
^
+
i The i dea
of r est r uct ur i ng a sequent i al comput at i on has been successf ul l y ext ended t o
gener al ar i t hmet i c expr essi ons; Br ent [ 73] summar i zes ear l y wor k and basi c
t echni ques. Si nce an upper bound i s desi r ed, i t may be assumed t hat t he
expr essi on' s di r ect ed comput at i on gr aph i s act ual l y a t r ee. Const ant s ar e
t r eat ed as i ndet er mi nat es and al l i ndet er mi nat es must be di st i nct . Tr ee
mani pul at i ons wi t h t he f l avor of r ecur si ve doubl i ng ar e t hen appl i ed, r e-
duci ng t he t r ee dept h and hence t he comput at i on t i me. Under t he MI MD
model , Tabl e 2 shows r ecent r esul t s f or an expr essi on wi t h N i n-
det er mi nat es. The P pr ocessor cases ar e der i ved usi ng t he al gor i t hmde-
composi t i on pr i nci pl e.
23-
TABLE 2
Par al l el Comput at i on Ti mes
Expr essi ons wi t h DI H. ^N
sour ce
# of pr ocessor s # of st eps
1 44
<
N
' ) 2. 881ogN + 0( 1) Mul l er and Pr epar at a [ 75]
3 N
41ogN + 0( 1) Br ent [ 73]
P 5N/ 2P + 0( l og N) Wi nogr ad [ 75]
Expr essi ons wi t hout Di vi si on
( N )
2. 081ogN + 0( 1) P
,
8

U )
Pr epar at a and Mul l er [ 75]
N
4l ogN + 0( 1)
P ^M/ OTI I ^WI 2.
Br ent [ 73]
L ' ^J
3N/ 2P + 0( l og N) Wi nogr ad [ 75]
Br ent ' s al gor i t hm f or expr essi ons wi t hout di vi si on i s numer i cal l y st abl e,
and an exampl e due t o W. Mi l l er ( Br ent [ 74] ) shows t hat t he cor r espondi ng al -
gor i t hm f or expr essi ons wi t h di vi si on can be unst abl e. The st abi l i t y of t he
ot her eval uat i on t echni ques has not yet been i nvest i gat ed. Hyaf i l and Kung
[ 74a] show t hat i n Wi nogr ad
1
s scheme ( no di vi si ons) t he const ant t er m3jl
cannot be decr eased by much. Thi s i s done by consi der i ng t he f i r st or der
l i near r ecur r ence x
Q
= a
Q
, x^
b
i
x
i _ i
+ a n d
showi ng t hat any al gor i t hm
t o eval uat e x^ i n t st eps must per f or mat l east 3n- t / 2 oper at i ons. Thi s r e-
sul t quant i f i es par t of t he f ol k wi sdom of par al l el comput at i on, t hat t her e
i s a t r adeof f bet ween speed and t he amount of wor k act ual l y per f or med. Usi ng
t he f act t hat we must have Pt 3n- t / 2 ( ot her wi se t st eps coul d not be suf -
f i ci ent ) we concl ude t hat t ^ 3n/ ( P+l / 2) . A sl i ght l y bet t er r esul t can be
obt ai ned f r om t he Munr o- Pat er son t heor em, as t ^ m( P, 3n- t / 2+l ) i mpl i es
t * ( 3n + P( f TogP] - 2) + 1) / ( P + 1/ 2) . The subst i t ut i on N - 2n+l gi ves
t he desi r ed r esul t .
I nvest i gat i on of l i near r ecur r ences began wi t h t wo di f f er ent appl i ca-
t i ons, pol ynomi al eval uat i on and t he sol ut i on of t r i di agonal l i near syst ems.
- 24-
Hor ner
f
s r ul e, whi ch uni quel y mi ni mi zes t he sequent i al comput at i on t i me
( Bor odi n [ 71] ) , i s a f ami l i ar exampl e of a f i r st or der l i near r ecur r ence,
and was one of t he f i r st met hods t o be r econst r uct ed f or par al l el comput a-
t i on. The si t uat i on i s now qui t e wel l under st ood, and Munr o and Pat er son
[ 73] have descr i bed some asympt ot i cal l y opt i mal al gor i t hms. Wi t h N pr ocess-
or s an N*"*
1
degr ee pol ynomi al can be eval uat ed i n l ogN + ( 21ogN) ^
2
+ 0( 1)
st eps, and wi t h P pr ocessor s i n m( P, 2N+l ) + 0( 1) st eps.
St one [ 73a] obser ved t hat an NxN t r i di agonal l i near syst em can be sol ved
i n O( l ogN) st eps i f t he f i r st N t er ms of f i r st and second or der l i near r ecur -
r ences can be eval uat ed i n O( l ogN) st eps usi ng N pr ocessor s. Thi s i s di s-
t h
cussed i n mor e det ai l i n Sect i on 4. c, and an al gor i t hm t o comput e m or der
m
l i near r ecur r ences ( x^^^, . . . , x
Q
gi ven, x^ ^
a
i j
X
i - j
+ t >
i
>
* ^ ^
i s
i = l
gi ven i n Sect i on 4. b, al t hough t hi s i s sl i ght l y di f f er ent t han t he r ecur si ve
doubl i ng t echni ques used by St one.
A ser i es of paper s by Kogge and St one ( Kogge [ 72a, b, c] , [ 74] , Kogge and
St one [ 73] , cf . Tr out [ 72] and Hel l er [ 74b] ) deal t wi t h var i ous gener al i za-
t i ons and i mpr ovement s of St one' s or i gi nal r ecur r ence met hods, al t hough
l i near - l i ke pr oper t i es ar e st i l l r equi r ed. For exampl e, i f x
Q
i s gi ven and
a x +b
x
i +1
a
x +d
9 t h e n X
" ^ ^
N
*
m a y b e c o m
P
u t e c i i n
0( 1 8
N
) st eps wi t h N
i
X
i i
J
pr ocessor s. Wor k by Kung [ 74] and Hyaf i l and Rung [ 75] shows t hat t he l i near
r ecur r ence i s r eal l y ver y speci al , and t hat onl y a const ant speedup i s pos-
si bl e f or gener al r at i onal and nonl i near r ecur r ences. That i s, t her e can be
no f ast al gor i t hms, r egar dl ess of how many pr ocessor s ar e avai l abl e.
We not e t hat , al t hough t he comput at i on of N t er ms of a l ow or der l i near
r ecur r ence on a pi pel i ne comput er gener al l y r equi r es use of t he scal ar mode
- 25-
because of shor t vect or l engt hs, Kogge [ 73] shows how t o adapt r esul t s f r om
t he par al l el comput at i on of r ecur r ences t o t he desi gn of speci al pur pose
pi pel i nes. By car ef ul use of par al l el comput at i on net wor ks and f eedback
l oops, t he maxi mal out put r at e of one r esul t per cycl e may be achi eved.
3. b. I nner Pr oduct s and Rel at ed Comput at i ons
One of t he most i mpor t ant appl i cat i ons of t he associ at i ve f an- i n al gor -
i t hm i s t he comput at i on of i nner pr oduct s. Wi nogr ad [ 70] has shown t hat N
N
mul t i pl i cat i ons and N- 1 addi t i ons ar e r equi r ed t o comput e a b = ^ a b , so
i =l
wi t h P pr ocessor s i t i s necessar y t o use at l east m( P, 2N) st eps. Thi s l ower
bound i s achi evabl e on an MI MD machi ne by a sl i ght al t er at
i on of t he f an- i n
met hod. On a SI MD machi ne [N/P"| + m( P, N) st eps ar e needed, si nce we cannot
T
per f or m di f f er ent oper at i ons i n t he same st ep. I n ei t her case, a b r equi r es
[ l ogN] + 1 st eps gi ven N pr ocessor s. As bef or e, t he er r or anal ysi s i s not
di f f i cul t , f or
\ i =l / i =l
wher e h m( P, 2N)+l i s t he hei ght of t he comput at i on t r ee used. Doubl
ci si on accumul at i on i s possi bl e as i n sequent i al comput at i on.
T
I n consi der i ng t he N pr ocessor eval uat i on of a b i t i s i mpor t ant t o u
ser ve t hat whi l e hal f of t he oper at i ons ar e mul t i pl i cat i ons, al most al l of
t he comput at i onal st eps i nvol ve per f or mi ng t he addi t i ons. I n sequent i al com-
put at i on i t i s of t en possi bl e t o est i mat e an al gor i t hm' s per f or mance sol el y
by count i ng mul t i pl i cat i ons, but t hi s i s def i ni t el y not t he case f or par al l el
comput at i on. Al l oper at i ons must be count ed.
e pr e-
ob-
- 26-
On a vect or comput er t he i nner pr oduct i s gener al l y i ncl uded as a
har dwar e i nst r uct i on t aki ng t i me T , N + a, wher e T , ^ < T + T
dot dot dot x sum
Even i f T, = T + T i t i s bet t er t o use t he har dwar e i nner pr oduct
dot x sum
si nce i t wi l l save one vect or st ar t up t i me, whi ch i s not i nsi gni f i cant .
The vect or p- nor ms can al so be comput ed iiji par al l el usi ng t he associ a-
t i ve f an- i n al gor i t hm. Her e | | v| |
p
- ^L
V
I
L , so f or 1 <; p < oo
w e
r equi r e
( N/ P] unar y st eps t o comput e | v^|
P
, m( P, N) addi t i on st eps f or t he summat i on,
and one st ep f or ||v|| I MI ^
8=8
max| v_J i s di r ect l y comput abl e i n [N/P"1 + m( P, N)
st eps si nce t he maxi mum oper at i on i s associ at i ve. The most f r equent cases
ar e p = 1, 2, and , and on a pi pel i ne machi ne t hese may be comput ed usi ng
t he absol ut e val ue, summat i on, i nner pr oduct and maxi mum vect or i nst r uc-
t i ons. Si mi l ar l y, t he 1 and mat r i x nor ms ar e easi l y comput abl e. I n act ual
pr act i ce t he choi ce of nor mand comput at i onal met hod wi l l be gr eat l y af f ect ed
by t he par t i cul ar appl i cat i on and comput er , and t he i ssues i nvol ved i n t hi s
choi ce ar e f ar f r ombei ng r esol ved.
An i nner pr oduct i s, of cour se, a speci al case of mat r i x mul t i pl i cat i on,
bei ng t he pr oduct of 1 X N and N X 1 mat r i ces. Mor e gener al l y, t he pr oduct
of mX n and n X p mat r i ces may be opt i mal l y comput ed i n J L. ogn| + 1 st eps
wi t h mnp pr ocessor s, si nce each component of t he mX p r esul t i s an i nner
pr oduct of n- vect or s. I f f ewer t han mnp pr ocessor s ar e avai l abl e t hen asymp-
t ot i cal l y f ast al gor i t hms may be devel oped usi ng t r adeof f s bet ween t he f ast
sequent i al al gor i t hms ( St r assen [ 69] ) and t he usual sequent i al al gor i t hmf or
par t i t i oned mat r i ces. For exampl e, i f A and B ar e N X N and we have a f ast
sequent i al met hod usi ng cN^st eps, a - l og7 - 2. 81, t hen wi t h 8 pr ocessor s
C - AB may be comput ed i n cK
a
/7 + N
2
/ 8 st eps. I f and ar e JN/2J X J N/ ^
- 27-
'
A
H
A
1 2 \ / B N B
A
A
2 ,
A
2 2 /
B
\ B 2 , B 2 2
" 12>
\
> THEN
'
A
1 1
B
1 1
+ A
1 2
B
2 1
A
1 1
B
1 2
+ A
1 2
B
2 2 \
^
A
2 1
B
1 1
+ A
2 2
B
2 1
A
2 1
B
1 2
+ A
2 2
B
2 2
E AC H
A
I J C
B
K J MAT R I X PRODUCT I S COMPUTED I N ONE PROCESSOR I N C( N/
/
2)
O R
S T E P S , AND
THE FOUR MAT R I X ADDI T I ONS ARE E AC H DONE WI TH TWO PROCESSORS I N ( N/ 2) / 2 ST E P S.
- 81
THUS THE SPEEDUP I S S G ( N ) 7 - 0 ( N * ) , WHI CH I S SAT I SFAC T OR Y .
A MAT R I X- VE C T OR MU L T I P L I C AT I ON C AB I S E QUI VAL E NT TO THE FORMATI ON OF
a L I NEAR COMBI NATI ON OF THE COLUMNS OF A , AND HERE WE USE T HE FACT THAT THE
COMPUTATI ONS FOR THE COMPONENTS OF C ARE I NDEPENDENT. THUS
N
c
i " 1V j V o * i N)
I F A I S N x N. S P E C I A L CASE S OF I MPORTANCE ARE N
B
2 , 3 OR N, AND I T I S SE E N
THAT GOOD SPEEDUPS ARE OBT AI NABL E FOR ANY VAL UE S OF N, N AND P. T HE CHOI CE
OF A PAR T I CUL AR METHOD FOR A P I P E L I N E COMPUTER WOULD DEPEND ON THE STORAGE
SCHEME USE D FOR A AND THE R E L AT I VE COSTS OF AN I NNER PRODUCT AND A VECTOR
MUL T I P L Y - AND- ADD.
MURAOKA AND KUCK [ 7 3 ] HAVE CONSI DERED THE E VAL UAT I ON OF A CONFORMABLE
SEQUENCE OF MAT R I X PRODUCTS
A
1
A
2 * * *
A
N WHERE A I S E I T HE R 1 x N, N x N, OR
N x 1, US I NG UNL I MI T E D P AR AL L E L I S M. I T I S NECESSARY TO ASSOC I AT E THE PRODUCTS
CORRECTL Y I N ORDER TO MI NI MI ZE THE COMPUTATI ON T I ME , AND A MI NI MAL WEI GHT
PAR SI NG AL GORI THM I S G I VE N .
- 28-
Th e par al l el eval uat i on of ar bi t r ar y mat r i x expr essi ons i s di scussed
by Mar uyama [ 73] and Kuck and Mar uyama [ 75] , who gener al i ze ear l i er r esul t s
f or scal ar expr essi ons. I f , wi t h unl i mi t ed par al l el i sm, N x N mat r i ces may
be added, mul t i pl i ed and i nver t ed i n t ^, t ^ and t ^ st eps r espect i vel y, t hen
any mat r i x expr essi on i nvol vi ng n N x N mat r i ces and no i nver si ons may be
eval uat ed i n 2j l ognj ( t ^+ t^. ) st eps. I f i nver si ons ar e necessar y t hen
| ~l ogn| ( 2t
A
+ 3 ^ + t j ) - + t j st eps ar e suf f i ci ent .
3. c The Fast Four i er Tr ansf or m
The di scr et e Four i er t r ansf or m of an N- vect or a ( a^, . . . >
a
N-
- j ) i s an-
ot her N- vect or b, wher e
N- 1
b. / c / V , 0 <; i N- 1,
i L y
t h
and 0) i s t he pr i nci pal N r oot of uni t y. We assume f or t hi s sect i on onl y
t hat al l ar i t hmet i c i s done wi t h compl ex number s. The t r ansf or m i s j ust a
2
mat r i x- vect or mul t i pl i cat i on b Fa, so wi t h N pr ocessor s we onl y need t o
gener at e F ( pLogNJ st eps gi ven u)) and per f or m t he mul t i pl i cat i on ( f l ogNl + 1
st eps) . The f ast Four i er t r ansf or m ( Cool ey and Tukey [ 65] ) al l ows us t o
comput e b j ust as qui ckl y but onl y usi ng N pr ocessor s, or i n 0( Nl ogN/ P)
st eps wi t h P pr ocessor s ( Pease [ 68] ) .
For si mpl i ci t y assume t hat N = 2
n + 1
, and f or 0 r N- 1, 0 k n, l et
n
r = [ r
0
r
r
. . r
n
] - \
V j
2
J
, ^ - 0 or 1,
- 29-
f ( r , k) = [ ^. . . r Ôr ^. . . ^] ,
g( r , k) - [ r
k
. r
k
_
r
. . r
0
0. . . 0] ,
h( r , k) = Cv
r
k - l
l r
k +T"
r
n
]
'
r ev( r ) =[ r
n
. . . r
Q
] =g( r , n) .
The par al l el FFT i s per f or med as f ol l ows:
z
- a )
1
, ( 0 <= i N- 1) ;
c. a. , ( 0 <: i ^N- 1) ;
i l
f or k =0 st ep 1 unt i l n dp_
C
i -
C
f <i , k )
+ 2
g( i , k)
C
h( i , k) '
(
*
l S H
-
1
>!
b
i "
c
r e v ( i ) '
(
S i S N
-
1 )
'
Except f or t he i ni t i al comput at i on of z^, whi ch may be done i n a var i et y of
ways, t he al gor i t hm r uns i n 2TN/ P] f l ogNJ compl ex ar i t hmet i c st eps wi t h P
pr ocessor s. I n essence, F has been f act or ed i nt o f l ogNJ ver y si mpl e mat r i ces,
and t he cumul at i ve pr oduct i s comput ed. However , t he i nt er pr ocessor dat a
movement s ar e j ust as i mpor t ant as t he ar i t hmet i c cost s, and i t may be obser ved
t hat ei t her f ( r , k) - r and h( r , k) - r +2
N
~
K
or f ( r , k) - r - 2
N
"
K
and h( r , k) - r ,
so t he movement s f or c wi t hi n t he l oop ar e wel l st r uct ur ed ( Pease [ 68] , St one
[ 71] ) .
- 30-
4. LI NEAR SYSTEMS
I n t hi s sect i on we consi der t he par al l el sol ut i on of ar bi t r ar y l i near
syst ems and some speci al syst ems of pr act i cal i nt er est ; most of t he met hods
ar e di r ect r at her t han i t er at i ve. Lambi ot t e [ 75] cover s many of t hese t opi cs
wi t h r espect t o t he CDC STAR, and t her e ar e a l ar ge number of t echni cal r e-
por t s deal i ng wi t h i mpl ement at i on on I l l i ac I V ( see Pool e and Voi gt [ 74] ) .
Thr oughout t hi s sect i on A wi l l denot e a r eal nonsi ngul ar mat r i x, and we wi l l
N N x M
at t empt t o sol ve Ax = v, v R , and AX = B, B g R
4. a. Gener al Dense Mat r i ces
- 1 2
I f x = A v t hen each component of x depends on t he N component s of A
2
and t he N component s of v, so at l east m( P, N +N) st eps ar e r equi r ed t o com-
put e x wi t h P pr ocessor s. Thi s number i s about 21ogN f or l ar ge P. Supposi ng
t hat t he par al l el pr ocessor uses w- bi t f l oat i ng poi nt ar i t hmet i c, w < , l et
w
S be t he set of al l N- vect or s such t hat z f l ( z) ; t her e ar e at most 2 N
vect or s i n S. Now, i t may be shown t hat i f y G S sat i sf i es
llAy-vIL - mi n HAz- v^.
zS
2
t hen ( A+E) y v, wher e ||E i s smal l . Usi ng N | s | pr ocessor s we can comput e
I J Az- vH^, f or each z 6 S, i n 1 + f l og( N+1) ] + [logtt\ st eps. The mi ni mi zat i on
over S may be done i n ( l og|s]| st eps, f or a t ot al of at most 31ogN + w + 4
st eps. Never t hel ess, we must r ej ect t hi s appr oach, f or i t r equi r es an ex-
cessi ve number of pr ocessor s and oper at i ons when we r egar d w as a pr obl em
par amet er , hence gi vi ng no usef ul i nf or mat i on f or an act ual par al l el pr ogr am,
and al so f ai l s t o gi ve a cl ue f or t he t r eat ment of r eal number s ( w ) .
- 3 1 .
Let T( P, N, M) be t he mi ni mum number of st eps r equi r ed t o comput e A
usi ng P pr ocessor s; t he t r i vi al l ower bound T( P, N, 1) ^ m( P, N^+N) has j ust
been ment i oned. I t i s cl ear t hat i t i s no mor e di f f i cul t t o comput e A ^B
t han A ^v; si mpl y comput e, i n par al l el , A ^B_. wher e B^ i s t he j *"*
1
col umn of
B. That i s, T( MP, N, M) T( P, N, 1) . I t i s al so cl ear t hat T i s an i ncr easi ng
f unct i on of N f or any val ue of P. No one has yet been abl e t o advance much
beyond t hi s l ower bound f or gener al par al l el al gor i t hms.
For some t i me i t was bel i eved t hat t he Gauss- J or dan el i mi nat i on al gor -
i t hmwoul d pr ove t o be t he f ast est way t o comput e A ^v gi ven any number of
pr ocessor s. The r unni ng t i me of t hi s met hod i s l i near i n N when pi vot i ng i s
not used, and a maj or open pr obl emwas t o cl ose t he gap bet ween t he l ogar i t h-
mi c l ower bounds and t he l i near upper bounds f or T( P, N, 1) . I n f act i t i s not
har d t o gi ve a pl ausi bi l i t y ar gument ( but not a pr oof I ) t hat at l east N st eps
ar e needed f or any r ecur si ve al gor i t hm such as el i mi nat i on. We now know
- 1 2
t hat A may be comput ed i n O( l og N) st eps usi ng a bounded number of pr ocessor s
( Csanky [ 75] ) . Al l t he mat hemat i cal t ool s i nvol ved i n t hi s r esul t ar e cl ass-
i cal , but i t s i mpor t ance i s dr awn f r om t he moder n not i ons of sequent i al and
par al l el comput at i on.
Suppose t hat t he ei genval ues of A ar e ^, 1 j <>N, and i t s char act er -
i st i c pol ynomi al i s
N
f ( z) = J | (E- aj ) - det ( zI - A)
4 1
- *
N
+ C 2
N
-
]
+ ... +
C Z +
N
Let s. = t r ace(A^\ = V i ,
I - < > I a,; y THE H E W C O L D E N C L T I E S ^ ^ ^
- 32-
coef f i ci ent s of a pol ynomi al we have t he t r i angul ar syst em
s
2 S ]
S
N- 1
'1
By t he Cayl ey- Hami l t on Theor em f ( A) = 0, so we can wr i t e
A"
1
= - ( A*"
1
+
C l
A
N
"
2
+ . . . + c
N
_
2
A + c Î ) / ^.
N /
Not e t hat = (- 1) det ( A) f 0 si nce A i s nonsi ngul ar . The met hod i s t o com-
put e al l t he power s of A and t he t r aces s^, sol ve f or t he coef f i ci ent s of
t he char act er i st i c pol ynomi al , and f i nal l y use t he power s and coef f i ci ent s t o
eval uat e A ^.
To det er mi ne t he r unni ng t i me, r ecal l t hat z*", 1 i N, can be comput ed
i n n
85
[l ogNJ st eps wi t h N/ 2 pr ocessor s. Thi s i s easi l y ext ended t o show
i 3
t hat A , 1 i N, can be comput ed i n n( n+l ) st eps wi t h N( N ) / 2 pr ocessor s.
N
V 2
Si nce t r ace( A) ) a. , , we can comput e s. , 1 ^ i ^ N, i n n st eps usi ng N
pr ocessor s. Lat e^7^
I N
Sect i on 4. b, we wi l l show t hat a t r i angul ar l i near
syst em can be sol ved i n (nH- 1) (i tf-2)/ 2 st eps usi ng N
3
/ 68 + O( N^) pr ocessor s.
( Thi s par t i cul ar r esul t i s due t o Chen and Kuck [ 75] . ) Fi nal l y, A ^ may
3 - 1
be comput ed i n n+2 st eps usi ng N pr ocessor s. Si nce x A v r equi r es onl y
2
n+1 mor e st eps wi t h N pr ocessor s, we have
21ogN < T( N
4
/ 2, N, 1) ( 3/ 2) l og
2
N 4- O( l ogN) .
- 33-
The number of pr ocessor s r equi r ed can be r educed somewhat whi l e pr e-
2
ser vi ng t he O( l og N) t i me by comput i ng i nner pr oduct s i n O( l ogN) st eps
usi ng Of t / l ogN) pr ocessor s. Thus i f P 0( N
4
/ l ogN) , T( P, N, 1) 0( l og
2
N) .
Not e al so t hat i f A i s M X N, M N, and A has f ul l col umn r ank, t hen we
t T
can comput e A , t he gener al i zed i nver se of A, si nce A A i s posi t i ve def i ni t e
and A
1
= ( A
T
A) "V
As f or st abi l i t y, t he eval uat i on of c
M
i s ext r emel y sensi t i ve t o r ound-
N
i ng er r or s commi t t ed i n t he eval uat i on of t he t r aces s^. I n many cases
sever e cancel l at i on wi l l occur ( see t he di scussi on of Lever r i er ' s met hod i n
Wi l ki nson [ 65] )
f
and a ver y l ar ge number of f i gur es must be car r i ed i n or der
t o obt ai n a r easonabl e comput ed val ue of A \
I n summar y, we have an excel l ent t heor et i cal r esul t , but i t wi l l be of
l i t t l e hel p i n cr eat i ng pr ogr ams f or r eal par al l el comput er s. For t hi s we
must r et ur n t o t he st andar d el i mi nat i on met hods, whi ch ar e known t o be st abl e
and whi ch have enough i nher ent par al l el i sm t o al l ow ef f i ci ent execut i on on
par al l el and pi pel i ne comput er s. We suspect t hat t her e i s a conser vat i on
l aw f or l i near syst ems, whi ch st at es t hat i f a st abi l i t y cr i t er i on i s t o be
met t hen a cer t ai n number of ar i t hmet i c st eps must be per f or med. We bel i eve t hat
t echni ques devel oped by W
#
Mi l l er [ 75] ar e wor t h pur sui ng f or par al l el comput at i on.
We di scuss f our el i mi nat i on met hods: Gauss- J or dan, t he LU and QR de-
composi t i ons, and a met hod due t o Pease [ 74] . Each has di f f er ent mer i t s
accor di ng t o t he comput i ng model and char act er i st i cs of A. I n each case we
i dent i f y B wi t h col umns N+1 t hr ough N+M of A, and comput e X m A
- 1
B by per -
f or mi ng oper at i ons on whol e r ows of t he augment ed mat r i x. For not at i onal
t h
conveni ence "r ow i " wi l l r ef er t o t he i r ow of ( A, B) .
- 34-
The Gauss- J or dan al gor i t hm i s t he si mpl est t o descr i be. Assumi ng
pi vot i ng i s not necessar y f or st abi l i t y, we have
Al gor i t hm G- J :
f or j = 1 st ep 1 unt i l N do
r ow i - r ow i - ( a_ / a_ ) r ow j , (1 î ^N , i ^j ) ;
x . - a. 7a. . , (1 <, i <; N; N+1 ^ j ^ N+M)
i j i j i i
Usi ng ( N- 1) ( N+M) pr ocessor s t hi s r equi r es 3N+1 st eps. Not e t hat A ^ can
be comput ed i n par al l el wi t h A ^B, so t hat Y = A i s l at er obt ai nabl e by
a mat r i x mul t i pl i cat i on.
I f a ^ =* 0 at some poi nt i n Al gor i t hm G- J , t hen i t i s suf f i ci ent ( mat he-
mat i cal l y at l east ) t o sear ch col umn j bel ow t he di agonal t o di scover a non-
zer o pi vot bef or e per f or mi ng t he el i mi nat i on st ep. Thi s adds an addi t i onal
number of compar i son st eps equal t o
N- 1
V
L
j - i
j l og( N- j ) ] = Nn - 2
N
- n + 1, n = [ l og( N- 1) | .
Thus t he ef f or t expended i n par t i al pi vot i ng can over whel m t he ar i t hmet i c
ef f or t , i n cont r ast t o t he si ngl e pr ocessor case wher e pi vot i ng does not
r adi cal l y change t he r unni ng t i me.
Sameh and Kuck [ 75b] descr i be one way t o over come t he pi vot i ng pr obl em.
Usi ng t he squar e- r oot - f r ee Gi vens t r ansf or mat i ons ( Gent l eman [ 73] ) , a di -
agonal mat r i x D, an upper t r i angul ar mat r i x R and an or t hogonal mat r i x Q,
1/ 2
such t hat QA = D R, can be obt ai ned i n 8N- 7 st eps ( onl y t wo of t hem
2
i nvol vi ng squar e r oot s) wi t h N pr ocessor s. Q i s comput ed i mpl i ci t l y as a
- 35-
pr oduct of si mpl er mat r i ces. For not at i onal pur poses we use a pr ocedur e
Rot at e( i , j ) whi ch appl i es a r oot - f r ee Gi vens t r ansf or mat i on t o r ows i - 1
and i i n or der t o el i mi nat e a. . , wher e 1 ^ j < i ^ N. Rot at e t akes car e
of managi ng t he scal e f act or s i n D, and t he r ows ar e i nt er changed i f
necessar y f or st abi l i t y. The t r i angul ar i zat i on al gor i t hm i s t hus
f or k 1 st ep 1 unt i l N- 1 do
beî n Rot at e( N- 2p, k- p) , ( 0 <>p <: mi n( k- 1 , N- k- 1) ) ;
Rot at e( N- 2p- 1 , k- p) , ( 0 p <>mi n( k- l , N- k- 2) )
end ,
Once A has been r educed t o t r i angul ar f or m, any of t he met hods of Sect i on 4. b
can be used t o sol ve t he r esul t i ng t r i angul ar syst em.
I f onl y N pr ocessor s ar e avai l abl e ( cf . Pease [ 67] ) t he Gauss- J or dan
2
al gor i t hmwi t hout pi vot i ng uses N + 2NM + M ar i t hmet i c st eps. Par t i al
pi vot i ng i s not cost l y si nce O( Nl ogN) compar i son st eps st i l l suf f i ce.
Al gor i t hm G- J ( N pr ocessor s) :
f or j 1 st ep 1 unt i l N do
begi n t -
a
i j
/
a
j j
* O * i * N, i +j ) ;
f or k - j +1 st ep 1 unt i l N-+M do
a
i k -
a
i k ~ V j k ' (1 ^ i ^ N, i ^ j )
end;
f or j - N+1 st ep 1 unt i l N+M do
X
i j "
a
i j ^
a
i i
> 0
*
1
*
N ) #
- 36-
Th i s i s, of cour se, onl y one of many possi bi l i t i es. I t has t he advan-
t age t hat t he onl y dat a movement s needed ar e t o spr ead a col umn acr oss t he
pr ocessor s and t o br oadcast one number t o al l pr ocessor s. The al gor i t hm
decomposi t i on modi f i cat i on t o P pr ocessor s i s easi l y done.
For a pi pel i ne comput er i t i s bet t er t o use Gaussi an el i mi nat i on, as
t hi s mi ni mi zes t he number of ar i t hmet i cs when we ar e l i mi t ed t o oper at i ons
on whol e r ows and col umns ( Kl uyuev and Kokovki n- Shcher bak [ 65] ) . The met hod
due t o St r assen [ 69] , whi ch uses asympt ot i cal l y f ewer oper at i ons by per f or m-
i ng bl ock el i mi nat i on, r equi r es some compl i cat ed addr essi ng schemes t hat may
be qui t e di f f i cul t t o i mpl ement . Because of t he r est r i ct i ve def i ni t i on of
vect or oper ands on t he STAR, i t i s necessar y t o pr ovi de t wo set s of Gauss
el i mi nat i on r out i nes, dependi ng on whet her A i s st or ed by r ows or col umns
T
( Lambi ot t e [ 75] ) . I n addi t i on, t he symmet r i c f act or i zat i on A = I DL when
A i s posi t i ve def i ni t e r equi r es bot h r out i nes, f or i f L i s st or ed by r ows
T
( col umns) t hen L i s st or ed by col umns ( r ows) .
The pr ecedi ng el i mi nat i on al gor i t hms ar e f ami l i ar ones f r om sequent i al
comput at i on, and we have r eal l y onl y gi ven par al l el i mpl ement at i ons of t hem.
Pease [ 74] r ecent l y pr esent ed a new al gor i t hm f or t he sol ut i on of a gener al
2
syst em ( cf . Pease [ 69] ) . Al t hough t he met hod woul d r equi r e 0( N l ogN) st eps
2
usi ng N pr ocessor s ( as opposed t o Gauss- J or dan
1
s 0( N ) st eps) , i t i s i nt er -
est i ng i n t hat any par al l el comput er wi t h an i nt er pr ocessor communi cat i on
net wor k desi gned f or t he FFT can i mpl ement i t r easonabl y wel l . I n par t i cul ar ,
speci al pur pose FFT devi ces coul d be modi f i ed t o suppor t t he al gor i t hm, and
i t i s expect ed t hat t hi s wi l l be t he pr i mar y appl i cat i on; i t i s not r ecom-
mended f or ot her si t uat i ons.
- 37-
We gi ve a si mpl e r ecur si ve def i ni t i on of Pease' s met hod, t hough t hi s
does not i l l umi nat e r el at i ons wi t h t he FFT. Li ke ot her el i mi nat i on al gor -
i t hms i t wor ks by pr emul t i pl yi ng t he augment ed mat r i x. Suppose f or si mpl i ci t y
t hat N 2
n
Al gor i t hmP( n) :
i f n 0 t hen x - A *v el se
begi n
l et A ( ) , x
n1 n 1
wher e A^ and ar e 2 x 2 " ;
si mul t aneousl y sol ve A^( F^
9
g^) ( E^
f
v^)
and A
2
( F
2
, g
2
) - ( E
2 >
v
2
)
by Al gor i t hm P( n- l ) ;
si mul t aneousl y sol ve ( I - F
1
F
2
) x
1
- ( g ^F ^) ,
and ( I -FJ F^XJ = ( gj - Fj gj )
by Al gor i t hm P( n- l )
end.
I n t he f i r st appl i cat i on of Al g
o r i t h m P ( n
.
1 ) f A j
.
y
s t r ansf or med i nt o
I F, \ / x,
F
2 vU' W
Thi s syst em i s pr emul t i pl i ed by
1
F2 I
t o f or m
- 38-
o r - *
2
^K
2
J \ g
2
- F
2 g l
The l ast st ep i s si mpl y t he cycl i c r educt i on met hod or i gi nal l y suggest ed f or
i mpl i ci t use wi t h l ar ge scal e i t er at i ve met hods ( Var ga [ 62] ) . The numer i cal
st abi l i t y of Al gor i t hm P has not yet been di scussed t hor oughl y; a manageabl e
pi vot i ng scheme must be gi ven wi t h a cor r espondi ng er r or anal ysi s.
4. b. Tr i angul ar Syst ems
We have seen t hat t he sol ut i on of dense syst ems r equi r es t he abi l i t y
t o sol ve mor e speci al i zed syst ems qui ckl y. Suppose now t hat A i s a nonsi ngul ar
t r i angul ar mat r i x; wi t hout l oss of gener al i t y we can assume A i s l ower t r i -
angul ar . I t i s easi l y ver i f i ed t hat t he st r ai ght f or war d sequent i al sol ut i on
2
of Ax
3
v r equi r es N ar i t hmet i c oper at i ons. The par al l el sol ut i on of Ax = v
was f i r st consi der ed by Hel l er [ 74a] , who showed t hat x can be comput -
2 4
ed i n O( l og N) st eps wi t h 0( N ) pr ocessor s. The or i gi nal met hod was r at her
compl i cat ed, i nvol vi ng t he eval uat i on of l ower Hessenber g det er mi nant s by
r ecur si ve doubl i ng, and t he r esul t has si nce been i mpr oved by r educi ng t he
3
pr ocessor r equi r ement t o 0( N ) . I t i s of some i nt er est t o consi der t hr ee di s-
t i nct but si mi l ar al gor i t hms f or t hi s pr obl em. These al gor i t hms succeed by
use of r ecur si on and doubl i ng, i n t hat t he number of compl et ed comput at i ons
( i . e. , component s of x) doubl es at each st age.
The f i r st met hod ( Chen and Kuck [ 75] ) , i s a var i at i on of Gauss- J or dan
el i mi nat i on, and mi ght be cal l ed el i mi nat i on by di agonal s. "Row i " agai n
r ef er s t o a r ow of t he augment ed mat r i x, wher e f or not at i onal si mpl i ci t y we
set a. . = 0 i f i <: 0 or j 0.
- 39-
Al gor i t hm C- K:
i ca
r ows.
A. Let
THE SECOND ETHD (BORODI N AND MUNRO [75]) N. ES PARTI TI ONI NG TO I NVERT
WHERE A, AND a, ARE LOWER TRI ANGNI AR AND AJ I S MX .. SO THAT
r ow i - ( r ow i ) / a
i i
, (1 <: i <. N) ;
f or j = 1 st ep j unt i l N- 1 do
2j - l
r ow i - r ow i - ) a , r ow i - k, ( j +1 <, i <. N) ;
k=j
2
Usi ng l ess t han N ( N+l ) / 2 pr ocessor s, f or each j we can do al l t he mul t i pl i
t i ons i n t he l oop i n par al l el , f ol l owed by t he l og- sum addi t i on of j +1
The t i me i s t hus, f or n JLOGNJ ,
n- 1
1 + ^
1
+ f l og( 2
k
+l ) L = ( n
2
+3n+2) / 2 = 0( l og
2
N) .
k0 '
By a cl oser anal ysi s of t he number of pr ocessor s needed when onl y essent i al
oper at i ons ar e per f or med, N
3
/ 68 + 0( N
2
) pr ocessor s suf f i ce. Chen [ 75] shows
2
t hat i f A i s a Toepl i t z mat r i x (A - ) t hen N / 4 pr ocessor s ar e suf f i ci -
2
ent . I t i s al so seen t hat AX B can be sol ved i n O( l og N) st eps wi t h no
2
mor e t han N ( N4W) pr ocessor s.
- 40-
The al gor i t hm pr oceeds i n t wo st ages, f i r st si mul t aneousl y i nver t i ng A
]
and
A and sol vi ng A Y = A , and t hen mul t i pl yi ng Y and A ~ \ The t ot al t i me i s
t ( 0 = 1,
t ( N) = mi n ( max( t ( m) , t ( N- m) ) + 1 + ["l ogml )
1 ^ XT '
1^n<N
usi ng O( N^) pr ocessor s. The pr oper choi ce f or m i s [N/ 2~| ; t hus t ( 2
n
) = ( n
2
+n) / 2
2
and i n gener al t ( N) = 0( l og N) .
A t hi r d met hod was obt ai ned i ndependent l y by Hel l er [ 74b] and Or cut t [ 74] .
Supposi ng t hat A has a uni t di agonal , l et A = I - L, wher e L i s st r i ct l y l ower
N
t r i angul ar . Si nce L
8 3
0
x = A
- 1
v - ( I + L + L
2
+ . . . + L
N
"
1
) v
2n- 1 2
n
~
2
= ( I + L ) ( I + L ) . . . ( I + L) v
wher e agai n n
a
f l ogNJ . x i s comput ed by r epeat edl y squar i ng L and accumul at -
i ng mat r i x- vect or pr oduct s accor di ng t o t he above f or mul a. The t i me r equi r ed
2 2
i s at most n + n st eps usi ng at most N ( N+l ) pr ocessor s.
Thi s t echni que al so pr ovi des a si mpl e al gor i t hm f or l i near r ecur r ences.
Suppose t hat a 0 f or i - j > m, so t he syst emAx = v r epr esent s an m*"*
1
or der l i near r ecur r ence wi t h i ni t i al val ues. We assume t hat m N. Par t i -
t i on A i nt o m X m bl ocks A
j f
1 * i , J * n - [ N/ HI] may be smal l er t han
m X m) ; t her e ar e onl y t wo bl ock di agonal s t hat ar e nonzer o, namel y A and
A . , Now consi der
i , i - 1
- 41-
A* = di ag( Aj ]
f
. . . , A^) A,
* - 1 - 1
v di ag( A, , , . . . . /
y
nn
di ag( A
1
, . . , A^) v
f
and not e t hat A. . = I A. . - = A. . A. . Usi ng 2mN pr ocessor s, A and
n ' l i i , i - 1
* 2
v may be comput ed i n O( l og m) st eps si nce A. , i s t r i angul ar . Mor eover ,
power s of L I - A al l have a si ngl e bl ock di agonal . To comput e L
f r omL r equi r es t he par al l el comput at i on of n- 2 m x mmat r i x pr oduct s,
whi ch may be done i n 1 + (l ognj st eps. Col l ect i ng t hese r esul t s, x ( t he
f i r st N t er ms of t he r ecur r ence) may be comput ed i n O( l ogm l ogN) st eps.
2
I t i s not har d t o show t hat 0( m l ogN) st eps suf f i ce wi t h N pr ocessor s.
We next appl y al gor i t hm and pr obl emdecomposi t i on i n or der t o r educe
3
t he pr ocessor r equi r ement s f or t r i angul ar syst ems bel ow N and st i l l obt ai n
f ast al gor i t hms. These r esul t s ar e due t o Hyaf i l and Kung [ 74b] ; si mi l ar
i deas appear i n Chen [ 75] .
Fi r st consi der t he f ol l owi ng scheme t o sol ve Ax v.
X
i *~ V
( 1
*
1
*
N ) ;
f or j 1 st ep 1 unt i l N do
begi n x. - x. / a. . ;
J J J J
X
i
X
i "
a
i j V *
1
*
N )
end.
Usi ng P <; N pr ocessor s t hi s r equi r e
s
N
I 1 + [ <N- J ) / p| < N
2
/ P + 2N
J - l
st eps. Usi ng vect or i nst r uct i ons on a pi pel i ne machi ne, t he ar i t hmet i c cost
woul d be
-42-
(T, + T )N
2
/ 2 + (t. - T V2 - T / 2 + a, + a )N.
+ x -F X + X
These are both acceptable speedups, since the scalar time would be
(t
+
+ t
x
)N
2
/2 + (t^ - tjl - t
x
/2)N.
Now suppose that P = N processors are available, 1 < r < 3. Let
m = [ P^
3
| and partition A into m X m blocks A , 1 i, j <, n = fN/RNL
(A may be smaller than m X m) . Similarly partition x and v, and apply
nn
the following algorithm:
x. - v., (1 i ^ n) ;
1 1
for j =* 1 step 1 until n do
begin x^ -
A
J J
X
J '
x. - x. - A. .x., (j+1 <> i <> n)
1 1 IJ J
end.
-1 3
The computation of
A
j j
x
j *-
s
done using the fast algorithm, since m processors
are available. The total time required is, for k
6 5
Jlogm] ,
n
I (k
2
+3k+2)/2 4- [(n-j)/m](k+2)
j-1
= 0(nk
2
+ n
2
k/m) - 0 ( N
]
~
r
/
3
l o g
2
N + N
2
"
r
l o g N ) .
If 3/2 r < 3 the first term dominates, and if 1 < r < 3/2 the second term
dominates. Thus there is a tradeoff between the use of the fast algorithm
and the cost of combining the results of its application.
- 43-
4. c. Tr i di agonal Syst ems
St one [ 73a] f i r st di scussed t he sol ut i on of a t r i di agonal syst emon a
par al l el comput er , r el at i ng t he LDU decomposi t i on of A t o f i r st and second
or der l i near r ecur r ences. The f act t hat t hese r ecur r ences ar e r el evant was
not new, but St one devel oped r ecur si ve doubl i ng al gor i t hms t o comput e t he
necessar y t er ms i n O( l ogN) st eps wi t h N pr ocessor s. Thus i t i s possi bl e t o
sol ve t r i di agonal syst ems i n near l y mi ni mal t i me, si nce at l east m( N, 4N- 2)
( about l ogN) st eps ar e necessar y.
For not at i onal conveni ence we wr i t e a t r i di agonal mat r i x as a t r i pl e of
vect or s, so
A =
1
N- 1
' N- 1
CJ . BJ . C)
wher e we assume
S ]
= c
N
= 0. The I DU f act or i zat i on expr e
sses
A = LDU = (1 l, 0) ( 0,d . , 0) ( 0,l,n)
J
J J
and i t i s easi l y ver i f i ed t hat
u.
J
V
d
j - T
2
*
j
*
N
'
c
j /
d
j 1 ^ j N.
- 44-
Ax
88
v i s t hen sol ved by sol vi ng Lw = v and Ux = D ' w. The bi di agonal sys-
t ems r epr esent f i r st or der l i near r ecur r ences, and D V i s comput abl e i n
a si ngl e par al l el di vi si on st ep usi ng N pr ocessor s. Si nce L and U ar e com-
pl et el y det er mi ned by D, al l t hat r emai ns i s a f ast met hod of comput i ng D.
Thi s i s f ound by def i ni ng p
A
1, p- = b, , p. = b. p. - - a c.
n
p.
0
and
0 H V *j J J - 1 j J - 1 j - 2
obser vi ng t hat d_. = p / p y Thus t he par al l el comput at i on of a second
or der r ecur r ence compl et es t he al gor i t hm.
Unf or t unat el y t hi s met hod wi l l f ai l i f pi vot i ng i s necessar y i n t he
sequent i al f act or i zat i on. One way t o avoi d t hi s pr obl em i s t o consi der t he
QR f act or i zat i on of A, wher e Q i s f or med f r om t he pr oduct of N- 1 Gi vens
r ot at i ons and R i s upper t r i angul ar wi t h r _ = 0 i f j - i > 2. Sameh and Kuck
[ 75b] show t hat by use of l i near r ecur r ences Q and R may be comput ed i n
O( l ogN) st eps wi t h N pr ocessor s, so A may be sol ved i n O( l ogN) st eps. We
del ay t he pr esent at i on of t he det ai l s of t hi s met hod unt i l Sect i on 5, wher e
t he QR al gor i t hm f or t he ei genval ues of a symmet r i c mat r i x i s di scussed.
The met hods of odd- even el i mi nat i on and r educt i on ar e anot her cl ass of
par al l el al gor i t hms wi t h some qui t e di f f er ent char act er i st i cs. We f i r st
di scuss odd- even el i mi nat i on. I n keepi ng wi t h pr evi ous al gor i t hms, we de-
scr i be t he met hod i n t er ms of r ow oper at i ons on t he augment ed mat r i x, al -
t hough a comput er i mpl ement at i on woul d be i n t er ms of t he vect or s descr i b-
i ng A and v. I t i s essent i al t hat pi vot i ng not be used, so t her e ar e some
r est r i ct i ons on t he appl i cat i on of t he al gor i t hm. For not at i onal conveni -
ence, l et a ^ 6 f or i ndi ces out si de t he r anges 1 ^ i ^ N, 1 ^ j ^ N+1.
- 45-
Al gor i t hmE:
A . > I + K ( R O W I " * ) / A I + K ) I + K , (1 * 1 * N ) ;
X
i -
a
i , N
+
l /
a
i i ' *
N )
-
Thi s i s not t he onl y possi bl e r ow oper at i on, but ot her s di f f er onl y by
scal i ng. The r ow el i mi nat i on pr eser ves t he f act t hat A has onl y t hr ee di -
agonal s, but as t he al gor i t hm pr ogr esses t he di agonal s move f ur t her and
f ur t her apar t , unt i l onl y a di agonal mat r i x r emai ns. Thus each execut i on of
t he l oop body r equi r es 13 st eps wi t h N pr ocessor s, f or a t ot al of O( l ogN)
st eps t o sol ve Ax = v. I n addi t i on, i t may be shown t hat i f A i s st r i ct l y
di agonal l y domi nant t hen t he t wo of f - di agonal s decr ease i n magni t ude r el a-
t i ve t o t he mai n di agonal , and t he r at e of decr ease i s quadr at i c ( St one [ 75a] ,
J or dan [ 74] , Hel l er [ 74c] ) . I f an appr oxi mat e sol ut i on i s desi r ed i t i s
t her ef or e possi bl e t o l eave t he l oop bef or e i t i s compl et ed and use t he
comput ed x as t he appr oxi mat i on.
Al gor i t hm E has an i mpor t ant var i at i on, whi ch was act ual l y t he or i gi nal
f or mul at i on ( Hockney [ 65] ) , namel y odd- even ( or cycl i c) r educt i on. I t i s
cal l ed r educt i on because i t i mpl i ci t l y gener at es a sequence of t r i di agonal
syst ems A ^ x ^ each hal f t he si ze of t he pr evi ous syst emand f or med
by el i mi nat i ng t he odd- i ndexed var i abl es and savi ng t he even- i ndexed var i abl es.
x
1
i s t hen obt ai ned by back subst i t ut i ng x ^
i + 1
\ f i nal l y ar r i vi ng at
whi ch i s t he sol ut i on t o t he or i gi nal pr obl em. For si mpl i ci t y, assume t hat
N - 2
n
- l , n ^ 1, and l et x
Q
- x
N + 1
= 0.
f or k - 1 st ep k unt i l N- 1 do
ROW i - ROW i - A . ^ C R O W I - 1 0 / a ^
_
k
- 46-
Al gor i t hm R:
The r educt i on al gor i t hmhas sever al advant ages over t he el i mi nat i on
al gor i t hm, despi t e t he f act t hat i t i s sl ower when N pr ocessor s ar e avai l -
abl e. The f i r st obser vat i on i s t hat i t i s equi val ent t o Gaussi an el i mi na-
T
t i on ( i n t he usual sequent i al sense, wi t hout pi vot i ng) appl i ed t o PAP , P
a par t i cul ar per mut at i on mat r i x. I n f act , t hi s i s j ust t he nest ed di ssec-
t i on or der i ng ( Wi dl und [ 72] , Bi r khof f and Geor ge [ 73] ) . The concept of
r eor der i ng a syst em t o i ncr ease t he i nher ent par al l el i sm of a sequent i al
al gor i t hm i s a ver y i mpor t ant one, as wi l l be seen i n l at er exampl es. The
second obser vat i on i s t hat 0( N) ar i t hmet i c oper at i ons ar e per f or med, as op-
posed t o O( Nl ogN) oper at i ons i n odd- even el i mi nat i on and i n t he r ecur r ence
modi f i cat i ons of t he LU and QR f act or i zat i ons. For l ar ge val ues of N, t he
r educt i on al gor i t hm i s t her ef or e pr ef er r ed f or i mpl ement at i on on pi pel i ne
and par al l el comput er s wi t h f i xed par al l el i sm, wher e t he oper at i on count
i s as i mpor t ant as t he st ep count . I n t he t er mi nol ogy of Sect i on 2. b, odd-
even r educt i on i s asympt ot i cal l y consi st ent , whi l e t he ot her met hods ar e not .
Pr ogr am t i mi ngs f or t he CDC STAR ar e gi ven by Lambi ot t e and Voi gt [ 75] ,
compar i ng odd- even r educt i on, sequent i al Gaussi an el i mi nat i on, and a
f or k = 1 st ep k unt i l N do
r ow i <- r ow i - a. . . ( r ow i - k) / a. .
1,1- k i - k, i - k
"
a
i ,i4- k
( r
W i + k ) / a
i
+
k , i
+
k '
( i = 2k, 4k, 2
n
- 2k) ;
f or k = 2
n
"
1
st ep - k/ 2 unt i l 1 do
X
i ~
( a
i , N
+ 1
"
a
i , i - k
X
i - k "
a
i , i " f k
X
i - Hc
) / a
i i '
( i - k, 3k, 5k, 2
n
- k ) .
- 47-
consi st ent var i at i on of St one' s met hod; St one [ 75a] gi ves a di f f er ent con-
si st ent var i at i on. For N smal l er t han about 100 t he sequent i al al gor i t hm
i s best , as t he ef f ect of vect or st ar t ups i s f el t by t he ot her t wo met hods.
For N l ar ger t han about 100,odd-even r educt i on i s i ndeed t he best met hod;
f or ver y l ar ge N ( about 2^) i t r equi r es about 14$ of t he comput at i on t i me
f or sequent i al Gaussi an el i mi nat i on, and about 204 of t he t i me f or t he con-
si st ent r ecur si ve doubl i ng al gor i t hm. I t i s al so possi bl e t o cr eat e a pol y-
al gor i t hm by usi ng odd- even r educt i on unt i l a r educed syst em i s obt ai ned
wher e t he sequent i al met hod i s f ast er .
I f an appr oxi mat e sol ut i on i s sat i sf act or y and pr evi ous comput at i ons
yi el d good i ni t i al appr oxi mat i ons an i t er at i ve met hod may be i ndi cat ed.
One obvi ous t echni que i s t o r eor der t he equat i ons accor di ng t o t he r ed- bl ack
scheme, so
wher e and A^ ar e bi di agonal and and ar e di agonal . Because of t hi
decoupl i ng, t he SOR- t ype met hods have consi der abl e i nher ent par al l el i sm,
and ar e easi l y adapt ed t o par al l el comput at i on ( Lambi ot t e and Voi gt [ 75] ) .
A second t echni que, pr oposed by Tr aub [ 73] and f ur t her devel oped hy
Hel l er , St evenson and Tr aub [ 74] , changes t he sequent i al LDU f act or i zat i on
i nt o a t hr ee- st age i t er at i on. I f
A = A
L
+ A
D
+ A
U
A
L
= (a., 0, 0) ,
A
D
- ( 0, b . , 0) ,
( 0, 0,
C j
) ,
- 48-
t hen D = A
n
- A D
- 1
A so a nat ur al i t er at i on i s = A_ - A
T
( D^
i - 1
^)
_ 1
A .
L* IJ U D LI U
Once t hi s i t er at i on has conver ged, L and U ar e comput ed and Lw = v and
Ux = D ^w ar e sol ved usi ng t he J acobi i t er at i on, whi ch i s i nher ent l y par al l el ,
I t may be shown t hat al l t hr ee i t er at i ons conver ge l i near l y under weak condi -
t i ons on A. Because t hr ee st ages ar e used and because i naccur aci es i n one
st age l i mi t t he at t ai nabl e accur acy i n l at er st ages i t i s i mpor t ant t o car e-
f ul l y choose t he number of i t er at i ons used. However , unl i ke SOR no opt i mal
par amet er s need t o be cal cul at ed, so t er mi nat i on i s t he onl y di f f i cul t i ssue.
I t i s possi bl e t o al t er t he t hr ee i t er at i ons as gi ven i n or der t o i m-
pr ove t he ant i ci pat ed per f or mance on r eal comput er s. Suppose f i r st t hat
di vi si on i s much mor e expensi ve t han addi t i on or mul t i pl i cat i on. On t he
I l l i ac I V, t
+
8 8
7 , t = 9, t = 56 cycl es ( Bur r oughs [ 72] ) . To avoi d di vi -
si on, comput e N = D" i nst ead of D and use one st ep of t he Newt on i t er at i on
f or z \ The new i t er at i on i s t hus
N
( I )
= N
( i
"
1 )
( 2I - ( A
D
- A /
1
"
1
^ ^
1
-
1
) )
Her e N^
1
^ i s t aken t o be an appr oxi mat i on t o (A - A N^
1
" "
1
Â )
- 1
. Conver -
gence of t he N i t er at i on i s sl i ght l y sl ower t han t he D i t er at i on i ni t i al l y,
but i mpr oves as mor e accur acy i s obt ai ned. Thi s var i at i on may be usef ul i f
t. > 3t + t , as i s t he case f or I l l i ac I V.
~ x +
For pi pel i ne comput er s a di f f er ent appr oach i s suggest ed. I f M i t er a-
t i ons ar e used i n t he f i r st st age, whi ch i s
d- " i ni t i al val ue" , (1 j <; N) ;
f or i = 1 st ep 1 unt i l " t er mi nat i on" do
4
J
( 1 )
-
B
J - V J V*" ; " - <
2
^ - >
I
- 49-
t he ar i t hmet i c cost i s M((TI. + T
+
) N + (o^ +A
+
) ) , not count i ng t he cost of
pr ecomput i ng ~
a
j
C
j y
N o w
consi der spl i t t i ng t he l oop:
d^
0 )
- "i ni t i al val ue" , (1 <. j <; N) ;
f or i = 1 st ep 1 unt i l " t er mi nat i on" do
begi n dj
1
* - b . - Y j - I ^j - ^' (1 ^ j N , j even) ;
d
end.
J
U)
-
B
J " VJ - l ^J - l * ^ ^J ^N, J odd)
The cost per i t er at i on i s now (TJ. + T
+
) N + 2 (ct. +t f
+
) , but onl y M/ 2 i t er at i ons
ar e r equi r ed because of t he use of mor e r ecent i nf or mat i on. Thus t he t ot al
ar i t hmet i c cost i s M((T^. + T
+
)N/ 2 + (O^ +CT
+
) ) and t he i t er at i on has been
sped up by near l y a f act or of t wo. The same spl i t t i ng t echni que may be ap-
pl i ed t o t he second and t hi r d st ages, and t hi s cor r esponds exact l y t o t he
Gauss- Sei del i t er at i on wi t h t he r ed- bl ack or der i ng. Numer i cal exper i ment s
usi ng t he STAR t i mi ng i nf or mat i on show t hat t he " accel er at ed" t hr ee st age
i t er at i on wi l l be f ast er t han t he r ed- bl ack opt i mal SOR met hods. I t i s al so pos-
si bl e t o def i ne mor e gener al spl i t t i ngs whi ch coul d gi ve f ur t her i mpr ovement s.
4, d, Bl ock Tr i di agonal and Band Syst ems
We now di scuss some ext ensi ons of t he par al l el al gor i t hms f or t r i di -
agonal mat r i ces. A i s a band mat r i x i f a^_. 0 when | i - j | > m f or some m;
i f m 1 t hen A i s t r i di agonal . A i s a bl ock t r i di agonal mat r i x i f i t can
be par t i t i oned as (
A
i
j ) wher e A ^ i s squar e and A_ = 0 i f | i - j | > 1. We
- 50-
assume f or si mpl i ci t y t hat al l t he bl ocks ar e t he same si ze. Any band
mat r i x i s al so a bl ock t r i di agonal mat r i x, as i s seen by t aki ng mx m
bl ocks, and any bl ock t r i di agonal mat r i x i s cl ear l y al so a band mat r i x.
Thus f or t heor et i cal pur poses we can consi der onl y one of t he t wo cases,
but f or pr act i cal pur poses i t i s mor e ef f i ci ent t o di st i ngui sh bet ween
t hem.
Suppose t hat A i s bl ock t r i di agonal wi t h n x n bl ocks. As wi t h t he
t r i di agonal case we wr i t e
\
A =
1
w
l
3
2
b
2
c
2
V
a
N- 1
b
N- 1
C
N- 1
3
N
b
N
( a. , b. , c. ) ,
wher e b^ C R
n X n
. x and v may be par t i t i oned si mi l ar l y. The st udy of bl ock
t r i di agonal syst ems i s made easi er by t he f act t hat el i mi nat i on met hods
used f or t r i di agonal syst ems may be gener al i zed t o bl ock el i mi nat i on met hods,
As a r ul e, i f t he bl ocks ar e not mut ual l y commut at i ve t hen t he di agonal
bl ocks must be i nver t ed, and t hi s can cr eat e ver y ser i ous st or age pr obl ems
when t he bl ocks ar e or i gi nal l y l ar ge and spar se.
Bl ock el i mi nat i on adds anot her ki nd of i nher ent par al l el i smby al l owi ng
t he par al l el sol ut i on of bl ock syst ems. For exampl e, t he bl ock LU f act or i -
zat i on
d
i -
b
r
g
i - V
f or j
8 3
2 st ep 1 unt i l N (to
d. *- b. - a. u. - ,
7
J J J j - i j j j f j . ,
- 51-
end;
, - 1
X
N *" N
8
N'
f or j = N- 1 st ep - 1 unt i l 1 do
x. - f . - u. x
J J J J
+1
can use Gauss- J or dan el i mi nat i on t o comput e u^ ^ and f ^ .., as wel l as f ast
mat r i x mul t i pl i cat i on i n ot her comput at i ons. O( nN) st eps ar e t hen used i f
2
n pr ocessor s ar e avai l abl e. I t may be obser ved, however , t hat t he comput a-
t i on of t he d sequence cannot be t r ansf er r ed i nt o a second or der l i near
mat r i x r ecur r ence wi t hout assumi ng t hat cer t ai n bl ocks commut e. I f , f or
exampl e, a^ i s nonsi ngul ar , 2 ^ j N, t hen we can wor k wi t h t he new syst em
A x v , wher e we t ake a = I and a* = ( I , a. b. , a.
c
. ) , v* ( a, v . ) . The
1 J J J J J J
t r ansf or mat i on t o a r ecur r ence wi l l now succeed, and an 0( n) + O( l ogn l ogN)
al gor i t hm can be f or mal l y def i ned.
The odd- even al gor i t hms ar e al so easi l y ext ended t o bl ock t r i di agonal
syst ems. Hel l er [ 74c] di scusses t hi s i n some det ai l , and shows t hat i f A i s
st r i ct l y di agonal l y domi nant t hen t he nor ms of t he of f - di agonal bl ocks r el a-
t i ve t o t he di agonal bl ocks decr ease quadr at i cal l y, j ust as i n t he t r i di agonal
case. An i mpor t ant obser vat i on f or t he bl ock case i s t hat t he addi t i onal
st or age r equi r ed by odd- even r educt i on i s about 2N bl ocks, whi l e A i t sel f
r equi r es 3N bl ocks.
As an ext ensi on of t he al gor i t hms f or t r i di agonal mat r i ces and m*"*
1
or der l i near r ecur r ences, Hyaf i l and Kung [ 75] descr i be an al gor i t hm f or
2
0( m N) pr ocessor s r equi r i ng 0( m) + O( l ogm l ogN) st eps f or a syst emof band-
wi dt h m. Tewar son [ 68] descr i bes anot her met hod usi ng a mat r i x r ecur r ence,
and t hi s al so shows t hat banded syst ems can be sol ved i n t hi s t i me. Vect or
-52-
al gor i t hms f or t he r oot - f r ee band Chol esky, symmet r i c band Gaussi an el i mi na-
t i on and pr of i l e el i mi nat i on met hods ar e di scussed by Lambi ot t e [75].
4. e. Syst ems Ar i si ng f r om Di f f er ent i al Equat i ons
The ar ea of di f f er ent i al equat i ons has exer t ed a gr eat i nf l uence on
par al l el comput at i on, f or i t pr ovi des a r ange of di f f i cul t and i mpor t ant
pr obl ems. Because of t hese speci al appl i cat i ons, par al l el comput er s have
been desi gned t o suppor t i n har dwar e some of t he oper at i ons nat ur al l y occur -
i ng i n t he sol ut i on of di f f er ent i al equat i ons. As exampl es, t he i nt er pr o-
cessor connect i ons on t he I l l i ac I V ar e pr eci sel y t hose of t he f i ve poi nt
f i ni t e- di f f er ence mol ecul e used f or t wo di mensi onal el l i pt i c equat i ons, and
t he CDC STAR pr ovi des vect or i nst r uct i ons f or di f f er enci ng and aver agi ng.
Al t hough our pr esent emphasi s i s on t he t r eat ment of di scr et e syst ems
der i ved f r om cont i nuous syst ems, t he par al l el sol ut i on of di f f er ent i al equa-
t i ons must be t aken as a t ot al package. I t i s essent i al t hat t he di scr et i za-
t i on be chosen wi t h par al l el i sm i n mi nd, so t hat t he l i near syst emmay be
gener at ed and sol ved ef f i ci ent l y. Whi l e many st andar d di scr et i zat i on t ech-
ni ques do yi el d i nher ent par al l el i sm i n sequent i al al gor i t hms, t hi s can be
gr eat l y enhanced by maki ng some basi c al t er at i ons, most l y wi t h r egar d t o
boundar y condi t i ons. One of t he i mpor t ant and r est r i ct i ve char act er i st i cs
of SI MD and pi pel i ne comput er s i s t he necessi t y of pr ocessi ng dat a i n a
homogeneous manner . Boundar y condi t i ons f or man except i onal case i n a di s-
cr et i zat i on, and i f not handl ed cor r ect l y wi l l al so f or man except i onal case
i n a pr ogr am. The pr obabl e ef f ect of t hi s woul d be t o br eak up t he f l ow of
par al l el oper at i ons wi t h l engt hy segment s of sequent i al code. I t i s not
our pur pose t o di scuss t hi s t opi c i n much mor e det ai l , so we assume t hat
- 53-
t he di scr et i zat i on i s gi ven and wi l l l ook f or par al l el al gor i t hms based on
pr oper t i es of t he l i near syst emAx v.
Equat i ons i n one space di mensi on gener al l y gi ve r i se t o banded mat r i ces
wi t h smal l bandwi dt h, and t hese pr obl ems have al r eady been di scussed. Two-
di mensi onal el l i pt i c equat i ons and syst ems of one di mensi onal equat i ons
of t en yi el d bl ock t r i di agonal l i near syst ems; t he bl ocks may be l ar ge and
spar se or smal l and dense. I f t he under l yi ng geomet r y i s, f or i nst ance, a
r ect angl e, t he syst emwi l l have a ver y r egul ar pat t er n of nonzer o el ement s,
and i t i s t he exi st ence of t hi s pat t er n t hat makes t he par al l el sol ut i on
f easi bl e. Gener al spar se mat r i ces ar e much mor e di f f i cul t t o handl e i n
par al l el , al t hough har dwar e f aci l i t i es such as t he STAR' s spar se vect or i n-
st r uct i ons ar e desi gned t o ai d i n t hi s ef f or t . We wi l l r et ur n t o t hi s l at er .
I t i s i mpor t ant t o be abl e t o sol ve some syst ems ver y wel l ; Poi seon
f
s
equat i on on a r ect angl e i s a pr i me exampl e ( Buzbee [ 73] ) . Assume a r egul ar
N x M gr i d wi t h t he usual f i ve poi nt di scr et i zat i on^ Di r i chl et boundar y
condi t i ons, t he nodes number ed by r ows and f r om r i ght t o l ef t wi t hi n a r ow.
The syst emAx = v i s t hen bl ock t r i di agonal wi t h const ant bl ock di agonal s;
t hat i s, A (- 1, B, - I ) , B - ( - 1, 4, - 1) . The ei genval ues and ei genvect or s
of t he di agonal bl ocks ar e known anal yt i cal l y, and i n f act t he mat r i x of
ei genvect or s r epr esent s a di scr et e si ne t r ansf or mat i on. Usi ng par al l el FFT
t echni ques, t he t r ansf or mat i on may be appl i ed si mul t aneousl y t o each r ow of
t he gr i d i n t i me O( l ogN) . The or i gi nal syst em i s now decoupl ed i nt o N i n-
dependent M x M t r i di agonal syst ems, one f or each col umn of t he gr i d. Thes
may be sol ved i n O( l ogM) st eps usi ng any of t he met hods al r eady descr i bed.
An i nver se si ne t r ansf or mat i on i s t hen appl i ed t o r ecover t he sol ut i on. The
ise
-54-
net result is that using O(NM) processors the (NM) X (NM) system may be
solved in O(logNM) steps. The same decoupling technique may be applied
to separable equations of the form
on a rectangle, and to the biharmonic equation (Sameh, Chen and Kuck [74]).
The biharmonic is a bit more difficult to handle, but with an N X N grid
3
it may be solved in 0(N) steps using 0(N ) processors, or in O(NlogN) steps
2
with 0(N ) processors.
The odd-even reduction algorithm is another attractive method for
Poisson's equation (Buzbee, Golub and Nielson [70], Ericksen [72]), although
because of excessive storage requirements it is not feasible to use the
block version of Algorithm R. Instead, assuming there are 2
n
-1 diagonal
blocks in A, the sequence of reduced block tridiagonal systems is
(- 1, B < , - I ) x < =v
(
,
each with 2
n 1
-1 diagonal blocks, and
B
<*> = ( B ^V - 21, B<> - B.
For numerical stability is represented by vectors p ^ and q ^ \ where
v
( 1 )
= (0, B
( i )
, 0)P<
+
Q
( 1 )
.
The inherent parallelism earlier demonstrated for odd-even reduction is still
present. The computation of p ^ and requires the solution of systems
involving B ^ , but B ^ is a polynomial in B of degree 2*, and this polynomial
-55-
can be f act or ed anal yt i cal l y i nt o i t s l i near t er ms. Thus can be
r epr esent ed by t he pol ynomi al i t sel f , does not need t o be st or ed di r ect l y,
and B ^"
1
can be comput ed by sol vi ng 2
1
t r i di agonal syst ems. Sweet [ 74]
has ext ended t he met hod t o cover cases ot her t han 2
n
- 1 di agonal bl ocks,
and Swar zt r auber [ 74] gi ves anot her ext ensi on f or separ abl e equat i ons
usi ng t he st or age- ef f i ci ent pol ynomi al r epr esent at i on.
For t he Poi sson equat i on we have used f i ni t e di f f er ences and a par t i c-
ul ar or der i ng of t he nodes of t he gr i d. A mor e gener al appr oach f or squar e
gr i ds ( N x N) usi ng a di f f er ent or der i ng i s gi ven by Li u [ 74] . The onl y
assumpt i on necessar y i s t hat A i s symmet r i c posi t i ve def i ni t e and t hat a ^
can be nonzer o onl y i f nodes x^ and x^ ar e cor ner s of t he same el ement ar y
squar e. Thi s admi t s a l ar ge cl ass of el l i pt i c equat i ons and f i ni t e di f f er -
ence or f i ni t e el ement met hods; i n t he r owwi se or der i ng A woul d be bl ock
t r i di agonal . Li u shows t hat t he nest ed di ssect i on or der i ng ( Geor ge [ 73] ,
T
Bi r khof f and Geor ge [ 73] ) coupl ed wi t h t he LDL decomposi t i on al l ows t he
2 3
sol ut i on of Ax = v i n 0( N) st eps usi ng 0( N ) pr ocessor s. We not e t hat 0( N )
mul t i pl i cat i ons ar e r equi r ed t o f or m t he decomposi t i on sequent i al l y ( Hof f man,
Mar t i n and Rose [ 73] ) . Nest ed di ssect i on par t i t i ons t he squar e gr i d i nt o
f i ve di sj oi nt subset s of gr i d poi nt s such t hat when A i s r eor der ed t o gr oup
t hese subset s t oget her we have
T
PAP =
C
3
C
4
-56-
The f i r st f our subset s ar e squar e, so t he di ssect i on may be r epeat ed r ecur -
si vel y; t he f i f t h i s " +" shaped and separ at es t he f i r st f our f r omeach ot her .
I t i s cl ear t hat t he i nher ent par al l el i sm of t he sequent i al el i mi nat i on has
been gr eat l y i ncr eased by use of t hi s or der i ng. To r educe t he pr obabi l i t y
of memor y i nt er f er ence wi t hout i ncr easi ng t he sol ut i on t i me, Li u suggest s a
new or der i ng, cal l ed doubl y nest ed di ssect i on. Li ke nest ed di ssect i on i t
r ecur si vel y uses
f l
+
f l
shaped separ at i ng set s, but t he cr osses ar e now doubl e
wi dt h, pr ovi di ng mor e compl et e i ndependence of t he squar e subset s.
Lambi ot t e [75] r epor t s on t he i mpl ement at i on of nest ed di ssect i on on
t he CDC STAR. By t aki ng advant age of t he abi l i t y t o pr edi ct t he number and
T
l ocat i on of nonzer os per r ow of L i t i s possi bl e t o obt ai n an al gor i t hm
2 3
consi st ent i n bot h st or age ( 0( N l ogN) wor ds) and t i me ( 0( N ) ) . However , t he
vect or i zat i on i s compl i cat ed enough so t hat t he st andar d band met hods ar e
expect ed t o be mor e ef f i ci ent ( bot h i n t er ms of r unt i me and pr ogr ammi ng
cost s) f or moder at e val ues of N.
Fr oma t heor et i cal st andpoi nt di r ect met hods ar e suf f i ci ent , and i t
woul d appear t hat t her e i s no need f or i t er at i ve met hods. However , f r oma
pr act i cal st andpoi nt t her e wi l l st i l l be some cases, such as i r r egul ar domai ns,
nonsepar abl e el l i pt i c equat i ons and t hr ee- di mensi onal pr obl ems, wher e an i t er a-
t i ve sol ut i on wi l l be at t r act i ve. I n addi t i on, many i t er at i ve met hods f or
l i near syst ems can be used t o def i ne i t er at i ve met hods f or nonl i near syst ems
( Or t ega and Rhei nbol dt [70]), so we can al so l ear n somet hi ng about t he par al -
l el sol ut i on of t hese mor e di f f i cul t pr obl ems.
A l ar ge cl ass of i t er at i ons i s def i ned by a spl i t t i ng of t he mat r i x A
( Var ga [ 62] ) . Two mat r i ces A^ and ar e chosen such t hat A A^ - A^ and
syst ems i nvol vi ng A^ may be sol ved " easi l y" . Gi ven an est i mat e x ^ , i mpr oved
-57-
est i mat es ar e gener at ed by t he r ul e A ^
1
*
1
^ = A
2
x ^ + v. A number of
t echni ques may be used t o accel er at e conver gence of t he basi c met hod, but
t he success of t he i t er at i on st i l l depends cr uci al l y on t he pr oper choi ce
of t he st r uct ur e of A^
#
One appr oach i s t o l et A^ be any of t he speci al f or ms al r eady di scussed.
I n t he case of nonsepar abl e el l i pt i c equat i ons a good choi ce f or A^ i s a
cl ose r el at i ve of t he Poi sson mat r i x ( Concus and Gol ub [ 73] ) . Of cour se,
t he si mpl est choi ce i s A^ = di ag( A) , yi el di ng t he J acobi i t er at i on, but
conver gence i s gener al l y so sl ow t hat t he met hod i s not compet i t i ve. Gi l mor e
[ 71] suggest s use of J acobi over r el axat i on wi t h an associ at i ve pr ocessor , but
sl ow conver gence i s agai n a det r act i ng f act or , as wel l as t he f act t hat f or
a consi st ent l y or der ed mat r i x t he opt i mal r el axat i on f act or i s uo = 1.
A second appr oach l ooks f or ef f i ci ent i mpl ement at i ons of st andar d sequen-
t i al i t er at i ons. To i l l ust r at e, suppose we have a t wo- di mensi onal el l i pt i c
equat i on and an N x N gr i d wi t h a f i ve poi nt f i ni t e di f f er ence appr oxi mat i on.
I t can be obser ved t hat a poi nt SOR met hod wi t h t he r owwi se or der i ng has es-
sent i al l y no i nher ent par al l el i sm. By choosi ng a di f f er ent or der i ng of t he
nodes t hi s si t uat i on can be gr eat l y i mpr oved wi t hout sacr i f i ci ng t he r at e of
conver gence. The basi c schemes ar e t he r ed- bl ack or der i ng, t he di agonal
or der i ng and subdomai n par t i t i oni ng. These i deas have appear ed i n many i nde-
pendent publ i cat i ons, wi t h sever al mi nor var i at i ons, so i t i s i mpossi bl e t o
gi ve al l cr edi t s, but Kar p, Mi l l er and Wi nogr ad [ 67] and Mor i ce [ 72] ar e
good sour ces.
The r ed- bl ack or der i ng ( or any ot her 2- cycl i c or der i ng ( Var ga [ 62] ) )
enabl es us t o wr i t e
- 58-
and di agonal , so t he SOR met hods may be descr i bed whol el y i n t er ms of
vect or oper at i ons. I n f act , t hese vect or s have aver age l engt h N^/ 2, whi ch
i s an at t r act i ve f eat ur e f or pi pel i ne comput er s. The di agonal or der i ng de-
composes t he gr i d as 2N- 1 di agonal s, cr eat i ng an i ndependence of comput a-
t i ons t hat al l ows al l t he nodes of a di agonal t o be updat ed si mul t aneousl y.
The subdomai n appr oach i s sui t abl e f or a par al l el comput er wi t h a smal l
number of pr ocessor s: each pr ocessor i s assi gned t o a gr oup of poi nt s,
whi ch i t updat es sequent i al l y.
These met hods al l use an or der i ng and par t i t i oni ng of A such t hat t he
smal l er syst ems ar i si ng i n bl ock SOR ar e act ual l y di agonal . Of cour se,
t he nat ur al or der i ng coul d be used wi t h l i ne SOR, wher e onl y t r i di agonal
syst ems need t o be sol ved. The al t er nat i ng di r ect i on i mpl i ci t met hods si m-
i l ar l y r equi r e onl y t he sol ut i on of t r i di agonal syst ems. The f i r st hal f
st ep of t he i t er at i on si mul t aneousl y sol ves a t r i di agonal syst em f or each
r ow of t he gr i d, and t he second hal f st ep does t he same f or each col umn.
Lambi ot t e [ 75] pr esent s a STAR i mpl ement at i on of ADI f or Poi sson' s equat i on,
i n whi ch t he gr i d i s st or ed by col umns, t he r ow syst ems ar e sol ved by si mul -
t aneous execut i on of t he usual sequent i al al gor i t hm, and t he col umn syst ems
2 2
ar e sol ved as one N x N t r i di agonal syst emusi ng odd- even r educt i on.
Hayes [ 74] r epor t s on some t est s conduct ed on t he ASC t o compar e t he
st andar d i t er at i ve schemes ( not i ncl udi ng ADI ) f or t he sol ut i on of Lapl ace' s
equat i on on a uni t squar e. For mesh si ze h
8 3
l / 80 ( N=
3
79) t he best met hod i s
t he cycl i c Chebyshev semi - i t er at i ve scheme, whi ch i s essent i al l y SOR i n t he r ed-
- 59-
bl ack or der i ng but wi t h bet t er par amet er s. I t i s i nt er est i ng t o not e,
however , t hat t he symmet r i c SOR semi - i t er at i ve met hod ( Young [ 72] ) i s t he
cl osest compet i t or . Thi s met hod uses t he nat ur al r owwi se or der i ng and t he
bul k of t he comput at i on has essent i al l y no i nher ent par al l el i sm, but i t i s
- 1/ 2
st i l l a good met hod because i t conver ges so qui ckl y ( 0( h ) i t er at i ons
vs. 0( h"
1
) i t er at i ons f or SOR and CCSI ) . Lambi ot t e [ 75] shows t hat i t i s
advant ageous t o consi der SSOR- SI based on t he di agonal or der i ng. The good
r at e of conver gence i s pr eser ved, and t he di agonal or der i ng al l ows t he use
of vect or i nst r uct i ons. Al t hough t he vect or s ar e onl y of l engt h N, as op-
2
posed t o N j2 wi t h t he r ed- bl ack or der i ng, f or l ar ge val ues of N t hi s becomes
l ess i mpor t ant t han t he conver gence r at e.
To cl ose t hi s subsect i on, we consi der some i t er at i ve met hods f or par al -
l el comput er s wi t h a smal l number of asynchr onous par al l el pr ocessor s. The
chaot i c r el axat i on met hods, or i gi nal l y devel oped f or t he i t er at i ve sol ut i on
of l i near syst ems ( Chazan and Mi r anker [ 69] , Donel l y [ 71] ) and now ext ended
t o nonl i near syst ems ( Rober t , Char nay and Musy [ 75] ) , t ake advant age of
asynchr onous comput at i on by r andoml y r el axi ng component s i n par al l el . The
cur r ent val ues of t he sol ut i on ar e st or ed i n a common memor y, and t he i t er a-
t i on i s t er mi nat ed when some condi t i on i s met ; e. g. , one pr ocessor i s gi ven
t he t ask of checki ng f or conver gence. Ther e must be some way of guar ant ee-
i ng t hat , i f t he par al l el pr ogr amwer e al l owed t o r un f or ever , each component
woul d be updat ed i nf i ni t el y of t en, f or ot her wi se conver gence coul d not occur .
Because synchr oni zat i on i s not par t of t he al gor i t hm, i t i s expect ed t hat
pr ogr ammi ng wi l l be consi der abl y easi er , especi al l y si nce memor y access con-
f l i ct s can be r esol ved by t he comput er syst em i t sel f , and not by t he chaot i c
- 60-
r el axat i on pr ogr ams. Chazan and Mi r anker [ 69] show t hat conver gence occur s
i f p ( J B| ) <l , B=I - di ag( A) "' A, i ndependent of t he schedul e of comput a-
t i ons; i f p( | B| ) ^ 1 t hen t her e exi st s a schedul e f or whi ch conver gence
does not occur , even when al l component s ar e updat ed i nf i ni t el y of t en.
A r el at ed cl ass of i t er at i ons f or t he P pr ocessor SI MD model was gi ven
by Rober t [ 70] . The syst em i s par t i t i oned i nt o P x P bl ocks, and gr oups of
component s ar e updat ed si mul t aneousl y and expl i ci t l y as i n t he J acobi i t er a-
t i on, r at her t han si mul t aneousl y and i mpl i ci t l y as i n t he usual bl ock i t er a-
t i ons. Wi t h one pr ocessor t he met hod r educes t o Gauss- Sei del i t er at i on, and
wi t h N pr ocessor s i t i s t he J acobi i t er at i on. I f A i s an M- mat r i x i t i s
possi bl e t o compar e t he r at es of conver gence f or di f f er ent val ues of P: i f
P.j di vi des exact l y,
a
nd M^ and M^ ar e t he i t er at i on mat r i ces f or Rober t ' s
met hods, t hen p(M. j ) <. pCM^) < 1. I t i s possi bl e t o show t hat f or a gi ven
val ue of P t he bl ock Gauss- Sei del i t er at i on i s asympt ot i cal l y f ast er , so
Rober t ' s met hod wi l l be l ess t i me consumi ng i f t he bl ock syst ems ar e di f -
f i cul t t o sol ve.
4. f . Gener al Spar se Mat r i ces
At t he 1968 I BM Spar se Mat r i x Conf er ence t he hope was expr essed t hat
par al l el machi nes woul d meet t he speci al needs of t hi s ar ea ( Wol f e [ 68] ) .
The spar se vect or i nst r uct i ons on t he CDC STAR, f or exampl e, wer e desi gned
f or packed st or age and f ast er execut i on t i mes. However , as Lambi ot t e [ 75]
shows f or nest ed di ssect i on, t he st or age cost s usi ng spar se vect or s can
act ual l y i ncr ease over scal ar met hods. I f we choose t o r epr esent t he N x N
2 2 2 i
mat r i x A as a spar se vect or of l engt h N , t hen N bi t s ( | N / 64| wor ds) ar e
needed t o def i ne t he st r uct ur e of A, r egar dl ess of i t s spar si t y. Mor eover ,
- 61-
si nce t he t i me f or a spar se vect or oper at i on depends on t he number of bi t s
i n each or der vect or , t he t ot al r unni ng t i me may be adver sel y af f ect ed.
Whi l e most of t he mat r i x el ement s ar e zer o, t he nonzer os need not be
di st r i but ed i n a r egul ar pat t er n, and t hi s can compl i cat e par al l el sol u-
t i ons usi ng t he LU decomposi t i on. I t may be possi bl e t o aut omat i cal l y r e-
or der t he syst em t o cr eat e a r egul ar pat t er n. One cr i t er i on i s t o f i nd a
T
per mut at i on P wi t h t he f ol l owi ng pr oper t y ( Cal ahan [ 73] ) : i f A
Q
= PAP
f
1;). \ - - I -<v;\.
t hen t he
D
^
, s a r e
di agonal , nonsi ngul ar , and t hei r aver age si ze i s as l ar ge
as possi bl e. The per mut at i on used by odd- even r educt i on has t hi s pr oper t y.
The aut omat i c gener at i on of P can be qui t e di f f i cul t ; i f t he spar se sys-
t em i s at t he i nner l oop of Newt on' s met hod f or a nonl i near syst em t hen t he
pr epr ocessi ng cost may be spr ead acr oss a l ar ge number of out er i t er at i ons.
Anot her basi c appr oach i s t o consi der A onl y as an oper at or t o be ap-
pl i ed t o a vect or . I t i s assumed t hat t her e i s some hi ghl y par al l el pr o-
T
cedur e t o eval uat e Az and z A gi ven z; i t i s not even necessar y t o gener at e
A expl i ci t l y. Any sequence of oper at i ons on whol e vect or s ( e. g. , l i near
combi nat i ons and i nner pr oduct s) i s nat ur al l y par al l el so t he met hod of
conj ugat e gr adi ent s may be used as ei t her a di r ect or i t er at i ve met hod i f A
i s posi t i ve def i ni t e ( Pal mer [ 74] , Rei d [ 72] ) . I f A i s symmet r i c but i n-
def i ni t e t he oper at or appr oach can al so be used t o der i ve di r ect met hods
r el at ed t o t he Lanczos t r i di agonal i zat i on ( Pai ge and Saunder s [ 75] ) . I f A
i s unsymmet r i c t hen an or t hogonal bi di agonal i zat i on of A can be f ound wi t h
-62-
r easonabl e economy ( Cl i ne [74]). As t hi s decomposi t i on pr oceeds, mi ni mal
r esi dual appr oxi mat i ons can be gener at ed usi ng cur r ent i nf or mat i on. At
pr esent t hese met hods appear t o be t he most f avor abl e f or synchr onous par al -
l el i sm; f or asynchr onous par al l el i sm t he chaot i c r el axat i on schemes ar e t he
nat ur al appr oach.
- 63-
5. EI GENVALUES
I n t hi s sect i on we consi der a r at her di f f er ent pr obl emof l i near al gebr a
Unl i ke t he sol ut i on of a l i near syst em, wher e t he exact answer can be pr o-
duced i n a f i ni t e number of st eps, t he exact cal cul at i on of ei genval ues i s
an i nf i ni t e pr ocess. I n act ual comput at i on t he pr ocess i s t er mi nat ed at
some poi nt . One f undament al t echni que i s t o const r uct a sequence of mat r i ces
A^, al l si mi l ar t o A and conver gi ng t o a mat r i x T f r omwhi ch t he ei genval ues
ar e mor e easi l y obt ai ned. We addr ess onl y t he f i ni t e pr obl emof gener at i ng
t he next mat r i x i n t he sequence. A second t echni que i s t o conver t A i nt o a
f or mwher e t he char act er i st i c pol ynomi al f ( \ ) det ( A- XI ) i a easi l y eval uat -
ed, and t o appr oxi mat e t he zer os of f by i t er at i on or bi sect i on. I f onl y
t he l ar gest ei genval ues and cor r espondi ng ei genvect or s ar e desi r ed, var i a-
t i ons of t he power met hod ar e di r ect l y appl i cabl e, as we onl y need t o com-
put e Az , i nner pr oduct s and l i near combi nat i ons of vect or s. However , we
wi l l not consi der t hi s met hod, or t he gener al pr obl em of comput i ng ei genvec-
t or s.
Fi r st , suppose t hat A i s a dense, r eal symmet r i c mat r i x, and l et A
Q
A.
T
I n J acobi
1
s met hod A^
+1
RÂJ R^, wher e R
i
i s a pl ane r ot at i on mat r i x de-
si gned t o anni hi l at e ( f or some p < q dependi ng on i ) t he of f - di agonal
el ement s a ^ and a ^ ; t hat i s, a ^
+ 1 )
- 0. Each r ot at i on r educes
t he sumof squar es of t he of f - di agonal el ement s, so A^ conver ges t o a di agonal
mat r i x, and t he i t er at i on i s hal t ed when t he sumof squar es i s suf f i ci ent l y
smal l . The pr emul t i pl i cat i on R. ^ r epl aces r ows p and q wi t h l i near combi na-
t i ons of t hemsel ves, and t he post mul t i pl i cat i on i s anal agous. Thus we can
expl oi t par al l el i smby usi ng t he i nher ent par al l el i sm of r ow and col umn
- 64-
oper at i ons, but Sameh [ 71] ( al so Kuck and Sameh [ 71] ) shows t hat f ur t her
T
i mpr ovement s can be made. I n t hi s par al l el var i at i on we t ake = MÂ^M^,
M
83
T. P. , P. a per mut at i on mat r i x and, supposi ng N = 2n,
i l i ' l
rp B AlacrfT^ rp( l )
T di ag( T
1
, T
2
, . . . , T
n
' cos c o ^ si n cpj j ^
T
( D '
J 3
J
I ( i ) ( O V
- s i n t pr
;
cos <pj
Thus n r ot at i ons have been over l apped, and N of f - di agonal el ement s can be
anni hi l at ed si mul t aneousl y. The per mut at i ons ensur e t hat al l t he of f -
di agonal posi t i ons wi l l be subj ect ed t o anni hi l at i on.
I f A i s not symmet r i c, t hen t he same over l appi ng t echni que can be ap-
pl i ed t o t he J acobi - l i ke al gor i t hm pr oposed by Eber l ei n [62]. A par al l el
ver si on i s gi ven by Sameh [ 71] . Her e A
i + ]
^^k^
9
\J
* S ^ P ^ P^ând
T^ as above, and
S. di ag( S
( i )
, S
( i )
, S
( i )
) ,
^cosh ^ si nh t y^
si nh t o, cosh t o.
The bi or t hogonal i zat i on al gor i t hm ( Hest enes [ 58] ) can al so benef i t f r omover -
l appi ng col umn oper at i ons ( Hal l [ 73] ) . I n t hi s met hod t he col umns of A ar e
r epeat edl y or t hogonal i zed i n pai r s so t hat , i n t he l i mi t , AW Q, wher e W i s
T 2 T
or t hogonal and Q Q =
D
i s di agonal . A J acobi sequence f or A A i s i mpl i ci t l y
T
pr oduced, and t he si ngul ar val ue decomposi t i on A - USV i s f ound by set t i ng
U QD"
1
, S - D, V - W.
- 65-
Many met hods can be appl i ed mor e ef f i ci ent l y i f t he dense mat r i x A i s
t r ansf or med i nt o a si mpl er f or mA = QAQ wher e Q i s or t hogonal . I f A i s
symmet r i c t hen A i s t aken t o be symmet r i c t r i di agonal ; ot her wi se A i s
upper Hessenber g. The out er pr oduct f or mul at i ons gi ven by Wi l ki nson [ 65]
( pp. 290- 2, 347- 9) show t he i nher ent par al l el i sm of t he Househol der t r ans-
f or mat i ons. I t must be obser ved, however , t hat t he ef f ect i ve vect or l engt hs
decr ease at each st ep, and t hi s may be a di sadvant age i n some par al l el i m-
pl ement at i ons, despi t e t he f act t hat i t i s al ways an advant age i n sequent i al
comput at i on.
Now, suppose t hat A i s symmet r i c and r educed t o t r i di agonal f or m. I t
i s wel l known t hat t he char act er i st i c pol ynomi al may be eval uat ed by a
second or der l i near r ecur r ence. Kuck and Sameh [ 71] suggest usi ng a mul t i pl e
St ur m sequence and bi sect i on al gor i t hm f or N pr ocessor s, eval uat i ng f at N
poi nt s si mul t aneousl y and oper at i ng wi t h N i nt er val s. Because der i vat i ves
of f ar e al so easy t o eval uat e, par al l el r oot f i ndi ng met hods can be used.
Of cour se, t he al gor i t hms t o eval uat e second or der r ecur r ences i n O( l ogN)
st eps gi ven N pr ocessor s ar e appl i cabl e. One pl ace wher e t hese occur i s i n
t he LR al gor i t hm: A ^ i s obt ai ned by f act or i ng A^ - k, Î
L
^
R
i
> wher e
i s uni t l ower bi di agonal and R^ i s upper bi di agonal , and set t i ng
555
R. L^+1
The shi f t i s necessar y t o accel er at e conver gence. Expl i ci t l y, suppose t hat
A. - k. I -
( a
j
, b
j
, C
j
)
*
I f 8
1 " V
8
j "
b
j *
a
j
c
j - /
8
j - V
j
"
2
'
N
'
t hen
A
l +1 "
( a
j
8
j
/ g
J - T
8
j
+
V l
C
J
8
j
+
V V
and conver gence i s t o an upper bi di agonal mat r i x. As i n St one' s met hod f or
t he sol ut i on of a t r i di agonal syst em, t he g sequence can be comput ed i n
-66-
O( LOGN) STEPS WI TH N PROCESSORS, AND THE REST OF THE COMPUTATI ONS ARE E N -
T
T I R E L Y P AR AL L E L . SI MI L AR RESUL TS HOLD FOR THE L L AL GORI THM.
FOR S T AB I L I T Y REASONS I T I S OFT E N PR E FE R ABL E TO USE THE QR AL GORI THM.
- KÎ I S REDUCED TO UPPER TRI ANGULAR FORM BY N- 1 PL ANE R OT AT I ONS, OB-
T AI NI NG A^ - KÎ = QJ R^* ORTHOGONAL AND R^ UPPER TRI ANGUL AR.
I S
THEN COMPUTED FROM R^Q^ + K AS I N THE LR AL GORI THM, ALTHOUGH SYMMETRY I S
PRESERVED AND CONVERGENCE I S TO A DI AGONAL MAT R I X. SAMEH AND KUCK [ 75a]
HAVE RECENTL Y SHOWN THAT RECURRENCE TECHNI QUES ARE ALSO AP P L I C ABL E TO THI S
METHOD. THE BASI C SEQUENTI AL AL GORI THM I S , FOR A. - K . I = ( A. , , B . , A . ) ,
1 1 J - 1 J * J
A
i +1
= ( 0 f
j - r
P L - B ] ; C Q - 1;
FOR J = 1 STEP 1 UNT I L N- 1 DO
BE GI N R. <- (P^
2 2. 1/ 2
+ A J ;
J
C. - P . / R . ; S. - A . / R . ;
J J J J J J
J +1 J J +1 J J - 1 J
END;
Q. S. B + C . C . . A . , (1 ^ J <. N- 1) ;
J J J +1 J J - 1 J
8. - P . C. , + Q . S . + K . , (1 <: J N- 1) ;
J J J - 1 J J I
^N - P
N
C
N- 1
+ K
I
;
a. - r
j + 1
s , (1 J N- 1) ;
N *
AS CAN BE SE E N THE MAJ OR BOTTL ENECK I S THE SE QUE NTI AL COMPUTATI ON OF THE R,
S, C AND P SEQUENCES. SAMEH AND KUCK I NTRODUCE THE FOLLOWI NG SEQUENCES:
2
= 1, w = b, , w. , , - b. , , w. - a. w. - ,
j 1, N- 1,
2 2
z
Q
= 1, =
a
j
z
j _ - | +
w
j J
s
^
N
*
2
By i nduct i on i t can be shown t hat w. / w. - - p. / c. , and z
i
/ z
i
. = r . , so
t he par al l el QR st ep may be expr essed as
comput e wj , 0 ^ j <>N;
comput e Z y 0 ^ j ^ N;
r
j "
z
j /
2
j - 1* (1 ^ J ^N) ;
2 . 2/ 2 ...
v
8
j V r *
j
*
;
2 2 2
3. *~s. b. , . + w. w. ' ( a. + r . ) / z. + k. ,
(1 ^ j ^ N- 1) ;
2
Thus t he QR i t er at i on r equi r es O( l ogN) st eps wi t h N pr ocessor s. Not e t hat
t he squar es of t he of f - di agonal el ement s ar e used, and t hat onl y r at i onal
oper at i ons ar e necessar y ( cf . Rei nsch [ 71] ) . Despi t e possi bl e i l l - condi -
t i oni ng of t he l i near syst ems i mpl i ci t i n t he comput at i on of w^ and Z y
Sameh and Kuck r epor t t hat t hey have been abl e t o obt ai n good r esul t s i n
si mul at ed execut i on on a sequent i al comput er .
We now consi der t he nonsymmet r i c case, and assume t hat A has been r e-
duced t o upper Hessenber g f or mby si mi l ar i t y t r ansf or mat i on. The st andar d
sequent i al met hod f or r eal mat r i ces i s t he QR al gor i t hmwi t h doubl e or i gi n
- 68-
shi f t , whi ch avoi ds t he need f or compl ex ar i t hmet i c. The doubl e shi f t i s
T
act ual l y per f or med by usi ng a si mi l ar i t y t r ansf or m = P^^P^, wher e P^
i s a Househol der t r ansf or mat i on and i s upper Hessenber g except f or c^^*
c 5 ^ and c f ^, whi ch ar e nonzer o. A. _ i s t hen obt ai ned by r educi ng C. t o
41 42 ' i +2
J
I
upper Hessenber g f or m, and we have
A
i
+
2 * ^ i V
( A . - k . l X A . - k . ^ D ^ Q . R . .
Ei t her pl ane r ot at i ons or Househol der t r ansf or mat i ons may be used i n t he
r educt i on; t he l at t er i s pr ef er r ed i n sequent i al comput at i on si nce f ewer
mul t i pl i cat i ons ar e used.
Unf or t unat el y, t he par al l el i mpl ement at i on of t he doubl e QR st ep may
be adver sel y af f ect ed by some of t he pr oper t i es t hat make i t so at t r act i ve
f or sequent i al comput er s. I nvest i gat i ons by Kuck and Sameh [ 71] f or t he
I l l i ac Iv and by War d [ 76] f or t he CDC STAR show t hat t he necessar y use of
shor t vect or s i n ei t her t he pr e- or post mul t i pl i cat i ons and i n def l at i on can
l ead t o i nef f i ci enci es. On t he I l l i ac t hi s i s mani f est ed by l ow pr ocessor
ef f i ci ency, and on t he STAR by t he f act t hat t he t i me f or t he doubl e st ep
2
i s oN + 0( N) , wher e a depends on t he vect or st ar t up t i me a. Thi s i s
cl ear l y not a good si t uat i on, and addi t i onal wor k i s needed t o f i nd bet t er
par al l el i mpl ement at i ons.
As an al t er nat i ve t o t he QR al gor i t hm, War d [ 76] suggest s t he use of
Laguer r e i t er at i on wi t h Hyman
f
s met hod of eval uat i ng f ( \ ) and i t s der i va-
t i ves ( Wi l ki nson [ 65] ) . I n f act , Hyman' s met hod cor r esponds t o t he sol u-
t i on of an upper t r i angul ar l i near syst em, so t hi s met hod i s at t r act i ve when
par al l el eval uat i on of f i s used.
I
- 69-
Acknowl edgment s. Val uabl e comment s on t he manuscr i pt wer e r ecei ved f r om
H. T. Kung, D. K. St evenson, J . F, Tr aub and R. G. Voi gt . Thi s wor k woul d
not have been possi bl e wi t hout t he cont i nued suppor t and encour agement of
J . F. Tr aub.
Ref er ences
Babuska [ 68] . I . Babuska, " Numer i cal st abi l i t y i n mat hemat i cal anal ysi s,
11
I FI P Congr ess 1968, Nor t h- Hol l and, Amst er dam, 1969, vol . 1, pp. 11- 23.
Baer [ 73] . J . L. Baer ,
! ,
A sur vey of some t heor et i cal aspect s of mul t i pr o-
cessi ng,
11
Comput i ng Sur veys, vol . 5, 1973, pp. 31- 80.
Bar nes et al . [ 68] . G. H. Bar nes, R. M. Br own, M. Kat o, D. J . Kuck, D. L.
Sl ot ni ck, R. A. St oker , "The I l l i ac I V comput er , " I EEE Tr ans, on Comp. ,
vol . C- 17, 1968, pp. 746- 757.
Bi r khof f and Geor ge [ 73] . G. Bi r khof f and A. Geor ge, " El i mi nat i on by nest ed
di ssect i on, " i n Compl exi t y of Sequent i al and Par al l el Numer i cal Al gor -
i t hms, J . F. Tr aub, ed. , Academi c Pr ess, N. Y. , 1973, pp. 221- 269.
Bor odi n [ 71] . A. Bor odi n, " Hor ner ' s r ul e i s uni quel y opt i mal , " i n Theor y of
Machi nes and Comput at i ons, Z. Kohavi and A, Paz, eds. , Academi c Pr ess,
N. Y. , 1971, pp. 45- 58.
Bor odi n and Munr o [ 75] . A. Bor odi n and I . Munr o, The Comput at i onal Compl exi t y
of Al gebr ai c and Numer i c Pr obl ems, Amer i can El sevi er , N. Y. , 1975.
Boukni ght et al . [ 72] . W. J . Boukni ght , S. A. Denenber g, D. E. Mcl nt yr e, J . M.
Randal l , A. H. Sameh, D. L. Sl ot ni ck, "The I l l i ac I V syst em, " Pr oc I EEE,
vol . 60, 1972, pp. 369- 388.
Br ent [ 73] . R. P. Br ent , "The par al l el eval uat i on of ar i t hmet i c expr essi ons
i n l ogar i t hmi c t i me, " i n Compl exi t y of Sequent i al and Par al l el Numer i cal
Al gor i t hms, J . F. Tr aub, ed. , Academi c Pr ess, N. Y. , 1973, pp. 83- 102.
Br ent [ 74] . R. P. Br ent , "The par al l el eval uat i on of gener al ar i t hmet i c ex-
pr essi ons, " J . ACM, vol . 21, 1974, pp. 201- 206.
Bur r oughs [ 72] . Bur r oughs Cor p. , I l l i ac I V Syst ems Char act er i st i cs and
Pr ogr ammi ng Manual , Paol i , Pa. , May, 1972.
Buzbee [ 73] , B. L. Buzbee, "A f ast Poi sson sol ver amenabl e t o par al l el com-
put at i on, " I EEE Tr ans, on Comp. , vol . C- 22, 1973, pp. 793- 796.
- 70-
Buzbee, Gol ub and Ni el son [ 70] . B. L. Buzbee, G. H. Gol ub, and C. W.
Ni el son,
l f
0n di r ect met hods f or sol vi ng Poi sson
f
s equat i ons,
11
SI AM J .
Num. Anal . , vol . 7, 1970, pp. 627- 656.
Cal ahan [ 73] . D. Cal ahan, ' ' Par al l el sol ut i on of spar se si mul t aneous l i near
equat i ons, " Dept . of El ec. Eng. , Uni v. of Mi chi gan, Ann Ar bor , 1973.
CDC [ 74] . Cont r ol Dat a Cor por at i on, CDC STAR- 100 I nst r uct i on Execut i on
Ti mes, pr el i mi nar y ver si on 2, Ar den Hi l l s, Mi nn. , J anuar y, 1974.
Chazan and Mi r anker [ 69] . D. Chazan and W. L. Mi r anker , "Chaot i c r el axat i on, "
Li n. Al g. Appl . , vol . 2, 1969, pp. 199- 222.
Chen [ 75] . S. C. Chen, "Speedup of i t er at i ve pr ogr ams i n mul t i pr ocessi ng
syst ems, " Di sser t at i on, Dept . of Comp. Sci . , Uni v. of I l l i noi s, Ur bana,
J anuar y, 1975.
Chen and Kuck [ 75] . S. C. Chen and D. J . Kuck, "Ti me and par al l el pr ocessor
bounds f or l i near r ecur r ence syst ems, " I EEE Tr ans, on Comp. , vol . C- 24,
1975, pp. 701- 717.
T. C. Chen [ 75] . T. C. Chen, "Over l ap and pi pel i ne pr ocessi ng, " i n I nt r o-
duct i on t o Comput er Ar chi t ect ur e, H. S. St one, ed, , Sci ence Resear ch
Associ at es, Pal o Al t o, Cal i f . , 1975, pp. 375- 431.
Cl i ne [ 74] . A. K. Cl i ne, "A Lanczos- t ype met hod f or t he sol ut i on of l ar ge
spar se syst ems of l i near equat i ons, " cont r i but ed paper , Second Langl ey
Conf . on Sci . Comp. , Vi r gi ni a Beach, Oct ober , 1974.
Concus and Gol ub [ 73] . P. Concus and G. H. Gol ub, "Use of f ast di r ect
met hods f or t he ef f i ci ent numer i cal sol ut i on of nonsepar abl e el l i pt i c
equat i ons, " SI AM J . Num. Anal . , vol . 10, 1973, pp. 1103- 1120.
Cool ey and Tukey [ 65] . J . W. Cool ey and J . W. Tukey, "An al gor i t hm f or t he
machi ne cal cul at i on of compl ex Four i er ser i es, " Mat h. Comp. , vol . 19,
1965, pp. 297- 301.
Cr ay [ 75] . Cr ay Resear ch, I nc. , "Cr ay- 1 Comput er , " Chi ppewa Fal l s, Wi s. , 1975.
Csanky [ 75] . L. Csanky, "Fast par al l el mat r i x i nver si on al gor i t hms, " con-
t r i but ed paper , 16t h Ann. Symp. on Foundat i ons of Comput er Sci ence
( SWAT) , Ber kel ey, Oct ober , 1975.
Donel l y [ 71] . J . D. P. Donel l y, "Per i odi c chaot i c r el axat i on, " Li n. Al g.
Appl . , vol . 4, 1971, pp. 117- 128.
Eber l ei n [ 62] . P. J . Eber l ei n, "A J acobi - l i ke met hod f or t he aut omat i c com-
put at i on of ei genval ues and ei genvect or s of an ar bi t r ar y mat r i x, " J .
SI AM, vol . 10, 1962, pp. 74- 88.
-71-
Er i cksen [ 72] . J . H. Er i cksen, "I t er at i ve and di r ect met hods f or sol vi ng
Poi sson' s equat i on and t hei r adapt abi l i t y t o I LLI AC I V, " Cent er f or
Advanced Comput at i on, Uni ver si t y of I l l i noi s, Ur bana, 1972.
Fl ynn [ 66] . M. J . Fl ynn, "Ver y hi gh- speed comput i ng syst ems, " Pr oc. I EEE,
vol . 54, 1966, pp. 1901- 1909.
Gent l eman [ 73] , W. M. Gent l eman, "Least squar es comput at i ons by Gi vens
t r ansf or mat i ons wi t hout squar e r oot s, " J . I nst . Mat h. Appl i es. , vol . 12,
1973, pp. 329- 336.
Geor ge [ 73] . J . A. Geor ge, "Nest ed di ssect i on of a r egul ar f i ni t e el ement
mesh, " SI AM J . Numer . Anal . , vol . 10, 1973, pp. 345- 363.
Gi l mor e [ 71] . P. A. Gi l mor e, "Par al l el r el axat i on, " Goodyear Aer ospace
Cor p. , Akr on, Ohi o, J ul y, 1971.
Hal l [ 73] . J . C. Hal l , " Exami nat i on of numer i cal met hods f or si ngul ar val ue
decomposi t i ons, " M. S. t hesi s, Uni ver si t y of Col or ado, Boul der , 1973.
Hayes [ 74] . L. Hayes, " Compar at i ve anal ysi s of i t er at i ve t echni ques f or
sol vi ng Lapl ace' s equat i on on t he uni t squar e on a par al l el pr ocessor , "
M. S. t hesi s, Dept . of Mat h. , Uni ver si t y of Texas, Aust i n, 1974.
Hel l er [ 74a] . D. Hel l er , "A det er mi nant t heor emwi t h appl i cat i ons t o par al l el
al gor i t hms, " SI AM J . Num. Anal . , vol . 11, 1974, pp. 559- 568.
Hel l er [ 74b] . D. Hel l er , " On t he ef f i ci ent comput at i on of r ecur r ence r el a-
t i ons,
11
I CASE, Hampt on, Va
#
; Dept . of Comput er Sci ence, Car negi e- Mel l on
Uni ver si t y, J une, 1974.
Hel l er [ 74c] . D. Hel l er , "Some aspect s of t he cycl i c r educt i on al gor i t hm
f or bl ock t r i di agonal l i near syst ems, " I CASE, Hampt on, Va. ; Dept . of
Comput er Sci ence, Car negi e- Mel l on Uni ver si t y, December , 1974.
Hel l er , St evenson and Tr aub [ 74] . D. Hel l er , D. K. St evenson, J . F. Tr aub,
"Accel er at ed i t er at i ve met hods f or t he sol ut i on of t r i di agonal l i near
syst ems on par al l el comput er s, " Depar t ment of Comput er Sci ence, Car negi e-
Mel l on Uni ver si t y, December , 1974.
Hest enes [ 58] . M. R. Hest enes, " I nver si on of mat r i ces by bi or t hogonal i zat i on
and r el at ed r esul t s, " J . SI AM, vol . 6, 1958, pp. 51- 90.
Hi nt z and Tat e [ 72] . R. G. Hi nt z and D. P. Tat e, "Cont r ol Dat a STAR- 100 Pr o-
cessor desi gn,
11
COMPCON- 72 Di gest of Paper s, I EEE Comp. Soc , 1972,
pp. 1- 4.
Hockney [ 65] . R. W. Hockney, "A f ast di r ect sol ut i on of Poi sson' s equat i on
usi ng Four i er anal ysi s, " J . ACM, vol . 12, 1965, pp. 95- 113.
- 72-
Hof f man, Mar t i n and Rose [ 73] . A. J . Hof f man, M. S. Mar t i n and D, J . Rose,
" Compl exi t y bounds f or r egul ar f i ni t e di f f er ence and f i ni t e el ement
gr i ds, " SI AM J . Numer . Anal . , vol . 10, 1973, pp. 364- 369.
Hyaf i l and Kung [ 74a] . L. Hyaf i l and H. T. Kung, "The compl exi t y of par al l el
eval uat i on of l i near r ecur r ences, " Pr oc. 7t h Ann. ACM Symp. on Theor y of
Comput i ng, 1975, pp. 12- 22.
Hyaf i l and Kung [ 74b] . L. Hyaf i l and H. T. Kung, "Par al l el al gor i t hms f or
sol vi ng t r i angul ar l i near syst ems wi t h smal l par al l el i sm, " Depar t ment of
Comput er Sci ence, Car negi e- Mel i on Uni ver si t y, December , 1974.
Hyaf i l and Kung [ 75] . L. Hyaf i l and H. T. Kung, "Bounds on t he speedups of
par al l el eval uat i on of r ecur r ences, " Second USA- J apan Comp. Conf . Pr oc ,
August , 1975, pp. 178- 182.
J or dan [ 74] . T. L. J or dan, "A new par al l el al gor i t hm f or di agonal l y domi nant
t r i di agonal mat r i ces, " Los Al amos Sci . Lab. , Los Al amos, N. M. , 1974.
Kahan [ 71] . W. Kahan, "A sur vey of er r or anal ysi s, " I FI P Congr ess 1971,
Nor t h- Hol l and, Amst er dam, 1972, vol . 2, pp. 1214- 1239.
Kar p, Mi l l er and Wi nogr ad [ 67] . R. M. Kar p, R. E. Mi l l er , S. Wi nogr ad, "The
or gani zat i on of comput at i ons f or uni f or m r ecur r ence r el at i ons, " J . ACM,
vol . 14, 1967, pp. 563- 590.
Kl yuyev and Kokovki n- Shcher bak [ 65] . V. V. Kl yuyev and N. I . Kokovki n- Shcher bak,
" On t he mi ni mi zat i on of t he number of ar i t hmet i c oper at i ons f or t he sol u-
t i on of l i near al gebr ai c syst ems of equat i ons, " t r ans, by G. J . Tee, De-
par t ment of Comput er Sci ence, St anf or d Uni ver si t y, 1965.
Kni ght , Pool e and Voi gt [ 75] . J . C. Kni ght , W. G. Pool e, J r . , and R. G.
Voi gt , "Syst embal ance anal ysi s f or vect or comput er s, " I CASE, Hampt on,
Vi r gi ni a, Mar ch, 1975.
Kogge [ 72a] . P. M. Kogge, "Par al l el al gor i t hms f or t he ef f i ci ent sol ut i on of
r ecur r ence pr obl ems, " Di gi t al Syst ems Lab. , St anf or d Uni ver si t y, Sept ember ,
1972.
Kogge [ 72b] . P. M. Kogge, "The numer i cal st abi l i t y of par al l el al gor i t hms f or
sol vi ng r ecur r ence pr obl ems, " Di gi t al Syst ems Lab. , St anf or d Uni ver si t y,
Sept ember , 1972.
Kogge [ 72c] . P. M. Kogge, "Mi ni mal par al l el i sm i n t he sol ut i on of r ecur r ence
pr obl ems, " Di gi t al Syst ems Lab. , St anf or d Uni ver si t y, Sept ember , 1972.
Kogge [ 73] . P. M. Kogge,
f l
Maxi mal r at e pi pel i ne sol ut i ons t o r ecur r ence pr ob-
l ems, " Pr oc. Fi r st Ann. Symp. on Comp. Ar chi t ect ur e, Gai nesvi l l e, Fl or i da,
1973, pp. 71- 76.
Kogge [ 74] . P. M. Kogge, "Par al l el sol ut i on of r ecur r ence pr obl ems, " I BM J .
Res. Deve. , vol . 18, 1974, pp. 138- 148.
- 73-
Kogge and St one [ 73] . P. M. Kogge and H. S. St one,
f l
A par al l el al gor i t hm f or
t he ef f i ci ent sol ut i on of a gener al cl ass of r ecur r ence equat i ons,
11
I EEE
Tr ans, on Comp. , vol . C- 22, 1973, pp. 786- 793.
Kuck [ 68] . D. J . Kuck,
l f
I l l i ac I V sof t war e and appl i cat i on pr ogr ammi ng,
11
I EEE Tr ans, on Comp. , vol . C- 17, 1968, pp. 758- 770.
Kuck [ 73] . D. J . Kuck, " Mul t i oper at i on machi ne comput at i onal compl exi t y,
11
i n Compl exi t y of Sequent i al and Par al l el Numer i cal Al gor i t hms, J . F.
Tr aub, ed. , Academi c Pr ess, N. Y. , 1973, pp. 17- 47.
Kuck and Mar uyama [ 75] . D. J . Kuck and K. Mar uyama, "Ti me bounds on t he
par al l el eval uat i on of ar i t hmet i c expr essi ons,
11
SI AM J . Comput . , vol . 4,
1975, pp. 147- 162.
Kuck and Sameh [ 71] . D. J . Kuck and A. H. Sameh, "Par al l el comput at i on of
ei genval ues of r eal mat r i ces, " I FI P Congr ess 1971, Nor t h- Hol l and,
Amst er dam, 1972, vol . 2, pp. 1266- 1272.
Kung [ 74] . H. T. Kung, "New al gor i t hms and l ower bounds f or t he par al l el
eval uat i on of cer t ai n r at i onal expr essi ons, " Pr oc Si xt h Ann. ACM Symp.
on Theor y of Comput . , pp. 323- 333.
Kung and Tr aub [ 74] . H. T. Kung and J . F. Tr aub, "Met hodol ogi es f or st udyi ng
t he speedups gai ned f r om par al l el i sm, " cont r i but ed paper , Second Langl ey
Conf . on Sci . Comp. , Vi r gi ni a Beach, Oct ober , 1974.
Lambi ot t e [ 75] . J . J . Lambi ot t e, J r . , "The sol ut i on of l i near syst ems of
equat i ons on a vect or comput er , " Di sser t at i on, Uni ver si t y of Vi r gi ni a,
1975.
Lambi ot t e and Voi gt [ 75] . J . J . Lambi ot t e, J r . and R. G. Voi gt , "The sol ut i on
of t r i di agonal l i near syst ems on t he CDC STAR- 100 comput er , " ACM Tr ans,
on Mat h. Sof t war e, Vol . 1, 1975, pp. 308- 329.
Lawr i e et al . [ 75] . D. H. Lawr i e, T. Layman, D. Baer and J
#
M. Randal ,
" Gl ypni r , a pr ogr ammi ng l anguage f or I l l i ac I V, " Comm. ACM, vol . 18,
1975, pp. 157- 164.

Li u [ 74] . J . W. H. Li u, "The sol ut i on of mesh equat i ons on a par al l el com-
put er , " Depar t ment of Comput er Sci ence, Uni ver si t y of Wat er l oo, Oct ober ,
1974.
Lynch [ 74] . W. C. Lynch, "How t o st uf f an ar r ay pr ocessor , " Thi r d Texas Conf .
on Comp. Syst ems, November , 1974.
Mar uyama [ 73] . K. Mar uyama, "The par al l el eval uat i on of mat r i x expr essi ons, "
I BM T. J . Wat son Resear ch Cent er , Yor kt own Hei ght s, N. Y. , 1973.
- 74-
Mi l l er [ 73] . R. E. Mi l l er ,
M
A compar i son of some t heor et i cal model s of par al -
l el comput at i on,
11
I EEE Tr ans, on Comp. , vol . C- 22, 1973, pp. 710- 717.
W. Mi l l er [ 75] . W. Mi l l er , "Comput at i onal compl exi t y and numer i cal st abi l i t y,
11
SI AM J . Comput . , vol . 4, 1975, pp. 97- 107.
Mi r anker [ 71] . W. L. Mi r anker ,
f l
A sur vey of par al l el i sm i n numer i cal anal ysi s,
1
SI AM Revi ew, vol . 13, 1971, pp. 524- 547.
Mol er [ 72] . C B. Mol er , " Mat r i x comput at i ons wi t h For t r an and pagi ng, " Comm.
ACM, vol . 15, 1972, pp. 268- 270.
Mor i ce [ 72] . Ph. Mor i ce, "Cal cul par al l el e et decomposi t i on dans l a r esol u-
t i on d' equat i ons aux der i vees par t i el l es de t ype el l i pt i que, " I RI A,
J une, 1972.
Mul l er and Pr epar at a [ 75] . D. E. Mul l er and F. P. Pr epar at a, "Upper bound t o
t he t i me f or par al l el eval uat i on of ar i t hmet i c expr essi ons, " cont r i but ed
paper , Symposi um on Anal yt i c Comput at i onal Compl exi t y, Car negi e- Mel i on
Uni ver si t y, Apr i l , 1975.
Munr o and Pat er son [ 73] . I . Munr o and M. Pat er son, "Opt i mal al gor i t hms f or
par al l el pol ynomi al eval uat i on, " J . Comp. Syst . Sci . , vol . 7, 1973,
pp. 189- 198.
Mur aoka [ 71] . Y. Mur aoka, "Par al l el i sm exposur e and expl oi t at i on, " Di sser t a-
t i on, Depar t ment of Comput er Sci ence, Uni ver si t y of I l l i noi s, Ur bana, 1971
Mur aoka and Kuck [ 73] , Y. Mur aoka and D. J . Kuck, " On t he t i me r equi r ed f or a
sequence of mat r i x pr oduct s, " Comm. ACM, vol . 16, 1973, pp. 22- 26.
Newel l and Rober t son [ 75] . A. Newel l and G. Rober t son, "Some i ssues i n pr o-
gr ammi ng mul t i - mi ni - pr ocessor s, " Depar t ment of Comput er Sci ence, Car negi e-
Mel l on Uni ver si t y, J anuar y, 1975.
Or cut t [ 74] . S. E. Or cut t , J r . , " Comput er or gani zat i on and al gor i t hms f or ver y
hi gh- speed comput at i ons, " Di sser t at i on, Depar t ment of El ect r i cal Engi neer -
i ng, St anf or d Uni ver si t y, 1974.
Or t ega and Rhei nbol dt [ 70] . J . M. Or t ega and W. C. Rhei nbol dt , I t er at i ve Sol u-
t i on of Nonl i near Equat i ons i n Sever al Var i abl es, Academi c Pr ess, N. Y. ,
1970.
Owens [ 73] . J . L. Owens, "The i nf l uence of machi ne or gani zat i on on al gor i t hms,
i n Compl exi t y of Sequent i al and Par al l el Numer i cal Al gor i t hms, J . F. Tr aub
ed. , Academi c Pr ess, N. Y. , 1973, pp. 111- 130.
Pai ge and Saunder s [ 75] . C. C. Pai ge and M. A. Saunder s, " Sol ut i on of spar se
i ndef i ni t e syst ems of equat i ons, " SI AM J . Numer . Anal . , vol . 12, 1975,
pp. 617- 629.
- 75-
Pal mer [ 74] . J . Pal mer , "Conj ugat e di r ect i on met hods and par al l el comput i ng, "
Di sser t at i on, Depar t ment of Comput er Sci ence, St anf or d Uni ver si t y, 1974.
Par l et t and Wang [ 75] . B. N. Par l et t and Y. Wang, "The i nf l uence of t he com-
pi l er on t he cost of mat hemat i cal sof t war e - i n par t i cul ar on t he cost
of t r i angul ar f act or i zat i on, " ACM Tr ans, on Mat h. Sof t war e, vol . 1, 1975,
pp. 35- 46.
Pease [ 67] . M. C Pease, " Mat r i x i nver si on usi ng par al l el pr ocessi ng, " J . ACM,
vol . 14, 1967, pp. 757- 764.
Pease [ 68] . M. C. Pease, "An adapt at i on of t he f ast Four i er t r ansf or m f or
par al l el pr ocessi ng, " J . ACM, vol . 15, 1968, pp. 252- 264.
Pease [ 69] . M. C Pease, " I nver si on of mat r i ces by par t i t i oni ng, " J . ACM,
vol . 16, 1969, pp. 302- 314.
Pease [ 74] . M. C. Pease, "The C( 2, m) al gor i t hm f or mat r i x i nver si on, "
St anf or d Resear ch I nst i t ut e, Menl o Par k, Cal i f or ni a, 1974.
Pool e and Voi gt [ 74] . W. G. Pool e, J r . and R. G. Voi gt , "Numer i cal al gor i t hms
f or par al l el and vect or comput er s: An annot at ed bi bl i ogr aphy, " Comput i ng
Revi ews, vol . 15, 1974, pp. 379- 388.
Pr epar at a and Mul l er [ 75] . F. P. Pr epar at a and D, E. Mul l er , "Par al l el eval ua-
t i on of di vi si on- f r ee expr essi ons, " cont r i but ed paper , Symposi um on
Anal yt i c Comput at i onal Compl exi t y, Car negi e- Mel l on Uni ver si t y, Apr i l , 1975.
Rei d [ 72] . J . K. Rei d, "The use of conj ugat e gr adi ent s f or syst ems of l i near
equat i ons possessi ng "Pr oper t y A" , " SI AM J
#
Numer . Anal . , vol . 9, 1972,
pp. 325- 332.
Rei nsch [ 71] . C. H. Rei nsch, "A st abl e, r at i onal QR al gor i t hm f or t he comput a-
t i on of t he ei genval ues of an Her mi t i an t r i di agonal mat r i x, " Mat h. Comp. ,
vol . 25, 1971, pp. 591- 597.
Rober t [ 70] . F. Rober t , "Met hods i t er at i ves
1
ser i e- par al l el e
1
, " C. R. Acad.
Sci . , Par i s, vol . 271 , 1970, pp. 847- 850.
Rober t , Char nay and Musy [ 75] . F. Rober t , M. Char nay, and F. Musy, "I t er at i ons
chaot i ques ser i e- par al l el e pour des equat i ons non- l i neai r es de poi nt
f i xe, " Apl i kace Mat emat i ky, vol . 20, 1975, pp. 1- 38.
Rudol ph [ 72] . J , A. Rudol ph, "A pr oduct i on i mpl ement at i on of an associ at i ve
ar r ay pr ocessor - STARAN, " AFI PS Fal l 1972, AFI PS Pr ess, Mont val e, N. J . ,
vol . 41, pt . 1, pp. 229- 241.
Ruggi er o and Cor yel l [ 69] . J . F. Ruggi er o and D. A. Cor yel l , "An auxi l i ar y
pr ocessi ng syst em f or ar r ay cal cul at i ons, " I BM Sys. J . , vol . 8, 1969,
pp. 118- 135.
76-
Sameh [ 71] . A. H. Sameh,
f l
0n J acobi and J acobi - l i ke al gor i t hms f or a par al l el
comput er ,
11
Mat h. Comp. , vol . 25, 1971 , pp. 579- 590.
Sameh, Chen and Kuck [ 74] . A. H. Sameh, S. C. Chen and D. J . Kuck, "Par al l el
di r ect Poi sson and bi har moni c sol ver s,
11
Depar t ment of Comput er Sci ence,
Uni ver si t y of I l l i noi s, Ur bana, J ul y, T974.
Sameh and Kuck [ 75a] . A. H. Sameh and D. J . Kuck,
l f
A par al l el QR al gor i t hm
f or t r i di agonal symmet r i c mat r i ces,
11
Depar t ment of Comput er Sci ence,
Uni ver si t y of I l l i noi s, Ur bana, Febr uar y, 1975.
Sameh and Kuck [ 75b] , A. H. Sameh and D. J . Kuck, "Li near syst em sol ver s f or
par al l el comput er s, " Depar t ment of Comput er Sci ence, Uni ver si t y of
I l l i noi s, Ur bana, Febr uar y, 1975.
Sameh and Layman [ 74] . A. H. Sameh and T. Layman, "Towar d an I l l i ac I V
l i br ar y, " cont r i but ed paper , Second Langl ey Conf er ence on Sci . Comp. ,
Vi r gi ni a Beach, Oct ober , 1974.
St evenson [ 75] . D. K. St evenson, "Pr ogr ammi ng t he I l l i ac, " Depar t ment of
Comput er Sci ence, Car negi e- Mel i on Uni ver si t y, t o appear , 1975.
St one [ 71] . H. S. St one, "Par al l el pr ocessi ng wi t h t he per f ect shuf f l e, "
I EEE Tr ans, on Comp. , vol . C- 20, 1971, 153- 161.
St one [ 73a] . H. S. St one, " An ef f i ci ent par al l el al gor i t hm f or t he sol ut i on
of a t r i di agonal l i near syst em of equat i ons, " J . ACM, vol . 20, 1973,
pp. 27- 38.
St one [ 73b] . H. S. St one, "Pr obl ems of par al l el comput at i on, " i n Compl exi t y
of Sequent i al and Par al l el Numer i cal Al gor i t hms, J . F. Tr aub, ed. ,
Academi c Pr ess, N. Y. , 1973, pp. 1- 16.
St one [ 75a] . H. S. St one, "Par al l el t r i di agonal equat i on sol ver s, " ACM Tr ans,
on Mat h. Sof t war e, Vol . 1, 1975, pp. 289- 307.
St one [ 75b] . H. S. St one, " Par al l el comput er s, " i n I nt r oduct i on t o Comput er
Ar chi t ect ur e, H. S. St one, ed. , Sci ence Resear ch Associ at es, Pal o Al t o,
Cal i f or ni a, 1975, pp. 318- 374.
St r assen [ 69] . V. St r assen, " Gaussi an el i mi nat i on i s not opt i mal , " Num. Mat h,
vol . 13, 1969, pp. 354- 356.
Swar zt r auber [ 74] . P. N, Swar zt r auber , "A di r ect met hod f or t he di scr et e sol u
t i on of separ abl e el l i pt i c equat i ons, " SI AM J . Num. Anal . , vol . 11, 1974,
pp. 1136- 1150.
Sweet [ 74] . R. A. Sweet , "A gener al i zed cycl i c r educt i on al gor i t hm, " SI AM J .
Num. Anal . , vol . 11, 1974, pp. 506- 520.
Tewar son [ 68] . R. P. Tewar son, " Sol ut i on of l i near equat i ons wi t h coef f i ci ent
mat r i x i n band f or m, " BI T, vol . 8, 1968, pp. 53- 58.
^CLASSI FI ED
S6CUW.TY CLASS.R.CAT.QN Qf THIS P A G E r t f t
M
n . ,.,. ,
REPORT DOCUMENTATI ON PAGE
|2 G O V T ACCESSION NO
1 R E P O R T N U M B E R
J 4. TITLE (mnd Subt i t l e)
A SURVEY OF PARALLEL ALGORI THMS I N
NUMERI CAL LI NEAR ALGEBRA
[7. A U T H O R S ;
DON HELLER
I
9
ÊRFORM'N G ORGANIZATION N A M E A N D ADDRESS
CARNEGI E-MELLON UNI VERSI TY
COMPUTER SCI ENCE DEPT.
PI TTSBURGH, PA 15213
L C O N T R O L L I N G OFFICE N A M E A N D A D D R E S S
OFFI CE OF NAVAL RESEARCH
READ INSTRUCTIONS
BEFORE COMPLETING FORM
3
RECIPIENT'S C A T A L O G N U M B E R '
5- T Y P E OF R E P O R T A PERIOD C O V E R E D
I NTERI M
6 PERFORMING O R G E P O R
T
NL
8. C O N T R A C T Q R G R A N T N ( J
N00014-76-C-0370,
NR044-422
10. P R O G R A M ELEMENT. P R O J E C T T A 3 K "
A R E A A WORK UNIT N U M B E R S
12. R E P O R T D A T E
J FEBRUARY 1976
ARLI NGTON, VA 22217 J U . N U M B E R O R P A G E S
U MONITORING 4 G E N C V N AME A ADDRESS/// di f f erent i ron, Cont usi ng OWc.)
15. SECURITY CLASS ' of t hl e report )
UNCLASSI FI ED
, 5
*' S C H E D U L E
F , C A T ,
"
I O O W
*
G R
A D L N G "
[ 16 DISTRIBUTOR S T A T E M E N T (at t hl e Report )'
APPROVED FOR PUBLI C RELEASE, DI STRI BUTI ON UNLI MI TED.
19 K E Y W O R D S ' Cont i nue
on revere, ei de i f neceeemry mnd i dent i f y by bl ock number)
20. A B S
R
R ' Cont i nue, on reveree ei de i t neceeemry mnd i dent i f y hy bl ock number) The exi st ence of par al l el ANC
pi pel i ne comput er s has i nspi r ed a new appr oach t o al gor i t hmi c anal ysi s. Cl assi cal
Numer i cal met hods ar e gener al l y unabl e t o expl oi t mul t i pl e pr ocessor s and power f ul
. / ect or - or i ent ed har dwar e. Ef f i ci ent par al l el al gor i t hms can be cr eat ed by r ef or m^
al at i ng f ami l i ar al gor i t hms or by di scover i ng newones, and t he r esul t s ar e of t en
sur pr i si ng. A compr ehensi ve sur vey of par al l el t echni ques f or pr obl ems i n l i near |
al gebr a i s gi ven. Speci f i c t opi cs i ncl ude: r el evant comput er model s and t hei r
comsequences, eval uat i on of ubi qui t ous ar i t hmet i c expr essi ons, sol ut i on of l i near
SYSTEMS of equat i ons, and comput at i on of ei genval ues.
1 JAK^B 1473 EDITION O F I N O V 65 IS O B S O L E T E
J NCI ^Sgl FI ED,
. - U IN H A M S K I K D
SECURITY CLASSIFICATION O F TM,S P A G E nt Ent ^d)
-77-
Tr aub [ 73] , J . F. Tr aub, " I t er at i ve sol ut i on of t r i di agonal syst ems on
:
par al l el and vect or comput er s,
11
i n Compl exi t y of Sequent i al and Par al - .
l ei Numer i cal Al gor i t hms, J . F. Tr aub, ed. , Academi c Pr ess, New Yor k,
1973, pp. 49- 82.
Tr out [ 72] . H. R. G. Tr out , "Par al l el t echni ques, " Depar t ment of Comput er
Sci ence, Uni ver si t y of I l l i noi s, Ur bana, Oct ober , 1972.
Var ga [ 62] . R. S. Var ga, Mat r i x I t er at i ve Anal ysi s, Pr ent i ce- Hal l , Engl ewood
Cl i f f s, New J er sey, 1962.
Vi t en' ko [ 68] . I . V. Vi t en' ko, "Opt i mum al gor i t hms f or addi ng and mul t i pl y-
i ng on comput er s wi t h a f l oat i ng poi nt , " USSR Comput . Mat h, and Mat h.
Phys. , vol . 8, 1968, pp. 183- 195.
War d [ 76 ] . R. C. War d, "The QR al gor i t hmand Hyman' s met hod on vect or com-
put er s, " Mat h. Comp. , Vol . 30, 1976, pp. 132- 142.
Wat son [ 72] . W. J . Wat son, "The TI ASC, a hi ghl y modul ar and f l exi bl e super -
comput er ar chi t ect ur e, " AFI PS Fal l 1972, AFI PS Pr ess, Mont val e, N. J . ,
vol . 41, pt . 1, pp. 221- 229.
Wi dl und [ 72] . 0. B. Wi dl und, " On t he use of f ast met hods f or separ abl e f i ni t e
di f f er ence equat i ons f or t he sol ut i on of gener al el l i pt i c pr obl ems, " i n
Spar se Mat r i ces and t hei r Appl i cat i ons, D. J . Rose and R. A. Wi l l oughby,
eds. , Pl enum Pr ess, N. Y. , pp. 121- 131.
Wi l ki nson [ 65] . J . H. Wi l ki nson, The Al gebr ai c Ei genval ue Pr obl em, Oxf or d
Uni ver si t y Pr ess, London, 1965.
Wi nogr ad [ 70] . S. Wi nogr ad, " On t he number of mul t i pl i cat i ons t o comput e
cer t ai n f unct i ons, " Comm. Pur e and Appl . Mat h. , vol . 23, 1970, pp. 165- 179.
Wi nogr ad [ 75] . ' S. Wi nogr ad, " On t he par al l el eval uat i on of cer t ai n ar i t hmet i c
expr essi ons, " J . ACM, vol . 22, 1975, pp. 477- 492.
Wol f e [ 68] . P. Wol f e, chai r man, " Panel di scussi on on new and needed wor k and
open quest i ons, " i n Spar se Mat r i x Pr oceedi ngs, R. A. Wi l l oughby, ed. ,
I BM T. J . Wat son Resear ch Cent er , Yor kt own Hei ght s, N. Y. , Sept ember , 1968.
Wul f and Bel l [ 72] . W. A. Wul f and C. G. Bel l , " C. mmp, a mul t i - mi ni - pr ocessor , "
AFI PS Fal l 1972, AFI PS Pr ess, Mont val e, N. J . , vol . 41, pt . 2, pp. 765- 7
7
f .
Young [ 72] . 0. M, Young, " Second- degr ee i t er at i ve met hods f or t he sol ut i on
of l ar ge l i near syst ems, " J . Appr ox. Th. , vol . 5, 1972, pp. 137- 148.
Zwackenber g [ 75] . R. G. Zwackenber g, "Vect or ext ensi ons t o LRLTRAN, " SI GPLAN
Not i ces, vol . 10, no. 3, Mar ch, 1975, pp. 77- 86.

A Survey of Parallel Algorithms in Numerical Linear Algebra

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Survey of Parallel Algorithms in Numerical Linear Algebra

Hochgeladen von

Copyright:

Verfügbare Formate

Carnegie Mellon University

Das könnte Ihnen auch gefallen