Sie sind auf Seite 1von 28
search ofthe National Bureau of Standacds Vol, 43, No. 6, December 1952 Retearch Paper 2379 Methods of Conjugate Gradients for Solving Linear Systems’ Magnus R. Hestenes * and Eduard Stiefel * An iterative algorithi is given for solving a sesteim Arak of a linear equations in.» uunknionns ‘The solution = ven in m stops tase of a very genral method sthich also includes Ganselan elimination eis shown that this method is a special "These general Sitorithine are Gceutialy algorithms for finding ann dimensional elipeoid. Conneetions fre made with the theory of orthosonal pols onials and continued fractions 1. Introduction One of the major problems in machine compute. tions is to find an effective method of solving system of n simultaneous equations inn unknown particularly if n is large. There is, of course jest method for all problems because the good of a method depends, to some extent upon the particular system to be solved. In judging the goodness of a method for machine compiitations, one Should bear in mind that criteria for a good machine method may be different from those for a hand method, By a hand method, we shall mean one in which a desk calculator may be used. By a machine method, we shall mean one in which seqqtence-controlled machines are used. ‘Av machine method should have the following properties: (1) ‘The method should be simple, composed of « repetition of elementary routines requiring « mini- mim of storage space. (2) The method should insure rapid convergence it the number of steps required for the solution is infinite. A method which—-if no rounding-off errors ‘occur—will yield the solution in a finite number of steps is to be preferred. 3) ‘The procedure should be stable with respect to ‘rounding-off errors. If needed, subroutine should be: ‘evalele to, ipsure this stability. Tt should be possible to diminish rounding-off errors by a repetition of the same routine, starting with the ‘previous result as the new estimate of the solution, (4) Each step should give information about the solution and should yield » new and better estimate than the previous one (3) As mang of the original data es possible should be ned diving ech step of the routine. “Special properties of the given linear system-such as having, Inune vanishing. coeficients should be_ preserved (For exnmple, in the Gauss climination special properties of this type max be destroyed.) Th our opinion there wre twa methods that best fit these criteria, namely, (a) the Gauss elimination method; (b) the conjugate gradient method presented in the present monograph, | There are many’ variations of the eliminstion method, just as there are meny variations of the conjugate gradient method here presented, In the present paper it will be shown that both methods fare special cases of a method that we call the method of conjugate directions. This enables one to com= pare the two methods from a theoretical point of In our opinion, the conjugate gradient method is superior to the elimination method as @ machine method, Our reasons ean be stated as follows (a) Like the Gauss elimination method, the method of conjugate gradients gives the solution in m steps if fo rownding-sff error occurs | (b) The conjugate gradient method is simpler to | code and requires less storage space. () The given matrix is unaltered during the proc- ess, 80, that a maximum of the original data is used ‘The advantage of having many zeros in the matrix is preserved.” ‘The method is, therefore, especially suited to handle linear systems arising from difference equations approximating boundary value problems (@) At each step an estimate of the solution is given, which is an improvement over the one given in the preceding step. (e) At any step one can start anew by a very simple device, keeping the estimate last obtained as, the initial estimate In the present paper, the conjugate gradient rou- tines ave! developed for the syinmetie and non. symmetric eases. ‘The principal results are described in section 3. For most of the theoretieal considera- tions, we restrict ourselves to the positive definite symmetvie ease. No generality is lost thereby. We deal only with real matrices, ‘The extension to complex mnatvices is simple. ‘The method of conjugate gradients was developed independently hy B. Stiefel of the Instivuteof Applied Mathematics at Zurich and by M. R. Hestenes with the cooperation of J.B. Rosser. G. Forsythe, ancl L. Paige of the Institute for Numerical Analysis National Burean of Standards. “The present account was. prepared jointly by Mo R. Hestenes ‘and. Stiefel diving the latter's stay at the National Bureaa | of Stndards, ‘The first papers on this method were 408 sive bay B, Stiefel Sand iy MLR Hestenes? Reports gy thi iio won gc oR, Ste Sw osser at a Symposium + om Aust 25-25, 1931 Recently, © Lanczos? developed a elosely Flatea routine based ‘on his enfier paper on edgenvalue problem.© Examples antaumnerieal tests ofthe Inethod have beet by Re Hayes, UC, Hochstrasser, find Mf Stein, 2. Notations and Terminology ‘Throughout the following pages we shall be con- cerned with the problem of solving a system of linear equations, aun bore} tty baat en yar baneet Hantyhe. ‘These equations will be written in the vector form Here A is the matrix of coefficients (a,,), the vectors (ry. + oe) and (Fis. + ose ed that <4 is nonsingular. Its inverse A~i therefore exists. We denote the transpose of A by: aA Given two vectors ws) and (ite Yes their sum ery iS" the veetor | Gye statue), and aris the vector (az. + 4), where a is'a Scalar. The sum (ey) snubs. hte is their sclar product, ‘The ength of willbe denoted ‘The Conehy-Sckwar: ineanality states that for all nu (ey*S (yy) or (eu) Sisilyl. 2:2) ‘The matrix A and its transpose A* satisfy the | relation (Ay) =Daurar=(Atay), If aay, that is, if A=A*, then A is said to be symmetric. S'matris -1 is said to be positive definite in case (2,Az)>0 whenever 20. If (¢,s)20 for cere Mette er Rebtneenng 2 ger th Hie Sascha comeonere TSipnusot the Canes ot llrmi at Lie Ane Curae ness alte of Sates uf mee equa vy named eras, NCU LasBat Si Net en ol ar ea tek men of aes diertet Ie oprars 3 Rte Hee ae a es Wodings "of Seat eatingessle Dial amg SSR Chabria ote apes 16h jal 2. thea Vis suid tos be nomenon, BEL is syan Inetrie, then twa veetone z and ate said i be ron Jumite oc Aeurtlogonal if the. relation a.c49) (ote) 20 Wok iow of the artline gomality relation (ay "By" an efzennutue of a mati: A is meant a number Xsuich that Ay—dy has a solution y #0, andy is called x corresponding eigenrector. Unless otherwise expressly stated the mattis 1, withthe are contenu, wl be doin be symmetric and positive definite, Clearly no loss of generality ie caused thereby from a theoretical point eeause the system Az—k is equivalent to *k. From a numerical point of view, the two Systems are diff ent, because of rounding-off errors that occur in joining the product Ataf. Our applications to t ec case donot involve the computa Jn the sequel we shall not have occasion to refer to, & particular coordinate of a vector. Accordingly ‘we may use subseripts to distinguish vectors instead of components. Thus 2 will denote the vector Gay oston) and 2: the vector (Eriy = +o tiqh Th ease a symbol isto be interpreted as a component, we shall call attention to this fact unless the interpretation is evident from the context. ‘The solution of the system Ar=k will be denoted by iy that is, If isan estimate of fy the diffe ence r=k—aAr will be called the residual of x as an estimate of h. ‘The quantity ir)? will be called the souared residual. ‘The vector h—z will be ealled the ‘error rector of 2, a an estimate of h 3. Methed of Conjugate Gradients (cg- Method) ‘The present section will be devoted to a description of a method of solving a system of linear equations As=k. This method will be called the conjugate grodient method of, more briefly, the eg-method, for Teasons which will unfold from the theory developed in later sections, For the moment, we shall limit ourselves to collecting in one place the basie formulas ‘ipon which the method is based and to deseribing briefly how these formulas are used. The eg-method is ap iterative method whiel terminates in at niostn steps if no rounding-of errors are encountered. Starting with an initial estimate 2 of the solution A, one determines succes- Sively new estimates Yo 21,24» - - of h the estimate 2, being closer toh than zy). At each step the residual r—k—z, is computed. Normally” this Yeetor can be used asa mensnre of the “"zoouness” of the estimate 2. However, this measure is not a Fellable one becatise, as will be seen in seetion 18 It is possible to consiruet cases in which the squared residual i? increases at each step (except for the fast) while the length of the error vector “h—z, decreases monotonically. If no. rounding-off error is encountered, one will reach an. estimate r4(m 3 0) At which yO. This estimate is the desired sol | tion A.) Nommally, m=n. However, since rounding- ano off errors slways occur exeept under very, unusual circumstances, the estimate z, in general will not be the solution h but will be « good approximation off. Tf the residual ry is too large, one may continue with the iteration to obtain better estimates of Our experience indicates that frequently rey. 38 considerably better than 74, One should not con- tinue too far beyond x," but should start anew with the last estimate “obtained as the initial estimate, so as to diminish the effects of rounding- off errors, As a matter of fact one can start. anew at any step one chooses. This flexibility is one of the principal advantages of the method. . Tn case the matrix A is symmetric and positive definite, the following formulas are used in the con- jugate gradient method: po=ro=k—Azy — (zparbitrary) — (@:1a) 0 dy ep) rbapy (B:le) nh 1a Apy, (3:14) (B:1e) Port be (if, In place of the formulas (3:1b) and (3-1e) one may use ) 20 an ee (3:28) Pa, 3:21 b= @atpd (3:2b) Although these formulas are slightly more compli- cated thin those given in (S:1]they have the nds Vantage that seale factors (intGodueed to. increase accuracy) are more easily ehunged! during the eourse of the computation. The ‘conjugate gradient method (cg-methodl) siven by the following steps Tnitial step: Select an est pute the restdaal 758 is rate 79 of andl come veetion py hy formulas mined the estimate 2 thal ee ee tae Ie ON Sate can section 5, the residuals ro, ri jozonal, and the diveetion ees tors Pay pa ‘ure mutually conjugate, that is, lrg =O re And (eI. 8:3) These relations ean be sed as checks Onee one hus obtained the set of n mutually conjugate veetors py ee sy Pens the solution of Arak (34) ‘ean be obtained by the formula ok 9, (3:3) (Apa pd It follows that, if we denote by ps the jth component of px, then. Pua @o Ap) is the el inyerse A“! of A. ‘There are two objections to the use of formula (3:5). “First, contrary to the procedure of the general routine (3:1), this would require the storage fof the vectors po, ‘This is impractieal, particularly in Iarie ‘systems, Second, the results obtained by this method are much more influenced by rounding-off crrors than those obtained by the step-by-step routine (3:1). in the eg-amethod the error rector h—a. is diminished in length at each step. ‘The quantity fiz)—(h—z. the jth row and Ath column of the 4 (2) called the error function, is also diminished at cach step, | But the sqicared residual 'r A? pormally oscillates and may even increase. There is a modification of the eg-method where all three quantities diminish at each step. ‘This modifieation is given in section 7, It has an advantage and a disudvantage. Its disadvantage is thatthe error yeetor in each step is longer than in the original method. Moreover, the eomputation is complicated, since it is a routine superimposed upon the original fone. However, in the special ease where the given linear equation system: arises. froma, difference approximation of « bourdary-value problem, it can be shown that the estimates are smoother in the modified method than in the original. "This may be an advantage if the desired solution is to be differ entiated afterwards, [“Coneurrently with the solution of a given Tinear system, characteristic roots of its matrix any be | obtained: compute the values of the polynomials [Ro Ri and Po, Phy... at NbY the iteration | Roem P | sar Rat bP, (3:6) ‘The last polynomial 2.) is a factor of the charac- teristic polynomial of -I'ard coincides with it when man. “The characteristic roots, whieh are the zeros of (9). ean be found by Newton's methods without actually” computing the polynomial yh) itsel One uses the formulas _ Ptr) 1 BO)" aul phere Raided, REIAe) are det (3:6) and R=P=o Ras R MPa P! Pe with A=A.._In this connection. it is of interest to observe that if m—n, the determinant of «Lis given by the formula det at 1. uty ene ‘The cg-method can be extended to, the ease in which Lis a general nonsymmetric and nonsingular Inatris. In this ease one replaces eq (3:1) by the set ‘This system is diseussed in section 10. 4, Method of Conjugate Directions (cd- Method)! ‘The eg-method can be considered as a special ease of a general method, which we shall call the method af engage dretions or move briefly the edemethod in this method, the vectors pu, pi. «,. ate selected to be mutually conjugate but have no further restrie~ tions, _ It consists of the following routine: ‘Initial step. Select an estimate 45 of h (the solu- on, compute the resitual r= f= ty and choose & direction py. General routine, Having obtained the estimate az of hy the residual r=—Ar, and the direction p,, compute the new estimate 2;,: and its residual 41 by the formulas aon {Pat . (pele (eta) TT eP ee (4:1b) ta Ap. (ele) ‘Themed expert om iret pint of vee by Fo, Maser a Thnetingse fatten chy Hoult Mecho sed Ap Sle 4a Nest select a diroe that is, seh that Prox conjugate to ps o (reve) a Ina sense the ed-method is not precise, in that ne ‘formulas are given for the computation of the divee- | bons ‘Various formulas can be giver Pe cack feuding toi special method. ‘Phe. formula GID fends to the egemethod. “Tt will he seen Setion 12 that the casein wich the ps are obtain by an -Lorthogonalization of the basie vectors 0... 0), (0.1.0, « 2), « «leads essentially to the Gauss elimination method. “The baste properties of the ese-method are given by the following theorems. “Theorem 41, The direction vectors Bo Bi mutually conjugate. Pe Pine af Me rendvals ref * “Theresia ect vihogonal The inner produet of py ith rack cris the same. That i, (roty=0 ii) (4:30 un Kngrd=(Dord= + 1) shy = (por). (480) The sealar a, can be given by the formula (ay in place of (4:10), uation (4:3n) follows from (4 nil that Using te), (Paste = (Wyte) ~Ae(Dc$ pe), | [Gig tywe have, by (40) upd. Morar (4:38) (pyrecs)™= (pate), Gk), Equations (4:3) tund (230) follow from these relations” ‘The formula 4) follows from (4:3e) and (4:18). AS. consequence of (4:4) the estimates 2.2, of f can be computed without computing the reside uals fori, - > ~ provided that the choice of the direction veetors Po.Pi, - + » is independent of these residuals, ‘Theorem 4:2. The ed-method is an m-step method (msn) in the sense that al the mth step the estimate 2m {8 the desired solution For let m be the first integer such that in the subspace spanned by py. ~ ~~ P ma, since the veetors PoP, has, Clearly. are Tinearly inde: pendent. We may, accordingly, choose sealars fo, + + satay Such that Yo Poe etn Hence, Feapyt ss ey Pt Moreover, HbA Ay) = eA 6 at Pre 12 Computing the inner product (p).r9) we find by (4:38) and (4:4) that a=a,, and henee that as was to be proved. ‘The ecd-methoul can be looked upon as a relaxation method. In order to establish this result, we intro- duce the function fa) =(h=1, Ah —2))= (eA) = 20 b+ (). (425) Clearly, f(2)20 and fiz)=0 if, and only if, 2h. ‘The function f(z) eats be used as a measure’ of the “goodness” of 2 as an estimate of h. Since it plays am important role in. our considerations, it wll be referred to as the error function. If p is'u direction vector, we have the usefil relation fetap=fe)—2a(p,r)+e%(p.Ap), (46) where Ar=A(h—z), as one readily verifies by substitution. Considered as a function of a. the function f(-Hap) has a minimum value at where ep (aa ‘This minimum value differs from f(r) by the quantity — fle ap)=a%p,Ap= te 48) Jin~farsapmatn,Apm PY. (4:8) Comparing (4:7) with (4:11), we obtain the first two sentences of the following result: Theorem 4:3. The point z, minimizes fis) on the Fine rc ctape alt the Uth step the érror 42.01) inerelazed by the amount Kr—fe (4:9) In fact, the point xe minimises (2) on the iedimensional lane Bof points (410) Saypy.. Picts where ay, ven ar are parameters, This plane con- tains the points 26.2 In view of this result ‘the ed-method is a method of relaxation of the error furction f(z). An iteration of the routine may” accordingly be refered to as a relaxation, In order to prove the third sentence of the theorem observe that at the point (4:10) fa)=f\20—~ Fagard alin» An At the minimum point we have (urd rtp by (24). The minima point is ihe point 21, as was to be proved: and henee a =a, accord a3 Geometrically, the equation fiz)=const. defines an ellipsoid of dimension n—1. “The point at which f(a) has « minimum is the center of the ellipsoid and is the solution of r=. The i-timensional plane P,, described in the last theorem, cuts the ellipsoid F@)=flay) in an ellipsoid E, of dimension im unless £, is the point 2 itself. (In the eg-method. By is never degenerate, unless 29=h.) ‘The point 24 is ‘the center of £;. Hence we have the corollary: Corollary 1." The point x. is the center of the —L)-timensional ellipsoid in whieh the idimensional plane Py intersects the (n—1)-dimensional ellipsoid Fa\= fee). ‘Although the funetion f(a) is the fundamental error function which decreases at each step of the relaxation, one is, unable to compute fis) without Knowing the solution h we are secking.” In order to obtain ap estimate of the magnitude of (a) wwe may. use the following: ‘Theorem 4:4. The error recor y=h=z, the residual Gfigaln and the error function f() satisfy the relations shins att sles y where w(2) is the Rayleigh quotient wen ee. 12) The Rayleigh quotent of the error rector y does not exceed that of the residual r, that is, asain. asia) Moreocer, Bava. aay ‘The proof of this result is based on the Schwarzian quotients (Ae), (413) 5 (azAe)’ ‘The first of these follows from the inequality of Schwarz (ahs (p.PV(a.9) (416) by choosing Az, The second is obtained by selecting p=B2, q= 32, where B= A. Tn order to prove theorem 4:4 recall that if we sot y=h—y, then Ar fie) Ad=n=Ay ln by (4:5). Using the inequalities (4:15) with -=y. We see Unt ppl glucan or? lly. 1y) HN i Stuy) WFNS Ay. Ay The vikls (any and ts). Caine (16) with | Hence SOT AD = WS Ih | fa)=niu) us ir, | so that the second inquality in (4:14) holds, The first inequality is obtained from the relations wn SLs u e ethuod hus with verse A! of 2 As is to be experted. any 6 its routine a determination of th We have, in fact, the following: Theorem 4:5. "Let poy oy Paes be n, mutually cone Jjugate nonzero vectors and lt py be the 4th. component fpes The element in the j-th rou and k-th column of Se given by the sum {pek) Bip” for the solution h of | ‘his result follows fot the forma | | | | k, obtained by selecting We conclude this section with the following: Theorem 46. Let we be the (n= ielinensional plane through 2 conjugate to the rector Po Pro eso | Dinte, The plane x, contains the points J. ‘and intersects the (n—Diedimensianal ellipsoid f12) fiz) in an ellipsoid Fe of dimension (u-1—2), ‘The center of Ef isthe solution hef Arsk, ‘The point Zany ie the midpoint of the chord G, of Fi through 1. ‘thick is parallel to ps. In the egemethod the chord Cy is normal to ES at z,and hence is in the direction of the gradient of 2) at x in “The last statement will he established at the end of section 6.“The equntions of the plane x, is given hy the system (Apiarms, (70.1... i1) Sinee papi The points ryt a and his the center of Ey.” The chord (is defined hy the equation 7 Xs fap. where (sa parameter HecHtapy=Ne ‘Phe second endpoint of the chor Cy is the point 2ch2a,p, at which 1=2. ‘The mitpoint corresponds = = are conjugate 0 Pow «Pies, s0-also Pome taper (RDI I. are accordingly fand henee is the point s).; as was to be fs) Ti view of 1 of the ederout in whiels awe seek he Beginning with 2, we 4 fey) touigh 7 ane fi ‘The pl ‘trough 2, conjugate to C4 ners of all chords parallel to C. In the next step we re- Strict ourselves (0 x1 antl select an arbitrary chor Cho fav=fia) through 2, andl find its midpoint rand the plane x in m, conjugate to Gj (and henee o Ca). This process when repeated will yield the Imswer jn ab most-n steps. Tn the ez-method the chord C, of flz)—/le) it chosen to be the normal a 5. Basic Relations in the cg-Method Recall thot inthe eg-method the following frtmulas it is seen dag at each step, mality of the space x, Gray (ath) an Gandy Gele “bpe Gil Pari ald verify tha eal onky i Bie) and (5:10) hold for ‘The present section is devoted to consequences of these formulas. Asa first result, we have Theorem 5:1, The residuals ty ri... and the direction veclors po, Dis « « . generated by (5:1) satisfy the relations (rat) xn noclp=0 Gj) (pari=0 (i0, €q (5 (5:40), we use (5 Pcp dt Dea Perclp l= (reAp A 4k. (pep) It follows that (5:4) holds and henee that holds for the vectors re. 7) =

0) (5:6b) (redr=0 ijt (6:60) (ro Ard= (Pr Ap) bain APs) (V0). (5:6d) The vector r, is shorter thaw po an acute angle with p, ‘The relations (3:68) and (5:6b) follow readily from (Ste), elf), (6:2), and (5:3). Using (If) and 5:30), we sce that (ry Ar) The veetor p, makes i0). Pov Similary, the residuals ro, tis... satiefy the relations (5:80) faitire — (5:8b) here Ga) ais Equation (5:7b) is obtained by: climinati and r, from the equations poiiclpy Deane bobe Equation (5:74) follows similarly. Tn order to prove (5:8b). eliminate lp, and Ap,-) from the equations renal Ap Aree bdpin Here Piat Equation (5 Theorem 3:5. seteral formulas $a) holds since pyre The sealars'n, and be wre given by the 73) Ap.) Gr10) trowstrd are. ay Th seta ates th ation str< cn) 0), (612) shee ul) yt igh tet (12). The esi reat ay fee ten th smallest and largest ohare | ‘acteristic roots of ‘The formula (5:10) follows from (5:1b) and (5:3c), while (5:11) follows from (5:1e), Grif), (5:3b), and Gd). Since HS Pie Colo > (pe Apa by (5:6b) and (5:6d), we have (papa) Pa) te Dit 7S ‘The inequalities (5:12) accordingly hold. ‘The last | statement is immediate, since (3) lies between the | Smallest and largest characteristic roots of A. | 6. Properties of the Estimates x, of h in the | eg-Method | atmo so 2 rec be obtained by"applying’ the cee rab beth conaponine rue " Pa-t the direction vectors tse tht Séetion will be devoted to the study of the prop: trties of the points 7, 4. 5 fq. An first teste e estimates of h | worl, Letra. ry | Theorem 6:1. The estimates so, 44- + stm of hare distinet. The point a, minimizes the error” function f\=(h—s,Mh—2)) on the rtimensional plane P, ‘passing through the points fo, 44. = tu Inthe ith Mupaf he cesnuthod. f(a) Terdimintaked by the amount fit n\—flaiaae 7. (6:0 where u(2) ix the Rayleigh quotient (4:12). Hence. fad —fle) sayy tml >. jones 853, ever cor eM each step of the eg-algorithm the <1, is reduced in length. Tn fact. (ed fle) bss Cr , where (=) Te the Rayleigh quotient (4:12) 416 In order to establish (625) observe that, by (5:6a), (yt ri) (2 Pedic 4 Petes Hy alPa =layri Only) In view of (6:2) and (5:1b) this becomes (6:0) Setting z=2,. and j=m in (6:4), we obt by the use of (6:6) ind (6:1) ‘This result establishes the cg-method as a method of successive approximations and justifies the pro cedure of stopping the algorithm before the, final- sheit. If this is done, the estimate ob- e using. the results given n (6:5) Theorem B28. Let td « Ce ein imenstonal plane P, passing thio Bee The pointe tee ty vag ie ha sraight ine in the order aiven by thelr enumeration ‘The point a2) (esi) ts aieen by the formatos jh the points -) (mr). (678) cea Find fn) (eae In onder to prove this result, it is sufficient to establish (6:7)... To this end observe first that the vector ee tic with pci), and using (5:60). The result is [yet ‘The projection of the point ee ee in P, is necordingly F ne pert 1 (627). The points lie in the ftv the first part of Since ft Thearem 28, fen (6:8) isthe point in P, whose distance from the solution h is the least. It ies on the line x,t: beyond 2... Moreover, ror 4 Tae) Fey hed) oo) and fr2\— F(x, ed Be (6.10) sol grr to establish (6:0) and 6:10) we use the flee api fede (PerAPan which holds for all values of a in view of the fact that 2 thinimizes fl) on Py Setting a=f(%) seed" Jue ant Seo, An algebraic reduetion yields (6:9). Inasmuch as sve obtain (6:10) from (6:9) and (5:1b) “ist further result we have Theorem 6. Let x5, sn af the posnts ye. ae Doin 1 tothe sofuiion Fe be the projections on the ine joining the initial EO The Gints hat reeuszah icin the onder of enumeration ‘img, tis seen that we proceed towards the sohi- tion without: oscillation. ‘To prove tis faet we need onlyrebserve that eet y= Fells Pidd by G0), Htor,Fj). Let’ x, be the (n—i-cimens COM|MEALE 10 Po, Pree os Pin of points saiisiying the equation A similar result holds for the line joining. we through +, ists of the set (me n=0 Ga 0.t pos ‘This plane contains the points 4, hesabition f “Theorem t:2. The gradient ofthe function flo at 4 inthe plane x, ia sealar multiple of the rector pu ‘he graiont ot fe) at the secon “The geaclient g, of flr) at 7, in a, Is the orthogonal projec~ fon of 2, i the plane = duis of tie Torin De Feel os Wray a7 fay, - a-y ate chosen so that eis orthogonal to Apr. ip. Since snap, a aon. it is seen upon elimination of rr. sively that py is also a linear. com ‘Apia. Tnasmuch as p,is.con Pest orihogonnl to po «Ap P, accordingly is.a sealar multiple of the Of f(2) ab zz in xp.as was to be proved. Tn view of the result obtained in theorem 6:7 it is seon that the name “method of conjugate gradient is an appropriate name for the method given in sec- tion 3._ In the fist step the relavation is made in the direction py of the gradient of f(z) at 2, obtaining @ minimuin value of f(2) at 71. Since the solution h Ties in m,, it ig sufficient to restrict z to the Accordingly, in the next step, we relax in tl tion p, of the gradient of f(z) inn, at 2, obtaining the point x, at which f(2) is least. The problem is then reduced to relaxing f(2) in the plane :, conjugate to Pa and py. At the next step the gradient in 2 in x: Rused. and s0 on.. The dimensionality of the space in which the relaxation is to take place is reduced by Unity at each step. Accordingly, after at most 7 steps, the desired solution is attained. ‘The vecto grunt 7. Properties of the Estimates ; of A in the cg-Method there is a second set of estimates hate of A that can be computed. and that are’ of’ significance. in application to Tinear systems arising from difference equations approxi- mating boundary-ralve problems. In these appli tions, the function defined by 7: is smoother than that of z,, and from this point of view is a better approximation of the solution A. ‘The point Z, has its residual proportional to the eonjugate gradient py. ‘The points FF, 22. - . « can be computed by iteration (7:2) given in the following. Theorem 7:1. The conjugateqradient p,iseapressible in the form _ In the egzmethod paelk— AF), Gay where ¢, and F, are defined by the rrewrsion formulas ab eels bey (7:28) of rs Apo. | The sum ofthe cores oft (and ince af Pods 4h in (7 ‘The relation (7:1) ean be establ It holds for If it holds for i, then stb De tbe AMG HbR) Sculls | pe ‘The formula (73a) follows from (7:2a), (5:40) a1 (5:6b).- Formula (7:3b) is an easy consequence of (Z22b)._ To prove (7:3c) one can use (5:2) oF (7:3), as one wishes. The final statement is @ consequence of (73a). Theorem 7:2. The point F, given by (7:2) lies in the conres closure Sof the points fot, +90 -26 Mis the point 2 in the t-dimensional plane Py through 1 Sipe snp Ect which the squared residual k— Ar? has its minimum eatue, This minimum value is given by the formula (ly The squared residuals [rl ues « 4 diminish, mon tonically during the eg-method. At the ith step the squared residual is reduced by the amount Fst 73) ‘The first statewent follows from (723b), since the coefficients of 44, 41, + 41 are positive and have Unity as their cum. In’ order to show that. the Squared residual has @ mininnum on P, at Z, observe that a point s in P, differs from 7, by a vector =, of the form, apo Fai ily given by Az, Apes sacle Inasnmeh as, by (7:80), F= py ey we have FoApd=} ustp)=0 GCI. and 1 2 lAel > (rx? ,). n that the minimum value of 7 on By (7:4) and (7:2a) roof of theorem 721 The Rayleigh quotients of ty ts ‘This completes the Theorem 7 ‘and Fo, For» « = are connected by the formulas wr) _ a). wid) Te att ea wD wed wry ge The Rayleigh quotient of F(i>0) i smaller than that Of re that #8, wary order to prove this result we use (526d) and obtain Since |ndt=|pi72 and _u(p)—=uF)y Fields (7:60). The eq (726) follow’ from (7 The ast statement foll lows from (5:12). tt in te applications to linear systenis arising from difference equations approsimating boundary” value problems, str.) can be taken as a_meastre of the smoothness of x, The smaller u(r) is, the smoother Tris. Hence 34s smoother than, Theorem TA.” Mt the point % the error function f(r) has the value Aan) Ae — for) + Fur A and we have an fei in the eg method the following: mem * (Sia) a= nt | ac tea) Neer adpy (S:le} by Hew (S:1d) Pirin tbpe (S:te) a a comsguene, we lave the orthosis relations (ron=0, (Apepl=0 (xh), (8:2) Because of rounding-off errors during a numerical calculation (routine), these relations will not he satisfied exactly, As the difference 17 increases, the error in (S:2) may inerease so rapids that, will not beas good antestimate of has desited, ThiL | error ean he lessened in two ways: first, by inte | ducing a subsidiary calculation to reduee rounding: | off errs; anu second, by repeating the lerition so ts fo obtain a new tstinute: “The serdion will be Concerned with the frst of those and with a say of (the prupagition of the remuding-alferrane et {divide the seetion in four parts its ah followings 811. Basic propagation formulas In this part we derive simple formulas showing how errors in sealur products of the type (rerds — APeupe) (8:3) are propagated during the next step of the comput tion. From (8:1¢) follows (ruteg = (Dostear)— bes Perse) Inserting (S:Le) in both terms on the right yields (Hera Popul, +6 (rors) ACAD uPo) Y= Cur d—a(PoApd +birtLAPLrDd, (8:4) ¢ (S:1b) becomes. (rr) bet AP oor Po: (8:5) This is our fiest propagation formula Using (S:le) again, (Apap ied =Ap arias) +b(AP PI: Inserting (7:1¢) in the first term, (Apaped= a2 road (ryregsl+ bd App» (8:6) Applying (S:le) to the first and third terms gives | But in view of (8:1b) and (S:1d) | | b(Apupd- 7) ‘Therefore | 1 (Apopad= 7, Gores (s:8) | ‘This is our second propagation formula Putting (8:5) and (8:8) together}yields the third | and fourth propagation formulas besa, (rrr (rr) (8:98) CAP Die) =D AP Pd (s:9b) which ean be written in the alternate form | Coat ae Groat) (ss108 a PnP os10b) | Pe nePi-0) by virtue of (S:1b) and (Std). Each of these propa- ation formulas, and in particular the simple formuc fas (8:9), enn be used to cheek whether nonvanishing products (8:3) are due to normal rounding-off errors or to errors of the computer, ‘The formulas (8:10) dave the following meaning. If we build the sym: metric matrix P having the elements (lp, pe). th left side of (8:10b) is the ratio of two consecutive elements in the same line, one located in the main diagonal and one on its right hand side. - Th Formula (8:10b) gives the change of this ratio as we go down the main diagonal. 8.2. A Stability Condition Even if the scalar products (8:2) are not all zero, so that the vectors pu Phy + Das ate Hot exactiy conjugate, we may use’ these’ Vectors. for solving ‘dz=F in the following way. The solution h may be seritten in the form heatotaspot apts tay -per (BELLY ‘Taking the sealar product with Ap,, we obtain (ey Apd+ 3 (AP daL= (hyp) = (Ah, (hp Dllp pda (roo. (12) The system Ar—k may be, replaced by this linear system for, age ey, ‘Therefore, "because of rounding-off errors we have certainly not solved the given system exactly, but we have reached ore modest goal, namely, we have transformed the ven system into the system (8:12), which bas a dominating main diagonal if roundingcoff errors have not accumulated too fast. ‘The cg-algorithm gives tn approximate solution, da Peat A comparison of (8:11) and (8:18) shows that the number a, computed during the eg-process is ah approsimate value of a in order to have a dominating main diagonal in the matrix of the system (8:12) the quotients {APP i yeky (say (Ap.po must be small, In particular this must be true for iI. Tn this special ease we learn from (8:10b) that increasing numbers ao, ae ~~ during the eg- process lead to accumulation of rounding-off erro: because then these quotients increase also, We have accordingly the following stability condition. The larger the ratios ayiae.y, the more rapidiy the rounding-off errors accumulate . Amore elaborate discussion of the general quotient (8514) gives essentially the same result 420 By theorem 25, the sealars a, lie on the range and we have the result (ras Bd), APs, Aa, (6:20) ives corrections of the a, if, because errors, the residual r, is not sinall ‘his formula of rounding-of enough. 8.4. Refinement of the cg-algorithm In otiler to diminish rounding-off errors in the orihogonality of the residuals r we refine our general routine (8:1). After the éth step in the routine we compute (Apcups), which should be small. Going then to the (dst step we replace a, by « slightly different quantity @ In @, chosen £0 that (ry rsa ‘order to perform this, we may use (8:4), hi must be written, Apoyo =0 (arin) = (rand Alp Pd be vielding PD results in the (i+ 1)th step. ‘The corrections just described can be incorporated automatically in the general routine by replacing the formulas (3:1) by the following refineinent: pomte=k—Ary, — do=1 rt | Good a ravers reer adps (624) Another but numerically more laborious method ‘of refinement. goes along the following lines. . Mfter finishing the ith step, compute a product of the type (pips) with E0), However other normalizations ean be made, In fonder to see how this normalization appears, we replace 7. by dip, in, eq (3:1), where di is a scalar factor. “Phis faetor dis not the same ws that given in section 8 but plays a similar role. ‘The result pare baes (9:1) (rea ADD, @u AP) Th connections between ay, by de are given by the uation wlrd dy bio Int der HOE a et an OPO (OR) where u(r) is the Rayleigh quotient (4:12). Tn onder to establish these relations we use the fact that 1. and rij1 are orthogonal. This yields irdt=autryApd . (x0) WsaP adress, Apo) by virtue of the formula ray From the reap conneetion between p, and we find that, ddr Ap) Sry Art bal AD) (ry Ar)— an" This vields (9:2) in ease i>0. follows similarly ‘The formula, when in the formulas (9:1) the scalar factor dy ig an arbitrary positive number determining the length of pe The case d iseussed in sections 3 and 5. the following eases are of interest I I. The vector py ean be chosen to be the residual vector # deseribed tn section 7. In this event we select p dy=ltbe (9:3) ‘The formula (7:2b) for 7.41 becomes Ba (9:4) holds. In this event the basic formulas (9:1) take the simple form —Aty pelts rete Ap) (9:5) renennp A ee BAD) y choosing In this ease the formulas (9:3) are very simple and larly adaptable to computation. It bas the disadvantage that the veetors p, may grow cone siderably in length, a8 cant be seen frou the relations |txx!p,|?--—2—- Deel int However, if “floating” operations are used, this should present no difficulty TIL. The vector py can be chosen to be the correction to be added to 2y in the (V+ 1)st relazation. da, this event, ag=1 andthe formas (01) take the forn (0:6) eos 423 ‘These relations are obtained from (9:1 and (92) by setting =I IV. The rector pcan be chosen so that ais the reeip= rocal of the Rayleigh quotient of ry The fornnubis for a, brand d, fi (921) then become ulicate the variety of choices: ne nae forthe sea far ds For purposes of computation the choice dT appears tobe the simplest, all things consid 10. Extensions of the cg-Method In the preceding pages we have assumed that the matrix Ais @ positive definite symmetric matrix, ‘The algorithm (3:1) still holds when 1 is nonnegative and symmettic. ‘The routine will terminate: when ‘one of the following situations is met (1) The residual rq is zero. In this event, is a solution of ak, and the problem is solved. (2) The residual ris liferent fypin zero Int (Apetad=0, and hehee Apy=0. Since p=ery it follows that A7,—0, where rg is the residual of the vector Zq defined in’ section 7. ‘The point Zy is accordingly a point at which _k—.1? attains minimum, Tn other words, 7 is a least-square olution. One should observe that p» #0 (and hence 7,20). Otherwise, we would have ba=—Die-Pa-o> contrary to the fact that re is orthogonal to yy ‘The point 49 fails to minimize the function g(2)=(,Ax) 20k), for in this event (n+ Pq) =alem) —2E Fal In fact, g(2) fails to have a minimum value ‘TLremains to consider the ease when <1 is a general nonsingular matrix. In this event we observe that the matrix A*A is symmetric and that the syst ale=k is equivalent to the system (10:1) Applying the eq (3:1) to this last system, we obtain the following iteration, (20:2) Pur Atria dar { | | If one docs not wish t use any. properties of the eg-method in the computation of ay and by besides the defining relations, sinee they may be distirhod off errors, one shouild use the formulas (lpr) Ab? (Ap AA) aly In this ease the error function f(2) is the finetion fe in, and henee isthe squared yesidial, Te is a simple’ matter to interpret the results given ahove for this new system Tt should be emphasized that, even though the 1 of the system (10:2) is equivalent from a theoretical point of view to applying. the egealgorithm to the system (10:1), the vo methods are not equivalent from a numerical point of view, This follows beeatise rounding-off errors ih the two methods are not the same. ‘The system (10:2) is the better of the two, eeause at all times one uses the original matrix -{ instead of the computed matrix AL, which will contain rounding-off errors, ‘There is a slight generalization of the system (10:2) hat is worthy of note, - This generalization consists of selecting a’ matrix Buch that Bal is positive dec nite and symmetric, ‘The matris Bis necessarily: of the form "77, where His, positive definite anil symmetric. We can apply the egealgorithm to the . Bar= Bh. (1023) In place of (10:2) one obtains the algorithm pv=Bry, (10:4) Brot bps Again the formulas for a, and by, which are given directly by the defining relations, are agen fbn Br) pn BA pO (Bross Bap) (uBAp, When B=-1*, this system reduces to (10:2). If A is symmetric and positive definite, the choice B= gives the original ég-algorithm, 424 There is a generalization of the ed-algorit cerning which a few remucks shotld be made, I this method we select sectors Py, sss Peas aid foes dens such that GArI=0 iri), | 0:5) | (auAp)>0. ‘The solution ean be obtained by the recursion for " r= k—-Aro, otal 10:8) Te it aDy Ten OD ‘The problem is then reduced to, Jo gr such that (1023) holds. Vy moment that q, is of the form nding the veetors hall show in» QB, an:7) where B has the property that Bal is symmetrie and positive definite, “The algorithm (10:6) is accordingly: equivalent to applying the ed-algorithm to. (10:3). To see that q's of the form (10:7), let P be the matrix whose column vectors ate Poy. so fa andl Je the matria whose column vectors ate gor 1 + fe ‘The condition (10:5) is equivalent to the statement that the matrix D=(Q*AP is a diagonal matrix whose diggonal terms are positive. Select J so that BP. Then D=P*BAP from which we eonclide | that Bel is a positive definite symmetric matris, as 0 be proved. view of the results just obtained, we see that the algorithm (10:4) is the most general eg-algarithmn for any milarly, the most general wy (i) ‘selecting a matrix 1 is symmetric and positive definite, 4ii) selecting nonzero vectors py... Pi stteh that (ri. BAp)=0, xi and (ii), using the recursion formulas Aes (poBr) _ (pi.Bre) pi 11. Construction of Mutually Conjugate Systems { As was romarked in complete until a me mutually conjugat ‘ion 4 the ed-method is not od of constructing a set of | Vectors Pe Pig ess Mas been | given. In the eg-method the choice of the vector > depended on the resnlt obtained in the previous Step. The vectors po, i, « are necordingle deter= mined by the starting point z and vary with the point 25. Assume again that lis a positive definite, metrie matrix In a ed-method. the vectors. pu. Pw «can be chosen to be independent of the Starting point. This ean be done, for example, by starting with & set of 1» linearly independent vectors Uys + oy tans and constructing conjugate vectors "successive: A-orthogconalization process. For example, we may use the formulas (iay wPow The coefficient (i>) is to be chosen so that py is conjugate to py. The formula for a, is evidently wtp) (ete) Gi, (12) Observe that utu=0 Gir (chu) = Post (nay Using (11:3) we see that alternately. (Award, ay (ln p,) As described in section 4, the successive estimates of the solution are given by the recursion formula w=0, rneechann ans) where (Pok) (Ap) (11:6) ‘There ism second method of compu Peetors me Pry ns Pecss given” by formulas, , ” ® the the recursion (ita) 17h) a Te) a 7d) 425, We have the relations (11:8) and, (u?Anj=0 G) (Ser (piApy=0 GAD). asd) Phe_eq (U4:8a) hold when k=j-e1 by virtue of (atte) and (i1s7d), ‘That they hold’ for ot values, of ji) of the identity matrix has been replaced by uf, the second row sickling the vector pw? Ob- serve also that af =(Au?,e) (@=2,...40) a2=(Ame KP = Ook). Henee, sont! hn is the nest estimate of the solution. Moreover, -te 28 Next multiply the 2nd row of (1: inet the result from the th rove ( tain ay ay “Oy Me Pa Py 0 ag ag ag Ps Px RE 0 0 ag an Ps Pm RY 0 0 a® ae ug ug oY 0 0 ag an uk uk ‘The vector (0,0.k.- ‘The elements tif,» 13! form the veetor uw? and psu? Wehave 2) is the residual of 2: 8) ag =(du?, 6) As 6) We have accordingly rect HE oe and Proceeding in this manner, we finally obtain a matrix of the form. +) Pey then the matrix (12:4) Prk. yectors are pi, Pay = is the matrix, ||PtA Pe ‘The matrices P*A and P are triangular matrices with zeros below the diagonal. ‘The matrix D=P*AP the diagonal matrix whose diagonal elements are 4y,03,.. 2,42. The determinant of Pisunity and ie determinant of A is the product As was seen in section 4, if we let fle) (h-2,Alh—2), the sequence Ha) fle), - decreases monotonically. ‘ean be made for the sequence Altes) flag) =0 ‘0 general statement Lolly « slteealeyl of lengths of the error vectors In fnet, ‘ve shall show that this sequence ¢an increase mono” tonically, except for the last step. A situation of this type cannot arise when the eg-process is used. If “Ais nonsymmetric, the interpretation given above must be modified somewhat. An analysis of the method will show that one finds implicitly two triangular matrices P and Q such that Q*P is a diagonal matrix. To cacry out this proce: on ay an an pm Dia be necessary to interchange rows of al. By eae " of the rematks in section 10, the matrix Q* Is of the 0 az az an Pa Pex form BrP fhe general proved therefore juiy ag _ alent to application of the above process to the sys 0 0 af Oh Pa oe Dw alent 19 ap p fo 13. An Example oo 0 Pas Pas RG | Ta theegemethod the estimates 7, .. of the solu (13:4) | tion feof sds=k have the: property that thee vectors Yy=h—Z, W=h—n, ‘are decreased in ‘The clements pay +++) Pox define a veetor pu ‘The | Jeneth at each step. This property is not enjoved are” the, mutually conjugate vectors py ss py At each veetors defined by’ the iteration (12:1). stage (341,040) @.H= Bord. Apdy Moreover the estimate 2, of the solution h is given by the formula ‘The vector 0, + + +, 0, a) defined by the first n elements in the ith row of (12:4) is the Vector enote hy P the matrix whose colune hat, for the estimates z= 0.2, aan examipl ation method, of the cli Anica! 6 If the order of elimination is changed, this property Ingo peste ae "Puc example we shall give is geometrical in of numerical. Start with an. (n-—)-dimension ‘Hipsotd Es with enter zye=h ana with aes of te anal Imgthe Drawn elon Cr throuel ay: which iShyot orthogonal to an axis of Ee salted a point Frangeen on this chord hse By, sd pase epee whe Pa throngh yes conjugate: to Cx that is Favatiel othe pfane determined ye the kp UE chords of parallel to C4. "Lat y bet anit Neato normal tar Tt te chor that ey fe not a7 x. To de tw al 0 clop this, we ct orthogonal poly parallel to Ch. The plane P Fein an (n2)-timensional tekat roy and with axes of Nestea cond Cro thot orthogonal to-an ssi of perpendictlar. tof pi yn (as wih a eave 0 thai Pens be the hyperplane through 74-2 conjugate te Geta mts Ba at Or simran ellipsoid ,-, with center at 02. The axes of aun be showit to heof tneqial lengths. Tet esc be | the exists for any continuous funetion f(A) on a eee eee ula te Py DEALT We call. (A) a mass distribution on thy ‘We now repeat the construction made in the last | Positive Deaxis. The following two cases must hn paragraph. Select a chord Cy. of By» through ) Ustitanished seettint snot orthogonal to ans of sand that | (2) The funetion m0) hs infinitely many points is not perpendicular to hme. Select Fan Cy, | Of perEAse On DGAET nearer toh than z,-,, and let P,_a be a plane through (b) There are only a finite number n of points of ie conjugate to’ Theuta By-y in an (n—d)~ | ierease. In both eases we may construct by Ainensional ellipsoid E,- with center at 2,-, with | Uhoggnalization of the successive powers 1, 2. axes of unequal lengths,” Let e,2 be a unit Vector in {Xa set of n-+-1 orthogonal polynomials Py and Py perpendicular to Psy Clearly, es RONBOW, «+ RO) aan ey, ¢a-2 are muitnally perpendicular. with respect to the mass distribution. One has nonaiats aid line t midi be a nonnegative and nondecreasing function on the interval OE AST. The (Riemann) Stieltjes integral Fortran Proceeding in this manner, we ean construct (1) Chords C., Cys, whieh are mutually conjugate. 7 | 12) Planes Pay... Py such that P, is conjugate [mormorimoy=0 ky (142) to Gage Me cits A se Pa to (3) ‘The intersection of the planes Pea. «Pe | The polynomial is of degree i, In ense (b> which cuts Ey ina (k—1)-limensional ellipsoid Bx RUA Pe ial RO) ‘ence “Totingne ese (i). with center 7, the m points of increase of m(a). Tn both eases the (4) The point x, which is closer to A than 1.3. icnmd (3) The uni zeros af each of the polynomials (14:1) are real andl listinet and located Inside the interval (0,0). Henee ‘wemay normalize the polynomials sp that pean (14:3) ‘The polynomials (14:1) are then uniquely determined by’ the mase distribution vectors en. 0, the aumber t being any number greater than We want to emphasize the following property of our correspondence, IA and 7y are givens we are table ta establish the corredence. witht cumputing eigencalues of A This follows immediately from the Dasie relation (14:8). Moreover, we are able to con pute integrals of the type f where A, RY wre polynominls of maximal degree F-t without “continieting. the nase Wetsuit Ended iugzls col the regain cals products (ear, Ccrar) af the eonnespandting of theorems 14-1 and 14-2. Finally i forthe constriction of the ortho sd). ROAD Bald) emus ctr ol ies ie oman ef Integrals of the type (LEI). The rorsespunine etary fou ns tues bulk an ontogonal bass he Me Budiian hospice (R) RE A)Adm (&), daar RU) RY) din), 18. An Algorithm for Orthogonalization In onler t obtain the orthoy zation of poly= nomi The falling amthod an he usd, Por any three cansecttive orthogonal polynomials the recurrence evlation holds y BQ WAN ROWER) Ro“ 1, e0 assy 429 where ay, ey, dy are real numbers and a0. Taking into account the normalization (14:3), we have ( Honee =O Sea Rela). ‘This relation ean be written From this equation it is seen by induction that Pym Behe x (15:3) are polynomials of degree i Introducing the num: bers (ass) we have PAN SRO) +P) Bag(Q) = RQ) a. AP) 5a) (a5:5b) Beginning with Ry=1, we are able to compnte by (1525) successively the polynomials Py=Ro=1, Bs, Pi, Re, Ps. .., provided that we know the numbers ay. by. Th order to compute them, abserve first the relation ff P.0vP.0Mmoy=0 C#m). 15:0) Indeed this integral is up to ® constant factor {[litaRaretne For k0. 18. Continued Fractions Suppose that we have given a mass distribution of ype (b) as described in section 14. ‘Fhe function nb) is step function with jumps at 0 yeh 0, m,>0 we observe that the ratio Ra) Reis a decreasing function of Xs ran be seen from (1823) by induction, Using this result, it is not too difficult to show that the polynomials Ry), RaQ RQ) build a Sturmian sequence the following sense. ‘The number of zeros of #0) in any interval a

Das könnte Ihnen auch gefallen