Sie sind auf Seite 1von 21

Calculating the Singular Values and Pseudo-Inverse of a Matrix

Author(s): G. Golub and W. Kahan


Source: Journal of the Society for Industrial and Applied Mathematics: Series B, Numerical
Analysis, Vol. 2, No. 2 (1965), pp. 205-224
Published by: Society for Industrial and Applied Mathematics
Stable URL: http://www.jstor.org/stable/2949777
Accessed: 01-09-2016 12:17 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Society for Industrial and Applied Mathematics: Series B,
Numerical Analysis

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

J. SIAM Numer. Anal.

Ser. B, Vol. 2, No. 2


Printed in U.S.A.. 1965

CALCULATING THE SINGULAR VALUES AND PSEUDO-INVERSE


OF A MATRIX*

G. GOLUBf and W. KAHANt


Abstract. A numerically stable and fairly fast scheme is described to compute the
unitary matrices U and V which transform a given matrix A into a diagonal form
2 = U*AV, thus exhibiting A's singular values on S's diagonal. The scheme flrst
transforms A to a bidiagonal matrix J, then diagonalizes /. The scheme described
here is complicated but does not suffer from the computational difficulties which
occasionally afflict some previously known methods. Some applications are mentioned, in particular the use of the pseudo-inverse A1 = VEJU* to solve least squares
problems in a way which dampens spurious oscillation and cancellation.

1. Introduction. This paper is concerned with a numerically stable and


fairly fast method for obtaining the following decomposition of a given
rectangular matrix A:

(1.1) A = 1727*,

where U and V are unitary matrices and


of the same size as A with nonnegative
nal elements are called the singular value
the nonnegative square roots of the eige
Some applications of the decompositio
paper. In particular, the pseudo-inverse
form

(1.2) A1 = F27?/*,

where 2)7 is obtained from 2 by replacing ea


its reciprocal. The properties and applications
by Greville [15], Penrose [25], [26], and Benpseudo-inverse's main value, both conceptual

provides a solution for the following least-squar


Of all the vectars x which minimize the sum o

the shartest (has the smalkst || x || = x x)?

The solution is x = AJb. If there were only one

* Received by the editors July 14,1964, This paper


posium on Matrix Computations at Gatlinburg, Ten
f Computation Center, Stanford University, Stan
the first author was in part supported by Contrac
versity and by The Boeing Scientific Research Lab
{ University of Toronto, Toronto, Ontario. The
the National Research Council (of Canada) for their
puter Sciences at the University of Toronto.
205

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

206

G. GOLUB AND W. KAHAN

|| b ? Ax || we would save a bit of work by using

A1 = (A*A)~1A*

instead of (1.2), and this is what we often try to do. Bu


singular then there will be infinitely many vectors x w
mize || b ? Ax\\ and the last formula will have to be
which takes A's rank into account (cf. [4], [6], [7]). The

in this paper simplify the problem of assigning a rank to

In the past the conventional way to determine the ran


vert A to a row-echelon form, e.g.,

(rank = 3),

in which x's represent nonzero elements and 0's represe


formation was accomplished by premultiplying A by a
elementary matrices (cf. [5]) or of unitary matrices (c
liquidate the subdiagonal elements of each column in
obtain a simple picture like the one above it would hav
perform column-interchanges to ensure that the large
were being left on the diagonal (cf. "complete pivotin
Wilkinson [33]). It is certainly possible to arrange that
form of A each row will have its largest element on t
quently the rank of A is just the number r of consecutive
the diagonal of its row-echelon form; all rows after the r
correspondingly, should have just r nonzero singular va
But in floating-point calculations it may not be so easy t
some number is effectively zero or not. Rather, one will t
rank r by observing whether all rows after the rth ar
parison to the first r, with the expectation that the same
singular values. Even this criterion is hard to apply, as
ample shows:
-1 -1

-1

-1 -1

-1

1 -1

-1

-1
-1
1

If this matrix, already in row-echelon form, has a sufficiently large number

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

CALCULATING SINGULAR VALUES OF A MATRIX 207

of rows and columns, then, although it may not appear to the nak
be deficient in rank, it is violently ill-conditioned (it has a very ti

value), as can be seen by applying the matrix to the column ve


elements are, in turn,
1, 2"\ 2~\ 2-\ ? ? ? , 2~n, ??? .

On the other hand, when all the ? l's in the matrix are repla
then the resulting matrix is quite docile. Therefore, it would b
to tell, by looking at only the diagonal elements of the row-ech

whether or not the original matrix A had a singular value sufficien


to be deleted during the calculation of AT. In other words, without

explicitly at the singular values there seems to be no satisfacto


assign a rank to A.
The singular values of a matrix A are the nonnegative squar
the eigenvalues of A*A or AA*, whichever has fewer rows an
(see [1]). But the calculation of A A using ordinary floating p
metic does serious violence to the smaller singular values as we
corresponding eigenvectors which appear in U and V in (1.1). A

of these points can be found in a paper by Osborne [24], which also

a nice proof of the existence of the decomposition (1.1). Since t

of U are the eigen vectors of AA and the columns of V are the eige

of A*A, there is some possibility that a simple calculation of th


sition (1.1) could be accomplished by using double-precision ari
deal with A*A and AA* directly in some way. Such a scheme wo
venient with a machine like the IBM 7094 which has double-p

hardware. But for most other machines, and especially when a prog

language deficient in double-precision facilities is used, the co


scheme described in this paper seems to be the best we have.

Kogbetliantz [18], Hestenes [16], and Forsythe and Henric

proposed rotational or Jacobi-type methods for obtaining the decom

(1.1). Kublanovskaja [19] has suggested a QR-type method. Thes


are accurate but are slow in terms of total number of operation
Our scheme is based upon an idea exploited by Lanczos [20]; t
A

_/0 A\

~\A* o)

has for its eigenvalues the singular values of A, each appearing with both a
positive and a negative sign. The representation A could not be treated
directly by a standard eigenvalue-vector program without dealing with the
problems which we shall discuss in detail in what follows.

2. A matrix decomposition, In order to facilitate the computation of the


singular values and the pseudo-inverse of the complex m X n matrix A, we

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

208

G. GOLUB AND W. KAHAN

describe a convenient matrix decomposition. We assum


discussion that m ^ n without any loss of generality.
Theorem 1. Let A be any m X n matrix with complex ele
be decomposed as

A = PJQ*,

where P and Q are unitary matrices and J is an m X n


the form

ai ft 0 ? ? ? 0
a2 ft 0

o
J =

A.-1
OLn

(m ? n) X n

Proof. The proof will be a constructive one in which Householder trans-

formations (see [17], [21], [32]) are used. Let A = Aa) and let A(3/2),
A(2), ? ? ? , A(n\ A(n+m be defined as follows:
A(k+H2) _ p(k)AOc)
A(k+1) _ A(k+l/2)Q(k)

k = 1, 2, . ? ? , n,

k = 1, 2,

1.

?}(*)

P and Q are hermitian, unitary matrices of the form


pw _ T

2x?)x(?*

x?)*x?> = 1(
?(A0* (fc)

q(?) _ j _ 2y?'y?)*; yKK""yKK" = 1.


The unitary transformation P(fc) is determined so that
(fc+l/2) _ 0

i = fc + 1, -

m,

and Qa) is determined so that


(M-l) _ 0

j = k + 2,

and A(fc+1) has the form

ai ft 0 ?
0 a2 ft 0
.

i (fc+D

Oik ftb

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

,n,

CALCULATING SINGULAR VALUES OF A MATRIX 209

We illustrate the derivation of the formula for P{k\ In order n

those elements which have already been annihilated we set

Xia) = 0, i = 1, 2, ? ? ? , k - 1.
Since P{k) is a unitary transformation, length is preserved and consequently
_ \ U>'
(2.1) |a*|2= El<4'
i=k

Also, since Pik) is hermitian,


p(A0^(fc+l/2) = ^(k)
so that
/i o I (fc)|2\ (k)

(1 - 2 \xk I )ak = akik


, &)- (k) (k)

2x>k)xkak = a},k} , i = k
and hence

(2.2) |^>|* = ^1_^,


(2.3) z/fc) - af'fc
2akxk?
Equations (2.1), (2.2), and (2.3) define two possible vectors x(k) to within
scalar factors of modulus one. In the interest of numerical stability, let us
choose sgn ak so that xkk) is as large as possible. Thus
ak

(fc) / m \ 1/2

ctk ? ? , ,

Summarizing, we have
A(*+i/2) = Ac*> __ xw.2(x(fc)*Aw),
with

/ m \l/2
Sfc

-(gloffif) ,

XiCW = 0 for i < k,


xk(k)

ck

-G0+W ?
2s"\^f\xk) '

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

210 G. GOLUB AND W. KAHAN


and

Xik) = ckai^ for i > k.

(If sk = 0, just set ak = 0 and x(fc) = 0.) Simila


A(k+1) __ Aik+1/2) Q/ A (fc+l/2)?(fc)\ ?(&)*

- 2(Aa+1/2)ya))-y'

with

)l/2
(fc+1/2)

ak,k+i

ft = ? ^fc

/" = 0 for j g fc,

j(l + i^rj)

2/t+i =

(fc+1/2)

-( 24 NKFM yk+1

1/2

(say),

r-

and
a)

?//"' = dkakj for j > k + 1.


An alternative approach to bidiagonalizing A is to generate the columns
of P and Q sequentially as is done by the Lanczos algorithm for tridiagonalizing a symmetric matrix. The equations

AQ = PJ and P*A = JQ*


can be expanded in terms of the columns p* of P and q* of Q to yield

Aqi = aipi,

Aqi = fo_ipi_i + aiPi, 1


p;_iA = ai_iq?-i + /5i_iq^ ,J

* = 2, 3,

p? A = awqw .

These lead to the following algorithm.

Choose qi arbitrarily with || qi || = 1; ^Aew set Wi = ^.qi ;


(2.4)

?i = II wi lli Pi = (?i)_:lWi. Set z? = p*A ? oaqi , 0? = || z< ||,


q*+i = (Pi^Zifor i = 1, 2, ? ? ? , n ? 1; set w* = Aq{ ? &_ip;_i,
?? = II w* \\y Vi = 0*)~V/or i = 2, ? ? ? , n.

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

CALCULATING SINGULAR VALUES OF A MATRIX

211

Of course if ak (&) equals zero, one must choose a new


is orthogonal to the previously computed p/s (q*'s). It
by an inductive proof that the p/s and q/s generated
n columns of the desired unitary matrices P and Q.

Unless an ak or ft vanishes, the vector qx will comp


rest of the vectors p?- and q*. Consequently qi could
Lanczos-type algorithm would be mathematically ide

holder-type algorithm except for a diagonal unitary simi


tion. But the Lanczos-type algorithm is unstable in the p

error unless reorthogonalization along the lines sugg


[30] is used. That is, one must restore the orthogonal
vectors by using the Gram-Schmidt method to reorth
generated vector p* or q* to the previously generated
spectively. With the extra work involved in this reor
Lanczos-type algorithm is noticeably slower than the

Householder algorithm except possibly if A is a sparse m

3. Computation of the singular values. The singular v


are the same; they are the positive square roots of J J
in order,

where an g: 0.

0"l ^ 0~2 ^

These are the numbers which appear on the diagonal of the matrix 2 which
was introduced in (1.1), i.e.,
ci
02
0"3

2 =

0
0

(m ? n) X n

Analogous to (1.1) is the decomposition

(3.1) J = X27*

in which X and Y are unitary matrice


lated, will lead via Theorem 1, A = P
(1.1), namely,

A = J7S7*,

with U = PX,V = QY.

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

212

G. GOLUB AND W. KAHAN

Evidently the last m ? n rows of zeros in J do not


singular values, nor do they have more than a trivial
termination of X and Y. Therefore it is convenient to delete J's last m ? n
rows and write

Pi

a\

a2

olz fiz

J =

o
without introducing any new notation to distinguish this n X n matrix J
from the m X n matrix J. This can be done because the previous equations
remain valid after the following process of "abbreviation":

(i) delete the last m ? n rows of zeros in J and 2;


(ii) delete the last m ? n columns of P and U;
(iii) delete the last m ? n rows and columns of X; these coincide with
the last rows and columns of an m X m unit matrix. In this section and the

next we deal only with the abbreviated matrices.


The singular values <n oiJ are known (cf. [20]) to be related to the eigenvalues of the 2n X 2n matrix

= (/* o)
whose eigenvalues are just +cr? and ? <n for i ? 1, 2, ? ? ? , n. The calculation of the eigenvalues of J is simplified conceptually by a transformation

to tridiagonal form via a permutation similarity which will be exhibited


now.

Consider the matrix equation

(3.2)

C-O-C^?U)-

which, when expanded, takes the form

Jy = <7X, J*x = o-y,


that is, afti + PiVi+i = <r%i, ai#i = *2/i, 0Lnyn = <rxn , j8?-_iXi_i + ot&i = a-jji .
Now the substitution
%2i ? Xi ,

?2?-l

?2/?

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

213

CALCULATING SINGULAR VALUES OF A MATRIX

leads to the equation


Tz = ?<rz

in which T is a 2n X 2w tridiagonal matrix

'0 5i

?i 0 ft

ft o

r =

?2

<*2

?n

Clearly there exists a unitary diagonal matrix D such that the similarity
transformation

0
$2

DTD* = S =

(3.3)

0
yields a tridiagonal matrix S whose elements

Si = \cti\ and ?? = | & |


are all real and nonnegative.
There are a number of methods for obtaining the eigenvalues of a tri?
diagonal symmetric matrix. One of the most accurate and effective methods

is to use Sturm sequences; an ALGOL program is given by Wilkinson [35].


One can simplify the algorithm, of course, by taking advantage of the fact
that the diagonal elements of T are zero.
Another method of computing the singular values of J is to compute the

eigenvalues of

l?ij

aipi

otifii

J*J =

?2P2

?2/32

0
OLn-l Pn-1

Otn-1 Pn-1 | ?n | +

Jn-1

Note again that since J J is a tridiagonal hermitian matrix there ex

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

214 G. GOLUB AND W. KAHAN

diagonal unitary matrix A such that


2

S2 -+ t\ s212
Si h s2

s2 h

A(JV)A* = K =

Sn?1 vn?1

Sn?1 tn?1 Sn "T~ ^n~ 1

where s? = | a% | and ^ = | 0* |. Hence K is a real, symmetric, positive semidefinite, tridiagonal matrix and its eigenvalues can be computed by the
Sturm sequence algorithm.
Although the smaller eigenvalues of A*A are usually poorly determined,
a simple error analysis shows that all the eigenvalues of K are as welldetermined as those of T. The reason for this is that the computation of the

Sturm sequences is algebraically the same for both T and K. Thus to use K
is preferable since the total number of operations in calculating its eigen?
values is certainly less than in computing the eigenvalues of T.
4. Orthogonal vectors properly paired. We consider now the calculation
of the unitary matrices X and Y which were introduced in (3.1):

J = X27*.
As pointed out in ?3, J can be transformed into a real matrix by means of
unitary diagonal transformations, and we shall assume henceforth that this

has been done (cf. (3.3)).


To each singular value <n corresponds a column x,- of X and y? of Y satisfying

(4.1) Jji = (TiXi, J'xt = a&i.

Since JlJji = c/y* one could, in principle, calculate y* as


eigenvector of J*J corresponding to the eigenvalue <n , and
tained fom the vector Jy* either by dividing it by <n or by

However, if <n is small but not quite negligible, then Jyi


contaminated by the roundoff errors left over after canc
calculated x? may well be neither normalized nor orthogon
ously calculated x's.
Another way to calculate X and Y might be to obtain t
Xi and yi of JJ* and JlJ independently, and then to orde
cording to the ordering of the corresponding singular values

of the singular values are too close together then the eq


unlikely to be satisfied.

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

CALCULATING SINGULAR VALUES OF A MATRIX

215

A third wTay, which seems at first to be free from the

preceding paragraphs, is to obtain the eigenvector


tridiagonal matrix S of (3.3). Then the odd-numbere

would constitute a vector y?- and the even-numbered


Xi which would satisfy (4.1). But in practice trouble s

ways. First, the facts that (4.1) is very nearly satisfied a

normalized so that z/z?- = 2 do not, in practice, tho


theory, ensure that x/x? = y/y* = 1. Fortunately
negligible, one can normalize x? and y? separately wi

extra error. And if <n is negligible one can find x? and y?-

that they are normalized. The claims in the last two sent
but there is no point in doing so because the second sour
drastic; if the z/s are not orthogonal then neither will t
nor will the y/s. The problem of ensuring that the z/s a
the present state of the art of computation, a serious on
One way to ensure the orthogonality of calculated eigen

metric matrix is to use Jacobi's method [13], but this


is to reorthogonalize the calculated eigenvectors obtai
iteration with a tridiagonal matrix (cf. [30]); but the e
is no guarantee that the vectors after orthogonaliza
ceptable as eigenvectors. A third method, and one which

ing, involves the use of deflation to "remove" each eig


tained and thereby ensure orthogonality. We shall d
flation methods suitable for use with symmetric trid
then adapt them to our bidiagonal matrix.

In this digression let K be some real symmetric trid


ai

6i

6i

ai

b2

K =

bn-

of which we already know an eigenvalue X and its eigen vector v. Rutishauser


[27] shows how, in principle, to construct an orthogonal Hessenberg matrix

H from the vector v so that K\ = P^KH will have zero in place of &w_i.
After deleting the last row and column of the tridiagonal matrix Ki,
another eigenvalue, eigenvector and deflation would be calculated, and so
on. The eigenvectors of K would be the columns of an orthogonal matrix ob?

tained by multiplying together all the H's. The orthogonality of the eigen?
vectors would be guaranteed (to within the limits of acceptable rounding

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

216

G. GOLUB AND W. KAHAN

error) irrespective of the closeness of some eigenvalue


method needs some modification because, as Wilkinso

shown, the effect of rounding errors in the transformat


destroy K's tridiagonal form if v's first few components
In Rutishauser's deflation the matrix H can be interpret

2X2 Jacobi-like rotations applied in succession to the


second and third, third and fourth, ? ? ? , (n ? l)th a

columns of K. After the first rotation, each rotation is ch

a spurious term which was introduced by the previou

ample, an asterisk in the following figure marks the spur


third rotation must annihilate:

The first rotation, which fixes all the subsequent ones, can be determined
from the first two elements of K's eigenvector v as suggested by Rutishauser
[28, p. 226] or else from the first two elements of K ? XI. In effect, the de?

flation of the tridiagonal matrix K is equivalent to a Q22-transformation

applied to K ? XI in the manner suggested by Ortega and Kaiser [22].


Unfortunately, this method also can be shown to be numerically unsatisfactory whenever v's last few components are abnormally small, because
then the element in K\ which replaces bn-i in K remains too large, in general,

to be ignored. Wilkinson [34, p. 187] hints at another method analogous to


the one he described in [31, pp. 351-353]; we shall outline this method

briefly because we believe it to be an effective compromise between


Rutishauser's two schemes.

Having found an eigenvalue X of K we calculate the corresponding eigen?


vector v and normalize it so that its largest component lies between ? and 2,

say. The calculation of v can be accomplished using the inverse iteration


described by Wilkinson [30]; but since there is no way to prove that, in
general, one of his inverse iterations will suffice to provide an adequately
accurate v, we describe the following method whose properties can be established rigorously. We require that X be chosen to be the algebraically greatest
or else the least eigen value of K; this is no inconvenience since in the course

of K's successive deflations each of its eigenvalues will be at some time the
greatest or the smallest of the current matrix on hand. Next we apply

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

CALCULATING SINGULAR VALUES OF A MATRIX

217

Gaussian elimination to K ? XI without pivotal interc


no trouble here (cf. [33, pp. 285-286]) provided floatin
used and provided X, if not exactly right, is larger than

than K's smallest eigenvalue by perhaps a unit or so i

point here is that each nonzero pivot u% in the eliminati

of the same sign as (K ? \I)ys diagonal elements. The


nation process is to expressK ? \I = LU, where
1

k 1

k 1
k

0
ln-1 1

and

ui h
U2 &2
Uz

U =

0
bn-l
Un

Here U\ ? a\ ? X and l% = bi/ui, Ui+i = ai+1 ? X ? Ubi for i = 1, 2, ? ? ? ,


n ? 1. Next we attempt the solution of (K ? \I)v = r using for r a vector
whose elements all have the same magnitude but signs chosen to maximize
the elements of v. The choice of sign is accomplished by first solving
L s = r as follows :

81 = +1,

si+i = ( ? Usi) + sgn ( ? Usi), i = 1, 2, ? ? ? , n ? 1.


The solution of Uv ? s for v and the subsequent normalization of v complete the calculation. Provided no two pivots Ui have opposite signs one can
show that the elements of v each have the same signs as the corresponding
elements of the desired eigenvector despite the rounding errors committed
during v's calculation. Furthermore, the elements of r exhibit the same signs
as those of +v or ? v, depending upon the sign of the uSs. Consequently the
cosine of the angle between r and the correct eigenvector is at least iV~1/2 in
magnitude, and finally we conclude that Kv must differ from Xv by no more

than a few units in the last place (cf. the argument in [30]). Now even if v
is contaminated by components of the eigenvectors corresponding to other

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

218

G. GOLUB AND W. KAHAN

eigenvalues pathologically close to X, it will look enoug

vector to permit the deflation process to proceed. This pr


ing v is simpler and a little faster than Wilkinson's.

Now that we have v we proceed to the deflation alon


by Wilkinson. Each 2X2 rotation is embedded in an n
i

o
Cj Sj

Pi =

Sj Cj

3 = 1, 2,

,72 ? 1,

0
with Cj for its jth and (j + l)th diagonal elements, where c/ + s/ = 1.
Suppose the products

P3Pj-x ? ? ? Pi(K - Xl)Pi ? ? ? PUP/ and PjPj^ ? ? ? ftv


have the forms

Pj Pj-, ? ? ? P1(K - XDPS ? ? ? PU Pj

and

Pj Pj-! ' ' ? Pi V =

At the start we can take

h = ai ? X, Wq = bi, 0i = vi

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

CALCULATING SINGULAR VALUES OF A MATRIX 219

To continue the deflation we must so determine Py+i that its app


simultaneously annihilate the spurious element w3- in the jih row
of the matrix as well as the vector's (j + 1 )th element <?y+i. But

the accumulation of rounding errors will prevent the exact a


both elements; instead we shall have to be satisfied with a Pj+
negligible residuals in place of w, and #y+i. Wilkinson, having
so that its largest element lies between \ and 2, would use which
equations

wjcj+i = hjSj+i ? <t>j+iCj+i = ?V3+2SJ+1,


contained the largest coeflicient \wj\, | Ay |, | $y+i |, or | Vj+2 | to determine,

in conjunction with cy+i + sy+i = 1, the values cy+i and sy+i. This method
seems to be effective and we believe that it should always work, but since
we cannot prove the method's mfallibility, our work is incomplete.
Now we can show how to construct a deflation process for the bidiagonal
matrix J. The first step is to obtain J's largest singular value a; a2 is the
largest eigenvalue of the tridiagonal matrix JlJ (see ?3). The next step
requires the corresponding vectors x and y which can be obtained either by
solving JlJy = a2y for y and setting x = </~Vy, or by calculating <r's eigen?

vector z of S in (3.3) and hence obtaining x and y from z's even and odd
components respectively. Both methods for getting x and y are numerically
stable when performed in floating point. The deflation of J is accomplished
by a sequence of 2 X 2 rotations applied in succession to its first and second
columns, its first and second rows, its second and third columns, its second

and third rows, its third and fourth columns, ? ? ? , its (n ? l)th and nth
rows. The ith rotation applied to rows i and i + 1 of J must simultaneously
annihilate a spurious subdiagonal element, introduced into row i + 1 by the
previous column rotation, and the ith element in the current x-vector. The
ith column rotation, except for the first, must annihilate a spurious term
introduced by the previous row rotation into the (i + l)th column just
above the first superdiagonal, and simultaneously the transpose of the ith
column rotation must liquidate the ith element of the current y-vector. The
first column rotation would when applied to J*J ? a2I annihilate the element

in its first row and second column. At the end of the deflation process J's
element 6w_i should have been replaced by zero. Of course, rounding errors
will prevent the rotations from performing their roles exactly upon both the
matrix J and the vectors x and y, but just as in the deflation of a tridiagonal
matrix we are able so to determine the rotations that negligible residuals are

left behind in place of the elements we wished to liquidate.


After deflating J we delete its last row and column and repeat the process
until J is deflated to a 1 X 1 matrix or the deflated J becomes negligibly
small. At the end we multiply the rotations in reverse order to construct the

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

220 G. GOLUB AND W. KAHAN

matrices X and Y which put J" into the form


J = XXY.

(If / was complex, a unitary diagonal transformation should be incor

here.) Finally the matrices P and Q of Theorem 1 are multiplied t

U = PX, V = QY,
to exhibit the decomposition (1.1):
A = U2V.

The two matrix multiplications PX and QY take most of the w

5. Applications. The basic decomposition given by (1.1) has m


plications in data analysis and applied mathematics. Suppose the
arises from statistical observation, and we wish to replace A b
matrix A (say) which has lower rank p and is the best approxim
in some sense. If we use the Frobenius norm (i.e., || A ||2 = trace
the problem has been solved [8] as follows.

Theorem 2. Let A be an m X n matrix of rank r which has

elements. Let Sp be the set of all m X n matrices of rank p < r. Th

B G Sp,

|| A - A || ^ || A - B ||,
where

A = utv*

and 2 is obtained from the 2 of (1.1) by setting to zero all but its p largest
singular values <n.

Proof. Since A = [72 y* and the Frobenius norm is unitarily invariant,

B || = || 2 - U*BV
Let U*BV = C. Then

|| s - C ||2 = Z I <r< - cu |2 + Z I Cij |2 ? EI *i ~ ca |2.


i=l

Now
di
^
and

i,4j

it

i=l

is

ai+i.
c^
=

conv

Thus,
0
othe

II yl /T II / 2 I I 2\^2

|| A ? A || = (oTp+i + ? ? ? + o> ) .
Finding the vector x of shortest length which minimizes || b ? Ax \\ is
equivalent to finding the vector y of shortest length which minimizes

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

OALCULATING SINGULAR VALUES OF A MATRIX 221

II c ? Jy II ? where c = P*b and y = Qx. Here a natural que


there any method which bypasses the complicated scheme in
exhibiting J's singular values explicitly, and instead takes ad
simple bidiagonal form to solve the least squares problem or
Such a method, if it exists, must retain provision for intent
tions designed to delete, in effect, negligible singular value

ducing too large a discrepancy in J or A. Unfortunately, J's sim


deceptive; even J's rank is hard to estimate without further cal
example, if J's rank r is less than n, then at least n ? r of the
sibly more, should vanish; but in practice none of the c^-'s may

even though several may be very small compared with adjac


consequence, a few of J's singular values may be negligible.

Perhaps the recurrence described by Greville [15] can be m


introduction of pivoting and then applied to J to calculate
scheme is worked out, the best method we can suggest for solv
squares problem together with controllable perturbations is
Compute explicitly the representation

A = c/2y*,

decide which of the singular values are small enough to igno


remaining singular values by their reciprocals to obtain 27,

A1 = y27C7*
to obtain the least squares solution x = A7b. Once again, to ignore some
singular values o>+i, ar+2, ? ? ? , cw is equivalent to perturbing A by a matrix
whose norm is (X^?=r+i cr*)12.
In some scientific calculations it is preferable that a given square matrix A

be perturbed as little as possible (just rounding errors), but instead a


perturbation 5b in the right-hand side b of the equation Ax = b is permissible provided || 5b || does not exceed a given tolerance e. The substitution

y = y*x, c = U*b, 5c = t/*5b,


transforms the perturbed equation Ax = b + 5b into an equivalent diagonal
system

2y = c + 5c,
in which the permissible perturbation 5c still satisfies

(5.1) || 5c || < e.
Subject to this constraint, 5c may be

For example, suppose we wish to

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

222 G. GOLUB AND W. KAHAN

should satisfy S25c + X(c + 5c) = 0 with so


Lagrange multiplier X sufficiently small

for most practical purposes it is sufficient t

X to within a factor of two so that 5c =


8c dc < e . The use of such a technique in
suppress violent oscillation and cancellation
from the usefulness of the solution x.

A similar technique is valuable for the solu


tions which approximate integral equations o

J A(i,j)x(j) dj = b(i).

Here the numerical treatment of the inte

values, is similar to the theoretical treatmen

use of the decomposition A = U^V aids th


lations in the function x.

We close with a warning; diagonal transformations can change A's


singular values and A1 in a nontrivial way. Therefore some sort of equilibra-

tion may be necessary to allow each row and column of A to communicate


its proper significance to the calculation. Two useful forms of equilibration
are:

(i) scale each row and column of A in such a way that


roughly the same norm and so have all the columns;
(ii) scale each row and column of A in such a way tha
certainty in each element of A does not vary much from

On least squares problems such equilibration is accomp


each residual in the sum of squares (see [2], [10], [11],
algorithms, and [14]).
REFERENCES

[1] A. R. Amir-Moez and A. L. Fass, Elements of Linear Spaces, Pe


York, 1962, Chap. 12.
[2] F. L. Bauer, Optimally scaled matrices, Numer. Math., 5 (1963)
[3] A. Ben-Israel and A. Charnes, Contributions to the theory of
verses, J. Soc. Indust. Appl. Math., 11 (1963), pp. 667-699.
[4] A. Ben-Israel and S. J. Wersan, An elimination method for comp
eralized inverse of an arbitrary complex matrix, J. Assoc. Co
10 (1963), pp. 532-537.
[5] G. Birkhoff and S. MacLane, A Survey of Modern Algebra, M
York, 1953, Chap. VII, ?6.
[6] J. W. Blattner, Bordered matrices, J. Soc. Indust. Appl. Math
528-536.

[7] J. C. G. Boot, The computation of the generalized inverse of singular or rectangular

matrices, Amer. Math. Monthly, 70 (1963), pp. 302-303.

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

CALCULATING SINGULAR VALUES OF A MATRIX 223

[8] C. Eckart and G. Young, The approximation of one matrix by a


rank, Psychometrika, 1 (1936), pp. 211-218.
[9] G. E. Forsythe and P. Henrict, The cyclic Jacobi method for comp
cipal values of a complex matrix, Trans. Amer. Math. Soc, 94 (
1-23.

[10] G. E. Forsythe and E. G. Straus, On best conditioned matrices, Proc. Amer.


Math. Soc, 6 (1955), pp. 340-345.
[11] D. R. Fulkerson and P. Wolfe, An algorithm for scaling matrices, SIAM Rev.,
4 (1962), pp. 142-146.
[12] W. Givens, Numerical computation of the characteristic values of a real symmetric

matrix. Oak Ridge National Laboratory, Report No. 1574, 1954.


[13] H. H. Goldstine, F. J. Murray, and J. von Neumann, The Jacobi method for
real symmetric matrices, J. Assoc. Comput. Mach., 6 (1959), pp. 59-96.
[14] G. H. Golub, Comparison of the variance of minimum variance and weighted least
squares regression coefficients, Ann. Math. Statist., 34 (1963), pp. 984-991.
[15] T. N. E. Greville, Some applications of the pseudo-inverse of a matrix, SIAM
Rev., 2 (1960), pp. 15-22.
[16] M. R. Hestenes, Inversion of matrices by biorthogonalization and related results,
J. Soc. Indust. Appl. Math., 6 (1958), pp. 51-90.
[17] A. S. Householder, Unitary triangularization of a nonsymmetric matrix, J.
Assoc. Comput. Mach., 5 (1958), pp. 339-342.
[18] E. G. Kogbetliantz, Solution of linear equations by diagonalization of coefficients
matrix, Quart. Appl. Math., 13 (1955), pp. 123-132.
[19] V. N. Kublanovskaja, Some algorithms for the solution of the complete problem
of eigenvalues, V. Vycisl. Mat.i. Mat. Fiz., 1 (1961), pp. 555-570.
[20] C. Lanczos, Linear Differential Operators, Van Nostrand, London, 1961, Chap. 3.
[21] D. D. Morrison, Remarks on the unitary triangularization of a nonsymmetric
matrix, J. Assoc. Comput. Mach., 7 (1960), pp. 185-186.
[22] J. M. Ortega and H. F. Kaiser, The LLl and QR methods for symmetric tri?
diagonal matrices, Comput. J., 6 (1963), pp. 99-101.

[23] E. E. Osborne, On pre-conditioning of matrices, J. Assoc. Comput. Mach., 7


(1960), pp. 338-345.
[24] -, On least squares solutions of linear equations, Ibid., 8 (1961), pp. 628-636.
[25] R. Penrose, A generalized inverse for matrices, Proc. Cambridge Philos. Soc, 51
(1955), pp. 406-413.
[26]-, On best approximate solutions of linear matrix equations, Ibid., 52 (1956),
pp. 17-19.
[27] H. Rutishauser, Deflation bei Bandmatrizen, Z. Angew. Math. Phys., 10 (1959),
pp. 314-319.

[28] -, On Jacobi rotation patterns, Proc. Symp. Appl. Math. XV, Experimental
Arithmetic, High Speed Computing, and Mathematics, Amer. Math. Soc,
1963, pp. 219-240.
[29] F. Smithies, Integral Equations, Cambridge University Press, Cambridge, 1958,
Chap. VIII.
[30] J. H. Wilkinson, The calculation of the eigenvectors of codiagonal matrices,
Comput. J., 1 (1958), pp. 148-152.
[31] -, Stability of the reduction of a matrix to almost triangular and triangular
forms by elementary similarity transformations, J. Assoc. Comput. Mach., 6
(1959), pp. 336-359.

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

224 G. GOLUB AND W. KAHAN

[32] -, Householder's method for the solution of the


put. J., 3 (1960), pp. 23-27.
[33]-, Error analysis of direct methods of matrix in
Mach., 8 (1961), pp. 281-330.
[34] -, Error analysis of eigenvalue techniques based o
J. Soc. Indust. Appl. Math., 10 (1962), pp. 162[35] -, Calculation of the eigenvalues of a symmetri
method of bisection, Numer. Math., 4 (1962), pp.

This content downloaded from 202.78.175.199 on Thu, 01 Sep 2016 12:17:33 UTC
All use subject to http://about.jstor.org/terms

Das könnte Ihnen auch gefallen