Sie sind auf Seite 1von 19

I.

REVIEW OF LINEAR ALGEBRA

A. Equivalence

Definition A1. If A and B are two m x n matrices, then A is equivalent to B if we can obtain B from A by a finite sequence of elementary row or elementary column operations. Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form

Ik O m-k k

O n-k m-k O n-k

Theorem A.2. Two m x n matrices A and B are equivalent if and only if B = PAQ for some nonsingular matrices P and Q. Theorem A.3. An n x n matrix A is nonsingular if and only if A is equivalent to In.

B. Rank

Definition B1. Let


+ * * * * * * .

a11 a21 . . am1

a12

@@@

a1n a2n . . amn

A =

a22 @@@ . . am2 @@@

, * * * * * * -

be an m x n matrix. The rows of A, considered as vectors in Rn, span a subspace of Rn, called the row space of A. Similarly, the columns of A,considered as vectors in Rm, span a subspace of Rm called the column space of A. Definition B2. The dimension of the row(column)space of A is called the row(column) rank of A. 1

Theorem B.1. The row rank and column rank of the matrix A = [aij] are equal. Since the row and column ranks of a matrix are equal, we shall now merely refer to the rank of a matrix. Note that rank In =n. Theorem A.2 states that A is equivalent to B if and only if there exist nonsingular matrices P and Q such that B = PAQ. If A is equivalent to B, then rank A = rank B, for rank B = rank(PAQ) = rank(PA) = rank A.

We also recall from the previous section 7 that if A is an m by n matrix, then A is equivalent to a matrix, C =
+ * * * . ,

Ik 0

0 * 0 *
*

Now rank A = rank C = k.

C. Determinants The minor *Mij* of the square (n x n) matrix A is the determinant of (n-1)x(n-1) matrix Mij formed from A minus the ith row and jth column. The cofactor of element aij is

Cij ' (&1)i%j *M ij*


Laplace expansion of determinant
n

*A* ' j aij Cij


i ' 1

for constant j

or
n

*A* ' j aij Cij


j ' 1

for constant i.

Theorem C.1. If A is a matrix, then *A* = *AT*.

Theorem C.2. If matrix B results from matrix A by interchanging two rows(columns) of A, then *B* = -*A*. Theorem C.3. If two rows(columns) of A are equal, then *A* = 0. Theorem C.4. If a row(column) of A consists entirely of zeros, then *A* = 0. Theorem C.5. If B is obtained from A by multiplying a row(column) of A by a real number c, then *B* = c*A*. Theorem C.6. If B = [bij] is obtained from A = [aij] by adding to each element of the rth row(column) c times the corresponding element of the sth row (column)(r then *B* = *A*. Theorem C.7. If a matrix A = [aij] is upper (lower) triangular, then *A* = a11a22@@@ann; that is, the determinant of a triangular matrix is the product of the elements on main diagonal. Theorem C.8. If A is an n x n matrix, then A is nonsingular if and only if *A* 0. Theorem C.9. If A and B are n x n matrices, then *AB* = *A*@*B*.

s),

the

D. Special Matrices

(1) The adjoint of A is Adj(A) = [Cji] (2) Inverse of A; A-1 = Adj(A)/*A* (3) Symmetric matrix; A = AT (4) Skew-symmetric matrix; A = -AT for square, real A. (5) Associate matrix of A; (A*)T (6) Hermitian matrix; A equals its associate (7) Involutary matrix; AA = I (8) Orthogonal Matrix; A-1 = AT

(9) Toeplitz matrix; a(i,j)=R(i-j) (10) Autocorrelation matrix; A = E[x@xT] for n by 1 vector x. A is Toeplitz if x represents evenly spaced samples from a wide-sense stationary random process. (11) Autocovariance matrix; A = E[(x-E[x])(x-E[x])T] for n by 1 vector x. A is Toeplitz if x represents evenly spaced samples from a wide-sense stationary random process.

E. Positive Definite Matrices An n x n matrix C with the property that xTCx > 0 for any nonzero vector x in Rn is called positive definite. Such a matrix is nonsingular, for if C is singular, then the homogeneous system Cx = 0 has a nontrivial solution xo. Then xoTCxo = 0, contradicting the requirement that xTCx > 0 for nonzero x. Conversely, if C = [cij] is any n x n symmetric matrix that is positive definite (that is, xTCx > 0 if x is a nonzero vector in Rn), then we define (",),for " = a1"1 + a2"2 + @@@ + an"n = b1"1 + b2"2 + @@@ + bn"n
n n

and in V by

( , ) ' j j ai cij bj .
i ' 1 j ' 1

It is not difficult to show that this defines an inner product on V. Theorem E.1. A real, symmetric matrix C is positive definite iff there exists a nonsingular matrix A such that C=ATA Theorem E.2. ATA is positive definite for nonsingular A since xT(ATA)x = (Ax)T(Ax) = 5Ax522 > 0 for every nonzero n by 1 vector x. Theorem E.3. A matrix C is positive definite (semidefinite) iff its eigenvalues are all positive (nonegative). Similarly, C is negative definite (semidefinite) iff its eigenvalues are all negative. C is indefinite if it has both positive and negative eigenvalues. Theorem E.4. A matrix C is positive definite iff every upper left hand determinant (leading principal minor) of C is positive. This is called Sylvester's test.

Theorem E.5. The Toeplitz matrix formed from an autocorrelation function is at least positive semidefinite. Theorem E.6. The non-Toeplitz autocorrelation matrix A = E[x@xT] is also positive semidefinite.

F. Orthogonal Matrices Definition F1. A square matrix A is called orthogonal if A-1 = AT. Of course, we can also say that A is orthogonal if ATA = In. Theorem F.1 All the roots of the characteristic polynomial of a real symmetric matrix are real numbers. Theorem F.2 If A is a symmetric n x n matrix, then the eigenvectors that belong to distinct eigenvalues of A are orthogonal. Theorem F.3. The n x n matrix A is orthogonal if and only if the columns(and rows) of A form an orthonormal set. Theorem F.4 If A is a symmetric n x m matrix, then there exists an orthogonal matrix P such that P-1AP = PTAP = D, a diagonal matrix. The eigenvalues of A lie on the main diagonal of D.

G. Characteristic Equation, Eigenvalues, And Eigenvectors

The characteristic equation for an n x n matrix A is P(8) = *8I-A* = 0, or P(8) = 8n+a18n-1+a28n-2+...+an-18+an = 0 or P(8) = (8-81)(8-82)...(8-8n) Now

P(0) = *-A* = (-1)n*A* and P(0) = (-1)n(818283...8n) so


*A* = 818283...8n

For each eigenvalue 8i, the exists an eigenvector xi which is a solution to [8iI-A]xi = 0 Also, each xi can be found as any nonzero column of Adj[8iI-A]. The modal matrix is one formed from columns, k ixi. Theorem G.1. The eigenvalues of a symmetric matrix A are real. Theorem G.2. The eigenvectors, corresponding to distinct eigenvalues of a symmetric matrix A, form an orthogonal set (Same as Thm. F.2.). Theorem G.3. If A has eigenvalues 8i, then the matrix Ak has eigenvalues 8ik. This can be proved using Thm. F.4. Theorem G.4. If the nonsingular matrix A has eigenvalues 8i, then the matrix A-1 has eigenvalues 1/8i. Theorem G.5. If A has eigenvalues 8i, then the matrix I@k + A has eigenvalues 8i+k. Theorem G.6. Two matrices are similar if they have the same eigenvalues.

If P is a nonsingular matrix and A = P@S@P-1 then A and S are similar, since if 8 is an eigenvalue of S, then S@x = 8@x, P@S@P-1(P@x) = 8@(P@x) 6

H. Spectral Decomposition

Expanding on Thm. F.4, any arbitrary complex-valued n by n matrix A with n linearly independent eigenvectors can be decomposed as A = P@S@P-1 where S = diag(81,...,8n), P = [p1,...,pn], 8i is the ith eigenvalue of A and pi is the corresponding eigenvector. This is proved as follows. By the definition of eigenvectors and eigenvalues, Api = 8ipi so AP = PS, from which we get A = P@S@P-1 For the general case, the eigenvalues can be positive or negative. If A is symmetric, then S is real. If A is positive semidefinite, then the diagonal elements of S (eigenvalues of A) are nonnegative (from Thm. E.3).

II. The Singular Value Decomposition (SVD)

A. Orthogonality And The SVD

The singular value decomposition-the SVD-is a powerful computational tool for analyzing matrices and problems involving matrices which has applications in many fields. In the remaining sections, we shall define the SVD, describe some other applications, and present an algorithm for computing it. The algorithm is representative of algorithms currently used for various matrix eigenvalue problems and serves as an introduction to computational techniques for these problems as well. Although it is still not widely known, the singular value decomposition has a fairly long history. Much of the fundamental work was done by Gene Golub and his colleagues W. Kahan, Peter Businger, and Christian Reinsch. Our discusssion will be based largely on a paper by Golub and Reinsch(1971). The underlying matrix eigenvalue algorithms have been developed by J.G.F. Francis, H. Rutishauser, and J.H. Wilkinson and are presented in Wilkinsons book(1965). Recent books by Lawson and Hanson(1974) and Stewart(1973)discuss the SVD as well as many related topics. In elementary linear algebra, a set of vectors is defined to be independent if none of them can be expressed as a linear combination of the others. In computational linear algebra, it is very useful to have a quantative notion of the "amount" of independence. We would like to define a quantity that reflects the fact that, for example, (1, 0, 0), (0, 1, 0), and (0, 0, 1) are very independent, whereas (1.01, 1.00, 1.00), (1.00, 1.01, 1.00), and (1.00, 1.00, 1.01) are almost dependent. Since two vectors are dependent if they are parallel, it is reasonable to regard them as very independent if they are perpendicular or orthogonal. Using a superscript T to denote the transpose of a vector or matrix, two vectors u and v are orthogonal if their inner product is zero, that is, if uTv = 0. Moreover, a vector u has length 1 if uTu = 1. A square matrix is called orthogonal (see Def. F1) if its columns are mutually orthogonal vectors each of length 1. Thus a matrix U is orthogonal if UTU = I, 8 the identity matrix.

Note that an orthogonal matrix is automatically nonsingular, since U-1 = UT. In fact, we shall soon make precise the idea that an orthogonal matrix is very nonsingular and that its columns are very independent. The simplest examples of orthogonal matrices are planar rotations of the form U =
+ * cos2 * *-sin2 . , * -

sin2 * cos2 *

If x is a vector in 2-space, then Ux is the same vector rotated through an angle 2. It is useful to associate orthogonal matrices with such rotations, even though in higher dimensions orthogonal matrices can be more complicated. For example,
+ * 1 * ))) * 49 * * .

24 41 12

36 -12 -31

23 *
* * -

U =

-24 * 36 *

is orthogonal but cannot be interpreted as a simple plane rotation. Multiplication by orthogonal matrices does not change such important geometrical quantities as the length of a vector or the angle between two vectors. Orthogonal matrices also have highly desirable computational properties because they do not magnify errors. For any matrix A and any two orthogonal matrices U and V, consider the matrix G defined by G = UTAV. If uj and vj are the columns of U and V, respectively, then the individual components of G are Fij = uiTAvj. The idea behind the singular value decomposition is that by proper choice of U and V it is possible to make most of the Fij zero; in fact, it is possible to make G diagonal with nonnegative diagonal entries. Consequently, we make the following definition. A singular value decomposition of an m-by-n real matrix A is any factorization of the form A = UGVT, where U is an m-by-m orthogonal matrix, V is an n-by-n orthogonal matrix, and G is an m-by-n 9

diagonal matrix with Fij = 0 if ij and Fii = Fi$ 0. The quantities Fi are called the singular values of A, and the columns of U and V are called the left and right singular vectors. Note the similarity to Thm. F.4 and section I.H. Readers familiar with matrix eigenvalues should note that the matrices AAT and ATA have the same nonzero eigenvalues and that the singular values of A are the positive square roots of these eigenvalues. Moreover, the left and right singular vectors are particular choices of the eigenvectors of AAT and ATA, respectively. (See theorems E.1, E.2, and E.3). In the language of abstract linear algebra, the matrix A is the representation of some linear transformation in a particular coordinate system. By making one orthogonal change of coordinates in the domain of this transformation and a second orthogonal change of coordinates in the range, the representation becomes diagonal.

B. Rank And Condition Number

The notion of the rank of a matrix is fundamental to much of linear algebra (see section I.B.). The usual definition is the maximum number of independent columns, or, equivalently, the order of the maximal nonzero subdeterminant in the matrix. Using such a definition, it is difficult to actually determine the rank of a general matrix in practice. However, if the matrix is diagonal, it is clear that its rank is the number of nonzero diagonal entries. If a set of independent vectors is multiplied by an orthogonal matrix, the resulting set is still independent. In other words, the rank of a general matrix A is equal to the rank of the diagonal matrix G in its SVD. Consequently, a practical definition of the rank of a matrix is the number of nonzero singular values. We shall use the letter k to denote rank. An m-by-n matrix with m $ n is said to be of full rank if k = n or rank deficient if k < n. For square matrices, the more common terms nonsingular and singular are often used for full rank and rank deficient, respectively. Since the rank of a matrix must always be an integer, it is necessarily a discontinuous function of the elements of the matrix. Arbitrarily small changes(such as roundoff errors) in a rank-deficient matrix can make all of its singular values nonzero and hence create a matrix which is technically of full rank. In practice, we work with the effective rank, the number of singular values greater than some prescribed tolerance which reflects the accuracy of the data. This is also a discontinuous function, but the discontinuities are much less numerous and troublesome than those of the theoretical rank. The great advantage of the use of the SVD in determining the rank of a matrix is that decisions need be made only about the negligibility of single numbers-the small singular 10

values-rather than vectors or sets of vectors. We can now precisely define the measure of independence mentioned earlier. The condititon number of a matrix A of full rank is cond(A) = Fmax/Fmin where Fmax and Fmin are the largest and smallest singular values of A. If A is rank deficient, then Fmin = 0 and cond(A) is said to be infinite. Clearly, cond(A) $ 1. If cond(A) is close to 1, then the columns of A are very independent. If cond(A) is large, then the columns of A are nearly dependent. If A is square, then terms like nearly singular or far from singular can be given fairly precise meanings. A matrix A is considered to be more singular than a matrix B if cond(A) > cond(B). If A is orthogonal, then cond(A) = 1, and so the columns of an orthogonal matrix are as independent as possible. Conversely, if cond(A) = 1, then it turns out that A must be a scalar multiple of an orthogonal matrix. Two common norms for the m by n matrix A are
m n

2A2 ' ( j j *a(i,j)*2).5


i ' 1 j ' 1

and 2A2 ' max 2Ax22 2x22 max ' 2x2 '1 2Ax22 2

which are respectively the Euclidean matrix norm or Frobenius norm, and the spectral norm. Using the singular values Fi where F1 is assumed to be the largest, the two norms are expressed respectively as
n

2A2 ' ( j

i ' 1

2 .5 i) ,

5A5 = F1

11

C. Evaluating Determinants and Finding Singular Matrices

Let A(z) be a square matrix which depends in a possibly complicated way on some parameter z. Consider finding a value, or values, of z for which A(z) is singular. Equivalently, find z for which the determinant of A(z) is zero. Using det to denote determinant, det(A) = det(U)@det(G)@det(VT). The determinant of an orthogonal matrix is 1, and the determinant of a diagonal matrix is simply the product of its elements, so det(A) = F1@F2@...@Fn. Computing determinants can be tricky because the value can vary over a huge range, and floating-point underflows and oveflows are frequent. Many problems which are formulated in terms of determinants do not require the actual value of the determinant but simply some indication of when it is zero. It is theoretically possible to compute *det(A)* by taking the product of the singular values of A. But in most situations, it is sufficient to use only the smallest singular value and thereby avoid underflow/overflow and other numerical difficulties.

D. The Eigenvalue Problem

This is not really a practical application of SVD but rather an indication of how the SVD is related to a larger class of important matrix problems. Let A be a square matrix. The eigenvalues of A are the numbers 8 for which Ax = 8x has a nonzero solution x. Since this is equivalent to requiring det(A-8I) = 0 the eigenvalues could theoretically be computed by finding the roots of this polynomial. However, consider

12

A =

for some small but not negligible number e. The true eigenvalues are 81 = 1 + e and 82 = 1 - e. The polynomial det(A-8I) is 82 - 28 + (1-e2). If e is small, then e2 is much smaller, and it is necessary to have twice the precision in the coefficients of the polynomial than is present in either the matrix elements or the eigenvalues. This difficulty becomes even more pronounced for higher-order matrices. Consequently, modern methods for computing eigenvalues avoid the use of polynomials or polynomial root finders. The connection between SVD and eigenvalues is simplest for matrices which are symmetric and positive semidefinite. Symmetry is easy to define and check- A is symmetric if aij = aji for all i and j. Positive semidefiniteness is much more elusive-positive semidefinite means that xTAx $ 0 for all x. Roughly this means that the diagonal elements of A are fairly large compared to the off-diagonal elements. In fact, a sufficient condition is that for each i

+ * * * .

1 e

e * 1 *
*

aii $ j *aij *
j 1

but this is not necessary (see section I.E.). It is not difficult to show that if A is real and symmetric, then all its eigenvalues are real, and if A is positive semidefinite, then all its eigenvalues are nonnegative. If A = UGVT and A = AT, then A2 = ATA = VGTUTUGVT = VG2VT Thus, A2V = VG2. Letting vj denote the columns of V and looking at the jth column of this equation, we find A2vj = F2jvj Since the eigenvalues of A2 are the squares of the eigenvalues of A (from Thm. G.3) and since the 13

eigenvalues of A are nonnegative, we conclude that the eigenvalues of a symmetric, positivesemidefinite matrix (see Thm. E.3) are equal to its singular values and that the eigenvectors are the columns of V. Thus, V is a modal matrix for A. If A is symmetric but not positive semidefinite, then some of its eigenvalues are negative. It turns out that the absolute values of the eigenvalues are equal to the singular values and that the signs of the eigenvalues can be recovered by comparing the columns of U with those of V, but we shall not go into details. If A is not symmetric, there is no simple connection between its eigenvalues and its singular values. This shows that the SVD algorithm could be used to compute the eigenvalues of symmetric matrices. However, this is not particularly efficient and not recommended. Our only point is that the SVD is closely related to matrix eigenvalue problems. Careful study of the SVD algorithm will help in understanding eigenvalue algorithms as well.

E. Linear Sets of Equations

Let A be a given m by n matrix, where m$n, and let b be a given m vector. We want to find all n vectors x which solve A@x = b (1)

Note that this includes the cases where A is square and singular, and square and nonsingular. Important questions include; (1) Are the equations consistent ? (2) Do any solutions exist ? (3) Is the solution unique ? (4) Do any nonzero solutions exist ? (5) What is the general form of the solution ? There are many methods for answering these questions, but the SVD is the only reliable method.

Using the SVD of A, equation (1) becomes UGVTx = b

(2)

14

and hence Gz = d

(3)

where z = VTx and d = UTb. The system of equations in (3) is diagonal and is therefore easily studied. It breaks up into as many as three sets, depending on the values of the dimensions m and n and the rank k, the number of nonzero singular values. The three sets are: Fjzj = dj, if j # n and Fj 0, 0@zj = dj, if j # n and Fj = 0, 0 = dj, if j > n.

(4) (5) (6)

Theorem E.1. The equations in (1) are consistent and a solution exists iff dj=0 whenever Fj=0 or j>n. Theorem E.2. If k<n, then the zj associated with a zero Fj can be given an arbitrary value and still yield a solution to the system. When transformed back to the original coordinates by x = V@z, (7)

these arbitrary components of z serve to parameterize the space of all possible solutions x.

The Linear Least-Squares Problem

This is an extension of the previous problem, but we now seek n vectors x for which Ax is only approximately equal to b in the sense that the length of the residual vector, r = A@x - b, is minimized. The problem is to pick an x which minimizes
m
2

2r2 ' j ri
i ' 1

(8)

15

If A has full rank, then the solution to x is unique and may be solved more efficiently by methods other than the SVD, such as the Toeplitz recursion. However, the SVD also handles the rank deficient case. Since orthogonal matrices preserve the norm (see section II.A.), equation (8) can be rewritten as
5r5 = 5UT(AVVTx - b)5 = 5Gz - d5

(9)

using the SVD and the fact that minimizing the norm squared of r is equivalent to minimizing the norm of r. The vector z which minimizes 5r5 is given by zj = dj/Fj, if Fj 0,

zj = anything, if Fj = 0. Therefore, if the problem is rank deficient, the solution which minimizes 5r5 is not unique. In this rank deficient case, it is often desirable to pick the minimum norm solution for z and therefore for x. This gives us a unique solution. This solution is obtained by setting zj = 0, if Fj = 0 Once z is determined for the full rank or deficient rank cases, we find the solution vector x using equation (7). The error 5r52 is calculated in either case as
5r52 = G dj2

10)

for values of j where Fj = 0.

16

F. The Discrete Karhunen-Loeve Transform (KLT)

In many statistical pattern recognition problems and maximum likelihood problems we are given a random vector x, of dimension n, which has a joint Gaussian density with parameters

E[x] = mx, Cx = E[(x - mx)(x - mx)T)]

mx and Cx are the mean vector and autocovariance matrix of x. Cx is Toeplitz if x represents evenly spaced samples from a stationary random process. (See (11) in section I.D.) We will assume that Cx is positive definite. The discrete Karhunen-Loeve Transform (KLT) is a transform, z = A@x such that; (1) The covariance matrix Cz = E[(z - mz)(z - mz)T] is diagonal so that A is an orthogonal matrix, and the elements z(i) of z are independent Gaussian random variables. (2) The sequence Fii = Cz(i,i) is non-increasing. (3) No other transform satisfies (2) with larger Fii values. Define the SVD of Cx as Cx = U@G@VT. Using equation (1) and x = A-1z

(1)

(2)

(3)

we get 17

Cx = E[(A-1z - A-1mz)(A-1z - A-1mz)T] = E[A-1@(z - mz)(z - mz)TA] = A-1@Cz@A (4)

Since A is an orthogonal matrix and since Cz(i,i) are nonegative, equation (4) represents an SVD of Cx. Therefore, for the KLT, we take the SVD of Cx as in equation (2) and get

V = U, A = UT = VT, Cz = G, mz = UTx Thus the transformation matrix A or UT diagonalizes the covariance matrix of z. The rows of A form an orthonormal set of n basis vectors.

G. Summary

(1) A singular value decomposition of an m-by-n real matrix A is any factorization of the form A = UGVT, where U is an m-by-m orthogonal matrix, V is an n-by-n orthogonal matrix, and G is an m-by-n diagonal matrix with Fij = 0 if ij and Fii = Fi$ 0. The quantities Fi are called the singular values of A, and the columns of U and V are called the left and right singular vectors. (2) The rank of A is the number of nonzero singular values of A. (3) The condititon number of a matrix A of full rank is cond(A) = Fmax/Fmin, where Fmax and Fmin are the largest and smallest singular values of A. 18

(4) det(A) = F1@F2@...@Fn. (5) The columns of U and V are respectively eigenvectors of AAT and ATA. These sets of columns are called the left singular vectors (for U) and the right singular vectors (for V). (6) The singular values are the positive square roots of the eigenvalues of ATA. (7) If A is symmetric and positive semidefinite, then the singular values of A are the eigenvalues of A, and the SVD is a spectral decomposition of A. (See section I part H.) If A is an autocorrelation matrix, it satisfies this condition. (See section I part E.) (8) The SVD can be used to solve linear sets of equations, including those involving least-squares, even when the A matrix is rank deficient. (9) The orthogonal KLT transformation matrix for a random vector x with covariance matrix Cx = U@G@VT is defined as A = UT The diagonal covariance matrix for the transformed vector z is Cz = G

19

Das könnte Ihnen auch gefallen