0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

90 Ansichten129 Seitenlinear algebra brief notes

Nov 14, 2016

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

linear algebra brief notes

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

90 Ansichten129 Seitenlinear algebra brief notes

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 129

Department of Mathematics, IIT Madras

Contents

Matrix Operations

1.1 Examples of linear equations

1.2 Basic matrix operations . . .

1.3 Transpose and adjoint . . . .

1.4 Elementary row operations .

1.5 Row reduced echelon form .

1.6 Determinant . . . . . . . . .

1.7 Computing inverse of a matrix

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1

1

3

10

13

15

20

23

2.1 Linear independence . . . . . .

2.2 Determining linear independence

2.3 Rank of a matrix . . . . . . . . .

2.4 Solvability of linear equations .

2.5 Gauss-Jordan elimination . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

27

27

31

33

36

40

3.1 Subspace and span . . . .

3.2 Basis and dimension . . . .

3.3 Matrix as a linear map . . .

3.4 Change of basis . . . . . .

3.5 Equivalence and Similarity

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

45

45

48

54

57

61

.

.

.

.

65

65

67

71

73

5.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . .

5.3 Special types of matrices . . . . . . . . . . . . . . . . . . . . . .

77

77

78

82

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Orthogonalization

4.1 Inner products . . . . . . . . . . . . . .

4.2 Gram-Schmidt orthogonalization . . . .

4.3 Best approximation . . . . . . . . . . .

4.4 QR factorization and least squares . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

iii

iv

6

Canonical Forms

6.1 Schur triangularization . . .

6.2 Diagonalizability . . . . . .

6.3 Jordan form . . . . . . . . .

6.4 Singular value decomposition

6.5 Polar decomposition . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

87

. 87

. 91

. 97

. 110

. 117

Short Bibliography

123

Index

124

1

Matrix Operations

1.1

Linear equations are everywhere, starting from mental arithmetic problems to advanced defense applications. We start with an example. The system of linear equations

x1 + x2 = 3

x1 x2 = 1

has a unique solution x1 = 2, x2 = 1. Substituting these values for the unknowns, we

see that the equations are satisfied; but why are there no other solutions? Well, we

have not merely guessed this solution; we have solved it! The details are as follows.

Suppose the pair (x1 , x2 ) is a solution of the system. Subtracting the first from the

second, we get another equation: 2x2 = 2. It implies x2 = 1. Then from either of

the equations, we get x1 = 1. To proceed systematically, we would like to replace the

original system with the following:

x1 + x2 = 3

x2 = 1

Substituting x2 = 1 in the first equation of the new system, we get x1 = 2. In fact,

substituting these values of x1 and x2 , we see that the original equation is satisfied.

Convinced? The only solution of the system is x1 = 2, x2 = 1. What about the

system

x1 + x2 = 3

x1 x2 = 1

2x1 x2 = 3

The first two equations have a unique solution and that satisfies the third. Hence this

system also has a unique solution x1 = 2, x2 = 1. So the extra equation does not put

any constraint on the solutions that we obtained earlier.

But what about our systematic solution method? We aim at eliminating the first

unknown from all but the first equation. We replace the second equation with the one

obtained by second minus the first. We also replace the third by third minus twice

the first. It results in

x1 + x2 =

x2 = 1

3x2 =

Notice that the second and the third equations coincide, hence the conclusion. We

give another twist. Consider the system

x1 + x2 = 3

x1 x2 = 1

2x1 + x2 = 3

The first two equations again have the same solution x1 = 2, x2 = 1. But this time, the

third is not satisfied by these values of the unknowns. So, the system has no solution.

Also, by using our elimination method, we obtain the equations as:

x1 + x2 =

x2 = 1

x2 = 3

The last two equations are not consistent. So, the original system has no solution.

Finally, instead of adding another equation, we drop one. Consider the linear

equation

x1 + x2 = 3

having only one equation. The old solution x1 = 2, x2 = 1 is still a solution of this

system. But x1 = 1, x2 = 2 is also a solution. Moreover, since x1 = 3 x2 , by assigning x2 any real number, we get a corresponding value for x1 , which together give

a solution. Thus, it has infinitely many solutions. Notice that the same conclusion

holds if we have more equations, which are some multiple of the only given equation.

For example,

x1 + x2 = 3

2x1 + 2x2 = 6

3x1 + 3x2 = 9

We see that the number of equations really does not matter, but the number of

independent equations does matter.

Caution: the notion of independent equations is not yet clear; nonetheless we have

some working idea.

It is not also very clear when does a system of equations have a solution, a unique

solution, infinitely many solutions, or even no solutions. And why not a system of

Matrix Operations

equations has more than one but finitely many solutions? How do we use our elimination method for obtaining infinite number of solutions? To answer these questions,

we will introduce matrices. Matrices will help us in representing the problem is a

compact way and also will lead to a definitive answer. We will also study the eigenvalue problem for matrices which come up often in applications. These concerns

will allow us to represent matrices in elegant forms.

1.2

As usual, R denotes the set of all real numbers and C denotes the set of all complex

numbers. We will write F for either R or C. The numbers in F will also be referred

to as scalars.

A matrix is a rectangular array of symbols. For us these symbols are real numbers

or, in general, complex numbers. The individual numbers in the array are called the

entries of the matrix. Each entry of a matrix is a scalar. The number of rows and the

number of columns in any matrix are necessarily positive integers. A matrix with m

rows and n columns is called an m n matrix and it may be written as

a11 a1n

.. ,

A = ...

.

am1 amn

or as A = [ai j ] for short, with ai j F for i = 1, . . . , m] j = 1, . . . , n. The number ai j

which occurs at the entry in ith row and jth column is referred to as the (i, j)th entry

of the matrix [ai j ].

Any matrix with m rows and n columns will be referred as an m n matrix. The

set of all m n matrices with entries from F will be denoted by Fmn .

A row vector of size n is a matrix in F1n . Similarly, a column vector of size n

is a matrix in Fn1 . The vectors in F1n (row vectors) will be written as

[a1 , , an ] or as

[a1 an ]

b1

..

.

bn

for scalars b1 , . . . , bn . We will sometimes write such a column vector as [b1 bn ]t ,

for saving vertical space.

We will write both F1n and Fn1 as Fn . Especially when a result is applicable to

both row vectors and columns vectors, this notation will become handy. Also, we

(a1 , . . . , an ).

When Fn is F1n , you should read (a1 , . . . , an ) as [a1 , . . . , an ], a row vector, and when

Fn is Fn1 , you should read (a1 , . . . , an ) as [a1 , . . . , an ]t , a column vector.

Any matrix in Fmn is said to have its size as m n. If m = n, the rectangular

array becomes a square array with m rows and m columns; and the matrix is called a

square matrix of order m.

Naturally, two matrices of the same size are considered equal when their corresponding entries coincide, i.e., if A = [ai j ] and B = [bi j ] are in Fmn , then

A=B

iff ai j = bi j

for each i {1, . . . , m} and for each j {1, . . . , n}. Thus matrices of different sizes

are unequal.

The zero matrix is a matrix each entry of which is 0. We write 0 for all zero

matrices of all sizes. The size is to be understood from the context.

Let A = [ai j ] Fnn be a square matrix of order n. The entries aii are called as

the diagonal entries of A. The diagonal of A consists of all diagonal entries; the

first entry on the diagonal is a11 , and the last diagonal entry is ann . The entries of A,

which are not on the diagonal, are called as off diagonal entries of A; they are ai j

for i 6= j. The diagonal of the following matrix is shown in bold:

1 2 3

2 3 4 .

3 4 0

Here, 1 is the first diagonal entry, 3 is the second diagonal entry and 5 is the third

and the last diagonal entry.

The super-diagonal of a matrix consists of entries above the diagonal. That is,

the entries ai,i+1 consist the super-diagonal of an n n matrix A = [ai j ]. Of course, i

varies from 1 to n 1 here. The super-diagonal of the following matrix is shown in

bold:

1 2 3

2 3 4 .

3 4 0

If all off-diagonal entries of A are 0, then A is said to be a diagonal matrix. Only

a square matrix can be a diagonal matrix. There is a way to generalize this notion to

any matrix, but we do not require it. Notice that the diagonal entries in a diagonal

matrix need not all be nonzero. For example, the zero matrix of order n is also a

diagonal matrix. The following is a diagonal matrix. We follow the convention of

not showing the 0 entries in a matrix.

1

1 0 0

3 = 0 3 0 .

0

0 0 0

Matrix Operations

Thus the above diagonal matrix is also written as

diag (1, 3, 0).

The identity matrix is a square matrix of which each diagonal entry is 1 and each

off-diagonal entry is 0.

I = diag (1, . . . , 1).

When identity matrices of different orders are used in a context, we will use the

notation Im for the identity matrix of order m.

We write ei for a column vector whose ith component is 1 and all other components

0. When we consider ei as a column vector in Fn1 , the jth component of ei is i j .

Here,

(

1 if i = j

ij =

0 if i 6= j

is the Kronekers delta. Notice that the identity matrix I = [ i j ].

There are then n distinct column vectors e1 , . . . , en . The list of column vectors

e1 , . . . , en is called the standard basis for Fn1 , for reasons we will discuss later.

Accordingly, the ei s are referred to as the standard basis vectors. These are the

columns of the identity matrix of order n, in that order; that is, ei is the ith column of

I. The transposes of these ei s are the rows of I. That is, the ith row of I is eti . Thus

t

e

.1

I = e1 en = .. .

etn

A scalar matrix is a matrix of which each diagonal entry is a scalar, the same

scalar, and each off-diagonal entry is 0. Each scalar matrix is a diagonal matrix with

same scalar on the diagonal. The following is a scalar matrix:

3

3

3

3

It is also written as diag (3, 3, 3, 3). If A, B Fmm and A is a scalar matrix, then

AB = BA. Conversely, if A Fmm is such that AB = BA for all B Fmm , then A

must be a scalar matrix. This fact is not obvious, and its proof will require much

more than discussed until now.

A matrix A Fmn is said to be upper triangular iff all entries above the diagonal

are zero. That is, A = [ai j ] is upper triangular when ai j = 0 for i > j. In writing such

a matrix, we simply do not show the zero entries below the diagonal. Similarly, a

matrix is called lower triangular iff all its entries above the diagonal are zero. Both

upper triangular and lower triangular matrices are referred to as triangular matrices.

A diagonal matrix is both upper and lower triangular. The following are examples of

lower triangular matrix L and upper triangular matrix U, both of order 3.

1

1 2 3

L = 2 3 , U = 3 4 .

3 4 5

5

Sum of two matrices of the same size is a matrix whose entries are obtained by

adding the corresponding entries in the given two matrices. That is, if A = [ai j ] and

B = [bi j ] are in Fmn , then

A + B = [ai j + bi j ] Fmn .

For example,

1 2 3

3 1 2

4 3 5

+

=

.

2 3 1

2 1 3

4 4 4

We informally say that matrices are added entry-wise. Matrices of different sizes can

never be added.

It then follows that

A + B = B + A.

Similarly, matrices can be multiplied by a scalar entry-wise. If A = [ai j ] Fmn ,

and F, then

A = [ ai j ] Fmn .

Therefore, a scalar matrix with on the diagonal is written as I. Notice that

A+0 = 0+A = A

for all matrices A Fmn , with an implicit understanding that 0 Fmn . For A =

[ai j ], the matrix A Fmn is taken as one whose (i, j)th entry is ai j . Thus

A = (1)A

and A + (A) = A + A = 0.

1 2 3

3 1 2

0 5 7

3

=

.

2 3 1

2 1 3

4 8 0

The addition and scalar multiplication as defined above satisfy the following properties:

Let A, B,C Fmn . Let , F.

1. A + B = B + A.

2. (A + B) +C = A + (B +C).

3. A + 0 = 0 + A = A.

Matrix Operations

4. A + (A) = (A) + A = 0.

5. ( A) = ( )A.

6. (A + B) = A + B.

7. ( + )A = A + A.

8. 1 A = A.

Notice that whatever we discuss here for matrices apply to row vectors and column

vectors, in particular. But remember that a row vector cannot be added to a column

vector unless both are of size 1 1, when both become numbers in F.

Another operation that we have on matrices is multiplication of matrices, which

is a bit involved. Let A = [aik ] Fmn and B = [bk j ] Fnr . Then their product AB

is a matrix [ci j ] Fmr , where the entries are

n

ci j = ai1 b1 j + + ain bn j =

aik bk j .

k=1

Notice that the matrix product AB is defined only when the number of columns in A

is equal to the number of rows in B.

A particular case might be helpful. Suppose A is a row vector in F1n and B is a

column vector in Fn1 . Then their product AB F11 ; it is a matrix of size 1 1.

Often we will identify such matrices with numbers. The product now looks like:

b1

.

a1 an .. = a1 b1 + + an bn

bn

This is helpful in visualizing the general case, which looks like

a11

a1k

a1n

c11 c1 j

..

ai1 aik ain b`1 b`j b`r = ci1

cij

..

.

am1

amk

amn

cm1 cm j

bn1 bnj bnr

c1r

cir

cmr

The ith row of A multiplied with the jth column of B gives the (i, j)th entry in AB.

Thus to get AB, you have to multiply all m rows of A with all r columns of B. Besides

writing a linear system in compact form, we will see later why matrix multiplication

is defined this way. For example,

3 5 1

2 2 3 1

22 2

43 42

4 0 2 5 0 7 8 = 26 16 14

6 .

6 3 2

9 4 1 1

9

4 37 28

1

3 6 1

1

3 6 1 2 = 19 , 2 3 6 1 = 6 12 2 .

4

4

12 24 4

It shows clearly that matrix multiplication is not commutative. Commutativity can

break down due to various reasons. First of all when AB is defined, BA may not be

defined. Secondly, even when both AB and BA are defined, they may not be of the

same size; and thirdly, even when they are of the same size, they need not be equal.

For example,

1 2 0 1

4 7

0 1 1 2

2 3

=

but

=

.

2 3 2 3

6 11

2 3 2 3

8 13

It does not mean that AB is never equal to BA. There can be some particular matrices

A and B both in Fnn such that AB = BA.

Observe that if A Fmn , then AIn = A and Im A = A. Look at the columns of In in

this product. They say that

Ae j = the jth column of A

for j = 1, . . . , n.

Here, e j is the standard jth basis vector, the jth column of the identity matrix of order

n; its jth component is 1 and all other components are 0. Also, directly multiplying

A with e j we see that

a1 j

0

a11 a1 j a1n

.. ..

..

. .

.

Ae j = ai1 ai j ain

1 = ai j = jth column of A.

. .

..

.. ..

.

am j

am1 am j amn 0

Thus A can be written in block form as

A = [Ae1 Ae j Aen ].

Unlike numbers, product of two nonzero matrices can be a zero matrix. For example,

1 0 0 0

0 0

=

.

0 0 0 1

0 0

It is easy to verify the following properties of matrix multiplication:

1. If A Fmn , B Fnr and C Frp , then (AB)C = A(BC).

2. If A, B Fmn and C Fnr , then (A + B)C = AB + AC.

3. If A Fmn and B,C Fnr , then A(B +C) = AB + AC.

Matrix Operations

4. If F, A Fmn and B Fnr , then (AB) = ( A)B = A( B).

You can see matrix multiplication in a block form. Suppose A Fmn . Write its

ith row as Ai? Also, write its kth column as A?k . Then we can write A as a row of

columns and also as a column of rows in the following manner:

A1?

Am?

Write B Fnr similarly as

B = [bk j ] = B?1

B1?

.

B?r = .. .

B?n

A1? B

.

AB = AB?1 AB?r = .. .

Am? B

When writing this way, we ignore the extra brackets [ and ].

Powers of square matrices can be defined inductively by taking

A0 = I

and

An = AAn1 for n N.

Example

1.1

1 1 0

1 n n(n 1)

2n for n N.

Let A = 0 1 2 . Show that An = 0 1

0 0 1

0 0

1

We use induction on n. The basis case n = 1 is obvious. Suppose An is as

given. Now,

1 1 0 1 n n(n 1)

1 n + 1 (n + 1)n

2n = 0

1

2(n + 1) .

An+1 = AAn = 0 1 2 0 1

0 0 1 0 0

1

0

0

1

Notice that taking n = 0 in the matrix An , we see that A0 = I.

A square matrix A of order m is called invertible iff there exists a matrix B of

order m such that

AB = I = BA.

Such a matrix B is called an inverse of A. If C is another inverse of A, then

C = CI = C(AB) = (CA)B = IB = B.

10

Therefore, an inverse of a matrix is unique and is denoted by A1 . We talk of invertibility of square matrices only; and all square matrices are not invertible. For

example, I is invertible but 0 is not. If AB = 0 for square matrices A and B, then

neither A nor B is invertible.

If both A, B Fnn are invertible, then (AB)1 = B1 A1 . Reason:

B1 A1 AB = B1 IB = I = AIA1 = ABB1 A1 .

Invertible matrices play a crucial role in solving linear systems uniquely. We will

come back to the issue later.

1. Compute AB, CA, DC, DCAB, A2 , D2 and A3 B2 , where

1

2

3

2

1

2 3

4 1

0 .

A=

, B=

, C = 2 1 , D = 4 6

1 2

4

0

1

3

1 2 2

2. Let Ei j be the n n matrix whose i jth entry is 1 and all other entries are 0.

Show that each A = [ai j ] Cnn can be expressed as A = ni=1 nj=1 ai j Ei j .

Also show that Ei j Ekm = 0 if j 6= k, and Ei j E jm = Eim .

3. Let A Cmn , B Cnp . Let B1 , . . . , B p be the columns of B. Show that

AB1 , . . . , AB p are the columns of AB.

4. Let A Cmn , B Cnp . Let A1 , . . . , Am be the rows of A. Show that A1 B, . . . , Am B

are the rows of AB.

5. Construct two 3 3 matrices A and B such that AB = 0 but BA 6= 0.

1.3

and is defined by

the (i j)th entry of At = the ( ji)th entry ofA.

That is, the ith column of At is the column vector [ai1 , , ain ]t . The rows of A

t

t

are the

columns of A and the columns of A become the rows of A . In particular, if

u = a1 am is a row vector, then its transpose is

a1

..

t

u = . ,

am

11

Matrix Operations

vector is a row vector. If you write A as a row of column vectors, then you can

express At as a column of row vectors, as in the following:

t

A?1

..

t

A = A?1 A?n A = . .

At?n

A1?

Am?

For example,

1 2

1 2 3

A=

At = 2 3 .

2 3 1

3 1

It then follows that transpose of the transpose is the original matrix. The following

are some of the properties of this operation of transpose.

1. (At )t = A.

2. (A + B)t = At + Bt .

3. ( A)t = At .

4. (AB)t = Bt At .

5. If A is invertible, then At is invertible, and (At )1 = (A1 )t .

In the above properties, we assume that the operations are allowed, that is, in (2), A

and B must be of the same size. Similarly, in (4), the number of columns in A must

be equal to the number of rows in B; and in (5), A must be a square matrix.

It is easy to see all the above properties, except perhaps the fourth one. For this,

let A Fmn and B Fnr . Now, the ( j, i)th entry in (AB)t is the (i, j)th entry in AB;

and it is given by

ai1 b j1 + + ain b jn .

On the other side, the ( j, i)th entry in Bt At is obtained by multiplying the jth row

of Bt with the ith column of At . This is same as multiplying the entries in the jth

column of B with the corresponding entries in the ith row of A, and then taking the

sum. Thus it is

b j1 ai1 + + b jn ain .

This is the same as computed earlier.

The fifth one follows from the fourth one and the fact that (AB)1 = B1 A1 .

Observe that transpose of a lower triangular matrix is an upper triangular matrix,

and vice versa.

12

Close to the operations of transpose of a matrix is the adjoint. Let A = [ai j ] Fmn .

The adjoint of A is denoted as A , and is defined by

the (i j)th entry of A = the complex conjugate of ( ji)th entry ofA.

We write for the complex conjugate of a scalar . That is, + i = i . Thus,

if ai j R, then ai j = ai j . Thus, when A has only real entries, A = At . Also, the ith

column of At is the column vector (ai1 , , ain )t . For example,

1 2

1 2 3

A=

A = 2 3 .

2 3 1

3 1

1i 2

1+i 2 3

3 .

A=

A = 2

2 3 1i

3 1+i

Similar to the transpose, the adjoint satisfies the following properties:

1. (A ) = A.

2. (A + B) = A + B .

3. ( A) = A .

4. (AB) = B A .

5. If A is invertible, then A is invertible, and (A )1 = (A1 ) .

Here also, in (2), the matrices A and B must be of the same size, and in (4), the

number of columns in A must be equal to the number of rows in B. The adjoint of A

is also called the conjugate transpose of A. Notice that if A Rmn , then A = At .

Occasionally, we will use A for the matrix obtained from A by taking complex

conjugate of each entry. That is, the (i, j)th entry of A is the complex conjugate of

the (i, j)th entry of A. Hence A = (A)t .

1. Determine At , A, A , A A and AA , where

1 2 + i 3 i

1 2 3 1

i 1 i

2i

(a) A = 2 1 0 3 (b) A =

1 + 3i

i 3

0 1 3 1

2

0 i

2. Let A Cmn . Suppose AA = Im . Does it follow that A A = In ?

13

Matrix Operations

1.4

Recall that while solving linear equations in two or three variables, you try to eliminate a variable from all but one equation by adding an equation to the other, or

even adding a constant times one equation to another. We do similar operations on

the rows of a matrix. These are achieved by multiplying a given matrix with some

special matrices, called elementary matrices.

Let e1 , . . . , em Fm1 be the standard basis vectors. Let 1 i, j m. The product

ei etj is an m m matrix whose (i, j)th entry is 1 and all other entries are 0. We write

such a matrix as Ei j . For instance, when m = 3, we have

0

0

e2 et3 = 1 0 0 1 = 0

0

0

0

0

0

0

1 = E23 .

0

1. E[i, j] := I Eii E j j + Ei j + E ji with i 6= j.

2. E [i] := I Eii + Eii , where is a nonzero scalar.

3. E [i, j] := I + Ei j , where is a nonzero scalar and i 6= j.

Here, I is the identity matrix of order m. Similarly, the order of the elementary matrices will be understood from the context; we will not show that in our symbolism.

Example 1.2

The following are instances of elementary matrices of order 3.

0 1 0

1

0 0

1

E[1, 2] = 1 0 0 , E1 [2] = 0 1 0 , E2 [3, 1] = 0

0 0 1

0

0 1

2

0

1

0

0

0 .

1

1. E[i, j] A is the matrix obtained from A by exchanging its ith and jth rows.

2. E [i] A is the matrix obtained from A by replacing its ith row with times the

ith row.

3. E [i, j] A is the matrix obtained from A by replacing its ith row with the ith

row plus times the jth row.

14

We call these operations of pre-multiplying a matrix with an elementary matrix as elementary row operations. Thus there are three kinds of elementary row operations

as listed above. Sometimes, we will refer to them as of Type-1, 2, or 3, respectively.

Also, in computations, we will write

E

A B

to mean that the matrix B has been obtained by an elementary row operation E, that

is, B = EA.

Example 1.3

See the following applications of elementary row operations:

1 1 1

1 1 1

1 1 1

[3,1]

E2 [2,1]

2 2 2 E3

2 2 2 0 0 0

3 3 3

0 0 0

0 0 0

Often we will apply elementary row operations in a sequence. In this way, the

above operations could be shown in one step as E3 [3, 1], E2 [2, 1]. However, remember that the result of application of this sequence of elementary row operations

on a matrix A is E2 [2, 1] E3 [3, 1] A; the products are in reverse order.

Elementary row operations can be undone by other elementary row operations.

The reason: each elementary matrix is invertible. In fact, the inverses of the elementary matrices are as follows:

(E[i, j])1 = E[i, j],

to pre-multiplying A with a suitable invertible matrix.

1. Compute E[2, 3]A, Ei [2]A, E1/2 [1, 3]A and Ei [1, 2]A, where A is given by

1

2 + i 3 i

1 2 3 1

i

1 i

2i

A = 2 1 0 3 (b) A =

1 + 3i i

3

0 1 3 1

2

0

i

2. Argue in general terms why the following are true:

(a) E[i, j] A is the matrix obtained from A by exchanging its ith and jth rows.

(b) E [i] A is the matrix obtained from A by replacing its ith row with

times the ith row.

(c) E [i, j] A is the matrix obtained from A by replacing its ith row with the

ith row plus times the jth row.

3. Describe A E[i, j], A E [i] and A E [i, j] as to how are they obtained from A.

15

Matrix Operations

1.5

many zero entries. Recall that this corresponds to eliminating a variable from an

equation of a linear system. The first, from left, nonzero entry in a nonzero row of

a matrix is called a pivot. We denote a pivot in a row by putting a box around it. A

column where a pivot occurs is called a pivotal column. A row where a pivot occurs

is called a pivotal row.

A matrix A Fmn is said to be in row reduced echelon form (RREF) iff the

following conditions are satisfied:

(1) Each pivot is equal to 1.

(2) In a pivotal column, all entries other than the pivot are zero.

(3) The row index of each pivotal row is smaller than the row index of each zero

row.

(4) If ith row and (i + k)th row are pivotal rows, for i, k 1, then the column index

of the pivot in the (i + k)th row is greater than the column index of the pivot in

the ith row.

Example 1.4

The matrices

are in

0

0

0

0

0

0

,

0

0

echelon form. Whereas

0

0 1 3 1

0 0 0 1

0

0 0 0 0 ,

0

0 0 0 0

0

1 2 0

0 0 1

0 0 0

row reduced

1 3 0

0 0 2

,

0 0 0

0 0 0

0 ,

1

1

0

,

0

0

1

0

0

0

0

1

0

0

0 0 0 0 ,

0

0

0

1

0

0

1

0

0

0

i

1 0 0 0

3

0

0

0

0

1

1

0

Observe that a (single) column vector in row reduced echelon form is either the

zero vector or e1 . Similarly, a row vector in row reduced echelon form is either a

zero row or etk for some k.

If a matrix in RREF has k pivotal columns, then those columns occur in the matrix

as e1 , . . . , ek , read from left to right, though there can be other columns inbetween

these pivotal columns.

Any matrix can be brought to a row reduced echelon form by using elementary

row operations. We give an algorithm to achieve this.

16

1. Set the work region R as the whole matrix A.

2. If all entries in R are 0, then stop.

3. If there are nonzero entries in R, then find the leftmost nonzero column. Mark

it as the pivotal column.

4. Find the topmost nonzero entry in the pivotal column. Box it; it is a pivot.

5. If the pivot is not on the top row of R, then exchange the row of A which

contains the top row of R with the row where the pivot is.

6. If the pivot, say, is not equal to 1, then replace the top row of R in A by 1/

times that row.

7. Make all entries, except the pivot, in the pivotal column as zero by replacing

each row above and below the top row of R using elementary row operations

in A with that row and the top row of R.

8. Find the sub-matrix to the right and below the pivot. If no such sub-matrix

exists, then stop. Else, reset the work region R to this sub-matrix, and go to 2.

We will refer to the output of the above reduction algorithm as the row reduced

echelon form (the RREF) of a given matrix.

Example 1.5

1

3

A =

1

2

1

R2

0

0

0

0

1 1

0 2

R1

1

0 4

5

9

0 6

0 32 12

1 E1/3 [3]

1 12

2

0 0 3

0 0 6

1

5

5

8

2

7

4

7

2

1

2

3

1

0

0

0

0

E1/2 [2]

1

5

9

1 2 0

1 12 12

4 2 5

6 3 9

0 32 12

1 0 32 0

1

1

R3 0

1 2

1 12 0

2

0 0 0 1

0 0

1

0 0

6

0 0 0 0

1

0

0

0

=B

Here, R1 = E3 [2, 1], E1 [3, 1], E2 [4, 1]; R2 = E1 [2, 1], E4 [3, 2], E6 [4, 2]; and

R3 = E1/2 [1, 3], E1/2 [2, 3], E6 [4, 3]. The matrix B is the RREF of A; and

B = E6 [4, 3] E1/2 [2, 3] E1/2 [1, 3] E1/3 [3] E6 [4, 2] E4 [3, 2] E1 [2, 1]E1/2 [2]

E2 [4, 1] E1 [3, 1] E3 [2, 1] A.

The products are in reverse order.

17

Matrix Operations

be derived from such a special form. Let A have the columns u1 , . . . , un ; these are

column vectors from Fm1 . That is,

A = [u1 u2 un ].

Let B be the RREF of A obtained by applying a sequence of elementary row operations. Let E be the mm invertible matrix (which is the product of the corresponding

elementary matrices) so that

EA = E[u1 u2 un ] = B.

Suppose the number of pivots in B is r. Then the standard basis vectors e1 , . . . , er of

Fm1 occur as the pivotal columns in B. Denote the n r non-pivotal columns in

B as v1 , . . . , vnr . In B, the columns e1 , . . . , er , v1 , . . . , vnr occur in some order. The

following observations are immediate from the above equations.

Observation 1: If ei occurs as the jth column in B, then Eu j = ei .

Observation 2: If vi occurs as the jth column in B, then Eu j = vi .

Notice that in B, the vectors e1 , . . . , er occur in that order, though some other vectors vi may occur between them; look at Example 1.5. If vi occurs between e j and

e j+1 , then vi has zero entries beyond the jth position. We then observe the following.

Observation 3: In B, if a vector vi occurs between standard basis vectors e j and e j+1 ,

then vi = [a1 a2 a j 0 0 0]t = a1 e1 + + a j e j for some a1 , . . . , a j F.

If e1 occurs as k1 th column, e2 occurs as k2 th column, and so on, then vi =

a1 Euk1 + + a j Euk j . That is,

E 1 vi = a1 uk1 + + a j uk j .

However, E 1 v j is the corresponding column of A. Thus we observe the following.

Observation 4: In B, if a vector vi = [a1 a2 a j 0 0 0]t occurs as the kth column,

and prior to it occur the standard basis vectors e1 , . . . , e j (and no other) in the columns

k1 , . . . , k j , respectively, then uk = a1 uk1 + + a j uk j .

It thus follows that each column of A can be written as b1 uk1 + br ukr for some

b1 , . . . , br F, where k1 , . . . , kr are all the column indices of pivotal columns in B.

Thus we have the following observation.

Observation 5: If v is any vector expressible in the form v = 1 u1 + + n un

and k1 , . . . , kr are all the column indices of pivotal columns in B, then then there

are scalars 1 , . . . , r F such that v = 1 uk1 + + r ukr . Moreover, uki is not

expressible in the form 1 uk1 + + i1 uki1 + i+1 uki+1 + + r ukr for any j F.

Notice that v = 1 uk1 + + r ukr = 1 E 1 e1 + + r E 1 er . Since er+1 is not

expressible in the form 1 e1 + + r er , we see that E 1 er+1 is not expressible in

18

Observation 6: If the number of pivots r in B is less than m, then the vector E 1 er+k

for 1 k m r, is not expressible in the form 1 u1 + + n un .

In B, the m r bottom rows are zero rows. They have been obtained from the

pivotal rows by elementary row operations. Monitoring the row exchanges that have

been applied on A to reach at B, we see that the zero rows correspond to some m r

rows of A. Therefore, similar to Observation 5, we find the following.

Observation 7: Let wk1 , . . . , wkr be the rows of A which have become the pivotal

rows in B. If w is any other row of A, then there exist scalars 1 , . . . , r such that u =

1 uk1 + + r ukr . Moreover, uki 6= 1 uk1 + + i1 uki1 + i+1 uki+1 + + r ukr

for any j F.

For vectors in Fn , we say that v is a linear combination of v1 , . . . , vm if there

exist scalars ai F such that v = a1 v1 + + am vm Suppose the number of pivots

in the RREF of A Fmn is r. Then Observations 5 and 7 imply that there exist

exactly r number of columns in A so that each of the other m r columns is a linear

combinations of these r ones, and none of these r columns is a linear combination of

other r 1 such columns. These r columns correspond to the pivotal columns in the

RREF of A. Similarly, there exist r number of rows of A, such that each of the other

n r rows is a linear combination of the r ones, and none of the r rows is a linear

combination of other such r 1 rows. Again, these r rows correspond to the nonzero

rows in the RREF of A, monitoring the row exchanges.

The row reduced echelon form of a matrix is canonical, in the following sense.

Theorem 1.1

Let A Fmn . There exists a unique matrix in Fmn in row reduced echelon

form obtained from A by elementary row operations.

Proof Suppose B,C Fmn are matrices in RREF such that each has been

obtained from A by elementary row operations. Then B = E1 A and C = E2 A

for some invertible matrices E1 , E2 Fmm . Now, B = E1 A = E1 (E2 )1C. Write

E = E1 (E2 )1 to have B = EC, where E is invertible.

Assume, on the contrary, that B 6= C. Then there exists a column index,

say k 1, such that the first k 1 columns of B coincide with the first k 1

columns of C, respectively; and the kth column of B is not equal to the kth

column of C. Let u be the kth column of B, and let v be the kth column of C.

We have u = Ev and u 6= v.

Suppose the pivotal columns that appear within the first k 1 columns in

C, and also in B, are e1 , . . . , e j . Since B = EC, we have

e1 = Ee1 = E 1 e1 , . . . , e j = Ee j = E 1 e j .

Matrix Operations

19

u = 1 e1 + + j e j . The latter case includes the possibility that u = 0. (If

none of the first k columns in C is a pivotal column, we take u = 0.) Similarly,

B is in RREF implies that either v = e j+1 or v = 1 e1 + + j e j for some

scalars 1 , . . . , j . We consider the following exhaustive cases.

If u = e j+1 and v = e j+1 , then u = v.

If v = 1 e1 + + j e j (and whether u = e j+1 or u = 1 e1 + + j e j ), then

u = Ev = 1 Ee1 + + j Ee j = 1 e1 + + j e j = v.

If u = 1 e1 + + j u j and v = e j+1 , then

v = E 1 u = 1 E 1 e1 + + j E 1 e j = 1 e1 + + j e j = u.

In either case, u = v; and this is a contradiction. Therefore, B = C.

Theorem 1.1 justifies our use of the term the RREF of a matrix. Given a matrix,

it does not matter whether you compute its RREF by following our algorithm or any

other algorithm; the end result is the same matrix in RREF.

1. Compute row reduced echelon forms of the following

matrices:

1 2 1 1

0 0 1

2 1 1 0

0 2 3 3

1 1 3 4

1 0 0

0 1 1 4

1 1 5 2

2. Argue why our algorithm for reducing a matrix to its RREF gives a unique

output.

3. In Example 1.5, let ui be the ith column and let w j be the jth row of A.

(a) Compute the matrix X so that XA is in RREF.

(b) Verify that Xu2 = e2 .

(c) Find a, b R such that u3 = au1 + bu2 using the RREF of A.

(d) Determine a, b, c R such that w4 = aw1 + bw2 + cw3 using the RREF

reduction of A.

4. Construct v R4 which is not expressible as av1 + bv2 + cv3 + dv4 for any

a, b, c, d R, where v1 = (1, 2, 3, 4), v2 = (2, 0, 1, 1), v3 = (3, 2, 1, 2) and

v4 = (1, 2, 2, 3). (Hint: take A = [vt1 vt2 vt3 vt4 ]. Compute its RREF and use

Observation 6.)

20

1.6

Determinant

There are two important quantities associated with a square matrix. One is the trace

and the other is the determinant.

The sum of all diagonal entries of a square matrix is called the trace of the matrix.

That is, if A = [ai j ] Fmm , then

n

akk .

k=1

1. tr( A) = tr(A) for each F.

2. tr(At ) = tr(A) and tr(A ) = tr(A).

3. tr(A + B) = tr(A) + tr(B) and tr(AB) = tr(BA).

4. tr(A A) = 0 iff tr(AA ) = 0 iff A = 0.

m

2

i=1 j=1 |ai j | = tr(AA ). Form this (4) follows.

The second quantity, called the determinant of a square matrix A = [ai j ] Fnn ,

written as det(A), is defined inductively as follows:

If n = 1, then det(A) = a11 .

If n > 1, then det(A) = nj=1 (1)1+ j a1 j det(A1 j )

where the matrix A1 j F(n1)(n1) is obtained from A by deleting the first row and

the jth column of A.

When A = [ai j ] is written showing all its entries, we also write det(A) by replacing

the two big closing brackets [ and ] by two vertical bars | and |. For a 2 2 matrix,

its determinant is seen as follows:

a11 a12

1+1

1+2

a21 a22 = (1) a11 det[a22 ] + (1) a12 det[a21 ] = a11 a22 a12 a21 .

Similarly, for a 3 3 matrix, we need to compute three 2 2 determinants. For

example,

1 2 3

1 2 3

det 2 3 1 = 2 3 1

3 1 2

3 1 2

3 1

+ (1)1+2 2 2 1 + (1)1+3 3 2 3

= (1)1+1 1

1 2

3 2

3 1

3 1

2 2 1 + 3 2 3

= 1

1 2

3 2

3 1

= (3 2 1 1) 2 (2 2 1 3) + 3 (2 1 3 3)

= 5 2 1 + 3 (7) = 18.

21

Matrix Operations

For a lower triangular matrix, we see that

a11

a22

a12 a22

a23 a33

a13 a23 a33

..

=

a

11

.

..

.

an1

an1

ann

= = a11 a22 ann .

ann

In general, the determinant of any triangular matrix (upper or lower), is the product

of its diagonal entries. In particular, the determinant of a diagonal matrix is also

the product of its diagonal entries. Thus, if I is the identity matrix of order n, then

det(I) = 1 and det(I) = (1)n .

Our definition of determinant expands the determinant in the first row. In fact, the

same result may be obtained by expanding it in any other row, or even any other

column. Along with this, some more properties of the determinant are listed in the

following.

Let A Fnn . The sub-matrix of A obtained by deleting the ith row and the jth

column is called the (i, j)th minor of A, and is denoted by Ai j . The (i, j)th co-factor

of A is (1)i+ j det(Ai j ); it is denoted by Ci j (A). Sometimes, when the matrix A is

fixed in a context, we write Ci j (A) as Ci j . The adjugate of A is the n n matrix

obtained by taking transpose of the matrix whose (i, j)th entry is Ci j (A); it is denoted

by adj(A). That is, adj(A) Fnn is the matrix whose (i, j)th entry is the ( j, i)th cofactor C ji (A). Also, we write Ai (x) for the matrix obtained from A by replacing its

ith row by a row vector x of appropriate size.

Let A Fnn . Let i, j, k {1, . . . , n}. Let E[i, j], E [i] and E [i, j] be the elementary

matrices of order n with 1 i 6= j n and 6= 0, a scalar. Then the following

statements are true.

1. det(E[i, j] A) = det(A).

2. det(E [i] A) = det(A).

3. det(E [i, j] A) = det(A).

4. If some row of A is the zero vector, then det(A) = 0.

5. If one row of A is a scalar multiple of another row, then det(A) = 0.

6. For any i {1, . . . , n}, det( Ai (x + y) ) = det( Ai (x) ) + det( Ai (y) ).

7. det(At ) = det(A).

8. If A is a triangular matrix, then det(A) is equal to the product of the diagonal

entries of A.

9. det(AB) = det(A) det(B) for any matrix B Fnn .

22

11. A adj(A) = adj(A)A = det(A) I.

12. A is invertible iff det(A) 6= 0.

Elementary column operations are operations similar to row operations, but with

columns instead of rows. Notice that since det(At ) = det(A), the facts concerning

elementary row operations also hold true if elementary column operations are used.

Using elementary operations, the computational complexity for evaluating a determinant can be reduced drastically. The trick is to bring a matrix to a triangular form

by using elementary row operations, so that the determinant of the triangular matrix

can be computed easily.

Example 1.6

1 0 0

1 1 0

1 1 1

1 1 1

1 0 0

1

1 R1 0 1 0

=

0 1 1

1

0 1 1

1

1

1

2 R2 0

=

0

2

0

2

0 0

1 0

0 1

0 1

1

1

2 R3 0

=

0

4

0

4

0

1

0

0

0

0

1

0

1

2

= 8.

4

8

Here, R1 = E1 [2, 1]; E1 [3, 1]; E1 [4, 1], R2 = E1 [3, 2]; E1 [4, 2], and R3 = E1 [4, 3].

Finally, the upper triangular matrix has the required determinant.

Example 1.7

See that the following is true, for verifying Property (6) as mentioned above:

3 1 2 4 1 0 0 1 2 1 2 3

1 1 0 1 1 1 0 1 1 1 0 1

+

=

1 1 1 1 1 1 1 1 1 1 1 1 .

1 1 1 1 1 1 1 1 1 1 1 1

1. Construct an nn nonzero matrix, where no row is a scalar multiple of another

row but its determinant is 0.

2. Let A Cnn . Show that if tr(A A) = 0, then A = 0.

3. Let a1 , . . . , an C. Let A be the n n matrix whose first row has all entries

k1 in that order. Show that

as 1 and whose kth row has entries ak1

1 , . . . , an

det(A) = i< j (ai a j ).

4. Let A be an n n matrix with integer entries. Prove that if det(A) = 1, then

A1 has only integer entries.

23

Matrix Operations

5. Determine

A1

1 0 0

1 1 0

using the adj(A), where A =

1 1 1

1 1 1

1

.

..

1

1

1

.

1

1

where the anti-diagonal entries are all 1 and all other entries are 0.

1.7

The adjugate property of the determinant provides a way to compute the inverse

of a matrix, provided it is invertible. However, it is very inefficient. We may use

elementary row operations to compute the inverse. Our computation of the inverse

bases on the following fact.

Theorem 1.2

A square matrix is invertible iff it is a product of elementary matrices.

Proof Each elementary matrix is invertible since E[i, j] is its own inverse,

E1/ [i] is the inverse of E [i], and E [i, j] is the inverse of E [i, j]. Therefore,

any product of elementary matrices is invertible.

Conversely, suppose that A is an invertible matrix. Let EA1 be the RREF

of A1 . If EA1 has a zero row, then EA1 A also has a zero row. That is, E

has a zero row. But E is a product of elementary matrices, which is invertible;

it does not have a zero row. Therefore, EA1 does not have a zero row. Then

each row in the square matrix EA1 has a pivot. But the only square matrix in

RREF having a pivot at each row is the identity matrix. Therefore, EA1 = I.

That is, A = E, a product of elementary matrices.

The computation of inverse will be easier if we write the matrix A and the identity

matrix I side by side and apply the elementary operations on both of them simultaneously. For this purpose, we introduce the notion of an augmented matrix.

If A Fmn and B Fmk , then the matrix [A|B] Fm(n+k) obtained from A and

B by writing first all the columns of A and then the columns of B, in that order, is

called an augmented matrix.

24

The vertical bar shows the separation of columns of A and of B, though, conceptually unnecessary.

For computing the inverse of a matrix, start with the augmented matrix [A|I]. Apply elementary row operations for reducing A to its row reduced echelon form, while

simultaneously applying the same operations on the entries of I. This means we premultiply the matrix [A|I] with a product B of elementary matrices. In block form, our

result is the augmented matrix [BA|BI]. If BA = I, then BI = A1 . That is, the part

that contained I originally will give the matrix A1 after the elementary row operations have been applied. If after row reduction, it turns out that B 6= I, then A is not

invertible; this information is a bonus.

Example 1.8

For illustration, consider the following square matrices:

1 1 2 0

1 0 0 2

A=

2 1 1 2 ,

1 2 4 2

1 1 2 0

1 0 0 2

B=

2 1 1 2 .

0 2 0 2

We want to find the inverses of the matrices, if at all they are invertible.

Augment A with an identity matrix to get

1 1 2 0

1 0 0 2

2 1 1 2

1 2 4 2

1

0

0

0

0

1

0

0

0

0

1

0

0

0

.

0

1

To zero-out the other entries in the first column, we use the sequence of

elementary row operations E1 [2, 1], E2 [3, 1], E1 [4, 1] to obtain

1 1 2 0

1

0 1 2 2

1

0 3 5 2 2

0 1 2 2 1

0

1

0

0

0

0

1

0

0

0

.

0

1

1 1 2 0

1 0 0 0

0 1 2 2 1 1 0 0

0 3 5 2 2 0 1 0 .

0 1 2 2 1 0 0 1

Use E1 [1, 2], E3 [3, 2], E1 [4, 2] to zero-out all non-pivot entries in the pivotal

25

Matrix Operations

column to 0:

1

0 0 2

0 1

0 1 2 2 1 1

0 0 1

4

1 3

0 0 0 0 2 1

0

.

0

1

0

0

1

0

Since a zero row has appeared in the A portion of the augmented matrix,

we conclude that A is not invertible. You see that the second portion of the

augmented matrix has no meaning now. However, it records the elementary

row operations which were carried out in the reduction process. Verify that

this matrix is equal to

E1 [4, 2] E3 [3, 2] E1 [1, 2] E1 [2] E1 [4, 1] E2 [3, 1] E1 [2, 1]

and that the first portion is equal to this matrix times A.

For B, we proceed similarly. The augmented matrix [B|I] with the first pivot

looks like:

1 1 2 0

1 0 0 2

2 1 1 2

0 2 0 2

1

0

0

0

0

1

0

0

0

0

1

0

0

0

.

0

1

1 1 2 0

1 0

0 1 2 2

1 1

0 3 5 2 2 0

0 2 0 2

0 0

0

0

1

0

0

0

.

0

1

Next, the pivot is 1 in (2, 2) position. Use E1 [2] to get the pivot as 1.

1 1 2 0

1 0 0 0

0 1 2 2 1 1 0 0

0 3 5 2 2 0 1 0 .

0 0 0 1

0 2 0 2

1

0

0

0

0 0 2 0 1

1 2 2 1 1

0 1 4 1 3

0 4 2 2 2

0

0

1

0

0

0

.

0

1

1

0

0

0

0

1

0

0

0 2 0 1

0 6 1 5

1

4 1 3

0 14 2 10

0

2

1

4

0

0

0

1

26

Next pivot is 14 in (4, 4) position. Use [4; 1/14] to get the pivot as 1:

1

0

0

0

0

1

0

0

0 2

0 1

0

0

0 6

1

5

2

0

1

4

1

3

1

0

0 1 1/7 5/7 2/7 1/14

Use E2 [1, 4]; E6 [2, 4]; E4 [3, 4] to zero-out the entries in the pivotal column:

1

1

Thus B1 =

73

1

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

2/7

1/7

3/7

1/7

5/7 2/7 3/7

1/7 1/7 2/7

5/7 2/7 1/14

3 4 1

5 2 3

. Verify that B1 B = BB1 = I.

1 1 2

5 2 21

Observe that if a matrix is not invertible, then our algorithm for reduction to RREF

produces a pivot in the I portion of the augmented matrix.

1. Compute the inverses of the following matrices, if possible:

3 1 1 2

2 1 2

1 4 6

1 2 0 1

1 1 2 1

1 1 2

1 2 3

2 1 1 3

0 1 0

2. Let A = 0 0 1 , where b, c C. Show that A1 = bI + cA.

1 b c

3. Show that if a matrix A is upper triangular and invertible, then so is A1 .

4. Show that if a matrix A is lower triangular and invertible, then so is A1 .

5. Show that every nn matrix can be written as a sum of two invertible matrices.

6. Show that every n n invertible matrix can be written as a sum of two noninvertible matrices.

2

Rank and Linear Equations

2.1

Linear independence

In the reduction to RREF, why some rows are reduced to zero rows and why the

others are not reduced to zero rows? To see what is going on in such a reduction we

need to introduce some more concepts. If A is an m n matrix with entries from F,

then its rows are vectors in F1n and its columns are vectors in Fm1 . Recall that to

talk about row and column vectors at a time, we write Fn for both F1n and Fn1 .

The elements of Fn are written as (a1 , . . . , an ). That is, such an n-tuple of numbers

from F is interpreted as either a row vector with n components or a column vector

with n components, as the case demands.

Let v1 , . . . , vm be vectors in Fn . Let 1 , . . . , m F be scalars. Recall that the

vector

1 v1 + m vm

is called a linear combination of v1 , . . . , vm .

For example, in F12 , one linear combination of v1 = [1, 1] and v2 = [1, 1] is as

follows:

2[1, 1] + 1[1, 1].

This linear combination evaluates to [3, 1]. Thus [3, 1] is a linear combination of

v1 , v2 .

Is [4, 2] a linear combination of v1 and v2 ? Yes, since

[4, 1] = 1[1, 1] + 3[1, 1].

In fact, every vector in F12 is a linear combination of v1 and v2 . Reason:

[a, b] =

a+b

2

[1, 1] + ab

2 [1, 1].

However, every vector in F12 is not a linear combination of [1, 1] and [2, 2]. Reason? Any linear combination of these two vectors is a multiple of [1, 1]. Then [1, 0]

is not a linear combination of these two vectors.

Now, you see that a zero row in a row echelon form matrix is a linear combination

of earlier rows. Conversely, if a row is a linear combination of earlier rows in any

matrix, then in the RREF of the matrix, this row is reduced to a zero row. However,

during the reduction process, there can be row exchanges. In that case, instead of

27

28

talking about a linear combination of earlier rows, we may think of a linear combination of all other rows.

The vectors v1 , . . . , vm in Fn are called linearly dependent iff at least one of them

is a linear combination of others. The vectors are called linearly independent iff

none of them is a linear combination of others.

For example, [1, 1], [1, 1], [4, 1] are linearly dependent vectors whereas [1, 1],

[1, 1] are linearly independent vectors in F12 .

If 1 = = m = 0, then, the linear combination 1 v1 + + m vm evaluates

to 0. That is, the zero vector can always be written as a trivial linear combination.

Suppose the vectors v1 , . . . , vm are linearly dependent. Then one of them, say, vi is

a linear combination of others. That is,

vi = 1 v1 + + i1 vi1 + i+1 vi+1 + + m vm .

Then

1 v1 + + i1 vi1 + (1)vi + i+1 vi+1 + + m vm = 0.

Here, we see that a linear combination becomes zero, where at least one of the coefficients, that is, the ith one is nonzero.

Conversely, suppose that we have scalars 1 , . . . , m not all zero such that

1 v1 + + m vm = 0.

Suppose that the kth scalar k is nonzero. Then

vk =

1

1 v1 + + k1 vk1 + k+1 vk+1 + + m vm .

k

Thus we have proved the following:

v1 , . . . , vm are linearly dependent

iff 1 v1 + + m vm = 0 for scalars 1 , . . . , m not all zero.

iff the zero vector can be written as a non-trivial linear combination of

v1 , . . . , vm .

The same may be expressed in terms of linear independence.

Theorem 2.1

The vectors v1 , . . . , vm Fn are linearly independent iff for all 1 , . . . , m F,

1 v1 + m vm = 0 implies that 1 = = m = 0.

Theorem 2.1 provides a way to determine whether a finite number of vectors are

linearly independent or not. You start with a linear combination of the given vectors;

and equate it to 0. Then you must be able to derive that each coefficient in that linear

combination is 0. If this is the case, then the given vectors are linearly independent.

29

If it is not possible, then from its proof you must be able to find a way of expressing

one of the vectors as a linear combination of the others, showing that the vectors are

linearly dependent.

Example 2.1

Are the vectors [1, 1, 1], [2, 1, 1], [3, 1, 0] linearly independent?

We start with an arbitrary linear combination and equate it to the zero vector.

Solve the resulting linear equations to determine whether all the coefficients

are necessarily 0 or not.

So, let

a[1, 1, 1] + b[2, 1, 1] + c[3, 1, 0] = [0, 0, 0].

Comparing the components, we have

a + 2b + 3c = 0, a + b + c = 0, a + b = 0.

The last two equations imply that c = 0. Substituting in the first, we see that

a + 2b = 0.

This and the equation a + b = 0 give b = 0. Then it follows that a = 0.

We conclude that the given vectors are linearly independent.

Example 2.2

Are the vectors [1, 1, 1], [2, 1, 1], [3, 2, 2] linearly independent?

Clearly, the third one is the sum of the first two. So, the given vectors are

linearly dependent.

To illustrate our method, we start with an arbitrary linear combination and

equate it to the zero vector. We then solve the resulting linear equations to

determine whether all the coefficients are necessarily 0 or not.

So, as earlier, let

a[1, 1, 1] + b[2, 1, 1] + c[3, 2, 2] = [0, 0, 0].

Comparing the components, we have

a + 2b + 3c = 0, a + b + 2c = 0, a + b + 2c = 0.

The last equation is redundant. From the first and the second, we have

b + c = 0.

We may choose b = 1, c = 1 to satisfy this equation. Then from the second

equation, we have a = 1. Our starting equation says that the third vector is

the sum of the first two.

30

Be careful with the direction of implication here. Your work-out must be in the

form

1 v1 + + m vm = 0 1 = = m = 0.

And that would prove linear independence.

To see how linear independence is

equations:

x1

2x1

4x1

+2x2 3x3 = 2

x2 +2x3 = 3

+3x2 4x3 = 7

Here, we find that the third equation is redundant, since 2 times the first plus the

second gives the third. That is, the third one linearly depends on the first two. (You

can of course choose any other equation here as linearly depending on other two, but

that is not important.) Now, take the row vectors of coefficients of the unknowns as

in the following:

v1 = [1, 2, 3, 2],

v2 = [2, 1, 2, 3],

v3 = [4, 3, 4, 7].

We see that v3 = 2v1 + v2 , as it should be. We see that the vectors v1 , v2 , v3 are

linearly dependent. But the vectors v1 , v2 are linearly independent. Thus, solving the

given system of linear equations is the same thing as solving the system with only

first two equations. For solving linear systems, it is of primary importance to find out

which equations linearly depend on others. Once determined, such equations can be

thrown away, and the rest can be solved.

1. Check whether the given vectors are linearly independent, in each case:

(a) (1, 2, 6), (1, 3, 4), (1, 4, 2) in R3 .

(b) (1, 0, 2, 1), (1, 3, 2, 1), (4, 1, 2, 2) in C4 .

2. Suppose that u, v, w are linearly independent in C5 . Are the following lists of

vectors linearly independent?

(a) u, v + w, w, where is a nonzero complex number.

(b) u + v, v + w, w + u.

(c) u v, v w, w u.

3. Give three linearly dependent vectors in R2 such that none of the three is a

scalar multiple of another.

4. Suppose S is a set of vectors and some v S is not a linear combination of

other vectors in S. Is S linearly independent?

5. Prove that the nonzero vectors v1 , . . . , vm Fn are linearly independent iff there

exists a vector vk which is a linear combination of v1 , . . . , vk1 .

2.2

31

We may use elementary row operations to check linear independence. Given m row

vectors v1 , . . . , vm F1n , we form a matrix A with its ith row as vi . Then using elementary row operations, we bring it to its RREF. Observe that exchanging vi with v j

in the list of vectors does not change linear independence of the vectors. Multiplying

a nonzero scalar with vi does not affect linear independence. Also, replacing vi with

vi + v j does not alter linear independence.

To see the last one, suppose v1 , . . . , vm are linearly independent. Let wi = vi + v j ,

i 6= j. To show the linear independence of v1 , . . . , vi1 , wi , vi+1 , . . . , vn , suppose that

1 v1 + + i1 vi1 + i wi + i+1 vi+1 + + m vm = 0.

Then

1 v1 + + i1 vi1 + i (vi + v j ) + i+1 vi+1 + + m vm = 0.

Simplifying, we have

1 v1 + + i vi + + ( j + i )v j + + m vm = 0.

Using linear independence of v1 , . . . , vm , we obtain

1 = i = j + bi = = m = 0.

This gives j = i = 0 and all other s are zero. Thus v1 , . . . , wi , . . . , vm are

linearly independent. Similarly, the converse also holds.

Thus, we take these vectors as the rows of a matrix and apply our reduction to

RREF algorithm. From the RREF, we know that all rows where a pivot occurs are

linearly independent. If you want to determine exactly which vectors among these

are linearly independent, you must keep track of the row exchanges. A summary of

the discussion in terms of a matrix is as follows.

Theorem 2.2

Let v1 , . . . , vm F1n . Let A Fmn be the matrix whose jth row is v j . Then

v1 , . . . , vm are linearly independent iff the RREF of A has no zero row.

Example 2.3

To determine whether the vectors [1, 1, 0, 1], [0, 1, 1, 1] and [1, 3, 2, 1] are

linearly independent or not, we proceed as follows.

32

1

0

1

1

1

3

0

1

1 1

2 1

E1 [3,1]

R2

1

0

0

0

0

0

1

1

R1

1 1 0

2 2

0

0 1 0

1 1

0

0 0 1

1

1

2

0 1

2

1 1 1

0 0 4

Here, R1 = E1 [1, 2], E2 [3, 2] and R2 = E1/4 [3], E2 [1, 3], E1 [2, 3].

The last matrix is in RREF in which each row has a pivot. Therefore, the

original vectors are linearly independent.

Though we have formulated Theorem 2.2 for row vectors. It is applicable for

column vectors as well. All that we do is start with the transposes of the column

vectors and apply the theorem.

Example 2.4

Are the vectors [1, 1, 0, 1]t , [0, 1, 1, 1]t and [2, 1, 3, 5]t linearly independent?

The vectors are in F41 . These are linearly independent iff their transposes

are. Forming a matrix with the transpose of the given vectors as rows, and

reducing it to its RREF, we see that

1

0

1

1

0 1 2

1

1

0

1 E [3,1] 1

R1

2

0

1

1 1 0

1

1 1 0

1

1 1

2 1 3

5

0 3 3

3

0

0

0

0

Here, R1 = E1 [1, 2], E3 [3, 2]. Since a zero row has appeared, the original vectors are linearly dependent. Also, notice that no row exchanges were carried

out in the reduction process. Therefore, the third vector is a linear combination of the first two vectors, which are linearly independent.

Due to our observations on RREF, we may follow another alternative. Instead of

taking transposes of the given column vectors, we proceed with the vectors themselves; thus forming a matrix with the columns as given vectors. In the RREF of the

matrix, if we see that each column is a pivotal column, then the vectors are linearly

independent. Moreover, if a column remains non-pivotal, then such a non-pivotal

column is a linear combination of the pivotal columns that preceed it. The components in the non-pivotal column give the coefficients of such a linear combination.

We solve Example 2.4 once more for illustrating this point.

Example 2.5

To determine whether the v1 = [1, 1, 0, 1]t , v2 = [0, 1, 1, 1]t and v3 = [2, 1, 3, 5]t

are linearly independent or not, we form a matrix [v1 v2 v3 ] and then reduce

33

it to its RREF. It is as follows.

1 0 2

1 1 1 R1

0 1 3

1 1 5

1

1

0 2

0

R2

0

1 3

0

0

1 3

0 1 3

0

0

2

1 3

.

0

0

0

0

Here, R1 = E1 [2, 1], E1 [4, 1] and R2 = E1 [3, 2] E1 [4, 2]. The third column is

non-pivotal. Thus, the corresponding vector v3 is a linear combination of the

vectors that correspond to the pivotal columns, i.e., of v1 and v2 . Moreover,

the components in the non-pivotal column say that v3 = 2 v1 3 v2 . You can

easily verify it.

1. Using elementary row operations determine whether the given vectors are linearly dependent or independent in each of the following cases.

(a) [1, 0, 1, 2, 3], [2, 1, 2, 4, 1], [3, 0, 1, 1, 1], [2, 1, 1, 1, 2].

(b) [1, 0, 1, 2, 3], [2, 1, 2, 4, 1], [3, 0, 1, 1, 1], [2, 1, 0, 7, 3].

(c) [1, i, 1, 1 i], [i, 1, i, 1 + i], [2, 0, 1, i], [1 + i, 1 i, 1, i].

2. Let V = R3 ; A = {(1, 2, 3), (4, 5, 6), (7, 8, 9)}. Determine whether A is linearly

dependent and if it is, express one of the vectors in A as a linear combination

of the remaining vectors.

2.3

Rank of a matrix

Suppose B is the RREF of a matrix A. Keeping track of the row exchanges, suppose

that the jth row of A has become a zero row in B. In that case, the jth row of A

is a linear combination of other rows. Conversely, if the jth row of A is a linear

combination of other rows, then in B, this row becomes a zero row. If B has r number

of pivots, then A has r number of linearly independent rows and other rows are linear

combinations of these r rows.

We define the rank of A as the number of pivots in the RREF of A, and denote it

by rank(A).

For an mn matrix A, the number nrank(A) is called the nullity of the matrix A.

Nullity of A is the number of un-pivoted columns in the RREF of A. We will connect

the nullity of a matrix to the solutions of the homogeneous linear system Ax = 0 later.

34

Example

2.6

1 1 1 2 1

1 2 1 1 1

Let A =

3 5 3 4 3 . We compute its RREF as follows:

1 0 1 3 1

1

1

3

1

1 1 2 1

R1

2 1 1 1

5 3 4 3

0 1 3 1

1

0

0

0

1

1

2

1

1 2 1

R2

0 1 0

0 2 0

0 1 0

1

0

0

0

0

1

0

0

1 3 1

0 1 0

0 0 0

0 0 0

Here, R1 is E1 [2, 1], E3 [3, 1], E1 [4, 1] and R2 is E1 [1, 2], E2 [3, 2], E1 [4, 2].

Thus rank(A) = 2.

Example 2.7

Determine the rank of the matrix A in Example 1.5, and point out which

rows of A are linear combinations of other rows, and which columns are linear

combinations of other columns, by reducing A to its RREF.

Form Example 1.5, we have seen that

1 0 32 0

1 1 2 0

3 5 7 1 E

0 1 12 0

A=

.

1 5 4 5

0 0 0 1

2 8 7 9

0 0 0 0

The row operation E is given by

E = E3 [2, 1], E1 [3, 1], E2 [4, 1] E1 [2, 1], E4 [3, 2], E6 [4, 2],

E1/2 [1, 3], E1/2 [2, 3], E6 [4, 3].

We see that rank(A) = 3, the number of pivots in the RREF of A. In this

reduction, no row exchanges have been used. Thus the first three rows of A

are the required rows. The fourth row is a linear combination of these three

rows. In fact,

row(4) = 3 row(1) + (1) row(2) + 2 row(3).

It also says that the third column is a linear combination of first and second.

Notice that the coefficients in such a linear combination are given by the

entries of the third column in the RREF. We can easily check that

col(3) = 23 col(1) + 21 col(2).

Let A Fmn . If rank(A) = r, then there are r number of linearly independent rows

in the RREF of A and other rows are linear combinations of these r rows. In A, the

35

corresponding r rows (with row exchanges taken care) are linearly independent and

the other rows are linear combinations of these r rows. Therefore, the maximum

number of linearly independent rows in A is r.

Looking at the columns in the RREF of A, we see that if rank(A) = r, then the

pivoted columns in the RREF are the standard basis vectors e1 , . . . , er . Thus, the unpivoted columns in the RREF are linear combinations of the pivoted ones. It shows

that in the RREF, there are r number of linearly independent columns and all other

columns are linear combinations of these r columns. Is this true in A also?

The RREF of A can be expressed as EA, where the matrix E is a product of elementary matrices. E is invertible. If the jth column of A is v j , then the jth column

of EA is Ev j . Without loss of generality, suppose that v1 , . . . , vr are linearly independent and each of vr+1 , . . . , vn is a linear combination of v1 , . . . , vr . We claim that the

columns of EA also has this property. To see this, suppose that

1 Ev1 + + r Evr = 0.

As, E is invertible, multiplying this equation with E 1 , we have

1 v1 + + r vr = 0.

Then the linear independence of v1 , . . . , vr implies that 1 = = r = 0. That is,

the vectors Ev1 , . . . , Evr are linearly independent. Next, for j > r, if

v j = 1 v1 + + r vr ,

then it follows that

Ev j = 1 Ev1 + + r Evr .

That is each of the vectors Evr+1 , . . . , Evn is a linear combination of Ev1 , . . . , Evr . It

proves our claim.

It then follows that if k is the maximum number of linearly independent columns

in A, then k is also the maximum number of columns in EA. Conversely, if EA has

maximum of k number of linearly independent columns, then A = E 1 (EA) will also

have maximum of k number of linearly independent columns.

Therefore, we conclude that the maximum number of linearly independent columns

in A is same as the maximum number of linearly independent columns in the RREF

of A, which is equal to rank(A).

The maximum number of linearly independent rows of a matrix is called the row

rank of the matrix. Similarly, the column rank of a matrix is the maximum number

of linearly independent columns. Using this terminology, we note down the above

discussion as our next theorem.

Theorem 2.3

Let A Fmn . Then rank(A) is equal to the row rank of A, which is also equal

to the column rank A.

36

It then follows that rank(A ) = rank(At ) = rank(A). Moreover, our discussion also

reveals that if P is any invertible matrix of order m, then rank(PA) = rank(A). Now,

if Q is an invertible matrix of order n, then

rank(AQ) = rank(Qt At ) = rank(At ) = rank(A).

We summarize the result as follows.

Theorem 2.4

Let A Fmn . Let P Fmm and Q Fnn be invertible matrices. Then

rank(PAQ) = rank(PA) = rank(AQ) = rank(A).

In general, if P Fkm and Q Fns , then rank(PA) rank(A) and rank(AQ)

rank(A). These are left as exercises for you.

Further, it follows that a matrix A Fnn is invertible iff rank(A) = n.

1 2 1 1 1

3 5 3 4 3

1. Determine rank r of

1 1 1 2 1 . Find out the r linearly independent

5 8 5 7 5

rows, and also the r linearly independent columns of the matrix. And then

express 4 r rows as linear combinations of those r rows, and 5 r columns

as linear combinations of those r columns.

2. Let A Fnn . Prove that A is invertible iff rank(A) = n iff det(A) 6= 0.

3. Let A Fmn . Let P Fmm . Is it true that the RREF of PA is same as the

RREF of A?

4. Let A Fmn and B Fnk . Prove that rank(AB) min{rank(A), rank(B)}.

2.4

We can now use our knowledge about matrices to settle some issues regarding solvability of linear systems. A linear system with m equations in n unknowns looks

37

like:

a11 x1 + a12 x2 + a1n xn = b1

a21 x1 + a22 x2 + a2n xn = b2

..

.

am1 x1 + am2 x2 + amn xn = bm

known scalars ai j and bi . Using the abbreviation x = [x1 , . . . , xn ]t , b = [b1 , . . . , bm ]t

and A = [ai j ], the system can be written in the compact form:

Ax = b.

Here, A Fmn , x Fn1 and b Fm1 . We also say that the matrix A is the system

matrix of the linear system Ax = b. Observe that the matrix A is a linear transformation from Fn1 to Fm1 , where m is the number of equations and n is the number of

unknowns in the system.

There is a slight deviation from our accepted symbolism. In case of linear systems,

we write b as a column vector and xi are unknown scalars.

Let A Fmn and b Fm1 . A solution of the system Ax = b is any vector y Fn1

such that Ay = b. In such a case, if y = [a1 , . . . , an ]t , then ai is called as the value of

the unknown xi in the solution y. In this language a solution of the system is also

written informally as

x1 = a1 , , xn = an .

The system Ax = b has a solution iff b R(A); and it has a unique solution iff b

R(A) and A is a one-one map. Corresponding to the linear system Ax = b is the

homogeneous system

Ax = 0.

The homogeneous system always has a solution since y := 0 is a solution. It has

infinitely many solutions when it has a nonzero solution. For, if y is a solution of

Ax = 0, then so is x for any scalar .

To study the non-homogeneous system, we use the augmented matrix [A|b] Fm(n+1)

which has its first n columns as those of A in the same order, and the (n+1)th column

is b. For example,

1 2 3

4

1 2 3 4

A=

, b=

[A|b] =

.

2 3 1

5

2 3 1 5

Theorem 2.5

Let A Fmn and b Fm1 . Then the following statements are true.

(1) Ax = b has a solution iff rank([A|b]) = rank(A).

38

given by u + y, where y is a solution of the homogeneous system Ax = 0.

(3) If [A0 |b0 ] is obtained from [A|b] by a finite sequence of elementary row

operations, then each solution of Ax = b is a solution of A0 x = b0 , and

vice versa.

(4) If r = rank([A|b]) = rank(A) < n, then there are n r unknowns which

can take arbitrary values and other r unknowns be determined from the

values of these n r unknowns.

(5) If m < n, then the homogeneous system has infinitely many solutions.

(6) Ax = b has a unique solution iff rank([A|b]) = rank(A) = n.

(7) If m = n, then Ax = b has a unique solution iff det(A) 6= 0.

Proof (1) Ax = b has a solution iff b is a linear combination of columns

of A iff the column rank of [A|b] is equal to the column rank of A. By Theorem 2.3, this happens, if and only if rank([A|b]) = rank(A).

(2) Let u be a particular solution of Ax = b. Then Au = b. Now, y is a solution

of Ax = b iff Ay = b iff Ay = Au iff A(y u) = 0 iff y u is a solution of Ax = 0.

(3) If [A0 |b0 ] has been obtained from [A|b] by a finite sequence of elementary

row operations, then A0 = EA and b0 = Eb, where E is the product of corresponding elementary matrices. The matrix E is invertible. Now, A0 x = b0 iff

EAx = Eb iff Ax = E 1 Eb = b.

(4) Due to (2), consider solving the corresponding homogeneous system. Let

rank(A) = r < n. Due to (3), assume that A is in RREF. There are r number

of pivots in A and m r number of zero rows. Omit all the zero rows. It

does not affect the solutions. The unknowns corresponding to pivots can be

expressed in terms of these n r unknowns. The n r unknowns which do not

correspond to pivots can take arbitrary values.

(5) If m < n, then r = rank(A) m < n. Consider the homogeneous system

Ax = 0. By (4), there are n r 1 number of unknowns which can take arbitrary values, and other r unknowns are determined accordingly. Each such

assignment of values to the n r unknowns gives rise to a distinct solution

resulting in infinite number of solutions of Ax = 0.

(6)

(7) Notice that for a matrix A Fnn , it is invertible iff rank(A) = n iff

det(A) 6= 0. Then the statement follows from (5).

A system of linear equations Ax = b is said to be consistent iff rank([A|b]) = rank(A).

39

Theorem 2.5(1) says that only consistent systems have solutions. Conversely, if a

system has a solution, then the system must be consistent. The statement in Theorem 2.5(4) is sometimes informally stated as follows:

A consistent system has n rank(A) number of linearly independent solutions.

The unknowns that correspond to the pivots are called the basic variables, and the

unknowns which correspond to the un-pivoted ones are called the free variables.

Thus there are rank(A) number of basis variables and n rank(A) of free variables,

which are assigned arbitrary values. Therefore, the number of free variables is equal

to the nullity of A.

To summarize, suppose that a linear homogeneous system Ax = 0 has m number

of equations and n number of unknowns. If m < n, then the system has a nonzero

solution; and hence, infinitely many solutions. If m > n, then the number of solutions

depends on the rank of the system matrix. In this case, if rank(A) < n, then Ax = 0

has infinitely many solutions; and if rank(A) = n, then it has a unique solution, which

is the trivial solution.

For non-homogeneous linear systems the same conclusion is drawn provided that

the system is consistent. To say it explicitly, let the linear system Ax = b has m

equations, n unknowns with rank(A) = r. Then it has no solution iff rank([A|b]) > r;

it has a unique solution iff rank([A|b]) = r = n; and it has infinitely many solutions

iff rank([A|b]) = r < n. Notice that the number m of equations plays no role; but the

number r, which is the number of linearly independent equations is important here.

1. Show that a linear system Ax = b is solvable iff b is a linear combination of

columns of A.

2. Consider the linear system Ax = b, where A Fmn and rank(A) = r. Write

explicit conditions on m, n, r so that the system has

(a) no solution

(b) unique solution

(c) infinite number of solutions

3. Let A Fnn . Prove that the following are equivalent:

(a) A is invertible.

(b) Ax = 0 has no non-trivial solution.

(c) Ax = b has a unique solution for some b Fn1 .

(d) Ax = b has at least one solution for each b Fn1 .

(e) Ax = ei has at least one solution for each i {1, . . . , n}.

(f) Ax = vi has at least one solution for each basis {v1 , . . . , vn } of Fn1 .

(g) Ax = b has at most one solution for each b Fn1 .

(h) Ax = b has a unique solution for each b Fn1 .

(i) rank(A) = n.

40

(j) The RREF of A is I.

(k) The rows of A are linearly independent.

(l) The columns of A are linearly independent.

(m) det(A) 6= 0.

(n) For each B Cnn , AB = 0 implies that B = 0.

4. Let A, B Fmn be in RREF. Prove that Sol (A, 0) = Sol (B, 0) iff A = B.

2.5

Gauss-Jordan elimination

row reduced echelon form for solving linear systems.

To determine whether a system of linear equations is consistent or not, we convert

the augmented matrix [A|b] to its RREF. In the RREF, if an entry in the b portion has

become a pivot, then the system is inconsistent; otherwise, the system is consistent.

Example 2.8

Is the following system of linear equations consistent?

5x1 + 2x2 3x3 + x4 = 7

x1 3x2 + 2x3 2x4 = 11

3x1 + 8x2 7x3 + 5x4 = 8

We take the augmented matrix and reduce it to its row reduced echelon form

by elementary row operations.

2/5 3/5

1/5

7/5

5 2 3 1 7

1

R1

1 3 2 2 11 0 17/5 13/5 11/5 48/5

3 8 7 5 8

0 34/5 26/5 22/5 19/5

1

0 5/17 1/17 43/17

R2

0 0

0

0 77/5

Here, R1 = E1/5 [1], E1 [2, 1], E3 [3, 1] and R2 = E5/17 [2], E2/5 [1, 2], E34/5 [3, 2].

Since an entry in the b portion has become a pivot, the system is inconsistent.

In fact, you can verify that the third row in A is simply first row minus twice

the second row, whereas the third entry in b is not the first entry minus twice

the second entry. Therefore, the system is inconsistent.

41

Example 2.9

We change the last equation in the previous example to make it consistent.

We consider the new system

5x1 + 2x2 3x3 + x4 = 7

x1 3x2 + 2x3 2x4 = 11

3x1 + 8x2 7x3 + 5x4 = 15

The reduction to echelon form is as follows:

2/5 3/5

1/5

7/5

7

5 2 3 1

1

R1

1 3 2 2 11 0 17/5 13/5 11/5 48/5

3 8 7 5 15

0 34/5 26/5 22/5 96/5

1

0 5/17 1/17 43/17

R2

0 1 13/17 11/17 48/17

0 0

0

0

0

with R1 = E1/5 [1], E1 [2, 1], E3 [3, 1] and R2 = E5/17 [2], E2/5 [1, 2], E34/5 [3, 2]

as the row operations. This expresses the fact that the third equation is

redundant. Now, solving the new system in row reduced echelon form is

easier. Writing as linear equations, we have

1 x1

5

1

17 x3 17 x4

1 x2 13

17 x3 +

11

17 x4

43

17

= 48

17

The unknowns corresponding to the pivots, that is, x1 and x2 are the basic

variables and the other unknowns, x3 , x4 are the free variables. The number

of basic variables is equal to the number of pivots, which is the rank of the

system matrix. By assigning the free variables xi to any arbitrary values, say,

i , the basic variables can be evaluated in terms of i .

We assign x3 to and x4 to . Then we have

x1 =

43

17

5

1

+ 17

+ 17

,

13

11

x2 = 48

17 + 17 17 .

43 5

1

17 + 17 + 17

48 13

+ 11

17

y := 17 17

for , F

43/17

5/17

1/17

48/17

13/17

11/17

y=

0 + 1 + 0 .

0

0

1

42

Here, the first vector is a particular solution of the original system. The two

vectors

1/17

5/17

11/17

13/17

1 and 0

1

0

are linearly independent solutions of the corresponding homogeneous system.

There should be exactly two such linearly independent solutions of the homogeneous system, because the nullity of the system matrix is the number of

unknowns minus its rank, which is 4 2 = 2.

It is easy to devise a mechanical way to write out the solution set of a consistent

linear system using Gauss-Jordan elimination. One has to add necessary number of

zero rows or delete some so that the RREF is a square matrix. Then identifying the

free and basis variables, one can just write out the solution set by reversing the signs

of entries on the un-pivoted columns, and changing one of the 0s to a 1. This is left

as an exercise for you.

There are variations of Gauss-Jordan elimination. Instead of reducing the augmented matrix to its row reduced echelon form, if we reduce it to another intermediary form, called the row echelon form, then we obtain the method of Gaussian elimination. In the row echelon form, we do not require the entries above a pivot to be 0;

also the pivots need not be equal to 1. In that case, we will require back-substitution

in solving a linear system. To illustrate this process, we redo Example 2.9 starting

with the augmented matrix, as follows:

5

5

2 3 1

7

R1

1 3 2 2 11

0

3 8 7 5 15

0

5

E2 [3,2]

0

0

48/5

96/5

2 3

1 7

0 0

0 0

3

17/5

13/5

34/5 26/5

2

1

11/5

22/5

Here, R1 = E1/5 [2, 1], E3/5 [3, 1]. The augmented matrix is now in row echelon

form. It is a consistent system, since no entry in the b portion is a pivot. The pivots

say that x1 , x2 are basic variables and x3 , x4 are free variables. We assign x3 to and

x4 to . Writing in equations form, we have

13

11

5 48

x1 = 7 2 x2 + 3 , x2 = 17

5 5 + 5 .

First we determine x2 and then back-substitute. We obtain

x1 =

43

17

5

1

+ 17

+ 17

,

48

11

x2 = 17

+ 13

17 17 ,

x3 = ,

x4 = .

As you see we end up with the same set of solutions as in Gauss-Jordan elimination.

43

1. Using Gauss-Jordan elimination, and also by Gaussian elimination, solve the

following linear systems:

(a) 3w + 2x + 2y z = 2, 2x + 3y + 4z = 2, y 6z = 6.

(b) w + 4x + y + 3z = 1, 2x + y + 3z = 0, w + 3x + y + 2z = 1, 2x + y + 6z = 0.

(c) w x + y z = 1, w + x y z = 1, w x y + z = 2, 4w 2x 2y = 1.

2. Show that the linear system x + y + kz = 1, x y z = 2, 2x + y 2z = 3 has

no solution for k = 1, and has a unique solution for each k 6= 1.

3

Subspace and Dimension

3.1

Recall that F stands for either R or C; and Fn denotes either F1n or Fn1 . Also,

recall that a typical row vector in F1n is written as [a1 , . . . , an ] and a column vector in Fn1 is written as [a1 , . . . , an ]t . Both the row and column vectors are written

uniformly as (a1 , . . . , an ); these constitute the vectors in Fn . In Fn , we have a special vector, called the zero vector, which we denote by 0 := (0, . . . , 0). And if

x = (a1 , . . . , an ) Fn , then its additive inverse x := (a1 , . . . , an ).

The operations of addition and scalar multiplication in Fn enjoy the following

properties:

For u, v, w Fn and , F,

1. u + v = v + u.

2. (u + v) + w = u + (v + w).

3. u + 0 = 0 + u = u.

4. u + (u) = u + u = 0.

5. ( u) = ( )u.

6. (u + v) = u + v.

7. ( + )u = u + u.

8. 1 u = u.

9. (1)u = u.

10. If u + v = u + w, then v = w.

11. If u = 0, then = 0 or u = 0.

It so happens that the last three properties follow from the earlier ones. Any

nonempty set where the two operations of addition and scalar multiplication are defined, and which enjoy the first eight properties above, is called a vector space. In

this sense, both F1n and Fn1 are vector spaces. In such a general setting when

45

46

a nonempty subset of a vector space is closed under both the operations is called a

subspace. We may not need these general notions. However, we define a subspace

of our two specific vector spaces.

Let V be a nonempty subset of Fn . We say that V is a subspace of Fn iff the

following properties are satisfied:

1. For each u, v V, u + v V.

2. For each F and for each v V, v V.

Example 3.1

1. {0} and Fn are subspaces of Fn .

2. Let V = {(a, b, c) : 2a + 3b + 5c = 0, a, b, c F}. Clearly, (0, 0, 0) V. So,

V 6= . If (a1 , b1 , c1 ), (a2 , b2 , c2 ) V, then

2a1 + 3b1 + 5c1 = 0,

If F, then 2( a1 ) + 3( b1 ) + 5( c1 ) = 0. So, (a1 , b1 , c1 ) V.

Therefore, V is a subspace of F3 .

3. Let V = {(a, b, c) : 2a + 3b + 5c = 1, a, b, c F}. Clearly, (1/2, 0, 0) V. So,

V 6= . Also, (0, 1/3, 0) V.

We see that (1/2, 0, 0) + (0, 1/3, 0) = (1/2, 1/3, 0). And

2 1/2 + 3 1/3 + 5 0 = 2 6= 1.

That is, (1/2, 0, 0) + (0, 1/3, 0) 6 V.

Therefore, V is not a subspace of F3 .

Also, notice that 2 (1/2, 0, 0) 6 V.

4. Let 1 , . . . , n F. Let V = {[a1 , . . . , an ] : 1 a1 + + n an = 0}. It is easy

to check that V is a subspace of F1n .

5. Let 1 , . . . , n F; F, 6= 0; V = {[a1 , . . . , an ] : 1 a1 + + n an = }.

Then V is not a subspace of F1n . Why?

It is easy to verify that in a subspace V, all the properties (1-8) above hold true.

This is the reason, we call such a nonempty subset as a subspace.

A single nonzero vector is not a subspace. For example, {(1, 1)} is not a subspace

of F2 but the set generated from it such as

{ (1, 1) : F}

47

is a subspace of F2 . This is the set of all linear combinations of the vector (1, 1).

Recall that a linear combination of vectors v1 , . . . , vm is any vector in the form

1 v1 + + m vm

for scalars 1 , . . . , m . We give a name to the set of all linear combinations of given

vectors.

If S is any nonempty subset of Fn , we define span (S) as the set of all linear combinations of finite number of vectors from S. We read span (S) as span of S. That

is,

span (S) = { 1 v1 + m vm : 1 , . . . , m F for some m N}.

If S = , then we define span () = {0}.

When S = {v1 , . . . , vm }, we also write span (S) as span {v1 , . . . , vm }. We see that

span {v1 , . . . , vm } = { 1 v1 + m vm : 1 , . . . , m F}.

For instance, v1 + v2 + + vm and v1 + 5v2 are in span {v1 , . . . , vm }. In the first case,

each i is equal to 1, whereas in the second case, 1 = 1, 2 = 5 and all other s

are 0.

Notice that S span (S) since each u S is a linear combination of itself, with the

coefficient as 1. Similarly, 0 span (S) since 0 = 0 u for any u S. However,

this argument is valid provided, there exists such a u in S. Otherwise, by definition,

span () = {0}. Therefore, 0 span (S) for every subset S of Fn .

Suppose S Fn . If u, v span (S), then both of them are linear combinations of

vectors from S. Their sum u+v is also a linear combination of vectors from S. Hence,

u + v span (S). Similarly, u span (S). Therefore, span (S) is a subspace of Fn .

Moreover, span (span (S)) = span (S). In general span of any subspace is the subspace

itself.

Let V be a subspace of Fn and let S V. We say that S is a spanning subset of V,

or that S spans V iff V = span (S). In this case, each vector in V can be expressed as

a linear combination of vectors from S. We also informally say that the vectors in S

span V whenever span (S) = V.

We just saw that

1

2

1

1

21

span

,

$ F , span

,

= F21 .

1

2

1

1

Notice that the vectors [1, 1]t , [1, 1]t , [4, 1]t also span F21 . In fact, since the

first two vectors span F21 , any list of vectors containing these two will also span

F21 .

Similarly, the vectors e1 , . . . , en in Fn1 span Fn1 , where ei is the column vector

in Fn1 whose ith component is 1 and all other components are 0.

In this terminology, vectors v1 , . . . , vn are linearly dependent iff one of the vectors

in this list is in the span of the rest. If no vector in the list is in the span of the rest,

then the vectors are linearly independent.

48

1. Let W be a vector space. Suppose U is a subspace of V and V is a subspace of

W. Is U a subspace of W ?

2. Let u, v1 , v2 , . . . , vn be n + 1 distinct vectors in Fn . Take S1 = {v1 , v2 , . . . , vn }

and S2 = {u, v1 , v2 , . . . , vn }. Prove that span (S1 ) = span (S2 ) iff u span (S1 ).

3. Let A, B be subsets of Fn . Prove or disprove the following:

(a) A is a subspace of Fn if and only if span (A) = A.

(b) If A B, then span (A) span (B).

(c) span (A B) = {u + v : u span (A), v span (B).

(d) span (A B) span (A) span (B).

4. Let S be a subset of a vector space V. Prove that span (S) is the subspace of V

satisfying the following properties:

(a) S span (S)

span (A) span (B) = {0} iff A B is linearly independent.

3.2

We bring in some flexibility in using the phrases linearly dependent or independent. When the vectors v1 , . . . , vm in Fn are linearly independent, we say that the list

v1 , . . . , vm is linearly independent, and also the set {v1 , . . . , vm } is linearly independent. Similarly, the vectors v1 , . . . , vm are linearly dependent iff the list v1 , . . . , vm is

linearly dependent. If there are no repetitions of vectors, in this list, then it is also

equivalent to asserting that the set {v1 , . . . , vm } is linearly dependent.

Let V be a subspace of Fn . Let S be a subset of V. The subset S may or may not

span V. If it spans V, it is possible that it has a proper subset which also spans V. For

instance,

S = {[1, 2, 3], [1, 0, 1], [2, 4, 2], [0, 2, 2]}

spans the subspace V = {[a, b, c] : a + b + c = 0} of F13 . Also, the subset

{[1, 2, 3], [1, 0, 1], [2, 4, 2]}

spans the same subspace V. Notice that S is linearly dependent. Reason:

[0, 2, 2] = (1)[1, 2, 3] + [1, 0, 1].

49

On the other hand the linearly independent set {[1, 2, 3]} does not span V. For

instance,

[1, 0, 1] 6= [1, 2, 3], for any F.

That is, a spanning subset may be superfluous and a linearly independent set may

be deficient. A linearly independent set which also spans a subspace may be just

adequate in spanning the subspace.

Let V be a subspace of Fn . Let B be a list of vectors from V. We say that B is a

basis of V iff B is linearly independent and B spans V. We write a basis using the set

notation, though it is a list of vectors. However, we remember that a basis is a list, an

ordered set, where the ordering of the vectors is as they are written. For instance, if

{v1 , v3 , v2 } is a basis for a subspace U, then we consider v1 as the first basis vector,

v3 as the second basis vector, and v2 as he third basis vector.

Example 3.2

1. It is easy to check that B = {e1 , . . . , en } is a basis of Fn1 . Similarly,

E = {et1 , . . . , etn } is a basis of F1n .

2. We show that B = {[1, 2, 3], [1, 0, 1]} is a basis of

V = {[a, b, c] : a + b + c = 0, a, b, c F}.

First, B V. Second, any vector in V is of the form [a, b, a b] for

a, b F. Now,

[a, b, a b] = 2b [1, 2, 3] + a b2 [1, 0, 1]

shows that span (B) = V. For linear independence, suppose

[1, 2, 3] + [1, 0, 1] = [0, 0, 0].

Then + = 0, 2 = 0, 3 = 0. It implies that = = 0.

3. Also, E = {[1, 1, 0], [0, 1, 1]} is a basis for the subspace V in (2).

Let B be a basis of a subspace V of Fn . If C is any proper superset of B, then any

vector in C \ B is a linear combination of vectors from B. So, C is linearly dependent.

On the other hand, if D is any proper subset of B, then each vector in B \ D fails

to be a linear combination of vectors from D. For, otherwise, B would be linearly

dependent. We thus say that

A basis is a maximal linearly independent set.

A basis is a minimal spanning set.

The zero subspace {0} has a single basis . But other subspaces do not have a unique

basis. For instance, the subspace V in Example 3.2 has at least two bases. However,

something remains same in all these bases. In that example, both the bases have

50

exactly two vectors. Is it true that all bases of a subspace have the same number of

vectors?

Theorem 3.1

If a subspace V of Fn has a basis of k vectors, then any list of vectors from V

having more than k vectors is linearly dependent.

Proof Let B = {u1 , . . . , uk } be a basis for a subspace V of Fn . Let E =

{v1 , . . . , vm }, where m > k. Each v j is a linear combination of us. So, we have

scalars ai j for i = 1, . . . , k and j = 1, . . . , m such that

k

v j = a1 j u1 + a2 j u2 + + ak j uk = ai j ui

for j = 1, 2, . . . , m.

i=1

k

i=1 j=1

j ai j ui = j a1 j

j=1

u1 + +

j ak j

uk = 0.

j=1

the linear system

a11 1 + a12 2 + + a1m m = 0

a21 1 + a22 2 + + a2m m = 0

..

.

ak1 1 + ak2 2 + + akm m = 0

This is a homogeneous linear system with k equations and m > k unknowns

1 , . . . , m . Thus it has a nonzero solution. Therefore, mj=1 j v j = 0, where

not all s are zero. That is, E is linearly dependent.

How does it answer our question? Well, suppose V is a subspace of Fn . Since Fn

has the standard basis with n number of vectors, we cannot find more than n vectors

from V which are linearly independent. Therefore, and basis of V will have at the

most n number of vectors. Now, suppose V has two bases B and E with k and m

vectors, respectively. As B is a basis and E is linearly independent, m cannot be

greater than k. Again, since B is linearly independent and E is a basis, k cannot be

greater than m. Therefore, k = m.

Now that we know, each basis of a subspace of Fn has a definite number of vectors

in it. We give a name to this important number associated with a subspace.

Let V be a subspace of Fn . The number of vectors in some (or any) basis for V is

called the dimension of V. We write this number as dim (V ).

51

Since {e1 , . . . , en } is a basis for Fn1 , dim (Fn1 ) = n. Similarly, dim (F1n ) = n.

Remember that when we consider Cn1 or C1n , the scalars are complex numbers,

and for Rn1 or R1n , the scalars are real numbers.

Example 3.3

1. The dimension of the zero space is 0. That is, dim ({0}) = 0.

2. The subspace U := {[a, b, c, d] : a 2b + 3c = 0 = d + a, a, b, c, d F} can

be written as

U = {[a, b, c, d] : [2b 3c, b, c, 2b + 3c] : b, c F}

= {b [2, 1, 0, 2] + c [3, 0, 1, 3] : b, c F}.

The vectors [2, 1, 0, 2] and [3, 0, 1, 3] are linearly independent. Therefore, U has a basis {[2, 1, 0, 2], [3, 0, 1, 3]}. So, dim (U) = 2.

For any subset B of a subspace V of Fn , the following statements should then be

obvious.

1. If B has less vectors than dim (V ), then span (B) is a proper subspace of V.

2. If B has more vectors than dim (V ), then B is linearly dependent.

3. If B has dim (V ) number of vectors and span (B) = V, then B is a basis for V.

4. If B has dim (V ) number of vectors and B is linearly independent, then B is a

basis of V.

5. If B is a proper superset of a spanning set of V, then B is linearly dependent.

6. If B is a proper subset of a linearly independent subset of V, then B is linearly

independent and span (B) is a proper subspace of V.

7. If U is a subspace of V, then dim (U) dim (V ) n.

8. If B is a spanning set of V, then it contains a basis for V.

9. If B is linearly independent, then there exists a superset of B which is a basis

of V.

To see (8), suppose B is a spanning subset of V. If B = {0}, then V = {0} and in

that case, B is a basis of V. Otherwise, choose a nonzero vector v1 from B. Take

C := {v1 }. If V = span (C), then C is a basis of V. Else, choose a (nonzero) vector

v2 from V \ span (C). Update C to C {v2 }. Notice that C is linearly independent.

Continue this process to obtain a basis C for V. This process terminates since V is

finite dimensional.

52

Incidentally the same process is applicable to prove (9) starting from the linearly

independent subset B of V. Observe that a linearly independent subset is a basis for

the span of the subset. Therefore, we conclude the following statement from (9).

Theorem 3.2

(Basis Extension Theorem) Let V be a subspace of Fn . Then each basis of a

subspace of V can be extended to a basis of V.

You can use the methods of last section, using elementary row operations to extract

a basis for a subspace which is given in the form of span of some finite number of

vectors. The trick is to throw away the vectors which are linear combinations of the

selected ones. That is, write the vectors as row vectors and form a matrix; convert the

matrix to its RREF; and then throw away the zero rows, or the rows corresponding

to the zero rows by monitoring row exchanges.

Example 3.4

Find a basis for the subspace U of F4 , where

U = span {(1, 1, 1, 1), (2, 1, 0, 3), (1, 0, 1, 2), (0, 3, 2, 1)}.

We start with the matrix with these vectors as its rows and

its RREF as follows.

1 1 1 1

1

1 1 1

2 1 0 3 R1 0 1 2 1

1 0 1 2 0

1 2 1

0 3 2 1

0

3 2 1

1

1 0 1 2

R2

R3 0

2 1

0 1

0 0

0

0 0

0 0 4 4

0

convert it to

0

1

0

0

0

1

0

1

1 1

0

0

Here, R1 is E2 [2, 1], E1 [3, 1], R2 is E1 [2], E1 [1, 2], E1 [3, 2], E3 [4, 2], and

R3 is E[3, 4], E1/4 [3], E1 [1, 3], E2 [2, 3].

Taking the pivoted rows, we see that {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1)} is

a basis for the given subspace. Notice that only one row exchange has been

done in this reduction process, which means that the third row in the RREF

corresponds to the fourth vector and the fourth row corresponds to the third

vector. Thus the pivoted rows correspond to the first, second and the fourth

vector, originally. This says that a basis for the subspace is also given by

{(1, 1, 1, 1), (2, 1, 0, 3), (0, 3, 2, 1)}.

The reduction process confirms that the third vector is a linear combination

of the first, second, and the fourth.

53

Suppose that the rows of a square matrix A Fnn are linearly independent. Then

the RREF of A has n number of pivots. That is, rank(A) = n. Consequently, A is

invertible. On the other hand, if a row of A is a linear combination of other rows,

then this row appears as a zero row in the RREF of A. That is, A is not invertible.

Considering At instead of A, we conclude that At is invertible iff the rows of A

are linearly independent. However, A is invertible iff At is invertible. Therefore, A

is invertible iff its columns are linearly independent. We note it down as our next

result.

Theorem 3.3

A square matrix is invertible iff its rows are linearly independent iff its columns

are linearly independent.

From Theorem 3.3 it follows that an n n matrix is invertible iff its rows form a

basis for F1n iff its columns form a basis for Fn1 .

1. Answer the following questions with justification:

(a) Is every subset of a linearly independent set linearly independent?

(b) Is every subset of a linearly dependent set linearly dependent?

(c) Is every superset of a linearly independent set linearly independent?

(d) Is every superset of a linearly dependent set linearly dependent?

(e) Is union of two linearly independent sets linearly independent?

(f) Is union of two linearly dependent sets linearly dependent?

(g) Is intersection of two linearly independent sets linearly independent?

(h) Is intersection of two linearly dependent sets linearly dependent?

2. Prove statements in (1)-(7) listed after Example 3.3.

3. Let {x, y, z} be a basis for a vector space V. Is {x + y, y + z, z + x} also a basis

for V ?

4. Find a basis for the subspace {(a, b, c) R3 : a + b 5c = 0} of R3 .

5. Find bases and dimensions of the following subspaces of R5 :

(a) {(a, b, c, d, e) R5 : a c d = 0}.

(b) {(a, b, c, d, e) R5 : b = c = d, a + e = 0}.

(c) span {(1, 1, 0, 2, 1), (2, 1, 2, 0, 0), (0, 3, 2, 4, 2), (3, 3, 4, 2, 1),

(5, 7, 3, 2, 0)}.

54

6. Extend the set {(1, 0, 1, 0), (1, 0, 1, 0)} to a basis of R4 .

7. Prove that the only proper subspaces of R2 are the straight lines passing through

the origin.

3.3

Let A Fmn . We may view the matrix A as a function from Fn1 to Fm1 . It goes

as follows.

Let x Fn1 . Then define the matrix A as a function A : Fn1 Fm1 by

A(x) = Ax.

That is, the value of the function A at any vector x Fn1 is the vector Ax in Fm1 .

Since the matrix product Ax is well defined, such a function is meaningful. We see

that due to the properties of matrix product, the following are true:

1. A(u + v) = A(u) + A(v) for all u, v Fn1 .

2. A( v) = A(v) for all v Fn1 and for all F.

In this manner a matrix is considered as a linear map. In fact, any function A from a

vector space to another (both over the same field) satisfying the above two properties

is called a linear transformation or a linear map.

To see the connection between the matrix as a rectangular array and as a function,

consider the values of the matrix A at the standard basis vectors e1 , . . . , en in Fn1 .

Recall that e j is a column vector in Fn1 where the jth entry is 1 and all other entries

are 0. Let A = [ai j ] Fmn . Then Ae j is the jth column of A. We thus observe the

following:

A matrix A Fmn is viewed as the linear map A : Fn1 Fm1 , where

A(e j ) is the jth column of A, and A(v) = Av for each v Fn1 .

The range of the matrix A (of the linear map A) is the set R(A) = {Ax : x Fn1 }.

Now, each vector x = [ 1 , . . . , n ]t Fn1 can be written as

x = 1 e1 + + n en .

If y R(A), then there exists an x Fn1 such that y = Ax. Such a y is written as

y = Ax = 1 Ae1 + + n Aen .

Conversely we see that each vector 1 Ae1 + + n Aen is in R(A). Since Ae j is the

jth column of A, we find that

R(A) = { 1 A1 + + n An : 1 , . . . , n F},

55

where A1 , . . . , An are the n columns of A. Thus, R(A) is the span of the columns of A;

and hence, it is a subspace of Fm1 . We thus refer to R(A) as the range space of A.

In this terminology, rank(A), which is the maximum number of columns of A is the

maximum number of linearly independent vectors in R(A). That is,

rank(A) = dim (R(A)).

In the RREF of A, the un-pivoted columns are the linear combinations of the pivoted

ones. If rank(A) = r, then the pivoted columns are e1 , . . . , er . Thus, in A, the columns

that correspond to the pivoted ones form a basis of R(A).

The null space N(A) = {x Fn1 : Ax = 0} of A is simply the set of solutions of

the homogeneous system Ax = 0. If u, v N(A), then A(u + v) = Au + Av = 0. For

any scalar , A( u) = Au = 0. Therefore, N(A) is a subspace of Fn1 . And we

refer to N(A) as the null space of A.

The nullity of A, denoted by null (A) := n rank(A) is the maximum number of

linearly independent vectors in N(A). That is,

null (A) = dim (N(A)).

As we know from the properties of a linear system, null (A) gives the number of

linearly independent solutions of the homogeneous system Ax = 0. In fact, by GaussJordan elimination we can construct a basis for N(A).

Explicitly, if A has more rows than columns, then we neglect the last m n zero

rows in the RREF of A; and if A has less rows than columns, we put in n m number

of zero rows at the bottom of the RREF of A, to get a square matrix. From the

resulting square matrix B, we collect all un-pivoted columns, and if an un-pivoted

column had the column index j in B, then we change its jth entry from 0 to 1.

These changed un-pivoted column vectors form a basis of N(A).

Example 3.5

Consider the system matrix in Example

pivots as shown below:

1

5 2 3 1

A = 1 3 2 2 0

3 8 7 5

0

0

1

0

5/17 1/17

13/17

11/17

= RREF(A).

The first two columns in RREF(A) are the pivoted columns. So, the first two

columns in A form a basis for R(A). That is,

Basis for R(A) is {[5, 1, 3]t , [2, 3, 8]t }.

For a basis of N(A), we adjoin a zero row to the RREF to make it a square

matrix to obtain

1

0 5/17 1/17

0 1 13/17 11/17

.

B=

0 0

0

0

0 0

0

0

56

changed un-pivoted columns form a basis for N(A). That is,

Basis for N(A) is

t 1

5

, 13

17

17 , 1, 0 , 17 ,

11

17 ,

t

0, 1 .

Check whether these vectors came up in writing the solution set of the system

in Example 2.9.

It follows that dim (R(A)) + dim (N(A)) is equal to the dimension of the domain

space of the linear map A, which is same as the number of columns of A. This statement is referred to as the rank nullity theorem.

1. Given any subspace U of Fn1 , does there exist a matrix in Fnn such that

U = N(A)?

2. Let A Fmn . Let {u1 , . . . , uk } be a basis for N(A). Extend this to {u1 , . . . , uk ,

v1 , . . . , vnk }, a basis for Fn1 . Then show that {Av1 , . . . , Avnk } is a basis for

R(A). This will give an alternate proof of the rank nullity theorem.

1 2 1 1 1

3 5 3 4 3

3. Determine rank r of A =

1 1 1 2 1 . And then express suitable 4 r

5 8 5 7 5

rows of A as linear combinations of the other r rows. Also, express suitable

5 r columns of A as linear combinations of the other r columns.

4. If E Fmm is an elementary matrix and A Fmn , then show that the row

rank of EA is equal to the row rank of A.

5. If B Fmm is an invertible matrix and A Fmn , then show that the column

rank of BA is equal to the column rank of A.

6. From previous two exercises, conclude that an elementary row operation neither alters the row rank nor the column rank of a matrix.

7. Let A Fmn . Prove that the linear map A : Fn1 Fm1 given by A(x) := Ax

for each x Fn1 is one-one iff N(A) = {0}.

8. Let A Fnn . Prove that the linear map A : Fn1 Fn1 given by A(x) := Ax

is one-one and onto iff A maps any basis onto another (may be same) basis of

Fn1 .

9. Let A Fnn . Prove that the linear map A : Fn1 Fn1 given by A(x) := Ax

for each x Fn1 is one-one iff it is onto.

57

3.4

Change of basis

basis of V. Let v V. As B spans V, the vector v is a linear combination of vectors

from B. Can there be two distinct linear combinations? Suppose that there exist

scalars a1 , . . . , am F and b1 , . . . , bm F such that

v = a1 v1 + + am vm = b1 v1 + + bm vm .

Then (a1 b1 )v1 + + (am bm )vm = 0. Due to linear independence of B, we

conclude that a1 = b1 , . . . , am = bm . That is, such a linear combination is unique.

Conversely, suppose that each vector in V is written uniquely as a linear combination of v1 , . . . , vm . To show linear independence of these vectors, suppose that

1 v1 + + m vm = 0

for scalars 1 , . . . , m . We also have

0 v1 + + 0 vm = 0.

From the uniqueness of writing the zero vector as a linear combination of v1 , . . . , vm ,

we conclude that 1 = 0, . . . , m = 0. Therefore, B is linearly independent.

We note down the result we have proved.

Theorem 3.4

Let B = {v1 , . . . , vm } be an ordered subset of a subspace V of Fn . B is a basis of

V iff for each v V, there exists a unique column vector [ 1 , . . . , m ]t Fm1

such that v = 1 v1 + + m vm .

Notice that in such a case, dim (V ) = m. Once a basis B having m vectors is given

for a subspace V of Fn , the unique column vector [ 1 , . . . , m ]t Fm1 is called the

coordinate vector of v with respect to the basis B; and it is denoted by [v]B .

In this sense, we say that a basis provides a coordinate system in a subspace; it

co-ordinatizes the subspace.

Example 3.6

Let V = {[a, b, c]t : a + 2b + 3c = 0, a, b, c R}. This subspace has a basis

B = {[0, 3, 2]t , [2, 1, 0]t }.

Let v = [3, 0, 1]t V. We find that

v = [3, 0, 1]t = 21 [0, 3, 2]t 32 [2, 1, 0]t .

t

Therefore, [v]B = 21 , 32 .

58

If w V has coordinate vector given by [w]B = [1, 1]t , then

w = 1 [0, 3, 2]t + 1 [2, 1, 0]t = [2, 2, 2]t .

When we change the coordinate system, there are two questions to answer. How

do we obtain the new coordinate vectors from the old? And how does a matrix

change with respect to the change in coordinate system?

Let A Fmn . We view A as a linear map from Fn1 to Fm1 . For a vector v

n1

F , we have a vector Av Fm1 . In writing the vectors v and Av, we had used the

standard bases, by default. If we choose a different pair of bases, that is, a basis B

for Fn1 and a basis C for Fm1 , then our second question is formulated as follows:

What is the matrix M such that [Av]C = M[v]B ?

Let B = {u1 , . . . , un } be a basis for Fn1 . Suppose the jth standard basis vector of

Fn1 can be written as

[e j ]B = [c1 , . . . , cn ]t .

e j = c1 u1 + + cn un ,

order. Then

c1

..

[u1 un ][e j ]B = [u1 un ] . = c1 u1 + + cn un = e j .

cn

We thus observe the following.

Observation 1: Let B = {u1 , . . . , un } be an ordered basis for Fn1 . Let e j denote the

jth standard basis vector of Fn1 . Construct the matrix P = [u1 un ] by taking the

column vector uk as its kth column. Then P[e j ]B = e j .

We use this observation in the proof of the following result.

Theorem 3.5

Let B = {u1 , . . . , un } and C = {v1 , . . . , vm } be bases for Fn1 and Fm1 , respectively. Let A Fmn . Construct the matrices

P = [u1 . . . un ],

Q = [v1 . . . vm ],

Then, for each w Fn1 , [Aw]C = Q1 AP [w]B .

Proof Let w = [a1 , . . . , an ]t Fn1 . Write Aw = [b1 , . . . , bm ]t . Also, let e1 , . . . , en

be the standard basis vectors in Fn1 ; and f1 , . . . , fm the standard basis vectors

59

in Fm1 . We know that, for i = 1, . . . , n and j = 1, . . . , m,

n

P [e j ]B = e j ,

Q [ fi ]C = fi ,

w=

a je j,

j=1

Aw = bi fi .

i=1

Then

P [w]B = P

a je j

j=1

Q [Aw]C = Q

=P

bi fi

i=1

i

C

a j [e j ]B

a j P [e j ]B = a j e j = w.

j=1

j=1

j=1

= bi Q[ fi ]C = bi fi = Aw = AP [w]B .

i=1

i=1

Fnn

are linearly independent, Q is invertible. It

then follows that [Aw]C = Q1 AP [w]B .

Theorem 3.5 says that the matrix Q1 AP when multiplied with the coordinate

vector of w with respect to the basis B, produces the coordinate vector of Aw with

respect to the basis C. In this sense, the matrix Q1 AP now represents the same linear

map A in the new coordinate system.

In particular, taking A as the identity matrix, we see that

[u]C = Q1 P[u]B .

Here, of course, both B and C are bases for the same space Fn1 . This formula

shows how the coordinate vector changes when a basis changes; thus answering our

first question. The matrix Q1 P is called the change of basis matrix.

Example 3.7

Consider the following bases for R3 :

O = {(1, 0, 1), (1, 1, 0), (0, 1, 1)},

Find the change of basis matrix M when the basis changes from O

to N. Also

1 1 1

find the matrix B that represents the linear map given by A = 1 0 1

0 1 0

and verify that

1

1

1

1

2 = M 2 , A 2 = B 2 .

3 N

3 O

3 N

3 O

We consider the transposes of the basis vectors and work in R31 . As per

the construction in Theorem 3.5, the change of basis matrix is given by

1 1 1

1 1 0

1 1/2 1/2

M = 1 1 1 0 1 1 = 1/2 1 1/2 .

1/2 1/2

1 1 1

1 0 1

1

60

(1, 2, 3) = 1(1, 0, 1) + 0(1, 1, 0) + 2(0, 1, 1)

(1, 2, 3) = 2(1, 1, 1) + 32 (1, 1, 1) + 25 (1, 1, 1).

Therefore, [(1, 2, 3)]O = [1, 0, 2]t and [(1, 2, 3)]N = [2, 32 , 25 ]t . Then

1 1/2 1/2 1

2

M[(1, 2, 3)]O = 1/2 1 1/2 0 = 3/2 = [(1, 2, 3]N .

1/2 1/2

5/2

1

2

According to Theorem 3.5,

2 3 3

1 1 1

1 1 0

1

B = 1 1 1 A 0 1 1 = 2 1 3 .

2

0 0 2

1 1 1

1 0 1

As to the verification,

1

6

1

1

1

A 2 = 2 = 4 1 + 4 1 + 2 1 .

3

6

1

1

1

We find that

1

1

4

1

B 2 = B 0 = 4 = A 2 .

3 O

2

2

3 N

Observe that the matrix B is now a linear map from R31 with basis as O to

the space R31 with basis as N.

1. Consider the subspace V = {[a, b, c]t : a + 2b + 3c = 0, a, b, c R} of F31 . V

t

t

t

t

has bases

B1 = {[0,3, 2] , [2, 1, 0] } and B2 = {[1, 1, 1] , [3, 0, 1] }. Let

3 1 0

A = 0 1 3 .

1 1 2

(a) Show that A : V V is a well defined map.

(b) Extend the basis B1 of V to a basis O for F31 .

(c) Extend the basis B2 of V to a basis N for F31 .

(d) Find the change of basis matrix M when basis changes from O to N.

(e) Verify that [v]N = M[v]O for v = [4, 1, 2]t .

(f) Find a matrix M that represents A obtained by changing the basis from O

to N.

61

(h) Find a matrix B such that [Av]N = B[v]N for any v F31 .

2. Given any subspace U of Fn1 , does there exist a matrix A in Fnn such that

U = R(A)?

3.5

In view of Theorem 3.5, we say that two matrices A, B Fmn are equivalent iff

there exist invertible matrices P Fnn and Q Fmm such that B = Q1 AP.

Observe that equivalent matrices represent the same linear map (matrix) with respect to possibly different pairs of bases. Therefore, ranks of two equivalent matrices

are same.

We can construct a matrix of rank r relatively easily. Let r min{m, n}. The

matrix Rr Fmn whose first r columns are the first standard basis vectors e1 , . . . , er

of F1n and all other rows are zero rows, has rank r. That is, in block form,

I 0

Rr = r

.

0 0

Such a matrix is called a rank echelon matrix. For notational ease, we do not show

the size of a rank echelon matrix; we rather specify it in different contexts. From

Theorem 2.4, it follows that any matrix which is equivalent to Rr has also rank r.

Conversely, if a row of a matrix is a linear combination of other rows, then the

rank of the matrix is same as the matrix obtained by deleting such a row. Similarly,

deleting a column which is a linear combination of other columns does not change

the rank of the matrix. It is thus possible to perform elementary row and column

operations to bring a matrix of rank r to its rank echelon form Rr . We state this result

and give a rigorous proof.

Theorem 3.6

(Rank factorization) A matrix is of rank r iff it is equivalent to the rank echelon

matrix Rr of the same size.

Proof Let A Fmn . Suppose rank(A) = r. Convert A to its row reduced

echelon form C := E1 A, where E1 is a suitable product of elementary matrices.

Now, each non-pivotal column is a linear combination of the r pivotal columns.

Consider the matrix C t . The pivotal columns of C are now the pivotal rows

t

ei in C t . Each other row in C t is a liner combination of these pivotal rows.

Exchange the rows of C t so that the first r rows of the new C t are et1 , . . . , etr in

that order. Use suitable elementary row operations to zero-out all non-pivotal

62

et1 , . . . , etr . Thus we obtain the matrix E2C t , where E2 is the suitable product of

elementary matrices such that E2C t has first r rows as et1 , . . . , etr and all other

rows are zero rows. Then taking transpose, we see that CE2t is a matrix whose

first r columns are e1 , . . . , er and all other columns are zero columns.

To summarize, we have obtained two matrices E1 and E2 , which are products

of elementary matrices such that

Ir 0

t

t

CE2 = E1 AE2 =

= Rr .

0 0

With P = E11 and Q = E2t , we see that A = Q1 Rr P.

For the converse, suppose that A = Q1 Rr P for some invertible matrices P

and Q. By Theorem 2.4, rank(A) = rank(Rr ) = r.

The rank factorization can be used to characterize equivalence of matrices. If A

and B are equivalent matrices, then clearly, they have the same rank. Conversely,

if two m n matrices have the same rank r, then both of them are equivalent to Rr .

Then, they are equivalent. Therefore, we have the rank theorem, which is stated as

follows:

Two matrices of the same size are equivalent iff they have the same rank.

I 0

I

Observe that Rr = r

= r Ir 0 . Therefore, any matrix A of rank r can

0 0

0

be written as

I

A = BC, with B = Q r , C = Ir 0 P1

0

for some invertible matrices P and Q. Here, B Fmr is of rank r and C Frn is of

rank r also. Such a factorization of a matrix is called the full rank factorization.

The notion of equivalence stems from the change of bases in both the domain and

the co-domain of a matrix viewed as a linear map. In case, the matrix is a square

matrix of order n, it is considered as a linear map on Fn1 . If we change the basis in

Fn1 , we would have a corresponding representation of the matrix in the new basis.

Let A Fnn , a square matrix of order n. The matrix A is a map from Fn1 to

n1

F . Let E = {e1 , . . . , en } be the standard basis of Fn1 . The matrix A acts in the

usual way: Ae j is the jth column of A. Suppose, we change the basis of Fn1 to

C = {v1 , . . . , vn }. That is, in both the domain and the co-domain space, we take the

new basis as C. Suppose the equivalent matrix of A is M. Then the following equation

will hold for each v Fn1 :

[Av]C = P1 AP [v]C ,

P = [v1 . . . vn ].

The matrix A as a linear map now takes the form P1 AP, where the columns of P

form a basis, or a new coordinate system in Fn1 . This leads to similarity of two

matrices.

63

We say that two matrices A, B Fnn are similar iff B = P1 AP for some invertible matrix P Fnn .

We emphasize that if B = P1 AP is a matrix similar to A, then the matrix A as a

linear map on Fn1 with standard basis, and the matrix B as a liner map on Fn1 with

an ordered basis as the columns of P, are the same linear map.

Let N be the ordered basis whose jth element is the jth column of P. We see that

for each vector v Fn1 , [Av]N = P1 AP [v]N .

Example 3.8

t

t

31

Consider the basis N = {[1, 1,1]t , [1, 1, 1]

, [1, 1, 1] } for R . To deter1 1 1

mine the matrix similar to A = 1 0 1 , when the basis has changed from

0 1 0

the standard basis to N, we construct the matrix P by taking the basis vectors

of N as follows:

1 1 1

P = 1 1 1 .

1 1 1

Then the matrix similar

1 0

1

B = P1 AP = 1 1

2

0 1

1

1 1 1

1 1 1

0 2 2

1

0 1 0 1 1 1 1 = 1 1 3 .

2

1

0 1 0

1 1 1

1 1 3

[u]N = [2, 32 , 52 ]t ,

0 2 2

2

4

1

B [u]N = 1 1 3 3/2 = 4 .

2

1 1 3 5/2

2

This verifies the condition [Au]N = B [u]N for the vector u = [1, 2, 3]t .

Though equivalence is easy to characterize by the rank, similarity is much more

difficult. And we postpone this to a later chapter.

1. Let A Cmn . Define T : C1m C1n by T (x) = xA for x C1m . Show

that T is a linear map. Identify T (etj ), Find a rank factorization of A.

2. Define T : R31 R21 by T ([a, b, c]t ) = [c, b + a]t . Show that T is a linear

map. Find a matrix A R23 such that T ([a, b, c]t ) = A[a, b, c]t . Determine a

full rank factorization of A.

64

3. Let T : R31 R31 be a linear map defined by

T ([a, b, c]t ) = [a + b, 2a b c, a + b + c]t .

Let A R33 be the matrix such that T (x) = Ax for x R31 . Find rank(A).

Then determine a rank factorization of A.

4. Using rank factorization, show that for any m k matrix A and k n matrix B,

rank(AB) min{rank(A), rank(B)}.

5. Let A, B Fnn . Show that rank(A) + rank(B) n rank(AB).

6. Using full rank factorization show that rank(A + B) rank(A) + rank(B) for

A, B Fmn .

7. Which matrices are equivalent to the zero matrix?

8. Which matrices are similar to the zero matrix?

9. Which matrices are equivalent to the identity matrix?

11. Is a rank factorization of a matrix unique?

12. Is a full rank factorization of a matrix unique?

4

Orthogonalization

4.1

Inner products

The dot product in R3 is used to define length and angle. In particular, the dot

product is used to determine when two vectors become perpendicular to each other.

This notion can be generalized to Fn .

For vectors u, v F1n , we define their inner product as

hu, vi = uv .

For example, if u = [1, 2, 3], v = [2, 1, 3], then hu, vi = 1 2 + 2 1 + 3 3 = 13.

Similarly, for x, y Fn1 , we define their inner product as

hx, yi = y x.

In case, F = R, in the definition of inner product, x becomes xt . The inner product

satisfies the following properties:

For x, y, z Fn , , F,

1. hx, xi 0.

2. hx, xi = 0 iff x = 0.

3. hx, yi = hy, xi.

4. hx + y, zi = hx, zi + hy, zi.

5. hz, x + yi = hz, xi + hz, yi.

6. h x, yi = hx, yi.

7. hx, yi = hx, yi.

In any vector space V, with a map from h i : V V F that satisfies Properties

(1)-(4) and (6) is called an inner product space. Properties (5) and (7) follow from

the others.

The inner product gives rise to the length of a vector as in the familiar case of

R13 . We now call the generalized version of length as the norm.

65

66

For u Fn , we define its norm, denoted by kuk as the nonnegative square root of

hu, ui. That is,

p

kuk = hu, ui.

The norm satisfies the following properties:

For x, y Fn and F,

1. kxk 0.

2. kxk = 0 iff x = 0.

3. k xk = | | kxk.

4. |hx, yi| kxk kyk. (Cauchy-Schwartz inequality)

5. kx + yk = kxk + kyk. (Triangle inequality)

A proof of Cauchy-Schwartz inequality goes as follows:

If y = 0, then the inequality clearly holds. Else, hy, yi 6= 0. Write =

=

hx, yi

. Then

hy, yi

hy, xi

and hx, yi = | |2 kyk2 . Then

hy, yi

0 hx y, x yi = hx, xi hx, yi + hy, yi hy, xi

= kxk2 hx, yi = kxk2 | |2 kyk2 = kxk2

|hx, yi|2

kyk2 .

kyk4

kx + yk2 = hx + y, x + yi = kxk2 + kyk2 + hx, yi + hy, xi kxk2 + kyk2 + 2kxk kyk.

Using these properties, the acute (non-obtuse) angle between any two nonzero vectors can be defined. Let x, y F1n (or in Fn1 ). The angle between x and y,

denoted by (x, y) is defined by

cos (x, y) =

|hx, yi|

.

kxk kyk

and write it as x y, iff hx, yi = 0.

Notice that this definition allows x and y to be zero vectors. Also, the zero vector is

orthogonal to every vector. Also, if x y, then y x; thus whenever x is orthogonal

to y, we say that x and y are orthogonal vectors.

It follows that if x y, then kxk2 +kyk2 = kx +yk2 . This is referred to as Pythagoras law. The converse of Pythagoras law holds when F = R. For F = C, it does not

hold, in general.

We extend the notion of orthogonality to a set of vectors. A set of nonzero vectors

in Fn is called an orthogonal set in Fn iff each vector in the set is orthogonal to every

67

Orthogonalization

other vector in the set. An orthogonal set of vectors is called an orthonormal set if

the norm of each vector is 1. For example,

{[1, 2, 3]t , [2, 1, 0]t }

is an orthogonal set in F31 . And

1/ 14, 2/ 14, 3/ 14 t , 2/ 5,

1/ 5,

t

0

Fn . Orthogonal and orthonormal sets enjoy nice properties; some of them are listed

in the exercises.

1. In C, consider the inner product hx, yi = xy. Let x = 1 and y = i be two vectors

in C. Show that kxk2 + kyk2 = kx + yk2 but hx, yi 6= 0.

2. In Fn1 , show that the parallelogram law holds. That is, for all x, y Fn1 , we

have kx + yk2 + kx yk2 = 2(kxk2 + kyk2 ).

hu, vi

v + w for v 6= 0. Show that hv, wi = 0. Then use

kvk2

Pythagoras theorem to derive Cauchy-Schwartz inequality.

3. Write a vector u as u =

4. Is the set {[1, 2, 3, 1]t , [2, 1, 0, 0]t , [0, 0, 1, 3]t } an orthogonal set in F41 ?

Is it also a linearly independent set?

5. Prove that each orthogonal set in Fn is linearly independent.

6. Construct an orthonormal set from {[1, 2, 3, 1]t , [2, 1, 0, 0]t , [0, 0, 1, 3]t }.

7. If an orthogonal set is given, how do we construct an orthonormal set from it?

8. Let B = {v1 , . . . , vm } be an orthonormal set in Fn . Let V = span (B). Let x Fn .

Prove the following:

(a) Fourier Expansion: If x V, then x = mj=1 hx, v j iv j .

(b) Parsevals Identity: If x V, then kxk2 = mj=1 |hx, v j i|2 .

(c) Bessels Inequality: kxk2 mj=1 |hx, v j i|2 .

4.2

Gram-Schmidt orthogonalization

It is easy to see that if the nonzero vectors v1 , . . . , vn are orthogonal, then they are

also linearly independent. For, suppose v1 , . . . , vn are nonzero orthogonal vectors.

68

Assume that

1 v1 + + n vn = 0.

Let j {1, . . . , n}. Take inner product of the left hand side and the right hand side

of the above equation with v j . If i 6= j, then hvi , v j i = 0. So, we have j hv j , v j i = 0.

But v j 6= 0 implies that hv j , v j i 6= 0. Therefore, j = 0. That is,

1 = = n = 0.

Therefore, the vectors v1 , . . . , vn are linearly independent.

Conversely, given n linearly independent vectors v1 , . . . , vn (necessarily all nonzero),

we can orthogonalize them. If v1 , . . . , vk are linearly independent but v1 , . . . , vk , vk+1

are linearly dependent, then we will see that our orthogonalization process will yield

the (k + 1)th vector as the zero vector. We now discuss this method, called GramSchmidt orthogonalization.

Given two linearly independent vectors u1 , u2 on the plane how do we construct

two orthogonal vectors? Keep v1 = u1 . Take out the projection of u2 on u1 to get v2 .

Now, v2 v1 .

What is the projection of u2 on u1 ? Its length is hu2 , u1 i. Its direction is that of

hu2 , v1 i

v1 does the job. You can now verify

u1 . Thus taking v1 = u1 and v2 = u2

hv1 , v1 i

that hv2 , v1 i = 0. We may continue this process of taking away projections in Fn . It

results in the following process.

Theorem 4.1

(Gram-Schmidt orthogonalization) Let u1 , u2 , . . . , un be linearly independent vectors in Fn . Define

v1 = u1

v2 = u2

hu2 , v1 i

v1

hv1 , v1 i

..

.

vn+1 = un+1

hun+1 , v1 i

hun+1 , vn i

v1

vn

hv1 , v1 i

hvn , vn i

hu2 , v1 i

hu2 , v1 i

v1 , v1 i = hu2 , v1 i

hv1 , v1 i = 0.

hv1 , v1 i

hv1 , v1 i

Also, span {u1 , u2 } = span {v1 , v2 }. To complete the proof, use Induction.

Proof

hv2 , v1 i = hu2

dependent, then Gram-Schmidt process will compute nonzero orthogonal vectors

v1 , . . . , vk and it will give vk+1 as the zero vector.

69

Orthogonalization

Example 4.1

The vectors u1 = [1, 0, 0], u2 = [1, 1, 0], u3 = [1, 1, 1] are linearly independent

in R13 . Apply Gram-Schmidt Orthogonalization.

v1 = [1, 0, 0].

hu2 , v1 i

v2 = u2

v1 = [1, 1, 0] 1 [1, 0, 0] = [0, 1, 0].

hv1 , v1 i

hu3 , v2 i

hu3 , v1 i

v1

v2 = [1, 1, 1] [1, 0, 0] [0, 1, 0] = [0, 0, 1].

v3 = u3

hv1 , v1 i

hv2 , v2 i

Example 4.2

The vectors u1 = [1, 1, 0], u2 = [0, 1, 1], u3 = [1, 0, 1] form a basis for F13 . Apply

Gram-Schmidt Orthogonalization.

v1 = [1, 1, 0].

hu2 , v1 i

[0, 1, 1] [1, 1, 0]

v2 = u2

v1 = [0, 1, 1]

[1, 1, 0]

hv1 , v1 i

[1, 1, 0] [1, 1, 0]

h

= [0, 1, 1] 12 [1, 1, 0] = 12 , 12 , 1].

v3 = u3

hu3 , v1 i

hu3 , v2 i

v1

v2

hv1 , v1 i

hv2 , v2 i

= [1, 0, 1] 12 [1, 1, 0] 13 [ 12 , 12 , 1] = [ 32 , 32 , 23 ].

The set [1, 1, 0], [ 12 , 12 , 1], [ 32 , 32 , 23 ] is orthogonal.

Example 4.3

Apply Gram-Schmidt orthogonalization process on the vectors u1 = [1, 1, 0, 1],

u2 = [0, 1, 1, 1] and u3 = [1, 3, 2, 1].

70

v1 = [1, 1, 0, 1].

hu2 , v1 i

v1 = [0, 1, 1, 1] 0 [1, 1, 0, 1] = [0, 1, 1, 1].

v2 = u2

hv1 , v1 i

hu3 , v2 i

hu3 , v1 i

v1

v2

v3 = u3

hv1 , v1 i

hv2 , v2 i

h(1, 3, 2, 1), (1, 1, 0, 1)i

= (1, 3, 2, 1)

(1, 1, 0, 1)

h(1, 1, 0, 1), (1, 1, 0, 1)i

h(1, 3, 2, 1), (0, 1, 1, 1)i

(0, 1, 1, 1)

= [1, 3, 2, 1] [1, 1, 0, 1] 2[0, 1, 1, 1] = [0, 0, 0, 0].

Discarding v3 , which is the zero vector, we have only two linearly independent

vectors out of u1 , u2 , u3 . They are u1 and u2 ; and u3 is a linear combination of

these two. In fact, the process also revealed that u3 = u1 2u2 .

An orthogonal set can be made orthonormal by dividing each vector by its norm.

Also you can modify Gram-Schmidt orthogonalization process to directly output

orthonormal vectors.

Let V be a subspace of Fn . An orthogonal subset of V which is also a basis of V is

called an orthogonal basis of V. Similarly, when an orthonormal set is a basis of V,

the set is said to an orthonormal basis of V.

For example, the standard basis {e1 , . . . , en } of Fn1 is an orthonormal basis of

n1

F . Similarly, {et1 , . . . , etn } is an orthonormal basis of F1n .

Gram-Schmidt procedure constructs an orthogonal or an orthonormal basis from

a given basis of any subspace of Fn . It also shows that every subspace of Fn has an

orthogonal (orthonormal) basis.

In an orthonormal basis, the components of the coordinate vector of any vector can

be expressed through the inner product. Revisit Exercise 8(a) of Section 4.1. The

Fourier expansion, there, says that the coordinate vector of any vector x with respect

to an orthonormal basis has the jth component as the inner product of x with the jth

basis vector.

1. Using Gram-Schmidt process, orthonormalize the vectors [1, 1, 1], [1, 0, 1],

and [0, 1, 2].

2. Find u R13 so that [1/ 3, 1/ 3, 1/ 3], [1/ 2, 0, 1/ 2], u are orhtonormal. Form a matrix with the vectors as rows, in that order. Verify that the

columns of the matrix are also orthonormal.

71

Orthogonalization

R13 is orthogonal to both u and v. How to obtain this third vector as u v by

Gram-Schmidt process?

4. If U = span {v1 , . . . , vm } is a subspace of Fn , can we use Gram-Schmidt process to extract a basis for U using the vectors v1 , . . . , vm ?

5. How do we use Gram-Schmidt process to compute the rank of a matrix?

4.3

Best approximation

To find the point in a given plane closest to a point in space, we draw a perpendicular

from the point to the plane. The foot of the perpendicular is the closest point in the

plane.

Let U be a subspace of Fn . Let v Fn . A vector u U is a best approximation

of v iff kv uk kv xk for each x U.

We show that our intuition of taking a perpendicular can be used to compute a best

approximation.

Theorem 4.2

Let U be a subspace of Fn . A vector u U is a best approximation of v Fn iff

v u U. Moreover, a best approximation is unique.

Proof

rem,

Suppose v u U. Let x U. Now, u x U. By Pythagoras Theokv xk2 = k(v u) + (u x)k2 = kv uk2 + ku xk2 kv uk2 .

Conversely, suppose u is a best approximation of v. Then

kv uk kv xk for each x U.

(4.1)

let = hv u, yi/kyk2 . Then hv u, yi = | |2 kyk2 . Thus, h y, v ui = | |2 kyk2

also. From (4.1), we have

kv uk2 kv u yk2 = hv u y, v u yi

= kv uk2 hv u, yi h y, v ui + | |2 kyk2

= kv uk2 | |2 kyk2 .

Hence, | |2 kyk2 = 0. As y 6= 0, | |2 = 0. It follows that hv u, yi = 0.

72

approximations to v. Then kv uk kv wk and kv wk kv uk. So,

kv uk = kv wk.

Now, since v w w u, by Pythagoras theorem,

kv uk2 = k(v w) + (w u)k2 = kv wk2 + kw uk2 = kv uk2 + kw uk2 .

Thus, kw uk2 = 0. That is, w = u.

In the presence of an orthonormal basis, we can have an explicit expression for the

best approximation.

Theorem 4.3

Let {u1 , . . . , un } be an orthonormal basis for a subspace U of Fn . Let v Fn .

The unique best approximation of v from U is u = ni=1 hv, ui iui .

Proof Let u = ni=1 hv, ui iui . Let x U. We have scalars 1 , . . . , n such that

x = nj=1 j u j . Let j {1, . . . , n}. Then

n

n

hv u, u j i = v hv, ui iui , u j = hv, u j i hv, ui ihui , u j i = hv, u j i hv, u j i = 0.

i=1

i=1

v from U.

In case, {u1 , . . . , un } is any basis for U, we may orthonormalize it using GramSchmidt process, and then apply the formula given in Theorem 4.3 for computing

the best approximation. Alternatively, we may write u = nj=1 j u j . Our requirement

v u u j means determining j from hv nj=1 j u j , ui i = 0. Thus, we solve the

linear system

n

hu j , ui i j = hv, ui i.

j=1

Theorem 4.3 guarantees that this linear system has a unique solution. Notice that the

system matrix of this linear system is A = [ai j ], where ai j = hu j , ui i. Such a matrix

which results from a basis by taking the inner products of basis vectors is called a

Gram matrix. Our result shows that a Gram matrix is invertible. Can you prove

directly that a Gram matrix is invertible?

Example 4.4

Find the best approximation of v = (1, 0) R2 from U = {(a, a) : a R}.

We seek ( , ) so that (1, 0) ( , ) ( , ) for all . That is, to find

( , ) so that (1 , ) (1, 1) = 0. So, = 1/2. The best approximation

here is (1/2, 1/2).

73

Orthogonalization

1. Find the best approximation of x V from U where

(a) V = R3 , x = (1, 2, 1), U = span {(3, 1, 2), (1, 0, 1)}.

(b) V = R3 , x = (1, 2, 1), U = {( , , ) R3 : + + = 0}.

(c) V = R4 , x = (1, 0, 1, 1), U = span {(1, 0, 1, 1), (0, 0, 1, 1)}.

4.4

Notice that we have discussed two ways of extracting a basis for the span of a finite

number of vectors from Fn . One is the method of elementary row operations and

the other is Gram-Schmidt orthogonalization. The orthogonalization is a superior

tool though computationally difficult. We will now see one of its applications in

factorizing a matrix.

Let u1 , u2 , . . . , un be the columns of A Fmn , where m n. Suppose that the

columns are linearly independent. Using Gram-Schmidt process and then orthonormalizing we get the vectors v1 , v2 , . . . , vn . Since span {u1 , . . . , uk } = span {v1 , . . . , vk }

for any k = 1, . . . , n, there exist scalars ai j , 1 i j n such that

u1 = a11 v1

u2 = a12 v1 + a22 v2

..

.

un = a1n v1 + a2n v2 + + ann vn

We take ai j = 0 for i > j. Writing R = [ai j ] for i, j = 1, 2, . . . , n, Q = [v1 v2 vn ],

we see that

A = [u1 u2 un ] = QR.

Since columns of Q are orthonormal, Q Fmn , R Fnn , Q Q = I, and R is upper

triangular.

The QR factorization of a matrix A is the determination of a matrix Q with orthonormal columns and an upper triangular matrix R so that A = QR.

Recall that if Q Rmn , we have Qt Q = I. The above discussion boils down to

the following result.

Theorem 4.4

Any matrix A Fmn with linearly independent columns has a QR factorization. Moreover, R is invertible.

74

Example

4.5

1/ 2 0

1 1

Let A = 0 1 . Orthonormalization of the columns of A yields Q = 0 1 .

1/ 2 0

1 1

2

2

. It is easy to check

Since A = QR and Qt Q = I, we have R = Qt A =

0

1

that A = QR.

The QR factorization and best approximation together give an efficient procedure

in approximating a solution of a system of linear equations. In order to discuss this,

we first define the so called least squares approximation of a solution of a linear

system.

Let A Fmn . A vector u Fn1 is a called a least squares solution of the linear

system Ax = b iff kAu bk kAz bk for all z Fm1 .

Notice that a least squares solution of Ax = b is simply the best approximation of

a solution to Ax = b from R(A). Then Theorem 4.2 yields the following result.

Theorem 4.5

Let A Fmn , and let b Fm1 .

1. The linear system Ax = b has a least squares solution.

2. A vector u Fn1 is a least squares solution iff Au b R(A).

3. A least squares solution is unique iff N(A) = {0}.

Recall that the null space N(A) of A is the set of all solutions of the homogeneous

system Ax = 0. Thus the condition N(A) = {0} says that the homogeneous system

Ax = 0 does not have a nonzero solution.

Least squares solutions can be computed by solving a related linear system.

Theorem 4.6

Let A Fmn and let b Fm1 . A vector u Fn1 is a least squares solution of

Ax = b iff A Au = A b.

Proof The columns u1 , . . . , un of A span R(A). Due to Theorem 4.5,

u is a least squares solution of Ax = b

iff hAu b, ui i = 0, for i = 1, . . . , n

iff ui (Au b) = 0 for i = 1, . . . , n

iff A (Au b) = 0

iff A Au = A b.

Orthogonalization

75

satisfies At Au = At b.

Least squares solutions are helpful in those cases where some errors in data lead

to an inconsistent system.

Example

4.6

1 1

0

1

Let A =

, b=

, and u =

. We see that At Au = At b. Hence

0 0

1

1

u is a least squares solution of Ax = b.

Notice that Ax = b has no solution.

We may use QR factorization in computing a least squares solution of a linear

system.

Theorem 4.7

Let A Fmn have linearly independent columns. Let A = QR be the QR factorization of A. Then, the least squares solution of Ax = b is given by u = R1 Q b.

Proof Let u = R1 Q b. Now, A Au = R Q QRR1 Q b = R Q b = A b. That

is, u satisfies the equation A Ax = A b. Thus, u is a least squares solution of

Ax = b.

Moreover, since A has linearly independent columns, rank(A) = n. Then

null (A) = n n = 0. That is, N(A) = {0}. So, the homogeneous system Ax = 0

has no nonzero solution. Therefore, this least square solution u is the unique

least squares solution.

Why is u = R1 Q b not a solution of Ax = b? The reason is, Q has orthonormal

columns, but it need not have orthonormal rows. Consequently, QQ need not be

equal to I. Then Au = QRR1 Q b = QQ b need not be equal to b.

But, if a solution v exists for Ax = b, then Av = b. It implies QRv = b; and then

Rv = Q b. And finally, it yields v = u. That is, a solution of Ax = b must be equal to

the least squares solution.

Notice that u = R1 Q b leads to the linear system Ru = Q b, which is easy to

solve since R is upper triangular.

1. Find a QR-factorization of each of the following matrices:

1 1

2

0 1

1 0 2

0 1 1

1 1

0

0 1

1 2 0

0 0

1

76

2. Let A Rmn and b Rm1 . If columns of A are linearly independent, then

show that there exists a unique x Rn1 such that At Ax = At b.

3. Find the least squares solution of the system Ax = b, where

0

1 1 1

3

1

1

1

1 0 1

2 , b = 0

(a) A = 1

(b) A =

1 1 0 , b = 1 .

2 1

2

2

0 1 1

4. Let A Rnn . Let b Rn1 with b 6= 0. Show that if b is orthogonal to each

column of A, then Ax = b is inconsistent. What are least squares solutions of

Ax = b?

5

Eigenvalues and Eigenvectors

5.1

Eigenvalues

0 1

Let A =

. We view A as a linear transformation; A : R21 R21 . It trans1 0

forms straight lines to straight lines or points. Does there exist a straight line which

is transformed to itself?

x

0 1

x

y

A

=

=

.

y

1 0

y

x

Thus, the line {(x,x) :x R}

never

moves.

So also the line

{(x,

x) : x R}.

x

x

x

x

Observe that A

=1

and A

= (1)

.

x

x

x

x

Let A Fnn . A scalar F is called an eigenvalue of A iff there exists a nonzero vector v Fn1 such that Av = v. Such a vector v is called an eigenvector of

A for (or, associated with, or, corresponding to) the eigenvalue .

Example 5.1

1 1 1

Consider the matrix A = 0 1 1 . It has an eigenvector [0, 0, 1]t associated

0 0 1

with the eigenvalue 1. Is [0, 0, c]t also an eigenvector associated with the same

eigenvalue 1?

In fact, corresponding to an eigenvalue, there are infinitely many eigenvectors.

1. Suppose A Fnn , F, and b Fn1 are such that (A I)x = b has a

unique solution. Show that is not an eigenvalue of A.

2. Formulate a converse of the statement in Exercise 1 and prove it.

3. Let A Fnn . Show that A is invertible iff 0 is not an eigenvalue of A.

77

78

5.2

Characteristic polynomial

for the eigenvalue iff v is a nonzero solution of (A I)x = 0. Further, the linear

system (A I)x = 0 has a nonzero solution iff rank(A I) < n, where A is an

n n matrix. And this happens iff det(A I) = 0. Therefore, we have the following

result.

Theorem 5.1

Let A Cnn . A scalar C is an eigenvalue of A iff det(A I) = 0.

The polynomial det(A tI) is called the characteristic polynomial of the matrix

A. Each eigenvalue of A is a zero of the characteristic polynomial of A. Conversely,

each zero of the characteristic polynomial is said to be a complex eigenvalue of A.

If A is a matrix with real entries, some of the zeros of its characteristic polynomial may turn out to be complex numbers. Considering A as a linear transformation

from Rn1 to Rm1 , the scalars are now only real numbers. Thus each zero of the

characteristic polynomial may not be an eigenvalue; only the real zeros are.

If we regard A as a matrix with complex entries, then A is a linear transformation on Cn1 . Then each complex eigenvalue, that is, a zero of the characteristic

polynomial of A, is an eigenvalue of A.

Since the characteristic polynomial of a matrix A of order n is a polynomial of

degree n in t, it has exactly n, not necessarily distinct, zeros in C. And these are the

eigenvalues (complex eigenvalues) of A. Notice that, here, we are using the fundamental theorem of algebra which says that each polynomial of degree n with complex

coefficients can be factored into exactly n linear factors.

Caution: When is a complex eigenvalue of A Fnn , a corresponding eigenvector

x is a vector in Cn1 , in general.

For computing eigenvalues, a matrix in Rnn is considered as a matrix in Cnn .

Thus, the eigenvalues of A are complex eigenvalues, in general.

Example 5.2

Find the eigenvalues and corresponding eigenvectors of the matrix

1

A= 1

1

0

1

1

0

0 .

1

1t 0

0

1 1 t 0 = (1 t)3 .

1

1 1t

79

we solve A[a, b, c]t = [a, b, c]t or that

a = a, a + b = b, a + b + c = c.

It gives b = c = 0 and a F can be arbitrary. Since an eigenvector is nonzero,

all the eigenvectors are given by (a, 0, 0)t , for a 6= 0.

Example 5.3

For A R22 , given by

A=

0 1

,

1 0

no eigenvalue.

However, i and i are its complex eigenvalues. That is, the same matrix

A C22 has eigenvalues as i and i. The corresponding eigenvectors are

obtained by solving

A[a, b]t = i[a, b]t and A[a, b] = i[a, b]t .

For = i, we have b = ia, a = ib. Thus, [a, ia]t is an eigenvector for a 6= 0.

For the eigenvalue i, the eigenvectors are [a, ia] for a 6= 0.

We consider A as a matrix with complex entries; and it has (complex)

eigenvalues i and i.

The following theorem lists some important facts about eigenvalues.

Theorem 5.2

Let A Fnn . Then the following are true.

1. A and At have the same eigenvalues.

2. Similar matrices have the same eigenvalues.

3. If A is a diagonal or an upper triangular or a lower triangular matrix,

then its diagonal elements are precisely its eigenvalues.

4. The product of all eigenvalues of A is equal to det(A).

5. The sum of all eigenvalues of A is equal to tr(A).

6. (Caley-Hamilton) A satisfies its characteristic polynomial.

Proof

80

(4) Let 1 , . . . , n be the eigenvalues of A, not necessarily distinct. Now,

det(A tI) = ( 1 t) ( n t).

Put t = 0. It gives det(A) = 1 n .

(5) Expand det(A tI) on its first row. The first term is (a11 t)A11 , where

A11 is the minor corresponding to the (1, 1)th entry. All other terms are

polynomials of degree less than or equal to n 2. Continuing in a similar

fashion of expanding the minor A11 we see that

Coeff of t n1 in det(A tI) = Coeff of t n1 in (a11 t) A11

= = Coeff of t n1 in (a11 t) (a22 t) (ann t) = (1)n1 tr(A).

But Coeff of t n1 in det(A tI) = (1)n1 ( 1 + + n ).

(6) Let A Fnn . Its characteristic polynomial is

p(t) = (1)n det(A t I).

We show that p(A) = 0, the zero matrix. Recall that for any square matrix B,

we have B adj(B) = adj(B)B = det(B) I. Taking B = A tI, we have

p(t) I = (1)n det(A tI) I = (1)n (A tI) adj (A tI).

The entries in adj (A tI) are polynomials in t of degree at most n 1. Write

adj (A tI) := B0 + tB1 + + t n1 Bn1 ,

where B0 , . . . , Bn1 Fnn . Then

p(t)I = (1)n (A t I)(B0 + tB1 + t n1 Bn1 ).

Notice that this is an identity in polynomials, where the coefficients of t j are

matrices. Substituting t by any matrix of the same order will satisfy the

equation. In particular, substituting A for t we obtain p(A) = 0.

Cayley-Hamilton theorem helps us in computing powers of matrices and also the

inverse of a matrix if at all it is invertible. For instance, suppose that a matrix A has

the characteristic polynomial

a0 + a1t + + ant n .

By Cayaley-Hamilton theorem, we have

a0 I + a1 A + + an An = 0.

Then An = (a0 I + a1 A + + an1 An1 ). Thereby computation of An , An+1 , . . . can

be reduced to computing A, A2 , . . . , An1 .

81

For computing the inverse, suppose that A is invertible. Then det(A) 6= 0. Since

det(A) is the product of all eigenvalues of A, = 0 is not an eigenvalue of A. It

implies that (t ) is not a factor of the characteristic polynomial of A. Therefore,

the constant term a0 in the characteristic polynomial of A is nonzero. Then we can

rewrite the above equation as

a0 I + A(a1 I + an An1 ) = 0.

Multiplying A1 and simplifying, we obtain

A1 =

1

a1 I + a2 A + + an An1 .

a0

3

0

1. Find eigenvalues and corresponding eigenvectors of

0

0

0

2

0

0

0

0

0

0

.

0 2

2

0

A whose jth row has each entry j.

3. Determine all eigenvalues and their corresponding eigenvectors for the 5 5

matrix whose each row is [1 2 3 4 5].

4. Let A Fnn be a matrix such that the sum of all entries in any row is for

some F. Then show that is an eigenvalue of A.

5. Show that if rank of an n n matrix is 1, then its trace is one of its eigenvalues.

What are its other eigenvalues?

6. Let A Fnn . Let p(t) be a polynomial of degree n with coefficient of t n as

(1)n . If p(A) = 0, then does it follow that p(t) is the characteristic polynomial of A?

7. Let A, B, P Cnn be such that B = P1 AP. Let be an eigenvalue of A. Show

that a vector v is an eigenvector of B corresponding to the eigenvalue iff Pv

is an eigenvector of A corresponding to the same eigenvalue .

8. An n n matrix A is said to be idempotent if A2 = A. Show that the only

possible eigenvalues of an idempotent matrix are 0 or 1.

9. An n n matrix A is said to be nilpotent if Ak = 0 for some natural number k.

Show that 0 is the only eigenvalue of a nilpotent matrix.

10. Show that if each eigenvalue of A Fnn has absolute value less than 1, then

both I A and I + A are invertible.

82

5.3

entries satisfies At = A; and accordingly, such a matrix is called a real symmetric

matrix. In general, A is called a symmetric matrix iff At = A. And A is called skew

hermitian iff A = A; also, a matrix is called skew symmetric iff At = A. In

the following, B is symmetric, C is skew-symmetric, D is hermitian, and E is skewhermitian. B is also hermitian and C is also skew-hermitian.

1 2 3

0 2 3

i 2i 3

0 2+i 3

B = 2 3 4 , C = 2 0 4 , D = 2i 3 4 , E = 2 i i 4i

3 4 5

3 4 0

3 4 5

3 4i 0

Notice that a skew-symmetric matrix must have a zero diagonal, and the diagonal

entries of a skew-hermitian matrix must be 0 or purely imaginary. Reason:

aii = aii 2Re(aii ) = 0.

Let A be a square matrix. Since A + At is symmetric and A At is skew symmetric, every square matrix can be written as a sum of a symmetric matrix and a skew

symmetric matrix:

A = 21 (A + At ) + 12 (A At ).

Similar rewriting is possible with hermitian and skew hermitian matrices:

A = 21 (A + A ) + 21 (A A ).

A square matrix A is called unitary iff A A = I = AA . In addition, if A is real, then

it is called an orthogonal matrix. That is, an orthogonal matrix is a matrix with

real entries satisfying At A = I = AAt . Notice that a square matrix is unitary iff it is

invertible and its inverse is equal to its adjoint. Similarly, a real matrix is orthogonal

iff it is invertible and its inverse is its transpose. In the following, B is a unitary

matrix of order 2, and C is an orthogonal matrix (also unitary) of order 3:

2 1 2

1 1+i 1i

1

B=

, C = 2 2 1 .

2 1i 1+i

3

1 2 2

The following are examples of orthogonal 2 2 matrices.

cos sin

cos

sin

O1 :=

, O2 :=

.

sin

cos

sin cos

If A = [ai j ] is an orthogonal matrix of order 2, then At A = I implies

a211 + a221 = 1 = a212 + a222 , a11 a12 + a21 a22 = 0.

83

Thus, there exist , such that a11 = cos , a21 = sin , a12 = cos , a22 = sin

and cos( ) = 0. It then follows that A is in the form of either O1 or O2 .

Let (a, b) be the vector in the plane that starts at the origin and ends at the point

(a, b). Writing the point (a, b) as a column vector [a b]t , we see that the matrix

product O1 [a b]t is the end-point of the vector obtained by rotating the vector (a, b)

by an angle . Similarly, O2 [a b]t gives a point obtained by reflecting (a, b) along a

straight line that makes an angle /2 with the x-axis. Thus, O1 is said to be a rotation

by an angle and O2 is called a reflection along a line making an angle of /2 with

the x-axis.

If A Fmn , then A A = I is equivalent to asserting that the columns of A are

orthonormal; and AA = I is equivalent to the fact that the rows of A are orthonormal.

Unitary or orthogonal matrices preserve inner product and also the norm. This is the

reason unitary matrices are also called isometries. We prove these facts about unitary

matrices in the following theorem.

Theorem 5.3

Let A Fnn be a unitary or an orthogonal matrix.

1. For each pair of vectors x, y Fn1 , hAx, Ayi = hx, yi.

2. For each x Fn1 , kAxk = kxk.

3. The columns of A are orthonormal.

4. The rows of A are orthonormal.

5. |det(A)| = 1.

Proof

(3) Since A A = I, the ith row of A multiplied with the jth column of A gives

i j . However, this product is simply the inner product of the jth column of A

with the ith column of A.

(4) It follows from (2). Also, considering AA = I, we get this result.

(5) Notice that det(A ) = det(A) = det(A). Thus

det(A A) = det(A )det(A) = det(A)det(A) = |det(A)|2 .

However, det(A A) = det(I) = 1. Therefore, |det(A)| = 1.

It thus follows that the determinant of an orthogonal matrix is either 1 or 1.

We wish to see the nature of eigenvalues and eigenvectors of these special types

of matrices.

84

Theorem 5.4

Let A Fnn . Let be any complex eigenvalue of A.

1. If A is hermitian or real symmetric, then R. Moreover, there exists

a real eigenvector corresponding to .

2. If A is skew-hermitian or skew-symmetric, then is purely imaginary

or zero.

3. If A is unitary or orthogonal, then | | = 1.

Proof Let A Fnn . Let be any complex eigenvalue of A with an eigenvector v Cn1 . Now, Av = v. Pre-multiplying with v , we have v Av = v v C.

(1) If A is hermitian, then A = A . Now,

(v Av) = v A v = v Av and

(v v) = v v.

If v = x + iy Cn1 is an eigenvector corresponding to , with x, y Rn1 ,

then

A(x + iy) = (x + iy).

Comparing the real and imaginary parts, we have

Ax = x,

Ay = y.

a real eigenvector corresponding to the eigenvalue of A.

(2) When A is skew-hermitian, (v Av) = v Av. Then v Av = v v implies that

( v v) = (v v).

= . That is, 2Re( ) = 0. This shows

Since v 6= 0, v v 6= 0. Therefore, =

that is purely imaginary or zero.

. Then

(3) Suppose A A = I. Now, Av = v implies v A = ( v) = v

v v = | |2 v v.

v v = v Iv = v A Av =

Since v v 6= 0, | | = 1.

1. Construct an orthogonal 2 2 matrix whose determinant is 1.

2. Construct an orthogonal 2 2 matrix whose determinant is 1.

3. Construct a 3 3 hermitian matrix with no zero entry whose eigenvalues are

1, 2 and 3.

85

5. Show that if a matrix A is real symmetric and invertible, then so is A1 .

6. Show that if a matrix A is hermitian and invertible, then so is A1 .

7. Show that in the plane,

(a) a rotation following a rotation is a rotation;

(b) a rotation following a reflection is a reflection;

(c) a reflection following a rotation is a reflection; and

(d) a reflection following a reflection is a rotation.

8. Let A Fnn . Show that hAx, Ayi = hx, yi for all x, y Fn1 iff kAxk = kxk for

all x Fn1 .

9. Let A R22 be an orthogonal matrix. Suppose that A has a non-trivial fixed

point; that is, there exists a nonzero vector v R21 such that Av = v. Show

that with respect to any orthonormal basis B of R21 , the matrix [A]B is in the

form

cos

sin

.

sin cos

6

Canonical Forms

6.1

Schur triangularization

Eigenvalues and eigenvectors can be used to bring a matrix to nice forms using similarity transformations. A very general result in this direction is Schurs unitary triangularization. It says that using a suitable similarity transformation, we can represent

a square matrix by an upper triangular matrix. Thus, the diagonal entries of the upper

triangular matrix must be the eigenvalues of the given matrix. This information can

be used to construct the appropriate similarity transformation.

Theorem 6.1

(Schur Triangularization) Let A Cnn . Then there exists a unitary matrix

P Cnn such that P AP is upper triangular. Moreover, if A Rnn has only

real eigenvalues, then P can be chosen to be an orthogonal matrix.

Proof Our proof is by induction on n. If n = 1, then clearly A is an upper

triangular matrix, and we take P = [1], the identity matrix with a single entry

as 1, which is both unitary and orthogonal.

Assume that for all B Cmm , m 1, we have a unitary matrix Q Cmm

such that Q BQ is upper triangular. Let A C(m+1)(m+1) . Let C be an

eigenvalue of A with an associated eigenvector u. Consider C(m+1)1 as an inner

product space with the usual inner product hw, zi = z w. Let v = u/kuk, so that

v is an eigenvector of A of norm 1 associated with the eigenvalue . Extend the

set {v} to obtain an orthonormal ordered basis E = {v, v1 , . . . , vm } for C(m+1)1 .

Here, you may have to use an extension of a basis, and then Gram-Schmidt

orthonormalization process. Now, construct the matrix R C(m+1)(m+1) by

taking these basis vectors as its columns, in that order; that is, let

R = [v v1 vm ].

Since E is an orthonormal set, R is unitary. With respect to the basis E, the

matrix representation of A is R1 AR = R AR. The first column of R AR is

R ARe1 = R Av = R1 v = R1 v = R1 Re1 = e1 ,

87

88

Then R AR can be written in the following block form:

x

R AR =

,

0 C

where 0 Cm1 , C Cmm and x C1m . In fact, x = [v Av1 v Av2 v Avm ];

but that is not important for the purpose of the proof.

Notice that if m = 1, the proof is complete. For m > 1, by induction hypothesis, we have a matrix S Cmm such that SCS is upper triangular. Then

take

1 0

P=R

.

0 S

Since S is unitary, P P = PP = I; that is, P is unitary. Moreover,

1 0

1 0

1 0 x 1 0

P AP =

R AR

=

=

0

0 S

0 S

0 S 0 C 0 S

y

SCS

for some y C1m . Since SCS is upper triangular, the induction proof is

complete for the case A Cnn .

When A Rnn , and all the eigenvalues of A are real, we use the transpose

instead of the adjoint, every where in the above proof. Thus, P can be chosen

to be an orthogonal matrix.

To eradicate possible misunderstanding, we recall that A has only real eigenvalues

means that when we consider this A as a matrix in Cnn , all its complex eigenvalues

turn out to be real numbers. This again means that all zeros of the characteristic

polynomial of A are real.

Further, during the course of the proof of Schurs triangularization, once we obtain

a matrix similar to A in the form

y

,

0 SCS

we look for whether is still an eigenvalue of SCS. If so, we choose this eigenvalue

over others for further reduction. In the next step we obtain a matrix similar to A in

the form

y z

0 x ,

0 0 M

where M is an (n 2) (n 2) matrix. Continuing further this way, we see that a

Schur triangularization of A exists, where on the diagonal of the final upper triangular

matrix equal eigenvalues occur together. Of course, the construction allows to have

an upper triangular form, where eigenvalues can be chosen to occur on the diagonal

on any given order. However, this particular form, where equal eigenvalues occur

together will be helpful later.

89

Canonical Forms

Example 6.1

2

1 0

3 0 for Schur triangularization.

Consider the matrix A = 2

1 1 1

We find that A (t) = (1 t)2 (4 t). All eigenvalues of A are real; thus there

exists an orthogonal matrix P such that Pt AP is upper triangular. To determine such a matrix P, we take one of the eigenvalues, say 1. An associated

eigenvector of norm 1 is v = [0, 0, 1]t . We extend {v} to an orthonormal basis

for C31 . For convenience, we take the (ordered) orthonormal basis as

{[0, 0, 1]t , [1, 0, 0]t , [0, 1, 0]t }.

Taking the basis vectors as columns, we form the matrix R as

0 1 0

R = 0 0 1 .

1 0 0

We then find that

1 1 1

2

1 .

Rt AR = 0

0

2

3

2 1

Now, we try to triangularize the matrix C =

. It has eigenvalues 1

2 3

and 4. The eigenvector of unit norm associated with the eigenvalue 1 is

{[1/ 2,

1/ 2]t ,

[1/ 2, 1/ 2]t }

for C21 . Then we construct the matrix S by taking these basis vectors as its

columns, that is,

1/ 2

1/ 2

S = 1

.

/ 2 1/ 2

1 1

We find that St CS =

, which is an upper triangular matrix. Then

0

4

1/ 2

1/ 2

0 1 0 1

0

0

0

1 0

1/ 2 = 0 1/ 2

1/ 2 .

1/ 2

P=R

= 0 0 1 0

0 S

1 0 0 0 1/ 2 1/ 2

1

0

0

Computing Pt AP, we have

1

Pt AP = 0

0

0 2

1 1 ,

0

4

Since P = P1 , Shur triangularization is informally stated as

90

any square matrix is unitarily similar to an upper triangular matrix.

Further, there is nothing sacred about being upper triangular. For, given a matrix

A Cnn , consider using Schur triangularization of A . There exists a unitary matrix

P such that P A P is upper triangular. Then taking transpose, we have P AP is lower

triangular. That is,

any square matrix is unitarily similar to a lower triangular matrix.

Analogously, a real square matrix having only real eigenvalues is also orthogonally

similar to a lower triangular matrix. We remark that the lower triangular form of a

matrix need not be the transpose or the adjoint of its upper triangular form.

Moreover, neither the unitary matrix P nor the upper triangular matrix P AP in

Schur triangualrization is unique. That is, there can be unitary matrices P and Q

such that both P AP and Q AQ are upper triangular, and Q, P 6= Q, P AP 6= Q AQ.

The non-uniqueness stems from the choices involved in the associated eigenvectors

and in extending this to an orthonormal basis. For instance, in Example 6.1, if you

extend {[0, 0, 1]t } to the ordered orthonormal basis

{[0, 0, 1]t , [0, 1, 0]t , [1, 0, 0]t },

then you end up with

0

P = 0

1

1/ 2

1/ 2

1/ 2

1/ 2 ,

1

Pt AP = 0

0

0 2

1

1 .

0

4

1. Prove that if all eigenvalues of a square matrix are real, then it is orthogonally

similar to a lower triangular matrix.

2. Let B Fnn . Suppose that I + B is invertible. Let C = (I + B)1 (I B). Show

that C is unitary. Further, if B is unitary, then show that C is skew-hermitian.

1

2

1 0

0 2

3. Let A =

, B=

and C =

. Show that both B and

1 2

0 0

0 1

C are Schur triangularizations of A. This would prove that Schur triangularization is not unique.

1 2

0 1

0 2

4. Let A =

, B=

and C =

. Show that B is a Schur trian3 6

0 7

0 7

gularization of A but C is not.

5. Using Schur triangularization, prove Spectral mapping theorem, which states

that if is a complex eigenvalue of A, then p( ) is a complex eigenvalue of

p(A), for any polynomial p(t). Moreover, all complex eigenvalues of p(A) are

of this form.

6. Prove Cayley-Hamilton theorem using Schur triangularization.

91

Canonical Forms

6.2

Diagonalizability

As you see from Schur triangularization, each matrix with complex entries is similar

to an upper triangular matrix. Moreover, a matrix A with real entries is similar to

an upper triangular real matrix provided all zeros of the characteristic polynomial

are real. The upper triangular matrix similar to a given matrix A takes a better form

when A is hermitian.

Theorem 6.2

(Spectral theorem for hermitian matrices) Each hermitian matrix is orthogonally similar to a diagonal matrix.

Proof Let A Cnn be a hermitian matrix. Due to Schur triangularization,

we have a unitary matrix P such that D = P AP is upper triangular. Now,

D = P A P = P AP = D.

Since D is upper triangular and D = D, we see that D is a diagonal matrix.

Observe that the diagonal entries in the diagonal matrix D are the eigenvalues of A. Since A is hermitian, all of them are real numbers, and the associated

eigenvectors can always be chosen to be in Rn1 . Therefore, the unitary matrix

P, which consists of real eigenvectors of A is an orthogonal matrix.

It thus follows that each real symmetric matrix is orthogonally similar to a diagonal matrix. The spectrum of a matrix is the multi-set of its eigenvalues. Theorem 6.2

is called the spectral theorem for hermitian matrices since it explicitly uses the eigenvalues of the matrix.

A matrix A Fnn is called diagonalizable iff there exists an invertible matrix P

Fnn such that P1 AP is a diagonal matrix. When P1 AP is a diagonal matrix, we

say that A is diagonalized by P. In this language, the spectral theorem for hermitian

matrices may be stated as follows:

Every hermitian matrix is orthogonally diagonalizable.

To see how the eigenvalues and eigenvectors are involved in the diagonalization process, we proceed as follows.

Let A Fnn . Let 1 , . . . , n be all complex eigenvalues (not necessarily distinct)

of A. Let v1 , . . . , vn Fn1 be corresponding eigenvectors. Construct n n matrices

P := [v1 v2 vn ],

D = diag ( 1 , 2 , . . . , n ).

col(i) of AP = Avi = i vi = i Pei = P ( i ei ) = PDei = col(i) of PD.

92

That is,

AP = P D..

P1 AP

If P is invertible, then

= D, a diagonal matrix. That is, A is similar to a

diagonal matrix. Moreover, P is invertible iff its columns form a basis for Fn1 . We

thus obtain the following result.

Theorem 6.3

A matrix A Fnn is diagonalizable iff there exists a basis of Fn1 consisting

of eigenvectors of A.

The question is, when are there n linearly independent eigenvectors of A? The

spectral theorem provides a partial answer. We can generalize the spectral theorem

to the so called normal matrices.

A matrix A Cnn is called a normal matrix iff A A = AA . We observe the

following.

Theorem 6.4

Each upper triangular normal matrix is diagonal.

Proof Let U Cnn be an upper triangular matrix. If n = 1, then clearly

U is a diagonal matrix. Lay out the induction hypothesis that each upper

triangular normal matrix of order k is diagonal. Let U be an upper triangular

normal matrix of order k +1. Write U in a partitioned form, as in the following.

V u

U=

,

0 a

where V Ckk , u Ck1 , 0 is the zero row vector in C1k , and a C. Then

V V

V u

VV + uu a u

U U =

=

UU

=

u V u u + |a|2

a u

|a|2

implies that u u + |a|2 = |a|2 . That is, u = 0. Plugging u = 0 in the above

equation, we see that V V = VV . Since V is upper triangular, by the induction

hypothesis, V is a diagonal matrix. Then with u = 0, U is also a diagonal

matrix.

Using this result on upper triangular normal matrices, we can generalize the spectral theorem to normal matrices.

Theorem 6.5

(Spectral theorem for normal matrices) A square matrix is unitarily diagonalizable iff it is a normal matrix.

93

Canonical Forms

matrix D = diag ( 1 , . . . , n ) and a unitary matrix P such that A = PDP . Then

A A = PD DP and AA = PDD P . However, D D is a diagonal matrix with

diagonal entries as | |2 ; so is DD . That is, D D = DD . Therefore, A A = AA .

That is, A is a normal matrix.

Conversely if A is normal, then A A = AA . Due to Schur triangularization,

let Q be a unitary matrix such that Q AQ = U, an upper triangular matrix.

Since Q = Q1 , the condition A A = AA implies that U U = UU . By Theorem 6.4, U is a diagonal matrix.

There can be non-normal matrices which are diagonalizable. For example, with

1 0

0

0 1 0

5 2

A = 4 3 2 , P = 1

2 1

0

1

3 1

we see that A A 6= AA , P P 6= I but

1 1

2

1

0

0 4

P1 AP = 1

2

1 1

2

0

0

0 1

3 2 1

5

1

0

1

3

0

1

2 = 0

1

0

0

1

0

0

0 .

2

Another partial answer is provided by the following theorem.

Theorem 6.6

Eigenvectors associated with distinct eigenvalues of a matrix are linearly independent. In particular, if a matrix of order n has n distinct eigenvalues,

then it is diagonalizable.

Proof Let 1 , . . . , n be the distinct eigenvalues of A, and let v1 , . . . , vn be

corresponding eigenvectors. We use induction on k {1, . . . , n}.

For k = 1, since v1 6= 0, {v1 } is linearly independent. Lay out the induction

hypothesis: for k = m suppose {v1 , . . . , vm } is linearly independent. Now, for

k = m + 1, suppose

1 v1 + 2 v2 + + m vm + m+1 vm+1 = 0.

(6.1)

1 1 v1 + 2 2 v2 + + m m vm + m+1 m+1 vm+1 = 0.

Multiply (6.1) with m+1 and subtract from the last equation to get

1 ( 1 m+1 )v1 + + m ( m m+1 )vm = 0.

94

i 6= m+1 . Thus i = 0 for each such i. Then, (6.1) yields m+1 vm+1 = 0.

Since vm+1 6= 0, m+1 = 0. This completes the proof of linear independence of

eigenvectors associated with distinct eigenvalues.

For the second statement, suppose A Cnn has n distinct eigenvalues

1 , . . . , n . Then the associated eigenvectors v1 , . . . , vn are linearly independent, and thus form a basis for Cn1 . Therefore, A is diagonalizable. More

directly, taking

P = [v1 v2 vn ],

we see that P is invertible and P1 AP = diag ( 1 , . . . , n ).

As you have seen, the procedure for diagonalization of a matrix A Fnn goes

as follows. First, we find its eigenvalues and the associated eigenvectors. Observe

that if F = R, and there exists an eigenvalue with nonzero imaginary part, then A

cannot be diagonalized by a matrix with real entries. Suppose we have n number

of eigenvalues in F counting multiplicities. Then, we check whether n such linearly

independent eigenvectors exist or not. If not, we cannot diagonalize A. If yes, then,

we put the eigenvectors together as columns to form the matrix P; and P1 AP is a

diagonalization of A.

Example 6.2

1 1 1

1 1 is real symmetric. It has eigenvalues

The matrix A = 1

1 1

1

1, 2, 2, with associated eigenvectors (normalized)

1/ 3

1/ 6

1/ 2

1/3 , 1/2 , 1/6 .

1/ 3

2/ 6

0

They form an orthonormal basis for R3 . Taking

1/ 3 1/ 2 1/ 6

P = 1/ 3 1/ 2 1/ 6 ,

1/ 3

2/ 6

0

1 0

we see that P1 = Pt and P1 AP = Pt AP = 0 2

0 0

0

0 .

2

Au = u. This means u N(A I). Now, how many linearly independent solution

vectors u of Au = u are there? Obviously, it is dim (N(A I)). This number has

certain relation with diagonalizability of T. To use this further, we first give this

number a name.

95

Canonical Forms

is dim N(A I); and the algebraic multiplicity of is the number of times is a

zero of the characteristic polynomial of A.

Observe that if is an eigenvalue of A, then its geometric multiplicity is the maximum number of linearly independent vectors associated with .

Theorem 6.7

Let A Fnn . Let be an eigenvalue of A. Then the geometric multiplicity of

is less than or equal to its algebraic multiplicity. Further, A is diagonalizable

iff geometric multiplicity of each eigenvalue is equal to its algebraic multiplicity

iff sum of geometric multiplicities of all eigenvalues is n.

Proof Let the geometric multiplicity of an eigenvalue be k. Then we

have k number of linearly independent eigenvectors of A associated with the

eigenvalue , and no more. Extend the set of these eigenvectors to an ordered

basis B of V. Let P be the matrix whose columns are the vectors in B. Then

Ik C

1

M := P AP =

,

0

D

where Ik is the identity matrix of order k and C Ck(nk) , D C(nk)(nk)

are some matrices. Since A is similar to M, they have the same characteristic

polynomial; and it is of the form

( t)k p(t)

for some polynomial p(t) of degree n k. Clearly, the algebraic multiplicity of

is at least k. This proves the first statement.

For the second statement, suppose A is diagonalizable. Then we have an

ordered basis E of V which consists of eigenvectors of A, with respect to which

the matrix of A is diagonal. If is an eigenvalue of A of algebraic multiplicity

m, then in this diagonal matrix there are exactly m number of entries equal to

. Then in the basis E there are exactly m number of eigenvectors associated

with . Therefore, the geometric multiplicity of is m.

Conversely, suppose that the geometric multiplicity of each eigenvalue is

equal to its algebraic multiplicity. Then corresponding to each eigenvalue ,

we have exactly that many linearly independent eigenvectors as its algebraic

multiplicity. Collecting together the eigenvectors associated with all eigenvalues, we get n linearly independent eigenvectors; which form a basis for Fn1 .

Therefore, A is diagonalizable.

The second iff statement follows since geometric multiplicity of each eigenvalue is at most its algebraic multiplicity.

Example

6.3

1 0

1

Let A =

and B =

0 1

0

1

. The characteristic polynomials of both

1

96

2 for both A and B.

For geometric multiplicities, we solve Ax = x and By = y.

Now, Ax = x gives x = x, which is satisfied by the linearly independent vectors [1, 0]t and [0, 1]t . Thus, N(A I) has dimension 2. Thus, the geometric

multiplicity of the only eigenvalue 1 of A is same as its algebraic multiplicity.

Also, we see that A is diagonalizable; in fact, it is already a diagonal matrix.

Proceeing similarly with the matrix B, we see that the linear system Bx = x

with x = [a, b]t gives a + b = a and b = b. That is, a = 0 and b can be any

complex number. For example, [0, 1]t . Now, dim (N(B I)) = 1. Geometric

multiplicity of the eigenvalue 1 of B is 1, which is not equal to its algebraic

multiplicity. Therefore, B is not diagonalizable.

1. In each of the following cases, determine whether the given

nalizable by a matrix with complex entries:

2 1 0

1 10 0

0 2 0

2 3

3 1

(a)

(b) 1

(c)

0 0 2

6 1

1

0 4

0 0 0

7 5 15

2. Diagonalize A = 6 4 15 . Then compute A6 .

0

0 1

matrix is diago

0

0

0

5

0 1 1

7 2

0

6 2 .

(a) 1 0 1

(b) 2

1 1 0

0 2

5

4. Check whether each of the following matrices is diagonalizable. If diagonalizable, find a basis of eigenvectors for the space C31 :

1 1 1

1 1 1

1 0 1

(a) 1 1 1

(b) 0 1 1

(c) 1 1 0

1 1 1

0 0 1

0 1 1

5. Show that each of the following matrices is diagonalizable with a matrix in

R33 . Also find a basis of eigenvectors for R31 .

3/2 1/2 0

3 1/2 3/2

2 1 0

2 0 .

(a) 1/2 3/2 0

(b) 1 3/2 3/2

(c) 1

1/2 1/2 1

1

5

1

/2

/2

2

2 3

6. Prove that if a normal matrix has only real eigenvalues, then it is hermitian.

Conclude that if a real normal matrix has only real eigenvalues, then it is symmetric.

97

Canonical Forms

6.3

Jordan form

eigenvalue, there may not be sufficient number of linearly independent eigenvectors.

Non-diagonalizability of a matrix A Fnn means that we cannot have a basis consisting of vectors v j for Fn1 so that A(v j ) = j v j for scalars j . In that case, we

would like to have a basis which would bring the matrix of the linear operator to a

nearly diagonal form.

In what follows, we will be using similarity transformations resulting out of elementary matrices. A similarity transformation that uses an elementary matrix E[i, j]

on a matrix A transforms A to (E[i, j])1 A E[i, j]. Since (E[i, j])1 = (E[i, j])t =

E[i, j], the net effect of this transformation is described as follows:

E[i, j]1 A E[i, j] = E[i, j] A E[i, j] exchanges the ith and jth rows, and

then exchanges the jth and ith columns of A.

We will refer to this type of similarity transformations by the name permutation

similarity.

Using the second type of elementary matrices, we have a similarity transformation (E [i])1 A E [i] for 6= 0. Since (E [i])1 = E1/ [i] and (E [i])t = E [i], this

similarity transformation has the following effect:

(E [i])1 A E [i] = E1/ [i] A E [i] multiplies all entries in the ith row

with 1/ and then multiplies all entries in the ith column with ; thus

keeping (i, i)th entry intact.

We will refer to this type of similarity as dilation similarity. In particular, if A is

such a matrix that its ith row has all entries 0 except the (i, i)th entry, and there is

another entry on the ith column which is 6= 0, then (E [i])1 AE [i] is the matrix

in which this changes to 1 and all other entries are as in A.

The third type of similarity transformation applied on A yields (E [i, j])1 AE [i, j].

Notice that (E [i, j])1 = E [i, j] and (E [ j, i])t = E [i, j]. This similarity transformation changes a matrix A as described below:

(E [i, j])1 AE [i, j] = E [i, j] A E [i, j] is obtained from A by replacing the ith row by the ith row minus times the jth row, and replacing

the jth column by the jth column plus times the ith column.

We name this type of similarity as a combination similarity.

Our goal is to prove that there exists an invertible matrix P such that P1 AP is in

the form

P1 AP = diag (J1 , J2 , . . . , Jk ),

where each Ji is a block diagonal matrix of the form

Ji = diag (J1 ( i ), J2 ( i ), . . . , Jsi ( i )),

98

for some si ; each matrix Jj ( i ) here has the form

i 1

i 1

.. ..

Jj ( i ) =

. .

1

i

The missing entries are all 0. Such a matrix Jj ( i ) is called a Jordan block with

diagonal entries i . Any matrix which is in the above block diagonal form is said

to be in Jordan form. We will see that the number of Jordan blocks with diagonal

entries i is the geometric multiplicity of the eigenvalue i .

Example 6.4

The following matrix is one in Jordan form:

1

1

2

1

2 1

2

It has three Jordan blocks for the eigenvalue 1 of which two are of size 11 and

one of size 2 2; and it has one block of size 3 3 for the eigenvalue 2. Notice

that the eigenvalue 1 has geometric multiplicity 3, algebraic multiplicity 4,

and 2 has geometric multiplicity 1 and algebraic multiplicity 3.

Theorem 6.8

(Jordan form) Each matrix A Cnn is similar to a matrix in Jordan form J,

where the diagonal entries are the eigenvalues of A. Moreover, if mk ( ) is the

number of Jordan blocks of order k with diagonal entry , then

mk ( ) = rank((A I)k1 )2 rank((A I)k )+rank((A I))k+1 for k = 1, . . . , n.

In particular, the Jordan form of A is unique up to a permutation of the blocks.

In the formula for mk we use the convention that for any matrix B of order n, B0 is

the identity matrix of order n.

Proof First, we will show the existence of a Jordan form, and then we will

come back to the formula mk , which will show the uniqueness of a Jordan form

up to a permutation of Jordan blocks.

99

Canonical Forms

matrix, where the eigenvalues of A occur on the diagonal, and equal eigenvalues occur together. If 1 , . . . , k are the distinct eigenvalues of A, then our

assumption means that A is an upper triangular matrix with diagonal entries

read from top left to bottom right appear as

1, . . . , 1; 2, . . . , 2; . . . ; k, . . . , k.

In this list suppose i occurs ni times. First, we show that by way of a

similarity transformation, A can be brought to the form

diag (A1 , A2 , . . . , Ak ),

where each Ai is an upper triangular matrix of size ni ni and each diagonal

entry of Ai is i . Our requirement is shown schematically as follows, where each

such element marked x that is not inside the blocks Ai need to be zeroed-out

by a similarity transformation.

x

0

A1

A1

A2

A2

..

..

.

.

Ak

Ak

If such an x occurs as the (r, s)th entry in A, then r < s. Moreover, the corresponding diagonal entries arr and ass are eigenvalues of A occurring in different

blocks Ai and A j . Thus arr 6= ass . Further, all entries below the diagonals of Ai

and of A j are 0. We use a combination similarity to obtain

E [r, s] A E [r, s] with

x

.

arr ass

This similarity transformation replaces the rth row with the rth row minus

times the sth row, and then replaces the sth column with sth column plus

times the rth column. Since r < s, it changes the entries of A in the rth

row to the right of the sth column, and the entries in the sth column above

the rth row. Thus the upper triangular nature of the matrix does not change.

Further, it replaces the (r, s)th entry x with

ars + (arr ass ) = x +

x

(arr ass ) = 0.

arr ass

We use a sequence of such similarity transformations starting from the last row

of Ak1 with least column index and ending in the first row with largest column

index. Observe that an entry beyond the blocks, which was 0 previously can

become nonzero after a single such similarity transformation. Such an entry

will eventually be zeroed-out. Finally, each position which is not inside any

100

up with a matrix

diag (A1 , A2 , . . . , Ak ).

In the second stage, we focus on bringing each block Ai to the Jordan form.

For notational convenience, write i as a. If ni = 1, then such an Ai is already

in Jordan form. We use induction on the order ni of A. Lay out the induction

hypothesis that each such matrix of order m 1 has a Jordan form. Suppose

Ai has order m. Look at Ai in the following partitioned form:

B u

,

Ai =

0 a

where B is the first (m 1) (m 1) block, 0 is the zero row vector in C1(m1) ,

and u is a column vector in C(m1)1 . Due to the induction hypothesis, there

exists an invertible matrix Q such that Q1 BQ is in Jordan form; it looks like

1

Q BQ =

B1

B2

..

.

B`

a 1

1

a

..

where each B j =

.

..

1

a

Then

Q

C :=

0

1

0

Q

Ai

1

0

1

0

Q BQ

=

1

0

a

..

1

Q u

.

=

a

..

b1

b2

.

bm2

a bm1

a

Here, the sequence of s on the super-diagonal read from top left to right

bottom comprise a block of 1s followed by a 0 and then a block of 1s followed

by a 0, and so on. The number of 1s depend on the sizes of B1 , B2 , etc. That

is, when B1 is complete and B1 starts, we have a 0. Also, we have shown Q1 u

as [b1 . . . bm1 ]t . Our goal is to zero-out all b j s to 0 except bm1 which may

be made a 0 or 1.

In the next sub-stage, call it the third stage, we apply similarity transformations to zero-out (all or except one of) the entries b1 , . . . , bm2 . In any row

of C the entry above the diagonal (the there) is either 0 or 1. The is a 0

at the last row of each block B j . We leave all such bs right now; they are to

be tackled separately. So, suppose in the rth row, br 6= 0 and the (r, r + 1)th

entry (the above the diagonal entry) is a 1. We wish to zero-out each such

Canonical Forms

101

similarity to transform C to

Ebr [r + 1, m]C (Ebr [r + 1, m])1 = Ebr [r + 1, m]C Ebr [r + 1, m].

Observe that this matrix is obtained from C by replacing the (r + 1)th row

with (r + 1)th row plus br times the last row, and then replacing the last

column with the last column minus br times the (r + 1)th column. Its net

result is replacing the (r, m)th entry by 0, and keeping all other entries intact.

Continuing this process of applying a suitable combination similarity transformation, each nonzero bi with a corresponding 1 on the super-diagonal on

the same row is reduced to 0. We then obtain a matrix, where all entries in

the last column of C have been zeroed-out, without touching the entries at

the last row of any of the blocks B j . Write such entries as c1 , . . . , c` . Thus at

the end of third stage, Ai has been brought to the following from by similarity

transformations:

B1

c

1

B2

c2

..

F :=

.

.

B`

c`

a

Notice that if B j is a 1 1 block, then the corresponding entry on the last

column is already 0. In the next sub-stage, call it the fourth stage, we keep

the nonzero c corresponding to the last block (the c entry with highest column index) and zero-out all other cs. Let Bq be the last block so that its

corresponding c entry is cq 6= 0 in the sth row. (It may not be c` ; in that

case, all of cq+1 , . . . , c` are already 0.) We first make cq a 1 by using a dilation

similarity:

G := E1/cq [s] FEcq [s].

In G, the earlier cq at (s, m)th position is now 1. Let B p be any block other

than Bq with c p 6= 0 in the rth row. Our goal in this sub-stage, call it the fifth

stage, is to zero-out c p . We use two combination similarity transformations as

shown below:

H := Ec p [r 1, s 1] Ec p [r, s] H Ec p [r, s] Ec p [r 1, s 1].

This similarity transformation brings c p to 0 and keeps other entries intact.

We do this for each such c p . Thus in the mth column of H, we have only one

nonzero entry 1 at (s, m)th position. If this happens to be at the last row, then

we have obtained a Jordan form. Otherwise, call this sub-stage as the seventh

stage, we move this 1 to the (s, s + 1)th position by the following sequence of

permutation similarities:

K := E[m 1, m] E[s + 2, m]E[s + 1, m] H E[s + 1, m]E[s + 2, m] E[m 1, m].

102

This transformation exchanges the rows and columns beyond the sth so that

the 1 in (s, m)th position moves to (s, s + 1)th position making up a block; and

other entries remain as they were earlier.

Here ends the proof by induction that each block Ai can be brought to a

Jordan form by similarity transformations. From a similarity transformation

for Ai a similarity transformation can be constructed for the block diagonal

matrix

A := diag (A1 , A2 , . . . , Ak )

by putting identity matrices of suitable order and the similarity transformation for Ai in a block form. As these transformations do not affect any other

a sequence of such transformations brings A to its

rows and columns of A,

Jordan form, proving the existence part in the theorem.

For the formula for mk , let be an eigenvalue of A. Suppose k {1, . . . , n}.

Observe that A I is similar to J I. Thus, rank((A I)i ) = rank((J I)i )

for each i. Therefore, it is enough to prove the formula for J instead of A. We

use induction on n. In the basis case, J = [ ]. Here, k = 1 and mk = m1 = 1. On

the right hand side, due to the convention,

(J I)k1 = I = [1], (J I)k = [0]1 = [0], (J I)k+1 = [0]2 = [0].

So, the formula holds for n = 1.

Lay out the induction hypothesis that for all matrices in Jordan form of

order less than n, the formula holds. Let J be a matrix of order n, which is in

Jordan form. We consider two cases.

Case 1: Let J have a single Jordan block corresponding to . That is,

0 1

1

0 1

.

.

.

.

.. ..

.. ..

J=

.

, J I =

1

1

0

on the super-super-diagonal, and 0 elsewhere. Proceeding similarly for higher

powers of J I, we see that their ranks are given by

rank(J I) = n 1, rank((J I)2 ) = n 2, . . . , rank((J I)i = n i,

rank((J I)n ) = 0, rank((J I)n+1 = 0, . . . .

Then for k < n,

= (n (k 1)) 2(n k) + (n k 1) = 0.

And for k = n,

= (n (n 1)) 2 0 + 0 = 1 = mn .

103

Canonical Forms

Case 2: Suppose J has more than one Jordan block corresponding to . Suppose that the first Jordan block in J corresponds to and has order r < n.

Then J I can be written in block form as

C 0

J I =

,

0 D

where C is the Jordan block of order r with diagonal entries as 0, and D is

the matrix of order n r in Jordan form consisting of other blocks of J I.

Then, for any j,

j

C

0

j

(J I) =

.

0

Dj

Therefore,

rank(J I) j = rank(C j ) + rank(D j ).

Write mk (C) and mk (D) for the number of Jordan blocks of order k for the

eigenvalue that appear in C and in D, respectively. Then

mk = mk (C) + mk (D).

By the induction hypothesis,

mk (C) = rank(Ck1 ) 2 rank(Ck ) + rank(C)k+1 ,

mk (D) = rank(Dk1 ) 2 rank(Dk ) + rank(D)k+1 .

It then follows that

mk = rank((J I)k1 ) 2 rank((J I)k ) + rank((J I))k+1 .

Since the number of Jordan blocks of order k corresponding to each eigenvalue of A is uniquely determined, the Jordan form of A is also uniquely

determined up to a permutation of blocks.

To obtain a Jordan form of a given matrix, we may use the construction of similarity transformations as used in the proof of Theorem 6.8, or we may use the formula

for mk as given there. We illustrate these methods in the following examples.

Example 6.5

2 1 0 0 0 1 0

2 0 0 0 3 0

2

2 1 0 0

2 0 2 0

2 0 0

Consider the upper triangular matrix A =

2 0

2 0

1

0

0

0

0

0

1

3

0

0

0

0

1

1

3

104

Following the proof of Theorem 6.8, we first zero-out the circled entries,

starting from the entry on the third row. Here, the row index is r = 3, the

column index is s = 7, the eigenvalues are arr = 2, ass = 3, and the entry to

be zeroed-out is x = 2. Thus, = 2/(2 3) = 2. We use the combination

similarity:

M1 = E2 [3, 7] A E2 [3, 7].

That is, in A, we replace row(3) with row(3) 2 row(7) and then replace

col(7) with col(7) + 2 col(3) to obtain

2

2 1 0 0 0 1 0

0

2 0 0 0 3 0 0

1

2 1 0 0 0 2 0

2 0 2 0 0 0

2 0 0 0 0

M1 =

.

2

0

0

0

3 1 1

3 1

3

Notice that the similarity transformation made a previously 0 entry nonzero.

It brought in a new nonzero entry such as 2 in (3, 8) position. We will

zero it out before proceeding to the originally encircled ones. The suitable

combination similarity is

M2 = E2 [3, 8] M1 E2 [3, 8],

which replaces row(3) with row(3) + 2 row(8) and then replaces col(8) with

col(8) 2 col(3). Verify that it zeroes-out the entry 2 but introduces 2 at

(3, 9) position. Once more, we use a combination similarity. This time we use

M3 = E2 [3, 9] M2 E2 [3, 9]

replacing row(3) with row(3)2row(9) and then replacing col(9) with col(9)+

2 col(3). Now,

2 0

2 1 0 0 0 1 0

2 0 0 0 3 0 0

1

2

1

0

0

0

0

0

2 0 2 0 0 0

2 0 0 0 0

M3 =

2

0

0

0

3 1 1

3 1

3

Similar to the above, we use the combination similarities to reduce M3 to M4 ,

where

M4 = E1 [2, 9] M3 E1 [2, 9].

105

Canonical Forms

To zero-out the encircled 2, we use the combination similarity

M5 = E2 [1, 8] M4 E2 [1, 8].

we use a suitable combination similarity to obtain

1

2 1 0 0 0

2 0 0 0 3

2

1

0

0

2

0

2

.

2

0

M6 = E2 [1, 9] M5 E2 [1, 9] =

3

1

1

3 1

3

Now, the matrix M6 is in block diagonal form. We focus on each of the

blocks, though we will be working with the whole matrix. We consider the

block corresponding to the eigenvalue 2 first. Since this step is inductive we

scan this block from the top left corner. The 2 2 principal sub-matrix of

this block is already in Jordan form. The 3 3 principal sub-matrix is also

in Jordan form. We see that the principal sub-matrix of size 4 4 and 5 5

are also in Jordan form, but the 6 6 sub-matrix, which is the block itself is

not in Jordan form. We wish to bring the sixth column to its proper shape.

Recall that our strategy is to zero out all those entries on the sixth column

which are opposite to a 1 on the super-diagonal of this block. There is only

one such entry, which is encircled in M6 above.

The row index of this entry is r = 1, its column index is m = 6, and the

entry itself is br = 1. We use the combination similarity

2 1 0 0 0 0

2 0 0 0 5

2 1 0 0

2 0 2

.

2 0

M7 = E1 [2, 6] M6 E1 [2, 6] =

3 1 1

3 1

3

Next, among the nonzero entries 5 and 2 at the positions (2, 6) and (4, 6), we

wish to zero-out the 5 and keep 2 as the row index of 2 is higher. First, we

use a dilation similarity to make this entry 1 as in the following:

M8 = E1/2 [4] M7 E2 [4].

106

It replaces row(4) with 1/2 times itself, and then replaces col(4) with 2 times

itself, thus making (4, 6)th entry 1 and keeping all other entries intact. Next,

we zero-out the 5 on (2, 4) position by using the two combination similarities.

Here, c p = 5, r = 2, s = 4; thus

2 1 0 0 0 0

2 0 0 0 0

2 1 0 0

1

2 0

2 0

M9 = E5 [1, 3] E5 [2, 4] M8 E5 [2, 4] E5 [1, 3] =

3 1 1

3 1

3

Notice that M9 has been obtained from M8 by replacing row(2) with row(2)

5 row(4), col(4) with col(4) + 5 col(2), row(1) with row(1) 5 row(3), and

then col(3) with col(3) + 5 col(1).

Next, we move this encircled 1 to (4, 5) position by similarity. Here, s =

4, m = 6. Thus the sequence of permutation similarities boils down to only one,

i.e., exchanging row(5) with row(6) and then exchanging col(6) with col(5).

Observe that we would have to use more number of permutation similarities

if the difference between m and s is more than 2. We thus obtain

2 1 0 0 0 0

2 0 0 0 0

2

1

0

0

2

1

0

.

2

0

M10 = E[5, 6] M9 E[5, 6] =

3

1

1

3 1

3

We focus on the other block corresponding to 3. Here, (7, 9)th entry which

contains a 1 is to be zeroed out. this entry is opposite to a 1 on the superdiagonal. Thus we use a combination similarity. Here, the row index is

r = 7, the column index m = 9, and the entry is br = 1. Thus the similarity

107

Canonical Forms

transformation is

2 1 0 0 0 0

2 0 0 0 0

2 1 0 0

2 1 0

2 0

M11 = E1 [8, 9] M10 E1 [8, 9] =

3 1 0

3 1

3

Now, M11 is in Jordan form.

Example 6.6

We consider the same matrix A of Example 6.5. Here, we compute the number

mk of Jordan blocks corresponding to each eigenvalue which is of size k. For

this purpose, we require the ranks of the matrices (A I)k for successive k

and for each eigenvalue of A. We see that A has two eigenvalues 2 and 3. You

may compute the successive powers of (A 2I) and of (A 3I) and their ranks

using packages such as Matalb or Scilab. We find that for the eigenvalue 2,

rank(A 2I)0 = rank(I) = 9, rank(A 2I) = 6, rank(A 2I)2 = 4,

rank(A 2I)3+k = 3

for k = 0, 1, 2, . . . .

rank(A 3I)0 = rank(I) = 9, rank(A 3I) = 8, rank(A 3I)2 = 7,

rank(A 3I)3+k = 6

for k = 0, 1, 2, . . . .

m1 (2) = 9 2 6 + 4 = 1, m2 (2) = 6 2 4 + 3 = 1,

m3 (2) = 4 2 3 + 3 = 1, m3+k (2)3 2 3 + 3 = 0.

m1 (3) = 9 2 8 + 7 = 0, m2 (3) = 8 2 7 + 6 = 0,

m3 (3) = 7 2 6 + 6 = 1, m3+k (3) = 6 2 6 + 6 = 0.

Therefore, in the Jordan form of A, there is one Jordan block of size 1, one of

size 2 and one of size 3 with eigenvalue 2, and one block of size 3 with eigenvalue 3. From this information we see that the Jordan form of A is uniquely

determined up to any rearrangement of the blocks. Check that M11 as obtained in Example 6.5 is one such Jordan form of A.

108

Suppose that a matrix A Cnn has a Jordan form J = P1 AP, in which the first

Jordan block is of size k with diagonal entries as . Suppose P = [v1 vn ]. Then

AP = PJ implies that

A(v1 ) = v1 , A(v2 ) = v1 + v2 , . . . , A(vk ) = vk1 + vk .

If the next Jordan block in J has diagonal entries as (which may or may not be

equal to ) then we have Avk+1 = vk+1 , Avk+2 = vk+1 + vk+2 , . . . , and so on.

The list of vectors v1 , . . . , vk above is called a Jordan string that starts with v1 and

ends with vk . The number k is called the length of the Jordan string. In such a Jordan

string, we see that

v1 N(A I), v2 N(A I)2 , . . . , vk N(A I)k .

Any vector in N((A I) j ), for some j, is called a generalized eigenvector corresponding to the eigenvalue of A.

The columns of P are all generalized eigenvectors of A corresponding to some

eigenvalue of A. Moreover, the columns of P can be constructed this way, looking at

the subspaces N(A I) j . One may start with linearly independent vectors satisfying

(A I)v = 0. Corresponding to each solution v1 of this linear system, one determines linearly independent vectors satisfying (A I)v = v1 . Next, corresponding

to each solution v2 of this linear system, one solves (A I)v = v2 and so on. The

process stops when n number of linearly independent vectors have been obtained this

way. These vectors form the matrix P.

In the first stage, if the geometric multiplicity of the eigenvalue is , then there

are number of linearly independent eigenvectors associated with . These are possible candidates for v1 . Thus there are number of Jordan strings associated with .

These strings give rise to the Jordan blocks with diagonal entries as . Thus, in J

there are exactly number of Jordan blocks with diagonal entries as .

You can prove this fact from J directly by first showing that the geometric multiplicity of the eigenvalue of A is the same as the geometric multiplicity of the

eigenvalue of J.

The uniqueness of a Jordan form can be made exact by first ordering the eigenvalues of A and then arranging the blocks corresponding to each eigenvalue (which now

appear together on the diagonal) in some order, say in ascending order of their size.

In doing so, the Jordan form of any matrix becomes unique. Such a Jordan form is

called the Jordan canonical form of a matrix. It then follows that if two matrices

are similar, then they have the same Jordan canonical form. Moreover, uniqueness

also implies that two dissimilar matrices will have different Jordan canonical forms.

Therefore, Jordan form characterizes similarity of matrices.

As an application of Jordan form, we will show that each matrix is similar to its

transpose. Suppose J = P1 AP. Now, Jt = Pt At (P1 )t = Pt At (Pt )1 . That is, At is

similar to Jt . Thus it is enough to show that Jt is similar to J. First, let us see it for a

109

Canonical Forms

single Jordan block. So, let

J =

1

.. ..

. .

Q=

1

.

1

..

where the entries on the anti-diagonal are all 1 and all other entries are 0. We see that

Q2 = I. Thus Q1 = Q. Further,

Q1 J Q = Q J Q = (J )t .

Therefore, each Jordan block is similar to its transpose. Now, construct a matrix R

by putting such a matrix as its blocks matching the orders of each Jordan block in J.

Then it follows that R1 J R = Jt .

It also follows from the Jordan form that one can always choose m linearly independent generalized eigenvectors corresponding to the eigenvalue , where m is the

algebraic multiplicity of . Further, it is guaranteed that

if the linear system (A I)k x = 0 has r < m number of linearly independent solutions, then (A I)k+1 has at least r + 1 number of linearly

independent solutions.

This result is often more useful in computing the exponential of a matrix rather than

using explicitly the Jordan form, which is comparatively difficult to compute.

1. Determine the Jordan forms of the following matrices:

0 0 0

2 1 3

3

3

(a) 1 0 0 (b) 4

2 1 0

2

1 1

2. Let A be a 7 7 matrix with characteristic polynomial (t 2)4 (3 t)3 . In the

Jordan form of A, the largest block for each of the eigenvalues is 2. Show that

there are only two possible Jordan forms for A; and determine those Jordan

forms.

110

3. Let A be a 5 5 matrix whose first two rows are [0, 1, 1, 0, 1] and [0, 0, 1, 1, 1];

all other rows are zero rows. What is the Jordan form of A?

4. Determine the matrix P C33 such that P1 AP is in Jordan form, where A is

the matrix in Exercise 1(b).

5. Let A Cnn have an eigenvalue . Suppose the number mk for this eigenvalue are known for each k N. Show that for each j, both rank(A I) j and

null (A I) j are uniquely determined.

6. Let A, B Cnn . Show that A and B are similar iff they have the same eigenvalues and for each eigenvalue , for each k N, rank(A I)k = rank(B I)k .

7. Let A, B Cnn . Show that A and B are similar iff they have the same eigenvalues and for each eigenvalue , for each k N, null (A I)k = null (B I)k .

8. Let be an eigenvalue of a matrix A Cnn having algebraic multiplicity m.

Then prove that null ((A I)m ) = m.

9. Let J be a Jordan form of a matrix A Cnn . Let be an eigenvalue of A.

Show that the geometric multiplicity of as an eigenvalue of A is the same as

the geometric multiplicity of as an eigenvalue of J.

10. Let J be a matrix in Jordan form. Let be an eigenvalue of J. Show that the

geometric multiplicity of is equal to the number of Jordan blocks in J having

as the diagonal entries.

11. Conclude from previous two exercises that if is an eigenvalue of a matrix A

and J is the Jordan form of A, then the number of Jordan blocks with diagonal

entry in J is the geometric multiplicity of .

6.4

Given an m n matrix A with complex entries there are two hermitian matrices that

can be constructed naturally from it, namely, A A and AA . We wish to study the

eigenvalues and eigenvectors of these matrices and their relations to certain parameters associated with A. We will see that these concerns yield a factorization of A.

The hermitian matrix A A Cnn has only real eigenvalues. If R is such an

eigenvalue with an associated eigenvector v Cn1 , then A Av = v implies that

kvk2 = v v = v ( v) = v A Av = (Av) (Av) = kAvk2 .

Since kvk > 0, we see that 0. The eigenvalues of A A can thus be arranged in a

decreasing list

1 2 r r+1 = = n = 0

111

Canonical Forms

for some r, 0 r n. Notice that all of 1 , . . . , r are positive and the rest are all

equal to 0. Conventionally, each i is written as s2i for si R. Notice that in this

notation, an si may be positive, negative or zero. We first give a name to the square

roots of eigenvalues of A A and then relate the number r of positive eigenvalues

of A A with the rank of the matrix A. Of course, we could have started with the

eigenvalues of AA instead of A A.

Let A Cmn . Let s21 s2n be the n eigenvalues of A A. The non-negative

real numbers s1 , . . . , sn are called the singular values of A.

Theorem 6.9

Let A Cmn . Then rank(A) = rank(A A) = rank(AA ) = the number of positive

singular values of A.

Proof As linear transformations, A : Cn1 Cm1 , A : Cm1 Cn1 , and

A A : R(A) Cm1 . By the rank nullity theorem,

rank(A A) = dim (R(A A)) dim (R(A)) = rank(A).

For the other inequality, let v N(A A). That is, A Av = 0. Then v A Av = 0

implies that kAvk2 = 0 giving Av = 0. That is, N(A A) N(A). It implies that

null (A A) null (A). Notice that A A Cnn . Thus

rank(A) = n null (A) n null (A A) = rank(A A).

Combining both the inequalities, we obtain rank(A A) = rank(A).

Now, consider A instead of A. What we just proved implies that

rank(AA ) = rank((A ) A ) = rank(A ).

But rank(At ) = rank(A). Also, rank(A) = rank(A). Thus, rank(A ) = rank(A).

Therefore,

rank(AA ) = rank(A).

Notice that A A is hermitian. So, it is similar to the diagonal matrix

D = diag (s21 , . . . , s2r , s2r+1 , . . . , s2n )

with s21 s2r s2r+1 s2n . Since rank(A A) = rank(D) = r, we see that

s1 sr > 0 and sr+1 = = sn = 0. That is, A has exactly r number of

positive singular values.

Not only the number of positive eigenvalues of A A and of AA is same, but also

something more can be said about these eigenvalues.

Suppose > 0 is an eigenvalue of A A with an associated eigenvector v. Then

is also an eigenvalue of AA with an associated eigenvector Av.

112

eigenvalue of A A. That is, a positive real number is an eigenvalue of A A iff it is an

eigenvalue of AA .

It also follows that A and A have the same r number of positive singular values,

where r = rank(A) = rank(A ). Further, A has n r number of zero singular values,

whereas A has m r number of zero singular values. In addition, if A Cnn is

hermitian and has eigenvalues 1 , . . . , n , then its singular values are | 1 |, . . . , | n |.

During the course of the proof of Theorem 6.9, we have shown that there exists a

unitary matrix P such that

P (A A)P = diag (s21 , . . . , s2r , 0, . . . , 0),

where s1 sr are the positive singular values of A. An analogous factorization

for A itself also holds.

Theorem 6.10

(SVD) Let A Cmn be of rank r. Let s1 . . . sr be the positive singular

values of A. Let S := diag (s1 , . . . , sr ) Crr . Then there exist unitary matrices

P Cmm and Q Cnn such that

S 0

A = P Q , :=

Cmn .

0 0

Further, the columns of P are the eigenvectors of AA , that form an orthonormal basis of Cm1 , and the columns of Q are the eigenvectors of A A that form

an orthonormal basis of Cn1 .

Proof The matrix A has singular values s1 sr > 0, , 0. Thus, the

eigenvalues of AA and of A A are s21 s2r > 0, , 0. In case of AA , there

are m r number of zeros and in case of A A, there are n r number of zeros.

The matrix A A is hermitian.There exists an orthonormal basis {v1 , . . . , vn }

for Cn1 such that

A A vi = s2i vi for i = 1, . . . , r ;

A A v j = 0 for j = r + 1, . . . , n.

ui =

1

Avi .

si

Then,

1

1

AA Avi = s2i Avi = s2i ui .

si

si

That is, u1 , . . . , ur are the eigenvectors of AA associated with the eigenvalues

s21 , . . . , s2r , respectively. Further, since {v1 , . . . , vr } is an orthonormal set, for

i, j = 1, . . . , r, we have

AA ui =

uj ui =

1

1

1 2

si

(Av j ) (Avi ) =

v j (A Avi ) =

v j si vi = vj vi .

si s j

si s j

si s j

sj

113

Canonical Forms

orthonormal set of eigenvectors of AA , in Cm1 .

Also, the above equation shows that, for i, j = 1, . . . , r,

ui Avi = si

basis {u1 , . . . , um } for Cm1 . Clearly, the above equations hold. Moreover,

A Avk = 0 for each k = r + 1, . . . , n. Now, kAvk k2 = vk A Avk = 0. Hence, Avk = 0

for k = r + 1, . . . , n. It follows that for j = r + 1, . . . , m and k = r + 1, . . . , n, we

have uj Avk = 0. In summary, for i = 1, . . . , n, and j = 1, . . . , m, we obtain

si

ui Av j = 0

if 1 i = j r

if 1 i 6= j r

otherwise.

u1

diag (s1 , . . . , sr ) 0

..

v

v

A

=

.

.

1

n

0

0

um

Next, take P as the matrix whose ith column is ui , and Q as the matrix whose

jth column is v j . We obtain

S 0

P AQ =

, S := diag (s1 , . . . , sr ).

0 0

Since P Cmm has orthonormal columns, it is unitary. Similarly, Q is unitary.

That is, P P = PP = I and Q Q = QQ = I. Multiplying P on the left and Q

on the right, we obtain the required factorization of A.

In the singular value decomposition of a matrix A, the columns ui in P are the

eigenvectors of AA ; and these are called the left singular vectors of A. Analogously,

the columns v j in Q are the eigenvectors of A A, and are called the right singular

vectors of A.

Observe that the columns r + 1 onwards in the matrices P and Q in the product

PAQ produce the zero blocks. Thus taking

P = u1 ur , Q = v1 vr ,

we see that a simplified decomposition of A can also be given. It is as follows:

Q .

A = PS

Such a decomposition is called the tight SVD of the matrix A.

114

In the tight SVD, A Cmn , P Cmr , S Crr and Q Crn are matrices each

and C = SQ to obtain

of rank r. Write B = PS

A = B Q = PC,

where B Cmr and C Crn have rank r. It shows that each m n matrix of rank

r can be written as a product of one m r matrix of rank r and a matrix of size

r n, which is also of rank r. Recall that this factorization is named as the full rank

factorization of a matrix.

Example 6.7

2 1

1 ,

To obtain SVD, tight SVD, and a full rank factorization of A = 2

4 2

we consider

24 12

A A=

.

12

6

rank(A) = 1 as the first column of A is 2 times the second column. Solving

the equations A A[a, b]t = 30[a, b]t , that is,

24a 12b = 30a, 12a + 6b = 30b,

we obtain a solution as a = 2, b = 1. So, a unit eigenvector of A A corresponding to the eigenvalue 30 is

2

1

v1 = 5

.

1

For the eigenvalue 2 = 0, the equations are

24a 12b = 0, 12a + 6b = 0.

Thus a unit eigenvector orthogonal to the earlier is

1

v2 = 15

.

2

Then,

2 1 2

/ 5

1 1 =

u1 = 130 Av1 = 130 2

/ 5

4 2

1

1 1 .

6

2

1

1

1

u1 = 16 1 , u2 := 12 1 , u3 = 13 1 .

2

0

1

115

Canonical Forms

obtain the SVD of A as

1/ 2

1/ 3

1/ 6

2 1

30 0 2 1

/ 5 / 5

2

1 = 1/ 6 1/ 2 1/ 3 0

.

0 1

/ 5 2/ 5

1/ 3

2/ 6

0

4 2

0

0

For the tight SVD, P has the r columns as the the first r columns of P, Q has

the the r columns as the first r columns of Q, and S is the usual r r block

consisting of singular values of A as the diagonal entries. With r = rank(A) = 1,

we thus have the tight SVD as

1/ 6

2 1

2/5

2

1 = 1/ 6 30

.

1/ 5

2/ 6

4 2

In the tight SVD, using associativity of matrix product, we get the rank

factorizations as

1/ 6

5 2

2 1

2/ 6

5

/

1

2

1 =

/ 6

.

=

1/ 5

5

6

2/ 6

4 2

2 5

You should check that the columns of Q are eigenvectors of AA .

Like the tight SVD, another simplification can be done in SVD. Let A Cmn

with m n. Suppose A = PQ is an SVD of A. Let the ith row of Q be denoted by

vi C1n . Write

v1

..

P1 := P, Q1 = . Cmn , 1 = diag (s1 , . . . , sr , 0, . . . , 0) Cmm .

vm

Notice that P1 is unitary and the m rows of Q1 are orthonormal. In block form, we

have

Q = Q1 Q2 , = 1 0 ,

where Q2 = vm+1 vn . Then

A = PQ = P1 1

Q1

0

= P1 1 Q1 .

Q2

Similarly, when m n, we may curtail P accordingly. That is, suppose the ith

column of P is denoted by ui Cm1 . Write

P2 = u1 un Cmn , 2 = diag (s1 , . . . , sr , 0, . . . , 0) Cnn , Q2 := Q.

116

A = PQ = P2 2 Q2 .

These above two forms of SVD, one for m n and the other for m n are called

the thin SVD of A.

It is easy to see that a singular value decomposition of a matrix is not unique. For,

SVD depends on the choice of orthonormal bases; and we can always choose different orthonormal bases, for instance, just by multiplying 1 to an already constructed

one. Also, it can be shown that when A Rmn , the matrices P and Q can be chosen

to have real entries.

Singular value decomposition is the most important result for scientists and engineers, perhaps, next to the theory of linear equations. It shows clearly the power

of eigenvalues and eigenvectors in a dramatic way. Observe that when we write

an m n matrix A of rank r in its SVD form A = PQ , the columns of P are the

eigenvectors of the matrix AA associated with the eigenvalues s21 , . . . , s2r , 0, . . . , 0.

Similarly, the columns of Q are the eigenvectors of the matrix A A associated with

the same eigenvalues. In the former case, there are m r zero eigenvalues and in the

latter case, they are n r in number. Writing the ith column of P as ui and the jth

column of Q as v j , SVD amounts to writing A as

A = s1 u1 v1 + + sr ur vr .

Each matrix uk vk here is of rank 1. This means that if we know the first r singular

values of A and we know their corresponding left and right singular vectors, we

know A completely. This is particularly useful when A is a very large matrix of low

rank. No wonder, SVD is used in image processing, various compression algorithms,

and in principal components analysis. We will see another application of SVD in

representing a matrix in a very useful and elegant manner.

1. Let A Cmn . Let s1 sr be the positive singular values of A. Show that

the positive singular values of A are also s1 , . . . , sr .

2. Prove that if 1 , . . . , n are the eigenvalues of an n n hermitian matrix, then

its singular values are | 1 |, . . . , | n |.

3. Compute the singular value decomposition of the following matrices:

2 2

1 2

2

2

1

2

(a) 1 1 (b)

(c) 2 0 5 .

2 1 2

1

1

3 0

0

1 0

2 1

are similar but they have different

4. Show the matrices

and

1 1

1

0

singular values.

117

Canonical Forms

5. Show that a matrix A Cmn is of rank 1 iff there exist vectors u Cm1 and

v C1n such that A = uv.

6. Let A Cmn be

of rank r with positive singular values s1 , . . . , sr .

a matrix

S 0

Suppose A = P

Q is an SVD of A, where S = diag (s1 , . . . , sr ). Define

0 0

1

S

0

A = Q

P . Prove that A satisfies the following properties:

0 0

(AA ) = AA , (A A) = A A, AA A = A, A AA = A .

A is called the generalized inverse of A.

7. Let A Fmn . Prove that there exists a unique matrix A Fnm satisfying the

four identities mentioned in the previous exercise.

6.5

Polar decomposition

Square matrices behave like complex numbers in many ways. One example is a powerful representation of square matrices using a stretch and a rotation. This mimics

the polar representation of a complex number as z = rei . In this representation, r is

a non-negative real number, thus it represents the stretch; and ei is a rotation. Similarly, a square matrix can be written as a product of a positive semidefinite matrix

and a unitary matrix. The positive semidefinite matrix is a stretch and the unitary

matrix is a rotation. We slightly generalize the representation to any m n matrix.

A hermitian matrix P Fnn is called positive semidefinite iff x Px 0 for each

x Fn1 .

Recall that a matrix U has orthonormal rows iff UU = I; it has orthonormal

columns iff U U = I; and it is unitary iff its rows are orthonormal and its columns

are orthonormal iff UU = U U = I.

Theorem 6.11

(Polar decomposition) Let A Cmn . Then there exist positive semidefinite

matrices P Cmm , Q Cnn , and a matrix U Cmn such that

A = PU = UQ,

where P2 = AA , Q2 = A A, and U satisfies the following:

1. If m = n, then the n n matrix U is unitary.

2. If m < n, then the rows of U are orthonormal.

3. If m > n, then the columns of U are orthonormal.

118

s1 sr . Let A = BDE be an SVD of A, where B Cmm , E Cnn are

unitary matrices, and D Cmn has first r diagonal entries as s1 , . . . , sr , all

other entries 0.

(1) Suppose m = n. Then all the matrices A, B, D, E are of size n n. Since

B B = BB = E E = EE = I, we rewrite A as follows.

A = BDE = (BDB )(BE ) = (BE )(EDE ).

We take U := BE , P := BDB and Q = EDE so that A = PU = UQ. We must

show that U is unitary and P, Q are positive semidefinite satisfying P2 = AA

and Q2 = A A. Now,

U U = EB BE = EE = I,

UU = BE EB = BB = I.

Thus, U is unitary.

Clearly, both P and Q are hermitian. For the other properties of P and Q,

let x Cn1 . Then

x Px = x BDB x = (B x) D(B x).

Write B x := (a1 , . . . , an )t Cn1 . Then

x Px = (a1 , . . . , an )D(a1 , . . . , an )t = |a1 |2 s1 + + |ar |2 sr 0.

Therefore, P is positive semidefinite. Also,

P2 = BDB BDB = BDDB = BDE EDB = (BDE )(BDE ) = AA .

Similarly, it follows that Q is positive semidefinite and Q2 = A A.

(2) Let m < n. Write the n n matrix E in block form

E = E1 E2 ,

where E1 Cnm comprises the first m columns of E and E2 Cn(nm) comprises the rest of the columns. Since E is unitary, the columns of E1 are

orthonormal, that is, E1 E1 = I. Notice that E1 E1 need not be I. Further, write

D in block form with D1 Cmm as the matrix obtained from D by retaining

the first m columns and deleting the next n m columns. That is,

D = D1 0 , D1 = diag (s1 , . . . , sr , 0, . . . , 0).

Consequently,

DE = D1

E1

= D1 E1 ,

E2

ED = ED = (DE ) = (D1 E1 ) = E1 D1 .

119

Canonical Forms

Set U := BE1 and Q := E1 DE . Now, U Cmn , Q Cnn ; and

A = BDE = BE1 E1 DE = UQ.

We find that

UU = (BE1 )(BE1 ) = BE1 E1 B = BB = I.

Clearly, Q is hermitian. Next, let x Cn1 . Write E x := (a1 , . . . , an )t Cn1 ;

so E1 x = (a1 , . . . , am )t Cm1 . Then

x Qx = x E1 DE x = x E1 D1 E1 x = (E1 x) D(E1 x) = |a1 |2 s1 + + |ar |2 sr 0.

That is, Q is positive semidefinite.

Using DE = D1 E1 , E1 D1 = ED, E1 E1 = I and B B = I, we have

Q2 = (E1 DE )2 = E1 DE E1 DE = E1 D1 E1 E1 DE = E1 D1 DE

= EDDE = EDB BDE = (BDE ) (BDE) = A A.

To show that A can also be written in the form PU, consider the following:

A = BDE = BD1 E1 = (BD1 B )(BE1 ) = PU

with P := BD1 B .

(3) Let m > n. Then A Cnm has less number of rows than columns. We

then use (2) to obtain positive semidefinite matrices P Cnn , Q Cmm and

a matrix U Fnm having orthonormal rows such that

A = PU = U Q.

Taking adjoint, and writing P := Q , Q := P and U := U , we obtain

A = UQ = PU,

where U Cmn has orthonormal columns, and P Cnn , Q Cmm satisfy

P2 = AA , Q2 = A A.

Notice that the proof of Theorem 6.11(2) is also valid when m n. Thus, the proof

of (1) is redundant as it would follow from (2) and (3). Also, (2) can be proved more

easily by using the thin SVD. Further, in the proof of (3) we have not constructed

the matrices P and Q explicitly. To unfold the proof, we start with A = BDE , which

yields A = ED B . It is in the form

A = B D E ,

B = E,

D = D ,

E = B.

Next, we follow the construction in (2) with m and n interchanged. It asks us to get

and D 1 as the first n columns of D,

and the first n

B 1 as the first n columns of B;

columns of E as E1 . Then A = PU = U Q with

U = B E1 ,

P = B D 1 B ,

Q = E1 D E

120

We also write B1 , E1 for the matrices formed by taking the first n columns of B, E,

respectively and D1 for the matrix formed by taking the first n rows of D.

Now, taking adjoint, we have A = PU = UQ with

U = U = E1 B = (first n columns of B)E = B1 E ,

P = Q = E D E1 = BDB1 ,

Q = P = B D 1 B = ED1 E .

With these U, P and Q, you can give a direct proof of (3) in Theorem 6.11.

The construction of polar decomposition from SVD may be summarized as follows:

If A Cmn has SVD as A = BDE , then A = PU = UQ, where

mn :

U = BE1 ,

P = BD1 B ,

Q = E1 DE .

mn :

U = B1 E ,

P = BDB1 ,

Q = ED1 E .

B by taking its first n columns; and D1 is constructed from D by taking its first n

rows. In case, m = n, the subscripts go away from B, D and E.

Example 6.8

2 1

1 of Example 6.7. We had obtained its

Consider the matrix A = 2

4 2

SVD as A = BDE , where

1/ 2

1/ 3

1/ 6

30 0

2/ 5 1/ 5

B = 1/ 6 1/ 2 1/ 3 , D = 0

.

0 , E =

1/ 5

2/ 5

2/ 6

1/ 3

0

0

0

We follow the notation used in the proof of Theorem 6.11. Here, A C32 .

Thus Theorem 6.11(3) is applicable; see the discussion following the proof of

the theorem. We construct the matrices B1 by taking first two columns of B,

and D1 by taking first two rows of D, as in the following:

1/ 6

1/ 2

30 0

1

1

/ 6

/ 2 , D1 =

B1 =

.

0

0

2/ 6

0

Then

1

U = B1 E = 16 1

2

P = BDB1 = 5 1

2

2 + 3 1 + 23

1

1

2 1

= 130 2 + 3

3

1 + 2 3 ,

1 2

5

0

4

2

1 1 2

0

1

1

1

2

0

1 2 ,

= 56 1

3

3 0

6

0

2 2 4

121

Canonical Forms

2

Q = ED1 E = 6

1

0 1 2

1

0

5

1

=

2

6

5

4 2

.

2

1

1 1

2

2 + 3 1 + 23

2 1

1

1 2 2 + 3

1 = A.

PU = 56 1

1 + 2 3 = 2

30

2 2

4

4 2

4

2

2 + 3 1 + 23

2 1

6

4

2

1 = A.

= 2

UQ = 130 2 + 3

1 + 2 3

2

1

5

4 2

4

2

matrices P and Q satisfy P2 = AA and Q2 = A A. If A Cmn , then AA Cmm

and A A Cnn are hermitian matrices with eigenvalues as s21 , . . . , s2r , 0, . . . , 0. That

is, if AA = C diag (s21 , . . . , s2r , 0, . . . , 0)C, then P = C diag (s1 , . . . , sr , 0, . . . , 0)C .

Here, the matrix C consists of orthonormal eigenvectors of AA corresponding to

the eigenvalues s21 , . . . , s2r , 0, . . . , 0.

Similarly, the matrix Q is equal to F diag (s1 , . . . , sr , 0, . . . , 0) F, where F consists of orthonormal eigenvectors of A A corresponding to the eigenvalues s21 , . . . , s2r ,

0, . . . , 0. Finally, the Us can be computed by solving the linear systems A = PU and

A = UQ. The Us in two instances may differ since they depend on the choices of

orthonormal eigenvectors of AA and A A. In case, A is invertible, you would end up

with the same U.

1. Determine the polar decompositions of the matrix A of Example 6.8 by diagonalizing AA and A A as mentioned in the text.

2. Let A Cmn with m < n. Prove that there exists a unitary matrix U Cnn

and a matrix P Cmn such that A = PU.

3. Give a direct proof of Theorem 6.11(3) analogous to that of (2) there. You

may have to partition B instead of E in the SVD of A as A = BDE .

4. Prove Theorem 6.11(2-3) by using thin SVD.

5. Derive singular value decomposition from the polar decomposition.

Short Bibliography

[1] S. Axler, Linear Algebra Done Right, Springer Int. ed., Indian Reprint, 2013.

[2] R.A. Brualdi, The Jordan canonical form: an old proof, The American Mathematical Monthly, 94:3 (1987), 257-267.

[3] S. D. Conte, C. de Boor, Elementary Numerical Analysis: An algorithmic approach, McGraw-Hill Book Company, Int. Student Ed., 1981.

[4] J. W. Demmel, Numerical Linear Algebra, SIAM Pub., Philadelphia, 1996.

[5] A. F. Filippov, A short proof of the theorem on reduction of a matrix to Jordan

form, Vestnik Moskov Univ. Ser. I Mat. Meh. 26:2 (1971), 1819. MR 43 No.

4839.

[6] F. R. Gantmacher, Matrix Theory, Vol. 1-2, American Math. Soc., 2000.

[7] G. H. Golub, C. F. Van Loan, Matrix Computations, Hindustan Book Agency,

Texts and Readings in Math. - 43, New Delhi, 2007.

[8] P. R. Halmos, Finite Dimensional Vector Spaces, Springer Int. Ed., Indian

Reprint, 2013.

[9] J. Hefferon, Linear Algebra, http://joshua.smcvt.edu/linearalgebra, 2014.

[10] R. Horn, C. Johnson, Matrix Analysis, Cambridge University Press, New York,

1985.

[11] K. Janich, Linear Algebra, Undergraduate Texts in Math., Springer, 1994.

[12] S. Kumaresan, Linear Algebra: A geometric approach, PHI, 200.

[13] S. Lang, Introduction to Linear Algebra, 2nd Ed., Springer-Verlag, 1986.

[14] D. Lewis, Matrix Theory, World Scientific, 19191.

[15] C. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM Pub., 2000.

[16] R. Piziak, P.L. Odell, Matrix Theory: From Generalized Inverses to Jordan

Form, Chapman and Hall / CRC, 2007.

[17] G. Strang, Linear Algebra and its Applications, 4th Ed., Cengage Learning,

2006.

123

Index

adjoint of a matrix, 12

adjugate, 21

algebraic multiplicity, 95

angle between vectors, 66

free variable, 39

full rank factorization, 62, 114

Gaussian elimination, 42

geometric multiplicity, 95

Gram-Schmidt orthogonalization, 68

Gram matrix, 72

basic variable, 39

basis, 49

best approximation, 71

Homogeneous system, 37

Cayley-Hamilton, 79

change of basis matrix, 59

characteristic polynomial, 78

co-factor, 21

column rank, 35

column vector, 3

combination similarity, 97

complex conjugate, 12

complex eigenvalue, 78

conjugate transpose, 12

consistent system, 38

coordinate vector, 57

identity matrix, 5

inner product, 65

Jordan block, 98

Jordan form, 98

least squares, 74

linearly dependent, 28

linearly independent, 28

linear combination, 18, 27

linear map, 54

Linear system, 36

Determinant, 20

diagonalizable, 91

diagonalized by, 91

diagonal entries, 4

diagonal matrix, 4

diagonal of a matrix, 4

dilation similarity, 97

dimension, 50

Matrix, 3

augmented, 23

entry, 3

hermitian, 82

inverse, 9

invertible, 9

lower triangular, 5

multiplication, 7

multiplication by scalar, 6

normal, 92

order, 4

orthogonal, 82

real symmetric, 82

size, 4

skew hermitian, 82

eigenvalue, 77

eigenvector, 77

elementary matrix, 13

elementary row operation, 14

equal matrices, 4

equivalent matrices, 61

124

125

Short Bibliography

skew symmetric, 82

sum, 6

symmetric, 82

trace, 20

unitary, 82

minor, 21

Spectral theorem, 92

standard basis, 5

standard basis vectors, 5

subspace, 46

super-diagonal, 4

system matrix, 37

norm, 66

null space, 55

tight SVD, 113

transpose of a matrix, 10

triangular matrix, 5

orthogonal basis, 70

orthogonal set, 66

orthogonal vectors, 66

orthonormal basis, 70

orthonormal set, 67

permutation similarity, 97

pivot, 15

pivotal column, 15

pivotal row, 15

positive semidefinite, 117

powers of matrices, 9

Pythagoras, 66

QR factorization, 73

range, 54

range space, 55

rank echelon matrix, 61

rank factorization, 61

rank nullity theorem, 56

Reduction

row reduced echelon form, 16

row rank, 35

Row reduced echelon form, 15

row vector, 3

scalars, 3

scalar matrix, 5

similar matrices, 63

singular values, 111

solution of linear system, 37

span, 47

spanning subset, 47

spans, 47

value of unknown, 37

zero matrix, 4