Sie sind auf Seite 1von 181

Linear Equations and Matrices

Nishan Krikorian
Northeastern University

December 2009

c 2009 by Nishan Krikorian


Copyright �
PREFACE

These notes are intended to serve as an introduction to linear equations and


matrices, or as the subject is usually called, linear algebra. Linear algebra has two
distinct personalities. On the one hand it serves as a computational device, a prob-
lem solving tool indispensable in all quantitative disciplines. This is its algebraic
side. On the other hand its concepts can be seen, its relationships visualized. This
is its geometric side. The structure of these notes follows this basic dichotomy. In
Part 1 we focus on how matrices provide convenient tools for systematizing laborious
calculations by providing a compact notation for storing information and for describ-
ing relationships. In Part 2 we present those concepts of linear algebra that are
best understood geometrically, and we show how matrices describe transformations
of physical space.
The style of these notes is informal, meaning that the main consideration is
pedagogical clarity, not mathematical generality. The treatment of proofs varies.
Some proofs are given in full, some proofs are given only partially, some proofs are
given by example, and some proofs are omitted entirely. Whenever possible, ideas
are illustrated by computational examples and are given geometric interpretations.
Some of the important and essential applications of linear algebra are also presented,
including cubic splines, ODE’s, Markov matrices, and least squares.
These notes should be thought of as course notes or lecture notes, not as a course
text. The distinction is that we take a fairly direct path through the material with
certain specific goals in mind. These goals include the solution of linear systems, the
structure of linear transformations, least squares approximations, orthogonal trans-
formations, and the three basic matrix factorizations: LU, diagonal, and QR. There
is very little deviation from this path or extraneous material presented. The text is
lean, and Sections 6, 11, 12, 13, and 14 can be skipped without loss of continuity.
Almost all the exercises are important and help to develop subsequent material.
Linear algebra is a beautiful and elegant subject, but its practical side is equally
compelling. Stated in starkest terms, linear problems are solvable while nonlinear
problems are not.
TABLE OF CONTENTS

PART 1: Algebra
1. Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. The LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4. Row Exchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5. Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6. Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7. Systems with Many Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9. Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10. Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
11. Matrix Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
12. Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
13. The Complex Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
14. Difference Equations and Markov Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
PART 2: Geometry
15. Vector Spaces, Subspaces and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
16. Linear Independence, Basis, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
17. Dot Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
18. Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
19. Row Space, Column Space, and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
20. Least Squares and Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization . . . . . . . . . . . . . . . 126
22. Diagonalization of Symmetric and Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . 136
23. Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
24. Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Answers to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
1. Gaussian Elimination 1

PART 1: ALGEBRA

1. GAUSSIAN ELIMINATION

The central problem of linear algebra is to find the solutions of systems of linear
equations. We begin with a simple system of three equations and three unknowns:

2u + v − w = 5
4u − 2v = 0
6u − 7v + w = −9 .

The problem is to find the unknown values of u, v, and w, which are themselves
called unknowns or variables. To do this we use Gaussian elimination.
The first step of Gaussian elimination is to use the coefficient 2 of u in the first
equation to eliminate the u from the second and third equations. To accomplish this,
subtract 2 times the first equation from the second equation and 3 times the first
equation from the third equation. The result is

2u + v − w = 5
− 4v + 2w = −10
− 10v + 4w = −24 .

This completes the first elimination step. The coefficient 2 of u in the first equation is
called the pivot for this step. Next use the coefficient −4 of v in the second equation to
eliminate the v from the third equation. Just subtract 2.5 times the second equation
from the third equation to get

2u + v − w = 5
− 4v + 2w = −10
− w= 1.

This completes the second elimination step. The coefficient −4 of v in the second
equation is the pivot for this step. The coefficient −1 of w in the third equation is
the pivot of the third elimination step, which did not have to be performed. The
elimination process is now complete. The resulting system is equivalent to the original
one, and its simple triangular form suggests an obvious method of solution: The third
equation gives w = −1; substituting this into the second equation −4v+2(−1) = −10
gives v = 2; and substituting both into the first equation 2u + (2) − (−1) = 5 gives
u = 1. This simple process is called back substitution.
How did we determine the multipliers 2 and 3 in the first step and 2.5 in the
second? Each is just the leading coefficient of the row being subtracted from, divided
by the pivot for that step. For example, in the second step, 2.5 equals the coefficient
−10 divided by the pivot −4.
2 1. Gaussian Elimination

We said that the triangular system obtained above is equivalent to the original
system, but what does this mean? It means simply that the two systems have the
same solution. This is clear since any solution of the original system must also be
a solution of each system obtained after each step of Gaussian elimination. This
is because Gaussian elimination amounts to nothing more than the subtraction of
equals from equals. Therefore any solution of the original system must also be a
solution of the final triangular system. And by reversing this argument we see that
any solution of the final triangular system must also be a solution of the original
system. Both systems must therefore have the same solutions.
We can simplify Gaussian elimination by noticing that there is no need to carry
the symbols for the unknowns u, v, w along in each step. We can instead represent
the system as an array:  � 
2 1 −1 �� 5
 4 −2 0 � 0  .

6 −7 1 � −9
The numbers multiplying the unknowns in the equations are called coefficients and
are determined by their position in the array. They are separated from the right-hand
sides of the equations by a vertical line. The first elimination step gives
 � 
2 1 −1 �� 5
 0 −4 2 �� −10  ,
0 −10 4 � −24

and the second gives  � 


2 1 −1 �� 5
 0 −4 2 � −10  .

0 0 −1 � 1
Note that the coefficient part of the array is now in triangular form with the pivots
on the diagonal. Back substitution gives the solution.
Can this process ever fail? It is clear that as long as the pivots are not zero
at each step, Gaussian elimination and back substitution will produce a solution.
But if a pivot is ever zero, Gaussian elimination will have to stop. This can happen
suddenly and unpredictably. In the example above, the second pivot was −4, but we
did not know this until we completed the first elimination step. It could have turned
out to be zero thereby stopping the process. In fact this would have happened if the
coefficient of v in the first equation were −1 instead of 1. In general, we don’t know
what the pivot for a particular elimination step is going to be until we complete the
previous step, so we don’t know ahead of time if the process is going to succeed. In
most cases the problem of a zero pivot can be fixed by exchanging two equations. In
some cases the zero pivot represents a true breakdown, meaning that there is either
no solution or infinitely many solutions. We will consider these possibilities later. For
now we assume our systems have only nonzero pivots and thus have unique solutions.
1. Gaussian Elimination 3

A comment on terminology: The official mathematical definition of a pivot re-


quires it to be nonzero. Therefore to say “nonzero pivot” is redundant, and to say
“zero pivot” is contradictory. For the latter we really should say “a zero in the pivot
position” or “a zero in the diagonal position.” But since it is simpler and clearer just
to say “nonzero pivot” or “zero pivot”, we will continue to do so. We will however
discuss this point further in Section 7 where the exact definition of a pivot will be
given.
We have seen how Gaussian elimination puts the coefficient part of the array into
triangular form so that back substitution will give the solution. But, instead of back
substitution, we can also use Gaussian elimination from the bottom up to get the
solution. For the example above, this is done as follows: Use Gaussian elimination
to get the array into triangular form as before:
 � 
2 1 −1 �� 5
 0 −4 2 � −10  .

0 0 −1 � 1

Next subtract −2 times the third row from the second row and 1 times the third row
from the first row to obtain
 � 
2 1 0 �� 4
 0 −4 0 � −8  ,

0 0 −1 � 1

and then subtract −.25 times the second row from the first to obtain
 � 
2 0 0 �� 2
 0 −4 0 �� −8  .
0 0 −1 � 1

Clearly the purpose of these steps is to introduce zeros above the diagonal entries.
The coefficient part of the array is now in diagonal form, and the solution u = 1,
v = 2, w = −1 is obvious. This method of using Gaussian elimination forwards
and backwards is called Gauss-Jordan elimination. It can be used for solving small
problems by hand, but it is inefficient for large problems. We will see later (Section 3)
that ordinary Gaussian elimination with back substitution requires fewer operations
and is therefore preferable.

EXERCISES

1. Solve the following systems using Gaussian elimination in array form.

(a) u − 6v = −8
3u − 2v = −8
4 1. Gaussian Elimination

(b) 5u − v = −1
−3u + 2v = −5

(c) 2u + v + 3w = −4
−2u + 5v + w = 18
4u + 2v + 4w = −6

(d) 4u − 2v + 4w = −24
2u + 3v − w = 17
−8u + 2v + 5w = −1

(e) 3u + 5v = 3
− 2v − 3w = −6
6w + 2x = 14
− w − 2x = −4

2. Solve the system below. When a zero pivot arises, exchange the equation with the
one below it and continue.
u + v + w = −2
3u + 3v − w = 6
u − v + w = −1

3. Try to solve the system below. Why won’t the trick in the previous problem work
here?
u + v + w = −2
3u + 3v − w = 6
u + v + w = −1

4. A farmer has two breeds of chickens, Rhode Island Red and Leghorn. In one year,
one Rhode Island Red hen will yield 10 dozen eggs and 4 pounds of meat, and one
Leghorn hen will yield 12 dozen eggs and 3 pounds of meat. The farmer has a market
for 2700 dozen eggs and 900 pounds of meat. How many hens of each breed should
he have to meet the demand of the market exactly?

5. Suppose a man wants to consume exactly his minimum daily requirements of


70.5 grams of protein and 300 grams of carbohydrates on a diet of bread and peanut
butter. How many grams of each should he eat if bread is 10% protein and 50%
carbohydrates and peanut butter is 25% protein and 20% carbohydrates?

6. A nutritionist determines her minimum daily needs for energy (1,800 kcal), protein
(92 g), and calcium (470 mg). She chooses three foods, pasta, chicken, and broccoli,
and she collects the following data on the nutritive value per serving of each.
1. Gaussian Elimination 5

energy (kcal) protein (g) calcium (mg)


pasta 150 5 10
chicken 200 30 10
broccoli 25 3 90
She then asks how many servings per day of pasta, chicken, and broccoli must she
consume in order to satisfy her minimum daily needs for energy, protein, and calcium
exactly.

7. Find the cubic polynomial y = ax3 + bx2 + cx + d that interpolates (that is, whose
graph pass through) the points (−1,5), (0,5), (1,1), (2,−1).

8. Find the cubic polynomial function f (x) = ax3 + bx2 + cx + d such that f (0) =
2, f � (0) = 1, f (1) = 1, f � (1) = 0. (This is called cubic Hermite interpolation). Sketch
its graph.
6 2. Matrix Notation

2. MATRIX NOTATION

As is common in multidimensional calculus, points in space can be represented


by vectors. For example  
5
b= 0 
−9
is the column vector that represents the point (5, 0, −9) in three-dimensional space.
The basic operations on vectors are multiplication by scalars (real numbers for the
time being)  
15
3b =  0 
−27
and addition      
5 −4 1
 −2  +  −3  =  −5  .
2 4 6
Two vectors can be added together as long as they are the same size.
We define a matrix to be an array of column vectors of the same size. For
example  
2 1 −1 5
C =  4 −2 0 0 
6 −7 1 −9
is a 3 × 4 matrix (read “three by four matrix”). It has three rows and four columns.
Two basic operations on matrices are multiplication by scalars
 
6 3 −3 15
3C =  12 −6 0 0 
18 −21 3 −27
and addition      
2 1 −3 6 −1 7
 −3 2   4 −2   1 0
 + = .
0 4 4 −1 4 3
−1 0 3 0 2 0
Two matrices can be added together as long as they have the same dimensions.
How do we multiply matrices? We first answer the question for two special
matrices; namely, the product of a 1 × n matrix, which is a row vector, and an n
× 1 matrix, which is a column vector. Multiplication for these matrices is done as
follows:  
3

[ 4 1 3 ] 1  = [ 4 · 3 + 1 · 1 + 3 · 0 ] = [ 13 ] .
0
2. Matrix Notation 7

This is just the familiar dot product of two vectors. To extend the definition to the
product of a matrix and a column vector, take the product of each row of the matrix
with the column vector and stack the results to form a new column vector:
     
4 1 3   4·3+1·1+3·0 13
3
2 6 8     2 · 3 + 6 · 1 + 8 · 0   12 
  1 =  =  .
1 0 9 1·3+0·1+1·0 3
0
2 2 1 2·3+2·1+1·0 8

Note that the number of columns of the matrix must equal the number of components
of the vector being multiplied. As an application, we note that the system of equations
considered in the previous section

2u + v − w = 5
4u − 2v = 0
6u − 7v + w = −9

can now be represented as a matrix multiplying an unknown vector so as to equal a


known one:     
2 1 −1 u 5
 4 −2 0  v  =  0 .
6 −7 1 w −9
This is an equation of the form Ax = b where the known matrix A, called the
coefficient matrix of the system, multiplies the unknown vector x and equals the
known vector b. The problem is to find x. The solution vector
   
u 1
x =  v  =  2 ,
w −1

obtained by Gaussian elimination, satisfies the equation


    
2 1 −1 1 5
 4 −2 0   2  =  0  .
6 −7 1 −1 −9

Finally, to multiply two matrices, just multiply the left matrix times each column
of the right matrix and line up the resulting two vectors in a new matrix. For example
    13 
4 1 3  23
3 5
2 6 8  12 18 
  1 0 =  .
1 0 9 3 14
0 1
2 2 1 8 11
8 2. Matrix Notation

Note again that the number of columns of the left factor must equal the number of
rows of the right factor for this to make sense. In this example we multiplied a 4 × 3
matrix by a 3 × 2 matrix and obtained a 4 × 2 matrix. In general, if A is m × n and
B is n × p, then AB is m × p.
Matrix multiplication satisfies the associative law (AB)C = A(BC) and the two
distributive laws A(B + C) = AB + AC and (B + C)D = BD + CD. (The proofs of
these properties are tedious and will be omitted.) It does not, however, satisfy the
commutative law. That is, in general AB �= BA. For example
� �� � � �� �
2 3 0 1 0 1 2 3
�= .
1 2 1 1 1 1 1 2
In fact for many pairs of matrices AB is defined whereas BA is not. (See Exercise
2.)
For every n there is a special n × n matrix, which we call I, with ones down its
diagonal (also called its main diagonal) and zeros everywhere else. For example, in
the 3 × 3 case  
1 0 0
I = 0 1 0.
0 0 1
It is easy to see that for any 3 × 3 matrix A we have IA = AI = A and that this
property carries over to the n × n case. For this reason I is called the identity matrix.
The notation for a general matrix A with m rows and n columns is
 
a11 a12 a13 . . . a1n
 a21 a22 a23 . . . a2n 
 
A= a a32 a33 . . . a3n 
 31 .. 
 ... ..
.
..
.
..
. . 
am1 am2 am3 . . . amn
where aij denotes the entry in the ith row and the jth column. Using this notation
we can define matrix multiplication as follows. Let A be m × n and B be n × p,

n
then C is the m × p matrix with ijth coefficient cij = aik bkj . We will try to
k=1
avoid expressions like this, but it is important to understand them when writing
computer programs to perform matrix computations. In fact, we can write a very
simple program that uses Gaussian elimination and back substitution to solve an
arbitrary linear system of n equations and n unknowns. First express the system in
array form:
 � 
a11 a12 a13 . . . a1n � a1 n+1

 a21 a22 a23 . . . a2n � a2 n+1 
 � 
 a31 a32 a33 . . . a3n � a3 n+1  .
 . .. .. .. .. �� .. 
 .. . . . . � . 

an1 an2 an3 . . . ann an n+1
2. Matrix Notation 9

Then Gaussian elimination would look like


for k = 1 to n − 1 do
if akk = 0 then signal failure and stop
for i = k + 1 to n do
m = aik /akk
aik = 0
for j = k + 1 to n + 1 do aij = aij − makj .
(Note that the program stops when a zero pivot is encountered.) And back substi-
tution would look like
for k = n down to 1 do
t = ak n+1
for j = k + 1 to n do t = t − akj xj
xk = t/akk .

Finally, we summarize the algebraic laws satisfied by matrix addition and multi-
plication. (The following equalities assume that all indicated operations make sense.)

1. A + B = B + A (commutative law for addition)


2. A + (B + C) = (A + B) + C (associative law for addition)
3. r(sA) = (rs)A
4. r(A + B) = rA + rB
5. (−1)A = −A
6. A(BC) = (AB)C (associative law for multiplication)
7. A(B + C) = AB + AC (left-distributive law for multiplication)
8. (B + C)A = BA + CA (right-distributive law for multiplication)
9. r(AB) = (rA)B = A(rB)

EXERCISES

1. Compute
� the following:

5 7 −1
(a) 2
4 −2 0
   
6 2 1 −2
(b) 7 1 + 3 6 
1 2 5 −7
  
4 0 −1 3
(c) 0 1 0  4 
2 −2 1 −5
10 2. Matrix Notation
 
4 0 −1
(d) [ 3 4 −5 ]  0 1 0 
2 −2 1
 
4
(e) [1 2 3]5
6
 
4
(f) 5[1 2 3]
6
  
2 −1 3 0 1
(g) 5 0 7   2 1
0 −1 0 −2 3
  
4 0 −1 2 −1 3
(h) 0 1 0 5 0 7
2 −2 1 0 −1 0
  
4 0 −1 1 0 0
(i) 0 1 0   0 1 0
2 −2 1 0 0 1
 5
2 0 0
(j) 0 1 0
0 0 3
 3
0 1 2
(k) 0 0 1
0 0 0

2. Which of the expressions 2A, A+B, AB, and BA makes sense for the two matrices
below? Which do not?
� � � �
5 7 −1 2 3
A= B=
4 −2 0 1 2

3. Give 3 × 3 matrices that are examples of the following.


(a) diagonal matrix: aij = 0 for all i �= j.
(b) symmetric matrix: aij = aji for all i and j.
(c) upper triangular matrix: aij = 0 for all i > j.
2. Matrix Notation 11

4. Show with a 3 × 3 example that the product of two upper triangular matrices is
upper triangular.

5. Find examples of 2 × 2 matrices such that


(a) A2 = −I
(b) B 2 = 0 where no entry of B is zero.
(c) AB = AC but B �= C. (Zero matrices not allowed!)

6. For any matrix A, we define its transpose AT to be the matrix whose columns are
the corresponding rows of A.  
2 −1 3
(a) What is the transpose of  5 0 7 ?
0 −1 0
(b) Illustrate the formula (A + B)T = AT + B T with a 2 × 2 example.
(c) The formula (AB)T = B T AT holds as long as the product AB makes sense.
(This requires a proof, which we omit.) Illustrate this with a 2 × 2 example,
and use it to prove the formula (ABC)T = C T B T AT .
(d) If a matrix satisfies AT = A, then what kind of matrix is it? (See Exercise 3
above.)
(e) Show that for any matrix C (not necessarily square) the matrix C T C is sym-
metric. (Use (c) and (d) above.)
(f) Show if A and B are square matrices and A is symmetric, then B T AB is sym-
metric.
(g) Show with a 2 × 2 example that the product of two symmetric matrices may not
be symmetric.
        
4 0 −1 c1 4 0 −1
7. Verify  0 1 0   c2  = c1  0  + c2  1  + c3  0  .
2 −2 1 c3 2 −2 1

8. Verify
 
4 0 −1
[ c1 c2 c3 ]  0 1 0  = c1 [ 4 0 −1 ] + c2 [ 0 1 0 ] + c3 [ 2 −2 1].
2 −2 1

9. The matrix (A + B)2 is always equal to which of the following.


(a) A(A + B) + B(A + B)
(b) (A + B)A + (A + B)B
(c) A2 + AB + BA + B 2
(d) (B + A)2
(e) A(A + B) + (A + B)B
12 2. Matrix Notation

(f) A2 + 2AB + B 2

10. Convince yourself that the product AB of two matrices can be thought of as A
multiplying the columns of B to produce the columns of AB or
 . .. ..   .. .. .. 
.. . . . . .
   
A  b1 b2 =
· · · bn   Ab1 Ab2 · · · Abn  .
.. .. .. .. .. ..
. . . . . .

11. Assuming the operations make sense, which are symmetric matrices?
(a) AT A
(b) AT AAT
(c) AT + A
3. The LU Factorization 13

3. THE LU FACTORIZATION

If we run Gaussian elimination on the coefficient matrix of Section 1, we obtain


 
2 1 −1
A =  4 −2 0 
6 −7 1

 
2 1 −1
 0 −4 2 
0 −10 4

 
2 1 −1
U =  0 −4 2  .
0 0 −1

We call the resulting upper triangular matrix U . The following equations describe
exactly how the Gaussian steps turn the rows of A into the rows of U .

row 1 of U = row 1 of A
row 2 of U = row 2 of A − 2(row 1 of U )
row 3 of U = row 3 of A − 3(row 1 of U ) − 2.5(row 2 of U )

Note that once a row is used as “pivotal row,” it never changes from then on. It
therefore can be considered as a row of U . We can solve these equations for the rows
of A to obtain

row 1 of A = 1(row 1 of U )
row 2 of A = 2(row 1 of U ) + 1(row 2 of U )
row 3 of A = 3(row 1 of U ) + 2.5(row 2 of U ) + 1(row 3 of U ) .

Using the property of matrix multiplication illustrated in Section 2 Exercise 8, this


is just an expression of the matrix equation
    
row 1 of A 1 0 0 row 1 of U
 row 2 of A  =  2 1 0   row 2 of U 
row 3 of A 3 2.5 1 row 3 of U

or     
2 1 −1 1 0 0 2 1 −1
 4 −2 0  =  2 1 0   0 −4 2  .
6 −7 1 3 2.5 1 0 0 −1
14 3. The LU Factorization

We write this equation as A = LU and call the product on the right the LU factor-
ization of A. Note that L is the lower triangular matrix with ones down its diagonal,
with the multipliers 2 and 3 from the first Gaussian step in its first column, and
with the multiplier 2.5 from the second Gaussian step in its second column. The
pattern is the same for every matrix. Any square matrix can be factored by Gaussian
elimination into a product of a lower triangular L with ones down its diagonal and
an upper triangular U , under the proviso that all pivots are nonzero.
How can the LU factorization of A be used to solve the original system Ax = b?
First we replace A by LU in the system to get LU x = b. Then we note that this
system
 can
 be solved by solving the two systems Ly = b and U x = y in order. Letting
r
y =  s , the first system Ly = b is
t
    
1 0 0 r 5
2 1 0s =  0 ,
3 2.5 1 t −9

which can be solved by forward substitution to get


   
r 5
 s  =  −10  .
t 1
 
u
And letting x =  v  the second system U x = y is
w
    
2 1 −1 u 5
 0 −4 2   v  =  −10  ,
0 0 −1 w 1

which can be solved by back substitution to get


   
u 1
 v  =  2 .
w −1

Therefore, in the matrix form of Gaussian elimination, we use elimination to factor


A into LU and then solve Ly = b by forward substitution and U x = y by back
substitution.
If you have just one system Ax1 = b1 , then there is no advantage of this method
over the array form of Gaussian elimination presented in Section 1. In fact, it is
3. The LU Factorization 15

slightly harder since there is an extra forward substitution step. But now suppose you
have a second system Ax2 = b2 with a different right-hand side. The LU factorization
method would factor A into LU and then solve LU x1 = b1 and LU x2 = b2 both by
forward and back substitution. On the other hand, the array method would have to
run through the entire Gaussian elimination process twice, once for each system. So
if you have several systems to solve, all of which differ only in their right-hand sides,
then the LU factorization method is preferable.
By counting operations, we can compare the relative expense in computer time of
elimination verses forward and back substitution. We will count only multiplications
and divisions since they take much more time than addition and subtraction. In
the first elimination step for an n × n matrix, a multiplier (one division) times the
second through nth entries of the first row (n − 1 multiplications) is subtracted
from a row below the first. This results in n operations. Since there are n − 1
rows to be subtracted from, the total number of operations for the first step is
(n − 1)n = n2 − n. The second step is exactly like the first except that it is performed
on an (n − 1) × (n − 1) matrix and therefore requires (n − 1)2 − (n − 1) operations.
Continuing in this manner we see that the total number of operations required for
Gaussian elimination is
�n
n3 − n
(k − k) =
2
,
3
k=1

and since n is negligible compared to n3 for large n, we conclude the number of oper-
ations required to compute the LU factorization of an n × n matrix is approximately
n3 /3. Back substitution is much faster since the number of operations required is
easily seen to be
� n
n(n + 1)
k= ,
2
k=1

which is approximately n2 /2. Forward substitution is the same. A 50 ×50 matrix


would therefore require 41,666 operations for its LU factorization but only 1,275
for forward and back substitution. By a similar operation count, we can show that
Gauss-Jordan elimination requires n3 /2 operations, which is 50% more than straight
Gaussian elimination with back substitution. Gauss-Jordan elimination on a 50 ×
50 matrix would therefore require 62,500 operations.

EXERCISES

1. Find the LU factorizations of


� �
4 −6
(a) A =
3 5
16 3. The LU Factorization
 
2 1 3
(b) 
B = −2 5 1
4 2 4
 
1 3 2 −1
 2 5 3 2 
(c) C= 
−3 2 −1 2
1 1 3 1
 
2 1 0 0 0
4 5 3 0 0
 
(d) D = 0 3 4 1 0
 
0 0 −1 1 1
0 0 0 4 3

2. Use the LU factorizations above and forward and back substitution to solve
� �
−8
(a) Ax =
13
 
12
(b) Bx =  −6 
18
 
0
 4 
(c) Cx =  
−1
2
 
4
 5 
 
(d) Dx =  −4 
 
2
3

3. If your computer performs 106 operations/sec and costs $500/hour to run, then
how large a linear system can you solve with a budget of $2? Of $200?
4. Row Exchanges 17

4. ROW EXCHANGES

We now return to the question of what happens when we run into zero pivots.

Example 1: We first consider the system

u + 2v + 3w = 1
2u + 4v + 9w = 5
2u + 6v + 7w = 4 .

Using Gaussian elimination on the corresponding array


 � 
1 2 3 �� 1
2 4 9�5

2 6 7�4

the first elimination step gives


 � 
1 2 3 �� 1
0 0 3 �� 3  .
0 2 1�2

A zero pivot has appeared. But note that there is a nonzero entry lower down in the
second column, in this case the 2 in the third row. The problem can therefore be
fixed by just exchanging the second and third rows:
 � 
1 2 3 �� 1
0 2 1�2.

0 0 3�3

This has the harmless effect of exchanging the second and third equations. In this
case we are done with elimination since the array is now ready for back substitution.

Example 2: Now let’s look at another system:

u + 2v + 3w = 1
2u + 4v + 9w = 5
3u + 6v + 7w = 5 .

Using Gaussian elimination on the corresponding array


 � 
1 2 3 �� 1
2 4 9�5

3 6 7�5
18 4. Row Exchanges

the first elimination step gives


 � 
1 2 3 �� 1
0 0 3 �� 3  .
0 0 −2 � 2
But now a row exchange will not produce a nonzero pivot in the second row. Gaussian
elimination breaks down through no fault of its own simply because this system has
no solution. The last two equations, 3w = 3 and −2w = 2, cannot be satisfied
simultaneously. We can also see this by extending Gaussian elimination a little. Use
the 3 in the second equation to eliminate the −2 in the third equation. This will
produce  � 
1 2 3 �� 1
0 0 3�3.

0 0 0�4
The third equation, 0 = 4, signals the impossibility of a solution.

Example 3: In the example above, suppose the right-hand side of the third equation
is equal to 1 instead of 5, then the elimination gives
 � 
1 2 3 �� 1
0 0 3�3.

0 0 0�0
What we really have here is two equations with three unknowns. Back substitution
breaks down since the first equation cannot determine both u and v by itself. In this
case there are infinitely many solutions to the original system. (See Section 7.)

We conclude that when we run into a zero pivot, we should look for a nonzero
entry in the column below the zero pivot. If we find one, we make a row exchange
and continue. If we don’t, then we must stop; a unique solution to the system does
not exist. A matrix for which Gaussian elimination possibly with row exchanges
produces a triangular system with nonzero pivots is called nonsingular. Otherwise
the matrix is called singular.
What happens to the LU factorization of A when there are row exchanges? The
answer is that the product of the L and U we obtain no longer equals the original
matrix A but equals A with row exchanges. Suppose we knew what row exchanges
would be necessary before we started. Then if we performed those exchanges on A
first, we would get the normal LU factorization of this altered A. The altered version
of A is realized by premultiplying A by a permutation matrix P , which is just the
identity matrix with some of its rows exchanged. We would then obtain the equation
P A = LU . For the first example of this section this looks like
     
1 0 0 1 2 3 1 0 0 1 2 3
0 0 12 4 9 = 2 1 00 2 1.
0 1 0 2 6 7 2 0 1 0 0 3
4. Row Exchanges 19

The P A = LU factorization can still be used to solve the system Ax = b as before.


Just apply P to both sides to get P Ax = P b. The LU factorization of P A gives
LU x = P b. Forward and back substitution then give the solution x.
Since row exchanges are unpleasant when factoring matrices, we will try to re-
strict our attention to nonsingular matrices that do not need them. In any case, we
restate the central fact (really a definition): For a linear system Ax = b whose coef-
ficient matrix A is nonsingular, Gaussian elimination, possibly with row exchanges,
will produce a triangular system with nonzero pivots on the diagonal, and back sub-
stitution will produce the unique solution. (From now on, we will take “Gaussian
elimination” to mean “Gaussian elimination possibly with row exchanges.”)

EXERCISES
    
1 4 2 u −2
1. Solve by the array method  −2 −8 3   v  =  32 
0 1 1 w 1

2. Which of the following matrices is singular? Why?


 
1 4 2
(a)  6 −8 2 
−2 −8 −4
 
1 4 2
(b)  −2 −8 −3 
−1 −4 5
 
1 3 2 0
0 5 0 2 
(c)  
0 0 10 2
0 0 0 11
 
1 3 2 −1
0 5 3 2 
(d)  
0 0 0 2
0 0 0 10

3. How many solutions do each of the following systems have?


    
0 1 −1 u 2
(a)  1 −1 0   v  =  2 
1 0 −1 w 2
20 4. Row Exchanges
    
0 1 −1 u 0
(b)  1 −1 0   v  =  0 
1 0 −1 w 0

4. Prove that if A is nonsingular, then the only solution of the system Ax = 0 is


x = 0. (A system of the form Ax = 0 is called homogeneous.)
5. Inverses 21

5. INVERSES

A square matrix A is invertible if there is a matrix B of the same size such


that their product in either order is the identity matrix: AB = BA = I. If there
is such a B, then there is at most one. We write it as A−1 and call it the inverse
of A. Therefore AA−1 = A−1 A = I. We can easily prove that there cannot be
more than one inverse of a given matrix: If B and C are both inverses of A, then
B = BI = B(AC) = �(BA)C = IC = C. As an example, the inverse of the matrix
� � � 2 1
1 −1 3 3
is since
1 2 − 3 13
1

� �� 2 1
� � �
1 −1 3 3 1 0
= .
1 2 − 13 1
3 0 1

� �
1 0
Some matrices do not have inverses. For example the matrix cannot have
2 0
an inverse since � �� � � �
1 0 a b a b
= ,
2 0 c d 2a 2b
so there is no choice of a, b, c, d that will make the right-hand side equal to the identity
matrix.
How can we tell if a matrix has an inverse, and, if it does have an inverse, then
how do we compute it? We answer the second question first. Let’s try to find the
inverse of  
2 −3 2
A =  1 −1 1  .
3 2 2
This means that we are looking for a matrix B such that AB = I or
    
2 −3 2 b11 b12 b13 1 0 0
 1 −1 1   b21 b22  
b23 = 0 1 0.
3 2 2 b31 b32 b33 0 0 1

Let B1 , B2 , B3 be the columns of B and I1 , I2 , I3 be the columns of I. Then we can


see that this is really the problem of solving the three separate linear systems

AB1 = I1 AB2 = I2 AB3 = I3 .

Since the coefficient matrix is the same for all three systems, we can just find the LU
factorization of A and then use forward and back substitution three times to find the
three solution vectors. These vectors, when lined up, will form the columns of B.
22 5. Inverses

If we want to find the solution by hand, we can use the array method and a trick
to avoid running through Gaussian elimination three times. First set up the array
 � 
2 −3 2 �� 1 0 0
 1 −1 1 � 0 1 0 

3 2 2�0 0 1

and use Gaussian elimination to get


 � 
2 −3 2 �� 1 0 0
 0 .5 0 �� −.5 1 0.
0 0 −1 � 5 −13 1

Now in this situation we would normally use back substitution three times. But we
could also use Gauss-Jordan elimination. That is, use the −1 in the third row to
eliminate the entries in the column above it by subtracting multiples of the third row
from the second (unnecessary since that entry is already zero) and from the first.
This gives  � 
2 −3 0 �� 11 −26 2
 0 .5 0 �� −.5 1 0.
0 0 −1 5 � −13 1
Then use the .5 in the second row to eliminate the −3 in the first row.
 � 
2 0 0 �� 8 −20 2
 0 .5 0 � −.5 1 0.

0 0 −1 � 5 −13 1

Finally divide each row by its leading nonzero entry to get


 � 
1 0 0 �� 4 −10 1
 0 1 0 � −1 2 0 .

0 0 1 � −5 13 −1

The three columns on the right are the solutions to the three linear systems, so
 
4 −10 1
A−1 =  −1 2 0 .
−5 13 −1

These two methods for finding inverses, (1) LU factorization and forward and
back substitution n times and (2) Gauss-Jordan elimination, each require n3 oper-
ations. Either method will work as long as A is nonsingular. Gauss-Jordan elimi-
nation does, however, present some organizational clarity when finding inverses by
5. Inverses 23

hand. Furthermore, since Gauss-Jordan elimination is just the array method per-
formed several times at once, row exchanges can be made without affecting the final
answer.
Once we have the inverse of a matrix, what can we do with it? It might seem
at first glance that A−1 can be used to solve the system Ax = b directly. Just
apply A−1 to both sides to obtain x = A−1 b. This turns out to be much inferior to
ordinary Gaussian elimination with back substitution for two reasons: (1) It takes
n3 operations to find A−1 as compared with n3 /3 operations to solve Ax = b by
Gaussian elimination. (2) Computing inverses, by whatever method, is subject to
much more numerical instability and round-off error than is Gaussian elimination.
Inverses are valuable in theory and for conceptualization. In some areas of statistics
and linear programming it is occasionally necessary to actually compute an inverse.
But for most large-scale applications, the computation of matrix inverses can and
should be avoided.
We end this section with a major result, which we state and prove formally. It
basically says that for any matrix A the three questions, (1) does Gaussian elimination
work, (2) does A have an inverse, and (3) does Ax = b have a unique solution, all
have the same answer.

Theorem. For any square matrix A the following statements are equivalent (all are
true or all are false).
(a) A is nonsingular (that is, Gaussian elimination, possibly with row exchanges,
produces nonzero pivots)
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) Ax = 0 has the unique solution x = 0.

Proof: We show (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (a).


(a) ⇒ (b): The point of this section has been to show that if A is nonsingular, then
Gaussian elimination can be used to find its inverse.
(b) ⇒ (c): Apply A−1 to both sides of Ax = b to obtain a solution x = A−1 b. Let y
be a different solution. Then apply A−1 to both sides of Ay = b to obtain y = A−1 b.
Therefore x = y and the solution is unique.
(c) ⇒ (d): Clearly x = 0 is a solution of Ax = 0, and by (c) it must be the only
solution.
(d) ⇒ (a): We prove this by assuming (a) is false, that is A is singular, and we show
that this implies (d) is false, that is Ax = 0 has nonzero solutions in x. (Recall
that the statement “(d) ⇒ (a)” is logically equivalent to the statement “not (a) ⇒
not (d).”) Consider the system Ax = 0. Since we are assuming A is singular, if we
apply Gaussian elimination, at some point we will run into a zero pivot. Using the
language and results of Section 7, we can immediately say that there must exist a
free variable and therefore conclude that there are nonzero solutions to Ax = 0. But
since we haven’t got to Section 7 yet, we’ll try to prove this directly. When we run
24 5. Inverses

into the zero pivot, we will have a situation that looks something like
    
∗ ∗ ∗ ∗ ∗ x1 0
0 ∗ ∗ ∗ ∗   x2   0 
    
0 0 0 ∗ ∗   x3  =  0  .
    
0 0 0 ∗ ∗ x4 0
0 0 0 ∗ ∗ x5 0
But if we set x5 = x4 = 0 and x3 = 1 and solve for x2 and x1 by back substitution,
we will get a nonzero solution to Ax = 0. This shows that (d) is false. The pattern is
the same in all cases. If A is any singular matrix, then Gaussian elimination applied
to Ax = 0 will produce a system that in exactly the same way can be shown to have
nonzero solutions. This proves the theorem.

Note that the first statement of the theorem is equivalent to the fact that A has
P A = LU factorization, which we can now write A = P −1 LU . (We skip the proof
that P −1 exists.)

EXERCISES

1. Use Gauss-Jordan elimination to find the inverses of the following matrices.


� �
1 4
(a)
2 7
 
2 0 0
(b)  0 .1 0 
0 0 −5
 
2 6 10
(c) 0 2 5 
0 0 5
 
1 1 1
(d) 2 3 2
3 8 2
 
1 1 1
(e)  −1 3 2 
2 1 1
 
1 2 3 1
1 3 3 2
(f)  
2 4 3 3
1 1 1 1
5. Inverses 25
� �
a b
(g)
c d

2. From Exercises 1(b) and 1(c) what can you say about the inverse of a diagonal
matrix and of an upper triangular matrix?
 
11
3. Let A be the matrix of Exercise 1(e). Solve the system Ax = 23  by using A−1

13

4. Which of the following matrices is invertible? Why? (See Section 4 Exercise 2.)
 
1 4 2
(a)  6 −8 2 
−2 −8 −4
 
1 4 2
(b)  −2 −8 −3 
−1 −4 5
 
1 3 2 0
0 5 0 2 
(c)  
0 0 10 2
0 0 0 11
 
1 3 2 −1
0 5 3 2 
(d)  
0 0 0 2
0 0 0 10

5. If A, B, C are invertible, then prove


(a) (AB)−1 = B −1 A−1 .
(b) (ABC)−1 = C −1 B −1 A−1 .
(c) (AT )−1 = (A−1 )T .

6. Prove that if A and B are nonsingular, then so is AB.

7. Give 2 × 2 examples of the following.


(a) The sum of two invertible matrices may not be invertible.
(b) The sum of two noninvertible matrices may be invertible.

8. There is a slight hole in our proof of (a) ⇒ (b) in the theorem of this section.
To find the inverse of A, we applied Gauss-Jordan elimination to the array [A, I] to
obtain [I, B]. We then concluded AB = I so that B is a right-inverse of A. But how
26 5. Inverses

do we know that B is also a left-inverse of A? Prove that it is, that is, prove that
BA = I by applying the reverse of the same Gauss-Jordan steps in reverse order to
the array [B,I] to obtain [I,A].

9. More generally, it is true that if a matrix has a one-sided inverse, then it must have
a two-sided inverse. Or more simply stated, AB = I ⇒ BA = I. To prove this, argue
as follows: AB = I ⇒ B is nonsingular ⇒ B is invertible ⇒ A = B −1 ⇒ BA = I.
Fill in the details.

10. True or false?


(a) “Every nonsingular matrix has an LU factorization.”
(b) “If A is singular, then the homogeneous system Ax = 0 has nonzero solutions.”
6. Tridiagonal Matrices 27

6. TRIDIAGONAL MATRICES

When coefficient matrices arise in applications, they usually have special pat-
terns. In such cases Gaussian elimination often simplifies. We now illustrate this
by looking at tridiagonal matrices, which are the simplest kind of band matrices. A
matrix is tridiagonal if all of its nonzero elements are either on the main diagonal or
adjacent to the main diagonal. Here is an example (from Section 3 Exercise 1(d)):
 
2 1 0 0 0
4 5 3 0 0
 
0 3 4 1 0
 
0 0 −1 1 1
0 0 0 4 3
If we run Gaussian elimination on this matrix, we obtain
 
2 1 0 0 0
0 3 3 0 0
 
0 0 1 1 0.
 
0 0 0 2 1
0 0 0 0 1
This example reveals three properties of tridiagonal matrices and Gaussian elimina-
tion. (1) There is at most one nonzero multiplier in each Gaussian step. (2) The
superdiagonal entries (that is, the entries just above the main diagonal) don’t change.
And (3) the final upper triangular matrix has nonzero entries only on its diagonal
and superdiagonal. If we count the number of operations required to triangulate a
tridiagonal matrix, we find it is equal to n instead of the usual n3 /3. We conclude
that large systems involving tridiagonal matrices are very easy to solve. In fact, we
can write a quick and efficient program that will solve tridiagonal systems directly:
   x0   b0 
d1 c1
 a2 d2 c2  x1   b1 
    
x2  =  b2 
 a3 d3 c3 
  .   . 
. . .  ..   .. 
an dn xn bn

for k = 2 to n do
if dk−1 = 0 then signal failure and stop
m = ak /dk−1
dk = dk − mck−1
bk = bk − mbk−1
if dn = 0 then signal failure and stop
xn = bn /dn
for k = n − 1 down to 1 do xk = (bk − ck xk+1 )/dk
28 6. Tridiagonal Matrices

Tridiagonal matrices arise in many situations: electrical circuits, heat flow prob-
lems, the deflection of beams, and so on. Here we show how tridiagonal matrices are
used in cubic spline interpolation. We are given data (x0 , y0 ), (x1 , y1 ), · · · , (xn , yn ).
The points x1 , x2 , · · · , xn−1 are called interior nodes, and x0 and xn are called bound-
ary nodes. The problem is to find a cubic polynomial on each of the intervals
[x0 , x1 ], [x1 , x2 ], · · · , [xn−1 , xn ] such that, at each interior node, the cubic on the left
and the cubic on the right have the same heights, the same slopes, and the same
curvature (that is to say, the same second derivative). If we glue these cubics to-
gether we will obtain a cubic spline, which is a smooth curve passing through all the
data. To make the problem completely determined, we need conditions at the two
boundary nodes. Often these are taken to be the requirement that the spline has no
curvature (zero second derivatives) at the boundary nodes. The spline thus obtained
is called a natural spline. Splines have applications in CAD-CAM, font design, and
modeling.

s1

(x1 , y1 )
(x3 , y3 )
s3
s4
(x0 , y0 )
s0 s2 (x4 , y4 )
(x2 , y2 )

x0 x1 x2 x3 x4

FIGURE 1

How do we find the cubic polynomials that make up the spline? If we knew what
slopes the spline curve should have at its nodes, then we could find the cubic poly-
nomial on each interval using the method of Section 1 Exercise 8. Let s0 , s1 , · · · , sn
be the unknown slopes at the nodes. For simplicity assume that the data is equally
spaced, that is, x1 − x0 = x2 − x1 = · · · = xn − xn−1 = h. Then with some algebraic
effort it is possible to show that the conditions described above force the slopes to
6. Tridiagonal Matrices 29

satisfy the following linear system:


    
2 1 s0 y1 − y0
1 4 1  s1
  y2 − y0

    
 1 4 1  s2
  y3 − y1

    
 1 4 1  s3
  y4 − y2

 ..   3
..  ..
 .  =  .
  .
 h  .
 1 4 1     
   sn−3   yn−2 − yn−4 
 1 4 1    
   sn−2   yn−1 − yn−3 
 1 4 1   sn−1   
yn − yn−2
1 2 sn yn − yn−1

The first and last equations come from the conditions at the boundary nodes. All
the other equations come from the conditions at the interior nodes. The system is
tridiagonal and therefore easy to solve, even when there is a large number of nodes.
Once the slopes s0 , s1 , · · · , sn are known, the cubic polynomial on each interval can
be found as a cubic Hermite interpolant. (See Section 1 Exercise 8.)

EXERCISES

1. Write down the system that the slopes of the natural spline interpolant of the
data (0,0), (1,1), (2,4), (3,1), (4,0) must satisfy. Solve it. Sketch the resulting spline
curve.

2. Run Gaussian elimination on the tridiagonal n + 1 × n + 1 matrix of this section


and show that the pivots for rows 1,· · · , n − 1 are all > 3. This proves that not
only is this matrix nonsingular but also that no row exchanges are necessary. The
tridiagonal algorithm can therefore be used.
30 7. Systems with Many Solutions

7. SYSTEMS WITH MANY SOLUTIONS

Consider the single linear equation in one unknown ax = b. At first glance we


would say that the solution is just x = b/a. But in fact there are three cases:
(1) If a �= 0, there is exactly one solution: x = b/a. For example, if 2x = 6, then
x = 6/2 = 3.
(2) If a = 0 and b �= 0, there is no solution. For example, 0x = 6 is not satisfied by
any x.
(3) If a = b = 0, there are infinitely many solutions because 0x = 0 is satisfied by
every x.
It is a striking fact that exactly the same three cases are the only possibilities that
exist for systems of equations. We first look at 2 × 2 examples.

Example 1: The system


u+v=2
u−v=0
in array form � � �
1 1 �� 2
1 −1 � 0
reduces by Gaussian elimination to
� � �
1 1 �� 2
.
0 −2 � −2

The unique solution is therefore v = 1 and u = 1. This is the nonsingular case we


have been considering in these notes up to now.

Example 2: The system


u+ v=2
2u + 2v = 0
in array form � � �
1 1 �� 2
2 2�0
reduces by Gaussian elimination to
� � �
1 1 �� 2
.
0 0 � −4

Clearly we cannot use back substitution. Even worse, the second equation, 0u + 0v =
−4, has no solution. This indicates that the entire system has no solution. The
coefficient matrix is of course singular, and the system is said to be inconsistent.
7. Systems with Many Solutions 31

Example 3: The system


u+ v=2
2u + 2v = 4
in array form � � �
1 1 �� 2
2 2�4
reduces by Gaussian elimination to
� � �
1 1 �� 2
.
0 0�0

This time the second equation is trivially satisfied for all u and v. So we set v = c
where c is an arbitrary constant and try to continue with back substitution. The first
equation then gives u = 2 − c. The solution is therefore

u=2−c
v=c

or written in vector form is � � � �


u 2−c
=
v c
or alternatively � � � � � �
u 2 −1
= +c .
v 0 1
We see that we have obtained an infinite number of solutions parametrized by an
arbitrary constant. The coefficient matrix is still singular as before, but the system
is said to be underdetermined.

In the following examples we present a systematic method for finding solutions of


more complicated systems. The method is an extension of Gauss-Jordan elimination.

Example 4: Suppose we have the 3 × 3 system


 � 
1 2 −1 �� 2
2 4 1 �� 7  .
3 6 −2 � 7

Gaussian elimination produces


 � 
1 2 −1 �� 2
0 0 3 �� 3  ,
0 0 0 �0
32 7. Systems with Many Solutions

and Gauss-Jordan produces  � 


1 2 0 �� 3
0 0 3 �� 3  .
0 0 0�0
This is as far as we can go. There is no way to get rid of the 2 in the first equation.
The variables u, v, w now fall into two groups: leading variables, those that correspond
to columns that have a leading nonzero entry for some row, and free variables, those
that do not. In this case, u and w are leading variables and v is a free variable. Free
variables are set to arbitrary constants, so v = c. Leading variables are solved for in
terms of free variables. Working from the bottom up, we obtain the solution
u = 3 − 2c
v=c
w=1
or in vector form      
u 3 −2
 v  = 0 + c 1 .
w 1 0

Example 5: Suppose we have the 3 × 4 system


 � 
1 2 1 3 �� 2
0 0 0 1 �� 2  .
0 1 1 1�3
One step of Gaussian elimination, an exchange of the second and third rows, will
produce the staircase form  � 
1 2 1 3 �� 2
0 1 1 1�3.

0 0 0 1�2
Now apply two steps of Gauss-Jordan to obtain
 � 
1 0 −1 0 �� −6
0 1 1 0� 1 .

0 0 0 1� 2
(These Gauss-Jordan steps are really not necessary, but they usually make the answer
somewhat easier to write down.) The free variable is w. The solution is therefore
u = −6 + c
v =1−c
w=c
x=2
7. Systems with Many Solutions 33

or in vector form      
u −6 1
v  1   −1 
 =  + c .
w 0 1
x 2 0

Example 6: Suppose we have a 3 × 4 system that reduces to


 � 
1 2 0 3 �� 2
0 0 2 1 �� 2  .
0 0 0 0�0

There are two free variables, v and x. Each is set to a different arbitrary constant.
The solution is therefore
u = 2 − 3c − 2d
v=d
w = 1 − .5c
x=c
or in vector form        
u 2 −3 −2
 v  0  0   1 
  =   + c  + d .
w 1 −.5 0
x 0 1 0
This time we have a an infinite number of solutions parametrized by two arbitrary
constants.

In general, Gaussian elimination will put the array into echelon form
 � 
• ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗�∗

0 • ∗ ∗ ∗ ∗ ∗ ∗ ∗�∗
 � 
0 0 0 • ∗ ∗ ∗ ∗ ∗�∗,
 � 
0 0 0 0 0 0 0 0 •�∗

0 0 0 0 0 0 0 0 0 ∗

and Gauss-Jordan elimination will put the array into row-reduced echelon form
 � 
• 0 ∗ 0 ∗ ∗ ∗ ∗ 0�∗

0 • ∗ 0 ∗ ∗ ∗ ∗ 0�∗
 � 
0 0 0 • ∗ ∗ ∗ ∗ 0�∗.
 � 
0 0 0 0 0 0 0 0 •�∗

0 0 0 0 0 0 0 0 0 ∗
34 7. Systems with Many Solutions

In either case, we get a staircase pattern where the first nonzero entry in each row
(indicated by bullets above) is a pivot. This is the precise mathematical definition of
pivot. For square nonsingular matrices, all pivots occur on the main diagonal. For
singular matrices, at least one pivot occurs to the right of the main diagonal. (Up to
now we have been referring to this informally as the case of a “zero pivot.”)

EXERCISES
1. Solutions can be written in many equivalent ways. Show that the following
expressions represent the same set of solutions.
� � � � � � � � � � � �
u 2 −1 u 2 8
(a) = +c and = +c �
v 0 1 v 0 −8
               
u 3 0 0 u 3 0 0
(b)    
v = 0 +c 1 +d 0     and     � 
v = 0 +c 1 +d 1 � 

w 0 0 1 w 0 0 1

2. Find the solutions of


 � 
1 2 3 �� 3
(a)  1 4 5 �� 4 
1 0 1�2
 � 
2 2 2 �� 1
(b) 2 6 4 �� 6 
4 8 6 � 10
 � 
1 2 −1 �� 3
(c)  2 4 −2 �� 6 
−3 −6 3 � −9
 � 
1 2 1 8 �� 1
(d) 0 1 0 2 �� −1 
2 5 2 20 � 1
 � 
2 4 2 3 �� 1
(e) 1 2 2 2 �� −1 
4 8 0 4� 8
 � 
2 3 �� −9
(f) 4 6 �� −18 
4 3 � −3
7. Systems with Many Solutions 35

3. Solve each of the following 2 × 2 systems. Then graph each equation as a line and
give a geometric reason for the number of solutions of each system.
� � �
1 3 �� 2
(a)
3 2�1
� � �
2 1 �� −1
(b)
−6 −3 � −4
� � �
3 −1 �� 2
(c)
−6 2 � −4

4. Solve each of the following 3 × 3 systems. Then graph each equation as a plane
and give a geometric reason for the number of solutions of each system.
 � 
1 1 0 �� 1
(a)  1 −1 0 �� 0 
0 0 1�0
 � 
2 0 0 �� 2
(b) 0 0 3 �� 0 
0 0 3�6
 � 
1 0 0 �� 0
(c) 0 1 0 �� 0 
1 1 0�1
 � 
1 0 0 �� 1
(d) 0 1 0 �� 0 
1 1 0�1
 � 
1 1 1 �� 1
(e) 2 2 2 �� 2 
3 3 3�3

5. Explain why the following statements are true.


(a) If the system Ax = b has more unknowns than equations, then it has either no
solution or infinitely many solutions. (Hint: There must be some free variables.)
(b) If the homogeneous system Ax = 0 has more unknowns than equations, then it
has infinitely many solutions. (Hint: Why can’t the no solution case occur?)

6. If Ax = b has infinitely many solutions, then Ax = c (different right-hand side)


has how many possible solutions: none, one, or infinitely many?
36 7. Systems with Many Solutions

7. Show with 3 × 2 examples that if a system Ax = b has more equations than


unknowns, then any one of the three cases of no solution, one solution, or infinitely
many solutions can occur. (Hint: Just expand the 2 × 2 examples at the beginning
of this section to 3 × 2 examples.)

8. A nutritious breakfast drink can be made by mixing whole egg, milk, and orange
juice in a blender. The food energy and protein for these ingredients are given below.
How much of each should be blended to produce a drink with 560 calories of energy
and 24 grams of protein?
energy (kcal) protein (g)
1 egg 80 6
1 cup milk 180 9
1 cup orange juice 100 3

9. Consider the chemical reaction

a NO2 + b H2 O = c HNO2 + d HNO3 .

The reaction must be balanced, that is, the number of atoms of each element must
be the same before and after the reaction. For oxygen, for example, this would mean
2a + b = 2c + 3d. While there are many possible choices for a, b, c, d that balance the
reaction, it is customary to use the smallest possible positive integers. Find such a
solution.

10. Find the equation of the circle in the form c1 (x2 + y 2 ) + c2 x + c3 y + c4 = 0 that
passes through the points (2,6), (2,0), (5,3).
8. Determinants 37

8. DETERMINANTS

Determinants have been known and studied for 300 years. Today, however, there
is far less emphasis on them than in the past. In modern mathematics, determinants
play an important but narrow role in theory and almost no role at all in computations.
We will make use of them in our study of eigenvalues in Section 9. The determinant
det(A) is a number associated with a square matrix A. For 2 × 2 and 3 × 3 matrices
it is defined as follows
� �
a11 a12
det = a11 a22 − a21 a12
a21 a22

 
a11 a12 a13
det  a21 a22 a23  =
a31 a32 a33

a11 a22 a33 + a12 a23 a31 + a13 a32 a21 − a31 a22 a13 − a21 a12 a33 − a32 a23 a11 .

These are the familiar diagonal rules from high school. These rules cannot be extended
to larger matrices! For such matrices we must use the general definition:

det(A) = sign(σ)a1σ(1) a2σ(2) a3σ(3) · · · anσ(n) ,
σ

where σ(1), · · · , σ(n) is a permutation or rearrangement of the numbers 1, 2, · · · , n.


This means the determinant is the sum of all possible products of n entries of A,
where each product consists of entries taken from unique rows and columns. In
particular, it is easy to see that this is true for the 3 × 3 case written out above. For
example, the second term in the high-school formula comes from
 
∗ a12 ∗
 ∗ ∗ a23  .
a31 ∗ ∗

The symbol sign(σ) is equal to +1 or −1 depending on how the rows and columns
are chosen. We intentionally leave this definition of the determinant vague since it is
hard to understand, difficult to motivate, and impossible to compute. It is important
to us only because from it the following properties of the determinant can be proved.
We will omit the proofs since in this section we want to get through the determinant
as quickly as possible. In a later section we present another approach that will make
clear where the mysterious determinant formula comes from and how the properties
are dertived.
38 8. Determinants

(1) The determinant of the identity matrix is 1.


 
1 0 0
det  0 1 0 = 1
0 0 1

(2) If A has a zero row or two equal rows or two rows that are multiples of each other,
then det(A) = 0.
     
1 4 2 1 4 2 1 4 2

det 0 0 0 = 0 
det 3 5 2 = 0 
det 3 5 2 = 0
5 7 1 1 4 2 2 8 4

(3) The determinant changes sign when two rows are exchanged.
   
1 2 2 3 1 3
det  5 7 1  = − det  5 7 1
3 1 3 1 2 2

(4) The typical Gaussian elimination operation of subtracting a multiple of one row
from another leaves the determinant unchanged.
   
1 2 2 1 2 2

det 3 1 3  = det  0 −5 −3 
5 7 1 5 7 1

(5) If all the entries in a row have a common factor, then that factor can be taken
outside the determinant
   
6 3 12 2 1 4
det  5 7 1  = 3 det  5 7 1
2 5 2 2 5 2

(6) The determinant of the transpose of a matrix is the same as the determinant of
the matrix itself: det(AT ) = det(A).
   
1 2 2 1 5 3

det 5 5  
5 = det 2 5 1
3 1 3 2 5 3
8. Determinants 39

(7) The determinant of a (lower or upper) triangular matrix is the product of its
diagonal entries.  
2 3 7
det  0 5 2  = 2 · 5 · 3 = 30
0 0 3

(8) The determinant of a product is the product of the determinants.

det(AB) = det(A) det(B)

(9) A is nonsingular if and only if det(A) �= 0.

Note that property 6 means that all the properties about rows also hold for
columns. Note also that property 9 can be added to the theorem of Section 5 to
obtain
Theorem. For any square matrix A the following statements are equivalent.
(a) A is nonsingular
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) Ax = 0 has the unique solution x = 0.
(e) det(A) �= 0.
Proof: We show (a) ⇔ (e). If A is nonsingular, then Gaussian elimination will produce
an upper triangular matrix with nonzero pivots. Since by properties 3 and 4 Gaussian
elimination changes at most the sign of the determinant, we have that det(A) �= 0.
If A is singular, then Gaussian elimination will produce an upper triangular matrix
with at least one zero pivot. By the same argument, det(A) = 0.

If the determinant is to have any practical value, there must be an efficient way to
compute it. We could try to use the formula in the definition of the determinant. But
as we saw, the formula consists of a sum of products of n entries of A, where in each
product each factor is an entry from a different row and a different column. Since the
first entry in a product can be chosen in n ways, the second in n − 1 ways, the third in
n − 2 ways, and so on, there are therefore n(n−1)(n−2)(n−3) · · · (2)(1) = n! different
products in the sum. This means there are n! products that must be summed up,
each of which requires n − 1 multiplications, resulting in (n − 1)n! multiplications
in all. For a 25 × 25 matrix there would be 24·25! or 3.7×1026 multiplications. A
computer that can perform a million multiplications a second would take 1013 years
to compute this determinant! This is clearly unacceptable.
An alternate approach is suggested by the proof of property (10). Use Gaussian
elimination to triangulate the matrix. Then the determinant is the product of the
40 8. Determinants

diagonal entries (the pivots!) times +1 or −1 depending upon whether there was an
even or odd number of row exchanges. For example, the matrix at the beginning of
Section 4 was reduced to an upper triangular matrix by Gaussian elimination with
one row exchange.
     
1 2 3 1 2 3 1 2 3
2 4 9 → 0 0 3 → 0 2 1
2 6 7 0 2 1 0 0 3
We therefore have
   
1 2 3 1 2 3
det  2 4 9  = − det  0 2 1  = −1 · 2 · 3 = −6.
2 6 7 0 0 3

Since this method uses only Gaussian elimination, it requires n3 /3 operations. For a
25 × 25 matrix this is only 5208 operations or only 0.005 seconds on our hypothetical
computer! The method above is an excellent way to compute the determinant, but it
takes just as many steps as Gaussian elimination. In fact, it is Gaussian elimination!
Why do we want to compute a determinant in the first place? What can it tell
us about a matrix? Whether or not the matrix is singular? But we can determine
that just by doing Gaussian elimination. If we run into a zero pivot that cannot
be cured by row exchanges, then we know the matrix is singular. Otherwise we get
its LU factorization. So do we ever need to compute a determinant in practice?
No! Determinants are rarely computed outside a classroom. They are important,
however, for theoretical developments as we will see in the next section.
The determinant can be evaluated in other ways. In particular, there is the
cofactor expansion of the determinant. It expresses the determinant of a matrix as a
sum of determinants of smaller matrices. Here we use it to find the determinant of
the matrix above:
 
1 2 3 � � � � � �
  4 9 2 9 2 4
det 2 4 9 = 1 det − 2 det + 3 det
6 7 2 7 2 6
2 6 7
= 1(28 − 54) − 2(14 − 18) + 3(12 − 8)
= −26 + 8 + 12
= −6.

In words, the determinant of the matrix on the left is the sum of the entries of its
first row times the cofactors of its first row. A cofactor is the determinant of the
2 × 2 matrix obtained from the original matrix by crossing out a particular row and
a column, with an appropriate sign placed in front of the determinant. In particular,
the cofactor of the first entry is the determinant of the matrix obtained by crossing
8. Determinants 41

out the first row and first column; the cofactor of the second entry is the determinant
of the matrix obtained by crossing out the first row and the second column with a
negative sign in front; and the cofactor of the third entry is the determinant of the
matrix obtained by crossing out the first row and the third column. Here is another
cofactor expansion of the same matrix:
 
1 2 3 � � � � � �
2 9 1 3 1 3
det  2 4 9  = −2 det + 4 det − 6 det
2 7 2 7 2 9
2 6 7
= −2(14 − 18) + 4(7 − 6) − 6(9 − 6)
= 8 + 4 − 18
= −6.

This time we expanded with respect to the second column. Note that the 2 × 2 ma-
trices arise in the same way, by crossing out the row and column of the corresponding
entry. Note also the signs. In general the signs in the definition of the cofactors form
a checkerboard pattern:  
+ − + − ···
− + − + ···
 
+ − + − ···.
 
− + − + ···
.. .. .. ..
. . . .
Here’s an example of a cofactor expansion of the determinant of a 4 × 4 matrix:
 
1 1 2 4
1 0 4 2
det  
1 −1 0 0
2 2 2 6
       
0 4 2 1 4 2 1 0 2 1 0 4
  
= 1 det −1 0 0 − 1 det 1 0  
0 + 2 det 1 −1  
0 − 4 det 1 −1 0.
2 2 6 2 2 6 2 2 6 2 2 2
We expanded with respect to the first row. In this case we are now faced with
finding four 3 × 3 determinants. We could use either cofactor expansion or the high-
school formula on each of these smaller determinants. (Note that we should have
expanded with respect to the third row because then we would have had only two
3 × 3 determinants to evaluate.) It is becoming clear that the method of cofactor
expansion requires a great deal of computation. Just think about the 5 × 5 case! In
fact, it generally requires exactly the same number of multiplications as the formula
that defined the determinant in the first place. It is therefore extremely impractical.
It does, however, have some value in theoretical considerations and in the hand
42 8. Determinants

computation of determinants of matrices that contain algebraic expressions. For


example, to compute  
x y 1

det 2 8 1
4 7 1
(ignoring that fact that we have the high-school formula for this!) we would use
cofactor expansion with respect to the first row. We would definitely not want to use
Gaussian elimination here.
As long as we have come this far we might as well write down the general formula
for the cofactor expansion of the determinant of a matrix with respect to its ith row.
It is

det A = ai1 [(−1)i+1 det Mi1 ] + ai2 [(−1)i+2 det Mi2 ] + · · · + ain [(−1)i+n det Min ]

where Mij is the submatrix formed by deleting the ith row and jth column of A. (The
formula for expansion with respect to columns is similar.) Note that the cofactor is
officially defined as the entire quantity in brackets, that is, as the determinant of the
submatrix Mij times (−1)i+j . The formula is not very illuminating, and we make no
attempt to prove it.

EXERCISES

1. Compute the determinants by Gaussian elimination


 
1 3 1
(a)  1 1 4 
0 2 0
 
1 1 1
(b)  3 3 −1 
2 −2 2
 
2 1 3
(c)  −2 5 1 
4 2 4
 
1 1 2 4
1 0 4 2
(d)  
1 −1 0 0
2 2 2 6
 
1 2 3 1
1 3 3 2
(e)  
2 4 3 3
1 1 1 1
8. Determinants 43
 
0 0 1 0
1 0 0 0
(f)   (Use property (3) for a quick solution.)
0 0 0 1
0 1 0 0
2. Give 2 × 2 examples of the following.
(a) A �= 0 and det(A) = 0.
(b) A �= B and det(A) = det(B)
(c) det(A + B) �= det(A) + det(B)

3. Prove the following.


(a) det(Ak ) = (det(A))k for any positive integer k.
(b) det(A−1 ) = 1/ det(A)
(c) det(BAB −1 ) = det(A)
(d) det(cA) = cn det(A) where A is n × n.

4. For the matrix in Exercise 1(a) find det(A−1 ) and det(AT ) without doing any
work.

5. Use cofactor expansions to evaluate the following determinants.


 
3 7 5 7
0 3 6 0
(a)  
1 1 7 2
0 0 1 0
 
1 1 2 4
1 0 4 2
(b)  
1 −1 0 0
2 2 2 6
 
x y 1
(c)  2 8 1 
4 7 1
 
2−x 2 2
(d)  1 2−x 0 
1 0 2−x
� �
B C
6. Suppose we have a square n × n matrix that looks like A = where B,
0 D
C, and D are submatrices of sizes p × p, p × (n − p), and (n − p) × (n − p). (A is said
to be partitioned into blocks.) Show that det(A) = det(B) det(D). (Use Gaussian
elimination.)

7. True or false? “If det(A) = 0, then the homogeneous system Ax = 0 has nonzero
solutions.”
44 9. Eigenvalues

9. EIGENVALUES

There are many problems in engineering and science where, given a square matrix
A, it is necessary to know if there is a number λ (read “lambda”) and a nonzero vector
x such that Ax = λx. The number λ is called an eigenvalue of A and the vector x
is called an eigenvector associated with λ. (“Eigen” is a German word meaning “its
own” or “peculiar to it.”) For example
    
5 4 4 1 1
 −7 −3 −1   −1  = 5  −1  .
7 4 2 1 1
 
1
So that 5 is an eigenvalue of the matrix above and  −1  is an associated eigenvector.
1
Note that any multiple
 of this vector is also an eigenvector. That is, any vector of
1
the form c  −1  is an eigenvector associated with the eigenvalue 5. So what we
1
actually have is an infinite family of eigenvectors. Note also that
 thisinfinite family
−2
�
can be represented in many other ways such as, for example, c 2 .
−2
Suppose we want to find the eigenvalues of a matrix A. We start by rewriting
the equation Ax = λx as Ax=λIx or Ax − λIx=0 or (A − λI)x=0. We therefore
want to find those numbers λ for which the homogeneous system (A − λI)x = 0 has
nonzero solutions x. By the theorem of the previous section, this is equivalent to
asking for those numbers λ that make the matrix A − λI singular or, in other words,
for which det(A − λI) = 0. This equation is called the characteristic equation of A.
The left-hand side is a polynomial in λ and is called the characteristic polynomial of
A.

Example 1: Find the eigenvalues of the matrix


� �
4 2
A= .
−1 1

First set � � � � � �
4 2 λ 0 4−λ 2
A − λI = − = .
−1 1 0 λ −1 1−λ
The characteristic equation of A is det(A−λI) = 0, which can be rewritten as follows:
� �
4−λ 2
det =0
−1 1−λ
9. Eigenvalues 45

(4 − λ)(1 − λ) − 2(−1)=0
λ2 − 5λ + 6=0
(λ − 2)(λ − 3)=0.
The eigenvalues of A are therefore λ = 2 and λ = 3. We can go further and find the
associated eigenvectors. For the case λ = 2 we wish to find nonzero solutions of the
system (A − 2I)x = 0, which can be rewritten as
� �� � � �
4−2 2 u 0
= .
−1 1 − 2 v 0

We use Gaussian elimination in array form


� � �
2 2 �� 0
−1 −1 � 0

to get � � �
2 2 �� 0
.
0 0�0
The solution is v = c and u = −c, or in vector form
� � � �
u −1
=c .
v 1

The case λ = 3 is similar. Write (A − 3I)x = 0 as


� �� � � �
4−3 2 u 0
=
−1 1 − 3 v 0

and then solve � � �


1 2 �� 0
−1 −2 � 0
to get � � �
1 2 �� 0
.
0 0�0
The solution is v = c and u = −2c, or in vector form
� � � �
u −2
=c .
v 1

Therefore, for each of the two eigenvalues we have found an infinite family of eigen-
vectors parametrized by a single arbitrary constant.
46 9. Eigenvalues

Example 2: Things can become more complicated as the size of the matrix increases.
Consider the matrix  
2 3 0
A = 4 3 0.
0 0 6
Proceeding as before we have the characteristic equation det(A − λI) = 0 rewritten
as  
2−λ 3 0
det  4 3−λ 0  = 0,
0 0 6−λ
(2 − λ)(3 − λ)(6 − λ) − 3 · 4(6 − λ)=0,
[(2 − λ)(3 − λ) − 3 · 4](6 − λ)=0,
[λ2 − 5λ − 6](6 − λ)=0,
−(λ − 6)2 (λ + 1)=0.
Here we have two eigenvalues, λ = 6 and λ = −1. To find the eigenvectors for λ = −1,
solve (A − (−1)I)x = 0 or
    
2+1 3 0 u 0
 4 3+1 0   v = 0
 
0 0 6+1 w 0

by reducing  � 
3 3 0 �� 0
4 4 0 �� 0 
0 0 7�0
to  � 
3 3 0 �� 0
0 0 0 �� 0  .
0 0 7�0
The solution is w = 0, v = c, and u = −c, or in vector form
   
u −1
 v  = c 1 .
w 0

For the case λ = 6, solve (A − 6I)x = 0 or


    
2−6 3 0 u 0
 4 3−6 0   v = 0
 
0 0 6−6 w 0
9. Eigenvalues 47

by reducing  � 
−4 3 0 �� 0
 4 −3 0 �� 0 
0 0 0�0
to  � 
−4 3 0 �� 0
 0 0 0 �� 0  .
0 0 0�0

The solution is w = c, v = d, and u = 34 d, or in vector form


    3
u 0 4
 v  = c0 + d 1 .
w 1 0

We have therefore obtained an infinite family of eigenvectors parametrized by two


arbitrary constants. Note that this infinite family can be represented in many other
ways such as, for example,
     
u 0 3
 v  = c�  0  + d�  4  .
w 2 1

So in this example we have a 3 × 3 matrix with only two distinct eigenvalues. We


write λ = −1, 6, 6 to indicate that 6 is repeated root of the characteristic equation,
and we say that 6 has multiplicity 2. For λ = 6 we found two linearly indepen-
dent eigenvectors such that arbitrary linear combinations of them generate all other
eigenvectors. Intuitively, “linearly independent” means “essentially different.” We
will not discuss the precise mathematical meaning of independence here (see Section
16), except to say that this does not always happen as in the following example.

Example 3: It is possible for a repeated eigenvalue to have only one independent


eigenvector. Consider the matrix
� �
2 1
A= ,
0 2

which is easily seen to have characteristic equation (λ − 2)2 = 0 and therefore the
repeated eigenvalue λ = 2, 2. But in solving the system (A − 2I)x = 0 we obtain
� �� � � �
2−2 1 u 0
=
0 2−2 v 0
48 9. Eigenvalues

or � � �
0 1 �� 0
.
0 0�0
So u = c and v = 0 or in vector form
� � � �
u 1
=c .
v 0

Therefore the eigenvalue λ = 2, 2 has only one independent eigenvector.

Example 4: Even worse, a matrix can have no (real) eigenvalues at all. For example
the matrix � �
0 1
A=
−1 0

has characteristic equation λ2 + 1 = 0 which has no real solutions.

In this section we have seen that, in order to understand eigenvalues, we have to know
something about determinants. In fact, the characteristic polynomial is defined as a
determinant. Because of this, in practice it is very difficult to compute characteristic
polynomials for large matrices. Even when this can be done, the problem of finding
the roots of a high degree polynomial is numerically unstable. For practical computa-
tions, a much more sophisticated algorithm called the QR method, which has nothing
to do with characteristic polynomials, is used to find eigenvalues and eigenvectors.
Although the characteristic polynomial is important in theory, in practice it is rarely,
if ever, computed.

EXERCISES

1. Find the eigenvalues and eigenvectors of the following matrices.


� �
1 1
(a)
0 2
 
5 0 2
(b)  0 1 0 
−4 0 −1
 
2 2 2
(c) 1 2 0
1 0 2
9. Eigenvalues 49
 
6 4 4
(d)  −7 −2 −1  Hint: Expand in cofactors of the first row.
7 4 3
 
0 2 2
(e)  2 0 −2 
2 −2 0
 
−2 0 0 0
 0 −2 5 −5 
(f)  
0 0 3 0
0 0 0 3

2. Suppose you and I are computing eigenvectors. We get the results below. Explain
in what sense we got the same answers, or not.
   
−3 4
(a) You get  9  and I get  −12 .
6 −8
       
1 0 1 1
(b) You get  1  ,  1  and I get  2  ,  0  .
0 1 1 −1
         
1 0 1 1 1
(c) You get 1 , 1 and I get 2 , 0 , 1  .
        
0 1 1 −1 0
       
1 0 1 2
(d) You get 1 , 1 and I get 2 , 4  .
      
0 1 1 2

3. Prove the following.


(a) A and AT have the same eigenvalues. (Hint: They have the same characteristic
polynomials.)
(b) A and BAB −1 have the same eigenvalues. (Hint: They have the same charac-
teristic polynomials.)
 
λ1
 λ2  −1
(c) A = S   . ..
 S has eigenvalues λ1 , λ2 , · · · , λn .

λn
(d) If Ax = λx, then A2 x = λ2 x, A3 x = λ3 x, · · ·.
(e) If Ax = λx and A is nonsingular, then A−1 x = λ1 x
(f) If A is singular, then λ = 0 must be an eigenvalue of A.
50 9. Eigenvalues

(g) If A is triangular, then its eigenvalues are its diagonal entries a11 , a22 , · · · , ann .
� �
B C
4. If A = is the matrix of Section 8 Exercise 6, then show that the
0 D
eigenvalues of A are the eigenvalues of B together with the eigenvalues of D. (Hint:
Show det(A − λI) = det(B − λI) det(D − λI).)

5. Findthe eigenvalues
 and associated eigenvectors of each of the following matrices.
2 0 0
(a)  0 2 0 
0 0 2
 
2 1 0
(b) 0 2 0
0 0 2
 
2 1 0
(c) 0 2 1
0 0 2
10. Diagonalization 51

10. DIAGONALIZATION

Example 1: Let’s look back at Example 1 of the previous section


� �
4 2
A=
−1 1

which had two eigenvalues, λ = 2 and λ = 3. If we write the two equations Ax1 = 2x1
and Ax2 = 3x2 , where x1 and x2 are the associated eigenvectors, we obtain
� �� � � � � �� � � �
4 2 −1 −1 4 2 −2 −2
=2 and =3 .
−1 1 1 1 −1 1 1 1

The two eigenvectors can be lined up to form the columns of a matrix S so that the
two equations above can be combined into one matrix equation AS = SD where D
is the diagonal matrix of eigenvalues:
� �� � � �� �
4 2 −1 −2 −1 −2 2 0
= .
−1 1 1 1 1 1 0 3

This equation can be rewritten as A = SDS −1 :


� � � �� �� �−1
4 2 −1 −2 2 0 −1 −2
= .
−1 1 1 1 0 3 1 1

What just happened in this example is so important that we will illustrate it


for the general case. Suppose the n × n matrix A has eigenvalues λ1 , λ2 , · · · , λn
with linearly independent associated eigenvectors v1 , v2 · · · , vn , then the equations
Av1 = λ1 v1 , Av2 = λ2 v2 , · · · , Avn = λn vn can be written in matrix form as

 . .. ..   .. .. .. 
.. . . . . .
   
A  v1 v2 · · · vn  =  Av1 Av2 · · · Avn 
.. .. .. .. .. ..
. . . . . .
 . .. .. 
.. . .
 
=  λ1 v1 λ2 v2 · · · λn vn 
.. .. ..
. . .
 .  
.. .. ..  λ1
. .  λ2 
 
=  v1 v2 · · · vn    .. 

.. .. .. .
. . . λn
52 10. Diagonalization

This matrix equation is of the form AS = SD. By multiplying on the right by


S −1
we obtain A = SDS −1 or
 .  
.. .. ..  λ1 .. .. .. −1
. .  λ  . . .
  2
  
A =  v1 v2 · · · vn   ..   v1 v2 · · · vn  .
.. .. .. . .. .. ..
. . . λn . . .

This last step is possible only if S is invertible. S will in fact be invertible if its
columns, which are the eigenvectors v1 , v2 · · · , vn , are linearly independent. This of
course leaves a giant gap in our discussion since at this point we still don’t know what
“linear independent” means. We will fill this gap in Sections 16 and 19. Our method
for finding eigenvectors, which is to solve (A − λI)x = 0 by Gaussian elimination,
does in fact produce linearly independent eigenvectors, one for each free variable.
The only question is are there enough linearly independent eigenvectors to form a
square matrix S? If the answer is yes, then A can be factored into A = SDS −1 where
S is invertible and D is diagonal, and A is called diagonalizable. If the answer is no,
then A is not diagonalizable.

Example 2: The matrix of Example 2 of the previous section is diagonalizable. Just


line up its eigenvectors to form the columns of S and write
     −1
2 3 0 −1 0 3
4 −1 0 0 −1 0 3
4
4 3 0 =  1 0 1  0 6 0 1 0 1 .
0 0 6 0 1 0 0 0 6 0 1 0

Note that the diagonal factorization of a matrix is not completely unique. For exam-
ple,
     −1
2 3 0 1 0 3 −1 0 0 1 0 3
 4 3 0  =  −1 0 4   0 6 0   −1 0 4 
0 0 6 0 1 0 0 0 6 0 1 0
is an equally valid factorization.

Whether or not a matrix can be diagonalized has important consequences for


the matrix and what we can do with it. It is one of the paramount questions in linear
algebra. We now give some conditions that insure that a matrix can be diagonalized.
1. An n × n matrix is diagonalizable if and only if it has n linearly independent
eigenvectors. In Example 2 above, even though there are only two eigenvalues, there
are three independent eigenvectors, and they are used to form the columns of S. If
a matrix does not have enough independent eigenvectors, as in Section 9 Examples
3 and 4, then it is not diagonalizable. Such matrices are called defective.
2. An n × n matrix is diagonalizable if it has n real and distinct eigenvalues. In Ex-
ample 1 above, there are two distinct eigenvalues, each eigenvalue has an associated
10. Diagonalization 53

eigenvector, and these eigenvectors can be used to form the columns of S since they
are independent. But why do distinct eigenvalues insure diagonability in general?
This follows from the fact, to be proved later, that eigenvectors associated with dis-
tinct eigenvalues are always independent. (See Section 22.)
3. It would be helpful if we could decide if a matrix is diagonalizable just by looking
at it, without having to go through the tedious process of determining if it has enough
independent eigenvectors. Unfortunately there is no simple way to do this. But there
is an important class of matrices that are automatically diagonalizable. These are
the symmetric matrices. A deep theorem in linear algebra, called The Spectral The-
orem, says in part that all symmetric matrices are diagonalizable. (See Section 22.)
A nonsymmetric matrix may or may not be diagonalizable, but, fortunately, many
of the matrices that arise in physics and engineering are symmetric and are therefore
diagonalizable.

EXERCISES

1. Write diagonal factorizations for each of the matrices in Section 9 Exercise 1.

2. If A = SDS −1 , then show An = SDn S −1 .

3. Decide which of the following matrices are diagonalizable just by looking at them.
 
0 −2 2
(a)  −2 0 −2 
2 2 2
 
0 2 2
(b)  2 0 −2 
2 −2 0
 
0 2 2
(c)  −2 0 2
2 −2 0

4. If �A �is 2 × 2 with
� eigenvalues
� λ1 = 6 and λ2 = 7 and associated eigenvectors
5 2
v1 = and v2 = , then find the following.
9 4
(a) The characteristic polynomial of A.
(b) det(A)
(c) A
(d) The eigenvalues of A2 .
(e) det(A2 )
54 11. Matrix Exponential

11. MATRIX EXPONENTIAL

So far we have developed a simple algebra for square matrices. We can add,
subtract, and multiply them, and therefore expressions like I + 2A − 3A2 + A3 make
sense. Of course we cannot divide matrices, but A−1 can be thought of as the
reciprocal of a matrix (defined only if A is nonsingular).
√ Is it possible for us to go
further and give meaning to expressions like A, e , ln(A), sin(A), cos(A), . . .?
A

Under certain conditions we can, but, because of its importance in applications, we


will focus only on the matrix exponential eA . To define it we use the Taylor series
for the real exponential function:
�∞
xn x2 x3
e =
x
=1+x+ + + · · · for − ∞ < x < ∞.
n=0
n! 2! 3!

This infinite series converges to ex for any value of x and therefore can be taken as
the definition of ex . We use it as the starting point for the matrix exponential by
simply defining
�∞
1 n 1 1
eA = A = I + A + A2 + A3 + · · ·
n=0
n! 2! 3!

for a square matrix A. Does this make sense? Let’s try an example:
�� �� � � � � � �2 � �
0 0 1 0 0 0 1 0 0 1 0
exp = + + + ··· =
0 0 0 1 0 0 2! 0 0 0 1

(Note that eA is also written as exp(A).) The exponential of the zero matrix is
therefore the identity matrix. Let’s try another example:
�� �� � � � � � �2 � �3
2 0 1 0 2 0 1 2 0 1 2 0
exp = + + + + ···
0 3 0 1 0 3 2! 0 3 3! 0 3
� � � � � 22 � � 23 �
1 0 2 0 0 0
= + + 2! + 3! + ···
0 1 0 3 2
0 32! 0 33!
3

� ∞ n 
2
n! 0
 
=  n=0 ∞ n 

0 3
n!
� 2 � n=0
e 0
= .
0 e3

It is clear that to exponentiate a diagonal matrix you just exponentiate its diagonal
entries. Note that in both computations above the infinite series of matrices con-
verged (trivially in the first example). Does this always happen? Yes! It can be
11. Matrix Exponential 55

shown that the infinite series for eA converges for any square matrix A whatever.
(We omit the proof.) Therefore eA exists for any square matrix A.
Accepting this, we still have the problem of how to compute eA for more compli-
cated matrices than those in the two previous examples. We can use two properties of
the matrix exponential to help us. The first is that if AB = BA then eA+B = eA eB .
(We omit the proof.) This just says that, if A and B commute, then for these matri-
ces the matrix exponential satisfies the familiar law of exponents. We use this fact
to compute the following:
�� �� �� � � ��
2 3 2 0 0 3
exp = exp +
0 2 0 2 0 0
�� �� �� ��
2 0 0 3
= exp exp
0 2 0 0
� 2 � �� � � � � �2 �
e 0 1 0 0 3 1 0 3
= + + + ···
0 e2 0 1 0 0 2! 0 0
� 2 � �� � � � � � �
e 0 1 0 0 3 1 0 0
= + + + ···
0 e2 0 1 0 0 2! 0 0
� 2 �� �
e 0 1 3
=
0 e2 0 1
� 2 �
e 3e2
= .
0 e2

(Don’t forget to first show the two matrices above commute in order to justify the
use of the law of exponents.)
The second helpful property of matrix exponentials is that if A = SDS −1 then
eA = SeD S −1 . The proof is so simple we exhibit it here:

�∞
1
e =
A
(SDS −1 )n
n=0
n!
�∞
1
= SDn S −1 (See Section 10 Exercise 2.)
n=0
n!
�∞ �
� 1
=S Dn S −1
n=0
n!
= SeD S −1

Given the diagonal factorization


� � � �� �� �−1
4 −5 1 5 −1 0 1 5
=
2 −3 1 2 0 2 1 2
56 11. Matrix Exponential

we can therefore immediately write down


�� �� � �� �� �−1
4 −5 1 5 e−1 0 1 5
exp = .
2 −3 1 2 0 e2 1 2

We could multiply out the right-hand side, or we might just want to leave it in this
form. If A is defective, that is, if A doesn’t have a diagonalization factorization, then
there are more sophisticated ways to compute eA . We will not pursue them here. In
applications to ODE’s we will need to compute matrix exponentials of the form eAt .
But this is easy for diagonalizable matrices like the one above since
� � � �� �� �−1
4 −5 1 5 −t 0 1 5
t=
2 −3 1 2 0 2t 1 2

and therefore
�� � � � �� �� �−1
4 −5 1 5 e−t 0 1 5
exp t )= .
2 −3 1 2 0 e2t 1 2

There is one more property of matrix exponentials that we will need in appli-
d at
cations. It is analogous to the derivative formula e = aeat . For the matrix
dt
d
exponential it is just eAt = AeAt . The proof follows:
dt

d At d � 1
e = (At)n
dt dt n=0 n!

d � 1 n n
= A t
dt n=0 n!
�∞
1 n n−1
= A nt
n=1
n!

� 1
=A An−1 tn−1
n=1
(n − 1)!

� 1
=A (At)n−1
n=1
(n − 1)!
�∞
1
=A (At)n
n=0
n!
= AeAt .
11. Matrix Exponential 57

EXERCISES

1. Find eA where A is equal to each of the matrices of Section 9 Exercise 1.

2. Find eAt where A is equal to each of the matrices of Section 9 Exercise 1.


�� � � � 2t �
2 3 e 3te2t
3. Show exp t = .
0 2 0 e2t

d
4. Verify the formula eAt = AeAt where A is equal to the following matrices.
dt
� �
2 0
(a)
0 3
� �
2 3
(b)
0 2

5. Prove the following equalities.


�� �� � �
0 β cos β sin β
(a) exp = (Use the series definition of the matrix
−β 0 − sin β cos β
exponential.)
�� �� � �
α β eα cos β eα sin β
(b) exp = (Use the law of exponents.)
−β α −eα sin β eα cos β

6. If Av = λv, then show eA v = eλ v.

7. Prove (eA )−1 = e−A and conclude that eA is nonsingular for any square matrix
A.
58 12. Differential Equations

12. DIFFERENTIAL EQUATIONS

We recall the differential equation ẏ = ay that governs exponential growth and


decay. The general solution is y(t) = Ceat . This fact will serve as a model for all that
follows.

Example 1: Suppose we want to solve the following linear system of first-order ordi-
nary differential equations with initial conditions:.

ẋ = 4x − 5y x(0) = 8
ẏ = 2x − 3y y(0) = 5

We can write this system in matrix notation as


� � � �� � � � � �
ẋ 4 −5 x x(0) 8
= =
ẏ 2 3 y y(0) 5
� �
x(t)
or letting u(t) = we can write it as
y(t)
� � � �
4 −5 8
u̇ = u u(0) = .
2 3 5

If we let A be the matrix defined above (called the coefficient matrix), then the
system becomes simply u̇ = Au. The solution of the system u̇ = Au with initial con-
dition u(0) is u(t) = eAt u(0). This fact follows immediately from the computations
d At
(e u(0)) = A(eAt u(0)) and eA0 u(0) = Iu(0) = u(0). For the example above, the
dt
solution would just be �� � �� �
4 −5 8
u(t) = exp t .
2 3 5
Since the coefficient matrix has the diagonal factorization
� � � �� �� �−1
4 −5 1 5 −1 0 1 5
= ,
2 −3 1 2 0 2 1 2

we have � �� �� �−1 � �
1 5 e−t 0 1 5 8
u(t) = .
1 2 0 e2t 1 2 5
To find the final solution it looks like we are going to have to compute an inverse.
But in fact this can be avoided by writing
� �−1 � � � �
1 5 8 c
= 1
1 2 5 c2
12. Differential Equations 59

as � �� � � �
1 5 c1 8
= ,
1 2 c2 5
which is just a linear system. Solving by Gaussian elimination we obtain
� � � �
c1 3
= .
c2 1

And putting this back into u(t) we get


� �� �� �
1 5 e−t 0 3
u(t) =
1 2 0 e2t 1
� �� �
1 5 3e−t
=
1 2 e2t
� −t � .
3e + 5e2t
=
3e−t + 2e2t
� � � �
−t 1 2t 5
= 3e +e
1 2

The solution in terms of the individual functions x and y is

x(t) = 3e−t + 5e2t


y(t) = 3e−t + 2e2t .

If no initial conditions are given, then c1 and c2 would have to be carried through to
the end. The solution would then look like
� �� �� �
1 5 e−t 0 c1
u(t) =
1 2 0 e 2t
c2
� �
c e−t + 5c2 e2t
= 1 −t
c1 e + 2c2 e2t
� � � �
−t 1 2t 5
= c1 e + c2 e .
1 2

We have expressed the solution in matrix form and in vector form. Note that the
vector form is a linear combination of exponentials involving the eigenvalues times
the associated eigenvectors. In fact if we set t = 0 in the vector form, then from the
initial conditions we obtain
� � � � � �
1 5 8
c1 + c2 =
1 2 5
60 12. Differential Equations

or � �� � � �
1 5 c1 8
= ,
1 2 c2 5
which is the same system for the c’s that we obtained above. So the vector form of
the solution carries all the information we need. This suggests that we really don’t
need the matrix factorization at all. To find the solution to u̇ = Au, just find the
eigenvalues and eigenvectors of A, and, assuming there are enough eigenvectors, write
down the solution in vector form.

Example 2: Let’s try another system:


    
ẋ 2 3 0 x
 ẏ  =  4 3 0  y .
ż 0 0 6 z
Since the coefficient matrix has the diagonal factorization
     −1
2 3 0 −1 0 3 −1 0 0 −1 0 3
4 3  
0 = 1 0 4  0 6 0   1 0 4
0 0 6 0 1 0 0 0 6 0 1 0
we can immediately write down the solution as
    −t  −1  
x −1 0 3 e 0 0 −1 0 3 x(0)
 y  =  1 0 4  0 e6t 0  1 0 4   y(0) 
z 0 1 0 0 0 e6t 0 1 0 z(0)
  −t  
−1 0 3 e 0 0 c1

= 1 0 4  0 e 6t
0  c2 
0 1 0 0 0 e 6t
c3
  
−1 0 3 c1 e−t
=  1 0 4  c2 e6t 
0 1 0 c3 e6t
     
−1 0 3
= c1 e−t  1  + c2 e6t  0  + c3 e6t  4 .
0 1 0
Since no initial conditions were given, we have arbitrary constants in the solution.

Note that once we recognize the general form of the solution, we can just write it
down without going through the matrix exponential at all. In general, it is clear that
if A is diagonalizable, that is, if it has eigenvalues λ1 , λ2 , · · · , λn and independent
eigenvectors v1 , v2 , · · · , vn , then the solution to u̇ = Au has the form

u(t) = c1 eλ1 t v1 + c2 eλ2 t v2 + · · · + cn eλn t vn .


12. Differential Equations 61

It is also clear that the eigenvalues decide how the solutions behave as t → ∞. If all
the eigenvalues are negative, then all the solutions consist only of linear combinations
of dying exponentials, and therefore u(t) → 0 as t → ∞. In this case the matrix A
is called stable. If at least one eigenvalue is positive, then there are solutions u(t)
containing at least one growing exponential and therefore those u(t) → ∞ as t → ∞.
In this case the matrix A is called unstable . This is the situation with both systems
above. There is also a third possibility. If all the eigenvalues are negative or zero with
at least one actually equal to zero, then the solutions consist of linear combinations
of dying exponentials and at least one constant function, and therefore all solutions
stay bounded as t → ∞. In this case the matrix A is called neutrally stable. The
eigenvalues therefore determine the qualitative nature of the solution.
All this is clear enough for diagonalizable matrices, but what about defective
matrices? Consider the following example:
� � � �� �
ẋ 2 3 x
= .
ẏ 0 2 y

The solution is
� � �� � �� � � 2t �� �
x 2 3 x(0) e 3te2t x(0)
= exp t = .
y 0 2 y(0) 0 e2t y(0)

(See Section 11 Exercise 3.) A term of the form te2t has appeared. This is typical of
defective systems. Note that this term does not change the qualitative nature of the
solution u(t) as t → ∞. In general, terms of the form tn eλt arise, but they tend to
zero or infinity as t → ∞ depending on whether λ is negative or positive. The factor
tn ultimately has no effect. It can be shown that this behavior holds for all defective
matrices. That is, the definitions of stable, unstable, and neutrally stable and their
implications about the long-term behavior of solutions hold for these matrices also.
(Actually a more precise statement has to be made in the case that zero is a multiple
eigenvalue, but we will ignore this possibility.) All of this will become clearer when
we consider the Jordan form of a matrix in a later section.

EXERCISES

1. Find the general solution of u̇ = Au where A is equal to each of the matrices in


Section 9 Exercise 1.

2. Find the solutions of the systems above with the initial conditions below.
       
� � � � x(0) 1 x(0) 0
x(0) 3    
(a) = (b) y(0) = 2 (c) y(0) = 1 
  
y(0) 2
z(0) −3 z(0) 3
62 12. Differential Equations

           
x(0) 2
x(0) 0 x(0) 4
 y(0)   2 
(d)  y(0)  =  0  (e)  y(0)  =  3  (f)  = 
z(0) 1
z(0) 1 z(0) 4
w(0) 2

3. Decide the stability properties of the following matrices.


� �
44 −28
(a)
77 −49
� �
47 −30
(b)
75 −48
� �
8 −6
(c)
15 −11

4. Here is another way to derive the general form of the solution of the system
u̇ = Au, assuming the diagonal factorization A = SDS −1 . Make the change of
variables w = S −1 u, and show that the system then becomes ẇ = Dw. This is just a
simple system of n individual of ODE’s of the form ẇ1 = λ1 w1 , ẇ2 = λ2 w2 , · · · , ẇn =
λn wn . These equations are well-known to have solutions w1 (t) = c1 eλ1 t , w2 (t) =
c2 eλ2 t , · · · , wn (t) = cn eλn t . Write this as
 
c1 eλ1 t
 c2 eλ2 t 
w(t) = 
 ... 

cn eλn t

and conclude that the solution of the original system is

u(t) = Sw(t) = c1 eλ1 t v1 + c2 eλ2 t v2 + · · · + cn eλn t vn ,

where the v’s are the columns of S, that is, the eigenvectors of A. This alternate
approach avoids the matrix exponential, but it does not generalize so easily to the
complex case or the case of defective matrices.
13. The Complex Case 63

13. THE COMPLEX CASE

We can no longer avoid complex numbers. When we considered real systems


Ax = b, the solution x was automatically real. There was no need to consider
complex numbers. But in the eigenvalue problem we have seen that there are real
matrices whose characteristic equations have complex roots. Does this mean that we
have to consider complex eigenvalues, complex eigenvectors, and complex diagonal
factorizations? The answer is yes, and not just for theoretical reasons. The com-
plex case is essential in solving linear systems of differential equations that describe
oscillations.
First we give a brief review of the most basic facts about complex numbers.
Recall that a complex number has the form a + ib where a and b are real numbers and
i√is a quantity that satisfies the equation i2 = −1. (You can think of i as denoting
−1, but don’t try to give any metaphysical meaning to it!) If z = a + ib, then a is
the real part of z and b is the imaginary part of z. Two complex numbers are equal
if and only if their real and imaginary parts are equal. Complex numbers are added
and multiplied much like real numbers, but you must keep in mind that i2 = −1.
For example:
(2 + i) + (3 − i2) = 5 − i
(2 + i)(3 − i2) = 6 − i4 + i3 + 2 = 8 − i
Dividing complex numbers is a little more troublesome. First we take the reciprocal
of a complex number:
1 1 (3 + i2) 3 + i2 3 2
= = = +i
3 − i2 3 − i2 (3 + i2) 9+4 13 13
We just multiplied the numerator and denominator by 3 + i2. We use the same trick
to divide two complex numbers:
2+i 2 + i (3 + i2) 6 + i4 + i3 − 2 4 + i7 4 7
= = = = +i
3 − i2 3 − i2 (3 + i2) 9+4 13 13 13
We say that the complex conjugate of a complex number a + ib is a − ib and write
a + ib = a − ib. In both cases above we multiplied the numerator and denominator
by the complex conjugate of the denominator. Complex conjugation commutes with
multiplication, that is, wz = w z (Exercise 1).
We can define complex matrices in the same way as real matrices. It is possible,
but tedious, to show that the algebra of matrices, Gaussian elimination, inverses,
determinants, eigenvalues and eigenvectors, diagonalization, and so on carry over to
complex matrices.
Now we can go to work. Let’s consider the eigenvalue problem for the matrix
� �
3 −2
A= .
1 1
64 13. The Complex Case

We compute the characteristic equation in the usual way and obtain λ2 − 4λ + 5 = 0.


The roots are 2 + i and 2 − i, a complex conjugate pair. (In fact, all the complex
roots of any real polynomial equation occur in complex conjugate pairs.) Since we
are now in the complex world, we can consider these two complex numbers as the
eigenvalues of A. Now let’s look for the eigenvectors. First we take the eigenvalue
2 + i and, as usual, use Gaussian elimination to solve the system (A − (2 + i)I)x = 0:
� � �
3 − (2 + i) −2 �0

1 1 − (2 + i) � 0
� � �
1−i −2 �� 0
1 −1 − i � 0
� � �
1 −1 − i �� 0
1−i −2 � 0
� � �
1 −1 − i �� 0
.
0 0 �0
(In the second step we exchanged� the �two rows to avoid a complex division.) Solving
1+i
this we obtain the eigenvector . For the eigenvalue 2 − i the computation is
1 � �
1−i
almost the same, and we obtain the eigenvector . (Note that this vector is
1
the complex conjugate of the previous eigenvector. See Exercise 2.) Now we simply
line up these vectors in the usual way and obtain
� �� � � �� �
3 −2 1+i 1−i 1+i 1−i 2+i 0
= ,
1 1 1 1 1 1 0 2−i

and therefore we have the complex diagonal factorization


� � � �� �� �−1
3 −2 1+i 1−i 2+i 0 1+i 1−i
= .
1 1 1 1 0 2−i 1 1

Everything worked exactly as in the real case. Of course, complex arithmetic is


involved, so this isn’t something we would want to do for large systems, but at least
the same principles hold. There is, however, something about this factorization that
is troubling. The three matrices on the right are complex and the matrix on the left
is real. How can this be? Somehow or other, when the three matrices on the right are
multiplied out, all the imaginary parts of the complex numbers appearing in them
must cancel out! From this we might suspect that it shouldn’t really be necessary to
introduce complex numbers in order to obtain a useful factorization of a real matrix.
It turns out that it is possible to transform the complex diagonal factorization
into one which is real and almost diagonal. To describe this we introduce some
13. The Complex Case 65

notation.
� We
� write the first eigenvalue and associated eigenvector as λ = 2 + i and
1+i
v= . Then the second eigenvalue and associated eigenvector are λ = 2 − i
1� �
1−i
and v = . Clearly they are just complex conjugates of the first eigenvalue
1
and eigenvector and therefore don’t add any new information. We can ignore them.
Now identify� the � real
� and
� imaginary parts of λ and v as λ = α + iβ = 2 + i and
1 1
v = x+iy = +i . Then the basic equation Av = λv can be written A(x+iy) =
1 0
(α + iβ)(x + iy). When multiplied out it becomes Ax + iAy = (αx − βy) + i(βx + αy).
Since complex numbers are equal if and only if their real and imaginary parts are
equal, this equation implies that Ax = αx − βy and Ay = βx + αy. These two
equations can be written simultaneously in matrix form as
� � � �� �
x1 y1 x1 y1 α β
A =
x2 y2 x2 y2 −β α
or � �� �� �−1
x1 y1 α β x1 y1
A= .
x2 y2 −β α x2 y2
Therefore for the matrix of our example we obtain
� � � �� �� �−1
3 −2 1 1 2 1 1 1
= .
1 1 1 0 −1 2 1 0

This our desired factorization. Everything on the right side is real. The middle factor
is no longer diagonal, but it exhibits the real and imaginary parts of the eigenvalue
in a nice pattern. (The question of the independence of the vectors x and y will be
settled in Section 16 Exercise 7.)
Let’s look at another example. Let
 
−2 −2 −2 −2
 1 0 −2 −1 
B= .
0 0 1 −2
0 0 1 3

The eigenvalues of B are 2 + i, 2 − i, −1 + i, −1 − i. (These are not so easy to compute


by hand since the characteristic polynomial of B is of fourth degree.) First we find
the eigenvector associated with 2 + i by solving the system (B − (2 + i)I)x = 0. In
array form � 

−4 − i −2 −2 −2 � 0

 1 −2 − i −2 −1 � 0 
 � 
0 0 −1 − i −2 � 0

0 0 1 1−i 0
66 13. The Complex Case

by Gaussian elimination becomes


 � 
1 0 0 0 �0

0 1 0 i �0
 � 
0 0 1 1 − i�0

0 0 0 0 0

0
 −i 
which gives the eigenvector  . Similarly, we find the eigenvector associated
−1 + i
1
with −1 + i by solving the system (B − (−1 + i)I)x = 0. In array form
 � 
−1 − i −2 −2 −2 � 0

 1 1 − i −2 −1 � 0 
 � 
0 0 2 − i −2 � 0

0 0 1 4−i 0

by Gaussian elimination becomes


 � 
1 1−i 0 0�0

0 0 1 0�0
 � 
0 0 0 1�0

0 0 0 0 0

−1 + i
 1 
which gives the eigenvector  . We are essentially done. All we have to do
0
0
now is write down the answers. The complex diagonal factorization is

 
0 0 −1 + i −1 − i
 −i i 1 1 
B= 
−1 + i −1 − i 0 0
1 1 0 0
 
2+i 0 0 0
 0 2−i 0 0 
 
0 0 −1 + i 0
0 0 0 −1 − i
 −1
0 0 −1 + i −1 − i
 −i i 1 1 
  .
−1 + i −1 − i 0 0
1 1 0 0
13. The Complex Case 67

And the corresponding real diagonal-like factorization is


   −1
0 0 −1 1 2 1 0 0 0 0 −1 1
 0 −1 1 0   −1 2 0 0   0 −1 1 0 
B=    .
−1 1 0 0 0 0 −1 1 −1 1 0 0
1 0 0 0 0 0 −1 −1 1 0 0 0
� �
α + iβ 0
Note how each of the two complex conjugate eigenvalues in the
0 � α − iβ �
α β
diagonal matrix of the first factorization expand to 2 × 2 blocks in the
−β α
diagonal-like matrix of the second factorization. From now on we will call such
diagonal-like matrices block diagonal matrices.
Now we apply all this to solving differential equations. Suppose we have the
following system:
ẋ = 3x − 2y
ẏ = x + y
The coefficient matrix is just A of the first example above. To solve the system we
have to compute eAt . Using the real block-diagonal factorization of A computed
above and the result of Section 11 Exercise 5(b), we get
� � �� � �� �
x(t) 3 −2 x(0)
= exp t
y(t) 1 1 y(0)
� � �� � �� �−1 � �
1 1 2 1 1 1 x(0)
= exp t
1 0 −1 2 1 0 y(0)
� �� 2t � � �
1 1 e cos t e2t sin t c1
= .
1 0 −e sin t e cos t
2t 2t
c2
� �� �
1 1 c1 e2t cos t + c2 e2t sin t
=
1 0 −c1 e2t sin t + c2 e2t cos t
� � � �
1 1
= e (c1 cos t + c2 sin t)
2t
+ e (−c1 sin t + c2 cos t)
2t
1 0

Now consider the larger system


ẇ = − 2w − 2x − 2y − 2z
ẋ = w − 2y − z
ẏ = y − 2z
ż = y + 3z.

The coefficient matrix is just B of the second example. We solve the system in the
same way as above using the real block-diagonal factorization of B and obtain
68 13. The Complex Case

     
w(t) −2 −2 −2 −2 w(0)
 x(t)   1 0 −2 −1    x(0) 
  = exp   t  
y(t) 0 0 1 −2 y(0)
z(t) 0 0 1 3 z(0)
 
0 0 −1 1
 0 −1 1 0 
= 
−1 1 0 0
1 0 0 0
  
2 1 0 0
 −1 2 0 0  
exp   t
0 0 −1 1
0 0 −1 −1
 −1  
0 0 −1 1 w(0)
 0 −1 1 0   x(0) 
   
−1 1 0 0 y(0)
1 0 0 0 z(0)
   2t  
0 0 −1 1 e cos t e2t sin t 0 0 c1
 0 −1 1 0   −e sin t e cos t 2t 2t
0 0   c2 
=   
−1 1 0 0 0 0 e−t cos t e−t sin t c3
1 0 0 0 0 0 −e sin t e cos t
−t −t
c4
  
0 0 −1 1 c1 e2t cos t + c2 e2t sin t
 0 −1 1 0   −c1 e2t sin t + c2 e2t cos t 
=  
−1 1 0 0 c3 e−t cos t + c4 e−t sin t
1 0 0 0 −c3 e−t sin t + c4 e−t cos t
   
0 0
 0   −1 
= (c1 e2t cos t + c2 e2t sin t)   + (−c1 e2t sin t + c2 e2t cos t)  
−1 1
1 0
   
−1 1
 1  0
+(c3 e−t cos t + c4 e−t sin t)   + (−c3 e−t sin t + c4 e−t cos t)   .
0 0
0 0
(The third equality requires a slight generalization of Section 11 Exercise 5(b).)
Now we can see the pattern. If λ = α + iβ, v = x + iy is a complex eigenvalue-
eigenvector pair for the coefficient matrix, then so is λ = α − iβ, v = x − iy, and they
together will contribute terms like

· · · + (c1 eαt cos βt + c2 eαt sin βt)x + (−c1 eαt sin βt + c2 eαt cos βt)y + · · ·

to the solution. When t = 0 these terms become · · · c1 x + c2 y + · · · and are equated to


the initial conditions. Terms of the form eαt cos βt and eαt sin βt describe oscillations.
13. The Complex Case 69

The imaginary part β of the eigenvalue controls the frequency of the oscillations. The
real part α of the eigenvalue determines whether the oscillations grow without bound
or die out. We can therefore extend the language of the real case and say that a
matrix is stable if all of its eigenvalues have negative real parts, is unstable if one of
its eigenvalues has positive real part, and is neutrally stable if all of its eigenvalues
have nonpositive real parts with at least one with real part actually equal to zero.
What about defective matrices? These are matrices with repeated complex
eigenvalues that do not provide enough independent eigenvectors with which to con-
struct a diagonalization. It is still possible by more general kinds of factorizations to
compute exponentials of such matrices. In systems of differential equations such ma-
trices will produce solutions containing terms of the form tn eαt cos βt and tn eαt sin βt.
Just as in the real case, the factor of tn doesn’t have any effect on the long-term qual-
itative behavior of such solutions. Stability or instability and the oscillatory behavior
of the solutions is still determined by the eigenvalues. Therefore, if you know the
eigenvalues of a system of differential equations, you know a lot about the behavior
of the solutions of that system without actually solving it.
Finally we present an application that describes vibrations in mechanical and
electrical systems. In modeling mass-spring systems, Newton’s second law of motion
and Hooke’s law lead to the second-order differential equation mẍ(t) + kx(t) = 0,
where m = the mass, k = the spring constant, and x(t) = the displacement of the
mass as a function of time. For simplicity, divide by m and let ω 2 = k/m, so the
equation becomes ẍ + ω 2 x = 0. In order to use the machinery that we have built up,
we have to cast this second-order equation into a first-order system. To do this let
y1 = x and y2 = ẋ. We then obtain the system
ẏ1 = y2
ẏ2 = − ω 2 y1
or in matrix form � � � �� �
ẏ1 0 1 y1
= .
ẏ2 −ω 2 0 y2
To solve the system we have to diagonalize the coefficient matrix. The eigenvalues
are λ = ±iω. Using Gaussian elimination to solve (A − iωI)x = 0
� � �
−iω 1 �� 0
−ω 2 −iω � 0
� � �
−iω 1 �� 0
0 0�0
� � � � � �
1 1 0
we obtain the eigenvector = +i . The solution of the system is
iω 0 ω
therefore
� � � � � �
y1 (t) 1 0
= (c1 cos ωt + c2 sin ωt) + (−c1 sin ωt + c2 cos ωt) .
y2 (t) 0 ω
70 13. The Complex Case

It follows that the solution of the original problem is x(t) = y1 (t) = c1 cos ωt +
c2 sin ωt. This is the mathematical representation of simple harmonic motion.

EXERCISES
� � � �
1 a −b
1. Verify = +i and (a + ib)(c + id) = (a + ib)(c + id).
a + ib a + b2
2 a + b2
2

2. Show if A is real and Av = λv, then Av = λv. Conclude that if λ, v is a complex


eigenvalue-eigenvector pair for A, then so is λ, v.
� �
α β
3. Find the eigenvalues of the matrix .
−β α

4. Find the complex diagonal factorizations, the real block-diagonal factorizations,


and the stability of the following matrices.
� �
9 −10
(a)
4 −3
 
−1 0 3
(b)  −5 1 1 
−3 0 −1

5. Find the general solutions of the following systems of differential equations.

(a) ẋ = 9x − 10y
ẏ = 4x − 3y

(b) ẋ = − x + 3z
ẏ = − 5x + y + z
ż = − 3x − z

6. Find the solutions of the systems in Exercise 5 with the following initial conditions.
� � � �
x(0) 3
(a) =
y(0) 1
   
x(0) −2
(b)  y(0)  =  −1 
z(0) 3
14. Difference Equations and Markov Matrices 71

14. DIFFERENCE EQUATIONS AND MARKOV MATRICES

In this section we investigate how eigenvalues can be used to solve difference


equations. Difference equations are discrete analogues of differential equations. They
occur in a wide variety of applications and are used to desacribe relationships in
physics, chemistry, engineering, biology, ecology, and demographics.
Let A be an n × n matrix and u0 be an n × 1 column vector, then the following
infinite sequence of column vectors can be generated:

u1 = Au0
u2 = Au1
u3 = Au2
..
.

The general relationship between consecutive terms of this sequence is expressed as


a difference equation:

uk = Auk−1 .

The basic challenge posed by a difference equation is to describe the behavior of the
sequence u0 , u1 , u2 , u3 , · · ·. Specifically, (1) determine if the sequence has a limit and
if so then find it, and (2) find an explicit formula for uk in terms of u0 . To this end
we observe that

u1 = Au0
u2 = Au1 = A(Au0 ) = A2 u0
u3 = Au2 = A(A2 u0 ) = A3 u0
..
.

so the sequence becomes u0 , Au0 , A2 u0 , A3 u0 , · · ·. The problem of finding the solu-


tion uk = Ak u0 of the difference equation at the kth stage then reduces to computing
the matrix Ak and determining its behavior as k becomes large. Suppose A has the
diagonal factorization A = SDS −1 , then we can use the fact that Ak = SDk S −1
(Section 10 Exercise 2). Let A have eigenvalues λ1 , λ2 , · · · , λn and associated eigen-
72 14. Difference Equations and Markov Matrices

vectors v1 , v2 , · · · , vn , and let c = S −1 u0 , then

uk = Ak u0
= SDk S −1 u0
= SDk c
 .  k  
.. .. ..  λ1 c1
. .  λk2   c2 
 
=  v1 v2 · · · vn    ..  . 
  .. 
.. .. .. .
. . . λkn
cn
 .  
..  c1 λ1
k
.. ..
. .  c2 λk2 
 
=  v1 v2 · · · vn    .. 

.. .. .. .
. . . cn λkn
 .   .   . 
.. .. ..
     
= c1 λk1  v1  + c2 λk2  v2  + · · · + cn λkn  vn 
.. .. ..
. . .
= c1 λk1 v1 + c2 λk2 v2 + · · · + cn λkn vn

This is then the general solution of the difference equation. (Note its similarity to
the general solution of a system of ODE’s in Section 12.) The c’s are determined by
the equation c = S −1 u0 . We can avoid the taking of an inverse by multiplying this
equation by S to obtain the linear system Sc = u0 , which can be solved by Gaussian
elimination. This can also be seen by letting k = 0 in the general solution to obtain
u0 = c1 v1 + c2 v2 + · · · + cn vn , which is again Sc = u0 .
To determine the long-term behavior of uk , let the eigenvalues be ordered so
that |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |. Then from the general solution uk = c1 λk1 v1 + c2 λk2 v2 +
· · · + cn λkn vn it is clear that the behavior of uk as k → ∞ is determined by the size
of λ1 . To be specific,

|λ1 | < 1 ⇒ uk → 0
|λ1 | = 1 ⇒ uk bounded, may have a limit
|λ1 | > 1 ⇒ uk blows up.

(We are assuming that c1 �= 0. In general, the long-term behavior of uk is determined


by the largest λi for which ci �= 0.) We now illustrate these ideas with the following
examples.
� � � �
0 2 2
Example 1: Find uk = A u0 where A =
k
and u0 = . Since A has the
−.5 2.5 5
14. Difference Equations and Markov Matrices 73

diagonal factorization
� � � �� �� �−1
0 2 2 4 2 0 2 4
A= = ,
−.5 2.5 2 1 0 .5 2 1

we have � �k � �
0 2 2
uk =
−.5 2.5 5
� �� �k � �−1 � �
2 4 2 0 2 4 2
=
2 1 0 .5 2 1 5
� �� k �� �
2 4 2 0 c1
=
2 1 0 .5k c2
� � � �
2 4
= c1 (2)k + c2 (.5)k .
2 1
(Of course, we could have written down the solution in this form as soon as we knew
the eigenvaluse and eigenvectors. We really didn’t need the diagonal factorization.
We only have to make sure that there are enough independent eigenvectors
� � � �to insure
� �
2 4 c1 2
that the diagonal factorization exists.) And since the system =
2 1 c2 5
has the solution c1 = 3, c2 = −1, we obtain
� � � � � �
k 2 k 4 6(2)k − 4(.5)k
uk = 3(2) + (−1)(.5) = .
2 1 6(2)k − (.5)k

It is also clear that uk becomes unbounded as k → ∞.

Example 2: Each year 2/10 of the people in California move out and 1/10 of the
people outside California move in. Let Ik and Ok be the numbers of people inside
and outside California in the kth year. The initial populations are I0 = 20 million
and O0 = 202 million. The relationship between the populations in successive years
is given by
� � � �� �
Ik+1 = .8Ik + .1Ok Ik+1 .8 .1 Ik
or = .
Ok+1 = .2Ik + .9Ok Ok+1 .2 .9 Ok


Ik
The problem is to find the population distribution uk = and to determine if
Ok
it tends to a stable limit. This is, of course, the problem of solving the difference
equation uk = Auk−1 where A is the matrix above. As usual we find the diagonal
factorization of A
� � � �� �� �−1
.8 .1 1 1 1 0 1 1
=
.2 .9 2 −1 0 .7 2 −1
74 14. Difference Equations and Markov Matrices

and solve the system � �� � � �


1 1 c1 20
=
2 −1 c2 202
to obtain c1 = 74 and c2 = −54. We can then write the solution as
� � � �
1 1
uk = 74(1) k
− 54(.7)k
2 −1
� �
74 − 54(.7)k
= .
148 + 54(.7)k

This is then the population distribution


� � for any year. Note that as k → ∞ the
74
population distribution tends to .
148

This example exhibits two esssential properties that hold in many chemical, bio-
logical, and economic processes: (1) the total quantity in question is always constant,
and (2) the individual quantities are never negative. As a consequence of these two
properties, note that the columns of the matrix A above are nonnegative and add to
one. This can be interpreted as saying that each year all the people inside California
have to either remain side or move out (⇒ the first column adds to one), and all the
people outside California have to either move in or remain outside (⇒ the second
column adds to one). Any matrix with nonnegative entries whose columns add to
one is called a Markov matrix and the process it describes is called a Markov process.
Markov matrices have several important properties, which we state but do not prove
in the following theorem.
Theorem. Any Markov matrix A has the following properties.
(a) All the eigenvalues of A satisfy |λ| ≤ 1.
(b) λ = 1 is always an eigenvalue and there exists an associated eigenvector v1 with
all entries ≥ 0.
(c) If any power of A has all entries positive, then multiples of v1 are the only
eigenvectors associated with λ = 1 and Ak u0 → c1 v1 for any u0 .
We cannot prove this theorem completely with the tools we have developed so far,
but we can make parts of it plausible. First, since the columns of A sum to one, we
have AT v = v where v is the column vector consisting only of one’s. This means
that one is an eigenvalue of AT and therefore of A also since both matrices have
the same eigenvalues (Section 9 Problem 3(a)). Second, assume A has a diagonal
factorization and λ2 , · · · , λn all have absolute value < 1. Then as usual we have
Ak u0 = c1 (1)k v1 +c2 λk2 v2 +· · ·+cn λkn vn , so that clearly Ak u0 → c1 v1 . This is exactly
what happened in the example above. Note also that, since the limiting vector c1 v1
is a multiple of the eigenvector associated with λ = 1, we have A(c1 v1 ) = c1 v1 . We
therefore say c1 v1 is a stable distribution or it represents a steady state. In terms of
14. Difference Equations and Markov Matrices 75
� �� � � �
.8 .1 74 74
the population example this means = . In other words, if the
� .2� .9 148 148
74
initial population distribution is , it will remain as such forever. And if the
148 � �
74
initial population distribution is something else, it will tend to in the long
148
run.

EXERCISES

1. For the difference equation uk = Auk−1 where the matrix A and the starting
vector u0 �are as given
� below, compute
� � uk and comment upon its behavior as k → ∞.
.5 .25 128
(a) A = and u0 = .
� .5 .75 � 64
� �
−2.5 4.5 18
(b) A = and u0 = .
� −1 � 2 � � 10
1 4 −1
(c) A = and u0 = .
1 1 2

2. Suppose multinational companies in the U.S., Japan, and Europe have total assets
of $4 trillion. Initially the distribution of assets is $2 trillion in the U.S., $0 in Japan,
and $2 trillion in Europe. Each year the distribution changes according to
    
U Sk+1 .5 .5 .5 U Sk
 Jk+1  =  .25 .5 0   Jk  .
Ek+1 .25 0 .5 Ek

(We are implicitly making the completely false assumption that the world economy
is a zero-sum game!)
(a) Find the diagonal factorization of A.
(b) Find the distribution of assets in year k.
(c) Find the limiting distribution of assets.
(d) Show the limiting distribution is stable.

3. A truck rental company has centers in New York, Los Angeles, and Chicago.
Every month half of the trucks in New York and Los Angeles go to Chicago, the
other half stay where they are, and the trucks in Chicago are split evenly between
New York and Los Angeles. Initially the distribution of trucks is 90, 30, and 30 in
New York, Los Angeles, and Chicago respectively.
    
N Yk+1 ∗ ∗ ∗ N Yk
 LAk+1  =  ∗ ∗ ∗   LAk  .
Ck+1 ∗ ∗ ∗ Ck
76 14. Difference Equations and Markov Matrices

(a) Find the Markov matrix A that describes this process.


(b) Find the diagonal factorization of A.
(c) Find the distribution of trucks in month k.
(d) Find the limiting distribution of trucks.
(e) Show the limiting distribution is stable.

4. Suppose there is an epidemic in which every month half of those are well become
sick, a quarter of those who are sick get well, and another quarter of those who are
sick die. Find the corresponding Markov matrix and find its stable distribution.
    
Dk+1 ∗ ∗ ∗ Dk
 Sk+1  =  ∗ ∗ ∗   Sk 
Wk+1 ∗ ∗ ∗ Wk

5. In species that reproduce sexually, the characteristics of an offspring are deter-


mined by a pair of genes, one inherited from each parent. The genes of a particular
trait (say eye color) are of two types, the dominant G (brown eyes) and the recessive
g (blue eyes). Offspring with genotype GG or Gg exhibit the dominant trait, whereas
those of type gg exhibit the recessive trait. Now suppose we allow only
 males of type
p
gg to reproduce. Let the initial distribution of genotypes be u1 = q  . The entries

r
p, q, and r respectively represent the proportions of GG, Gg, and gg genotypes in
the initial generation. (They must be nonnegative and sum to one.) Show that the
Markov matrix  
0 0 0
A =  1 .5 0 
0 .5 1

represents how the distribution of genotypes in one generation transforms to the next
under our restrictive mating policy (that is, only blue-eyed males can reproduce).
What is the limiting distribution?

6. Suppose in the setup of the previous problem we allow males of all genotypes to
reproduce. Let G and g respectively represent the proportion of G genes and g genes
in the initial generation. (They also must be nonnegative and sum to one.) Show
that G = p + q/2 and g = r + q/2. Show that the Markov matrix
 
G .5G 0

A= g .5 G
0 .5g g
14. Difference Equations and Markov Matrices 77

represents how the distribution


 2 of genotypes in the first generation transforms to the
G
second. Show that u2 =  2Gg  . Show that G and g again respectively represent the
g2
proportion of G genes and g genes in the second generation. The matrix A therefore
represents how the distribution of genotypes in the second generation transforms
to the third. Show that u3 = u2 . Genetic equilibrium is therefore reached after
only one generation. (What does this say in the important special case where p =
r?) This result is the Hardy-Weinberg law and is at the foundation of the modern
science of population genetics. It says that in a large, ramdom-mating population,
the distribution of genotypes and the proportion of dominant and recessive genes
tend to remain constant from generation to generation, unless outside forces such
as selection, mutation, or migration come into play. In this way, even the rarest of
genes, which one would expect to disappear, are preserved.
78 15. Vector Spaces, Subspaces, and Span

PART 2: GEOMETRY

15. VECTOR SPACES, SUBSPACES, AND SPAN

The presentation so far has been entirely algebraic. Matrices have been added
and multiplied, equations have been solved, but nothing of a geometric nature has
been considered. Yet there is a natural geometric approach to matrices that is at
least as important as the algebraic approach. The mechanics of Gaussian elimination
has produced for us one kind of understanding of linear systems, but for a different
and deeper understanding we must look to geometry.
We will assume some familiarity with, lines, planes, and geometrical vectors
in two and three dimensional physical space. We now want to examine what is
really at the heart of these concepts. To do this, we define an abstract model of
a vector space and then show how this idea can be used to develop concepts and
properties that are valid in all concrete instances of vector spaces. A vector space V
is a collection of objects, called vectors, on which two operations are defined, addition
and multiplication by scalars (numbers). If the scalars are real numbers, the vector
space is a real vector space, and if the scalars are complex numbers, the vector space
is a complex vector space. V must be closed under addition and scalar multiplication.
This means that if x and y are vectors in V and if a is a scalar, then x + y and ax
are also vectors in V . The operations must also satisfy the following rules:

1. x+y =y+x
2. x + (y + z) = (x + y) + z
3. There is a “zero” vector 0 such that x + 0 = x for all x.
4. For each vector x, there is a unique vector −x such that x + (−x) = 0.
5. 1x = x
6. (ab)x = a(bx)
7. a(x + y) = ax + ay
8. (a + b)x = ax + bx

To put meat on this abstract definition we need some examples. For us the most
important vector spaces are the real Euclidean spaces R1 , R2 , R3 , . . .. The space Rn
consists of all n×1 column matrices with the familiar definitions of addition and scalar
multiplication of matrices. (We have been calling such matrices column vectors all
along.) That these spaces are vector spaces follows directly from the properties of
matrices. The first three spaces can be identified with familiar geometric objects: R1
is represented by the real line, R2 by the real plane, and R3 by physical 3-space. The
representationsareclear. For example, the point (x1 , x2 , x3 ) in 3-space corresponds
x1
to the vector x2  in R3 . Likewise, a vector in a higher dimensional Euclidean

x3
15. Vector Spaces, Subspaces, and Span 79

space is completely determined by its components, even though the geometry is hard
to visualize.
x3 x3
x1
(x1 , x2 , x3 )
x2
x3

x2 x2

x1 a point a vector
x1

FIGURE 2
If we take column vectors whose components we allow to be complex numbers, we
obtain the complex Euclidean spaces: C 1 , C 2 , C 3 , · · ·. (We were actually in the world
of complex spaces in Section 13.) Even more abstract vector spaces that cannot be
visualized as any kind of Euclidean space are functions spaces. A particular example
is C 0 [0, 1], the collection of all real valued functions defined and continuous on [0,1].
It is easy to see that C 0 [0, 1] is a real vector space, but it is impossible to see it
geometrically. For now, since we want to keep things as concrete as possible, we will
concentrate on real Euclidean spaces.
One nice thing about the first three Euclidean spaces R1 , R2 , and R3 is that for
them addition and scalar multiplication have simple geometric interpretations: The
sum x + y is the diagonal of the parallelogram with sides formed by x and y. The
difference x−y is the other side of the parallelogram with one side side y and diagonal
x. (Note that the line segment from y to x is not the vector x − y and in fact is not
a vector at all!) The product ax is the vector obtained from x by multiplying its
length by a. And the vector −x has the same length as x but points in the opposite
direction. This geometric desciption even extends to higher dimensional Euclidean
spaces.
x+y
y y

x
x

x-y

FIGURE 3
80 15. Vector Spaces, Subspaces, and Span

It turns out that the vector spaces that we will need most occur inside the
standard spaces Rn . We formalize this idea by saying that a subset S of a vector
space V is a subspace of V if S has the following properties:
1. S contains the zero vector.
2. If x and y are vectors in S, then x + y is also a vector in S.
3. If x is a vector in S and a is any scalar, then ax is also a vector in S.
Since addition and scalar multiplication in S follow the rules of the host space V ,
there is no need to verify the rules for a vector space for S. It is automatically a
vector space in its own right. We now look at some examples of subspaces of Rn .


x1
Example 1: Consider all vectors in R2 whose components satisfy the equation
x2
x1 + 2x2 = 0. Clearly they are represented by points in R2 that lie on a line through
the origin. These vectors form a subspace of R2 since sums and scalar products of
vectors that satisfy the equation must also satisfy the equation. (We will prove this
using matrix notation later.) Furthermore, we can find all such vectors explicitly.
We just write the equation in matrix form
� �
x
[1 2] 1 = [0]
x2

and solve as usual; that is, we write the array [ 1 2 | 0 ] , run Gaussian elimination
(unnecessary here of �course),
� assign leading and free variables, and express the solu-
−2
tion in vector form c . We get all multiples of one vector, clearly a line through
1
the origin. It is easy to show that such vectors are closed under addition and scalar
multiplication (proved in greater generality later), thereby giving another verification
that we have a subspace.
x2

c -2
1

x1

FIGURE 4
If we change the equation to x1 + 2x2 = 2 we still have a line. Vectors that
satisfy this equation, however, cannot form a subspace since the sum of two such
15. Vector Spaces, Subspaces, and Span 81

vectors does
� �not satisfy
� � the equation. If we solve the equation we obtain vectors of
2 −2
the form +c . Again we see that we do not have a subspace because these
0 1
vectors are not closed under addition. We can also see this geometrically by adding
two vectors that point to the line and noting that the result no longer points to the
line. Even more simply, the line does not pass through the origin, so the zero vector
is not even included.
 
x1
Example 2: Consider all vectors  x2  in R3 whose components satisfy the equation
x3
x1 − x2 + x3 = 0. This equation defines a plane in R3 passing through the origin.
Vectors that satisfy this equation are closed under addition and scalar multiplication,
and the plane is therefore a subspace. We can find all such vectors by writing the
equation in matrix form  
x1
[ 1 −1 1 ]  x2  = [ 0 ]
x3
and solving. We use the array [ 1 −1 1 | 0 ] to obtain the solution
   
1 −1
c1 + d 0 
0 1

in vector form. This is the vector representation of the plane. Again, vectors of
this form are closed under addition and scalar multiplication and therefore form a
subspace.
-1
x3 0
1

x2

1
1
x1 0

FIGURE 5
82 15. Vector Spaces, Subspaces, and Span

As in Example 1, if we change the equation to x1 − x2 + x3 = 2, we still get a plane,


but for the same reasons as before it is no longer a subspace.

Example 3: This time we want all vectors in R3 that satisfy the two equations

x1 − x2 = 0
x2 − x3 = 0

simultaneously. Again, vectors that satisfy both equations are closed under addition
and scalar multiplication. To find all such vectors we write the equations in matrix
form  
� � x1 � �
1 −1 0   0
x2 = .
0 1 −1 0
x3
 
1
and solve to obtain c 1 . All multiples of this single vector generate a line in R3

1
passing through the origin. This makes sense since each equation defines a plane in
R3 and their intersection must be a line. This also suggests the general fact that the
intersection of any number of subspaces of a vector space is itself a subspace. The
conditions of closure under addition and scalar multiplication are easily verified.

Example 4: Finally, consider all vectors in R4 whose components satisfy the equation
x1 + x2 − x3 + x4 = 0. We might expect that this equation defines some kind of
geometric plane passing through the origin. If we solve it we obtain
     
−1 1 −1
 1  0  0 
a  + b  + c .
0 1 0
0 0 1

These vectors do form a subspace, but it is hard to visualize. Later we will give
precise meaning to the notion that this subspace is a “three dimensional hyperplane
in four space.”

All of the examples above have the same form, which can be expressed more
simply in matrix notation. Each defines a set S as the collection of all vectors x
in Rn that satisfy a system of equations Ax = 0. The problem is to show that S
is a subspace. We can do this directly as follows: If Ax = 0 and Ay = 0, then
A(x + y) = Ax + Ay = 0 + 0 = 0 and A(cx) = c(Ax) = c(0) = 0. Thus vectors that
satisfy the system Ax = 0 are closed under addition and scalar multiplication. The
second way to verify that S is a subspace is to solve the system Ax = 0 as we did
in the examples. The solution in vector form will look like a1 v1 + a2 v2 + . . . + an vn ,
15. Vector Spaces, Subspaces, and Span 83

where the a’s are arbitrary constants and the v’s are vectors. Vectors of this form
are closed under addition since

(a1 v1 + a2 v2 + . . . + an vn ) + (b1 v1 + b2 v2 + . . . + bn vn ) =

(a1 + b1 )v1 + (a2 + b2 )v2 + . . . + (an + bn )vn


and under scalar multiplication since

c(a1 v1 + a2 v2 + . . . + an vn ) = (ca1 )v1 + (ca2 )v2 + . . . + (can )vn .

So again we see that S is a subspace.


Vectors of the form a1 v1 + a2 v2 + . . . + an vn are said to be linear combinations of
the vectors v1 , v2 , . . . , vn . The subspace S of all linear combinations of v1 , v2 , . . . , vn
is called the span of v1 , v2 , . . . , vn . We also say the vectors v1 , v2 , . . . , vn span or
generate the subspace S, and we write S = span{v1 , v2 , . . . , vn }.

EXERCISES

1. Show that C 0 [0, 1] is a real vector space. Show that C 1 [0, 1], which is the set of all
functions that are continuous and have continuous derivatives on [0,1], is a subspace
of C 0 [0, 1].

2. Show directly that


 the following are subspaces of R .
3

x1
(a) All vectors x2  that satisfy the equation x1 − x2 + x3 = 0

x3    
1 0
(b) All vectors of the form c  1  + d  0 
0 1
� �
x1
3. None of the following subsets of vectors in R2 is a subspace. Why?
x2
(a) All vectors where x1 = 1.
(b) All vectors where x1 = 0 or x2 = 0.
(c) All vectors where x1 ≥ 0.
(d) All vectors where x1 and x2 are both ≥ 0 or both ≤ 0.
(e) All vectors where x1 and x2 are both integers.

4. Describe
  geometrically
 the subspace of R3 spanned by the following vectors.
1 0
(a) 1 , 0 
  
0 1
84 15. Vector Spaces, Subspaces, and Span
   
1 0
(b) 1 , 0 
  
1 1
   
1 1
(c)  1  ,  1 
0 1
     
1 1 0
(d)  1  ,  1  ,  0 
0 1 1

5. Find examples of subspaces of R4 that satisfy the following conditions.


(a) Two “two dimensional planes” that intersect only at the origin.
(b) A line and a “three dimensional hyperplane” that intersect only at the origin.

6. Find vector representations for the following geometric objects, or said another
way, find spanning sets of vectors for each of the following subspaces.
(a) 3x1 − x2 = 0 in R2 .
(b) x1 + x2 + x3 = 0 in R3 .
(c) x1 + x2 + x3 = 0 and x1 − x2 + x3 = 0 in R3 .
(d) x1 − 2x2 + 3x3 − 4x4 = 0 in R4 .
(e) x1 + 2x2 − x3 = 0, x1 − 2x2 + x4 = 0, x2 − x5 = 0 in R5 .

7. Find vector representations for the following geometric objects and describe them.
(a) 3x1 − x2 = 3 in R2 .
(b) x1 + x2 + x3 = 1 in R3 .
16. Linear Independence, Basis, and Dimension 85

16. LINEAR INDEPENDENCE, BASIS, AND DIMENSION

It is possible for different sets of vectors to span the same subspace. For example,
it is easy to see geometrically that the two sets of vectors
         
 1 1 0   1 0 
1,1,0 and 1,0
   
0 1 1 0 1

generate the plane x1 − x2 = 0 in R3 . The mathematical reason for this is that the
second vector in the first set can be written as a linear combination of the first and
third vectors:      
1 1 0
1 = 1 + 0
1 0 1
Since the second vector can be regenerated from the other two, it is really not needed
and therefore can be dropped from the spanning set. The question arises how in
general can we reduce a spanning set to one of minimal size and still have it span the
same subspace.
x3

0
0
1 1
1
1

x2

1
x1 1
0

FIGURE 6

Example 1: Suppose S = span{v1 , v2 , v3 , v4 , v5 , v6 }, and suppose we discover among


the spanning vectors the linear relationship v2 − v3 − 3v4 + 2v6 = 0. If we solve this
equation for v2 we obtain v2 = v3 + 3v4 − 2v6 . Since v2 can be regenerated from
the other vectors in the spanning set, it can be removed from the spanning set. The
remaining vectors will still generate the same subspace S = span{v1 , v3 , v4 , v5 , v6 }.
Of course, we could have solved the equation for v3 or for v4 or for v6 , so any one of
86 16. Linear Independence, Basis, and Dimension

those vectors could have been the one to have been dropped. Now suppose among the
remaining vectors we find another linear relationship, say 2v1 +v4 −4v5 = 0. Then we
can solve for v1 or for v4 or for v5 and can therefore drop any one of these vectors from
the spanning set. Suppose we drop v4 . We then obtain S = span{v1 , v3 , v5 , v6 }. At
this point suppose there does not exist any linear relationship between the remaining
vectors. Then this process of shrinking the spanning set will have to stop.

On the basis of these observations we make some definitions. We say that a


collection of vectors v1 , v2 , . . . , vn is linearly dependent if there exists a linear combi-
nation of them that equals zero

a1 v1 + a2 v2 + . . . + an vn = 0

where at least some of the coefficients a1 , a2 , . . . , an are not zero, and we say that
they are linearly independent if the only linear combination of them that equals zero
is the trivial one
0v1 + 0v2 + . . . + 0vn = 0.
If a set of vectors v1 , v2 , . . . , vn is (1) linearly independent and (2) spans a subspace
S, then we say these vectors form a basis for S. (We state these definitions for
subspaces, but, since any vector space is a subspace of itself, they also hold for vector
spaces.)

v2 v3
v1
v2 v2

v1 v1

two dependent two independent three dependent


vectors vectors vectors

FIGURE 7

The process described in the example above can now be expressed in the lan-
guage of linear independence and basis as follows: Suppose a set of vectors span a
subspace S. If these vectors are linearly dependent, then there is a nontrivial linear
combination of them that equals zero. In this case one of the vectors can be dropped
from the spanning set. (Any vector that appears in the linear combination with a
nonzero coefficient can be chosen.) The remaining vectors will still span S. This
process of successively dropping dependent vectors can be continued until the set of
spanning vectors is linearly independent. The resulting spanning set is therefore a
basis for S. Although this process of successively dropping vectors from spanning set
16. Linear Independence, Basis, and Dimension 87

is not a practical way to actually find a basis for a subspace, it does prove that every
subspace has a basis.
The importance of a basis to a subspace lies in the fact that not only can every
vector in a subspace be represented as a linear combination of the vectors in its basis,
but, even further, that this representation is unique. If for a basis v1 , v2 , . . . , vn we
have v = a1 v1 +a2 v2 +. . .+an vn and also v = b1 v1 +b2 v2 +. . .+bn vn , then subtraction
gives 0 = (a1 − b1 )v1 + (a2 − b2 )v2 + . . . + (an − bn )vn . But since v1 , v2 , . . . , vn are
linearly independent, all the coefficients (ai − bi ) = 0, and therefore ai = bi . We
conclude that there is only one way to write a vector as a linear combination of basis
vectors.

Example 2: The following three vectors


     
1 2 1
 3  6 3
 , , 
−1 0 1
−1 4 5

generate a subspace S of R4 . Our goal is to find a basis for S. We accomplish this


by forming the matrix  
1 3 −1 −1
A = 2 6 0 4 ,
1 3 1 5
whose rows consist of these three vectors, and running Gaussian elimination (or
Gauss-Jordan elimination as we do here) to obtain
 
1 3 0 2
U = 0 0 1 3.
0 0 0 0

The two nonzero rows of U , when made into column vectors, will form a basis for S:
   
1 0
3 0
 , 
0 1
2 3

Why does this work? First note that, because of the nature of Gaussian operations,
every row of U is a linear combination of the rows of A. Furthermore, since A can
be reconstructed from U by reversing the sequence of Gaussian operations, every
row of A is a linear combination of the rows of U . We can now draw a number of
conclusions. First, the rows of U must span the same subspace as the rows of A.
Second, since there is some linear combination of the rows of A that results in the
88 16. Linear Independence, Basis, and Dimension

third row of U , which is the zero vector, the rows of A must therefore be linearly
dependent. And finally, because of the echelon form of U , the nonzero rows of U are
automatically linearly independent. (See Exercise 4.)
Now that we have a basis for S, we can express any vector in S as a unique
linear combination of the basis vectors. For example, to express the vector
 
2
 6 
 
−3
−5

in terms of the basis we must solve the equation


     
1 0 2
3 0  6 
a  + b  =  .
0 1 −3
2 3 −5

If the given vector is in S, then as we have seen there will be exactly one solution,
otherwise there will be no solution. We make the extremely important observation
that this equation is equivalent to the linear system
   
1 0 � � 2
3 0 a  6 
  = 
0 1 b −3
2 3 −5

(See Section 2 Exercise 7), which we can solve by Gaussian elimination. In this case
we obtain the solution a = 2 and b = −3 so that
     
1 0 2
3 0  6 
2  − 3  =  .
0 1 −3
2 3 −5

Example 3: Find a basis for the subspace S of all vectors in R4 whose components
satisfy the equation x1 + x2 − x3 + x4 = 0. This was Example 4 of the previous
section. There we found S consisted of all vectors of the form
     
−1 1 −1
 1  0  0 
a  + b  + c .
0 1 0
0 0 1
16. Linear Independence, Basis, and Dimension 89

The three column vectors clearly span S, and in fact they are also linearly indepen-
dent. This is true because if
         
−1 1 −1 −a + b − c 0
 1  0  0   a  0
a  + b  + c =  =  ,
0 1 0 b 0
0 0 1 c 0

then clearly a = b = c = 0. These three vectors therefore form a basis for S. This
holds in general. That is, if we solve a homogeneous system Ax = 0 by Gaussian
elimination, set the free variables equal to arbitrary constants, and write the solution
in vector form, then we obtain a linear combination of independent vectors, one for
each free variable. Therefore, in all the examples of the previous section, we were
actually finding not just spanning sets but bases! Furthermore, the comment in
Section 10,“ Our method for finding eigenvectors, which is to solve (A − λI)x = 0
by Gaussian elimination, does in fact produce linearly independent eigenvectors, one
for each free variable,” is justified.

There is no unique choice of a basis for a subspace. In fact, there are infinitely
many possibilities. For example, the each of three sets of vectors
           
 1 0   1 1   1 0 
1,0 , 1,1 , 1,0
     
0 1 0 1 1 1

are bases for the plane x1 − x2 = 0 in R3 . You can no doubt think of many more.
For the Euclidean spaces Rn , however, there is the following natural choice of basis:
       
1 0 0 0
0 1 0 0
. .    
 ..  ,  ..  , . . . ,  ...  ,  ... 
       
0 0 1 0
0 0 0 1

These are the vectors that point along the coordinate axes, so we will call them
coordinate vectors. They clearly span and are linearly independent and therefore
form a basis for Rn .
Even though the set of vectors in a basis is not unique, it is true that the number
of vectors in a basis is unique. This number we define to be the dimension of the
subspace. Clearly the Euclidean space Rn has dimension n. It now makes sense to
talk about things like “a three dimensional hyperplane passing through the origin in
four space.” We state this important property of bases formally as:
90 16. Linear Independence, Basis, and Dimension

Theorem. Any two bases for a subspace contain the same number of vectors.
Proof: It is enough to show that in a subspace S the number of vectors in any linearly
independent set must be less than or equal to the number of vectors in any spanning
set. Since a basis is both linearly independent and spans, this means that any two
bases must contain exactly the same number of vectors. We now illustrate the proof
in a special case. The general case will then be clear. Suppose v1 , v2 , v3 span the
subspace S and w1 , w2 , w3 , w4 is some larger set of vectors in S. We show that the
w’s must be linearly dependent. Since the v’s span, each w can be written as a linear
combination of the v’s:
w1 = a11 v1 + a12 v2 + a13 v3
w2 = a21 v1 + a22 v2 + a23 v3
w3 = a31 v1 + a32 v2 + a33 v3
w4 = a41 v1 + a42 v2 + a43 v3 .
In matrix terms this is
 . .. .. ..   .. .. ..   
.. . . . . . . a11 a21 a31 a41
   
 w1 w2 w3 w4  =  v1 v2 v3   a12 a22 a32 a42  ,
.. .. .. .. .. .. .. a13 a23 a33 a43
. . . . . . .
which we write as W = V A. Since A has fewer rows than columns, there are nontrivial
solutions to the homogeneous system Ax = 0 (see Section 7 Exercise 5(b)), that is,
there is a nonzero vector c such that Ac = 0. We then have W c = (V A)c = V (Ac) =
V 0 = 0. But the equation W c = 0 when written out is just c1 w1 + c2 w2 + c3 w3 +
c4 w4 = 0 and is therefore a nontrivial linear combination of the w’s. The w’s are
therefore linearly dependent and we are done.

We see that a basis is a maximal independent set of vectors in the sense that it
cannot be made larger without losing independence. It is also a minimal spanning
set of vectors since it cannot be made smaller and still span the space. Note that we
have been implicitly assuming that the number of vectors in a basis is finite. It is
possible to extend the discussion above to the infinite dimensional case, but we will
not do this.

EXERCISES

1. Decide the dependence or independence of the following sets of vectors.


� � � �
1 2
(a) ,
2 1
     
1 3 3
(b) 3 , 3 , 6 
    
2 1 5
16. Linear Independence, Basis, and Dimension 91
     
2 1 3
(c) 1 , 1 , 2 
    
2 2 4
     
1 2 2
2 1 2
(d)   ,   ,  
1 2 2
1 1 2
       
1 2 3 1
 2   1   3   −1 
(e)   ,   ,   ,  
1 2 3 1
1 1 2 0

2. Find bases for the subspaces spanned by the sets of vectors in Exercise 1 above.
In each case indicate the dimension.

3. Find bases for the subspaces defined by the equations in Section 15 Exercise 6. In
each case indicate the dimension.
 
1 3 0 2
4. Show directly from the definition that the nonzero rows of  0 0 1 3  are
0 0 0 0
linearly independent.

5. Express each vector as a linear combination of the vectors in the indicated sets.
     
5  3 2 

(a) −1   1 , 2
 
 
4 2 1
     
−3  3 2 
(b)  1   1 , 2
 
 
4 2 1
       
10  3 2 −1 
(c)  −2  1,2, 1 
 
8 2 1 −1
� � �� � � ��
8 2 1
(d) , For this case draw a picture!
13 1 2

6. Suppose we have three sets of vectors, U = {u1 , . . . , u4 }, V = {v1 , . . . , v5 }, W =


{w1 , . . . , w6 }, in R5 . For each set answer the following.
(a) The set (is) (is not) (might be) linearly independent.
(b) The set (does) (does not) (might) span R5 .
92 16. Linear Independence, Basis, and Dimension

(c) The set (is) (is not) (might be) a basis for R5 .

7. If the complex vectors v and v are linearly independent over the complex numbers
and if v = x + iy, then show that the real vectors x and y are linearly independent
v+v
over the complex numbers. (Hint: Assume ax + by = 0 and use x = and
2
v−v
x = to show a = b = 0.) This settles a technnical question about complex
2
vectors from Section 13.
17. Dot Product and Orthogonality 93

17. DOT PRODUCT AND ORTHOGONALITY

So far, in our discussion of vector spaces, there has been no mention of “length”
or “angle.” This is because the definition of a vector space does not require such con-
cepts. For many vector spaces however, especially for Euclidean spaces, there is a nat-
ural way to establish these notions that� is often
� quite useful. In two-dimensional space
x1
the physical length of the vector x = is by the Pythagorean Theorem equal to
x2  
� x1
x21 + x22 , and in three-dimensional space the physical length of the vector x =  x2 
x3

is by two applications of the Pythagorean Theorem equal to x1 + x2 + x3 . It seems
2 2 2

reasonable therefore to define the length or norm of a vector x in Rn , which we denote


as �x�, in the following way:

�x� = x21 + x22 + · · · + x2n

(There are situations and applications where other measures of length are more ap-
propriate. But this one will be adequate for our purposes.) Note that since our
vectors are column
√ vectors, the length of a vector can also be written in matrix nota-
tion as �x� = xT x. It is easy to see that the length function satisfies the following
two properties:

1. �ax� = |a| �x�


2. �x� ≥ 0 and = 0 ⇔ x = 0.

Note also that if we multiply any vector x by the reciprocal of its length, we get
1
x, which is a vector of length one. We say this is the unit vector in the direction
�x�
of x. With this notion of length we can immediately define the distance between two
points x and y in Rn as �x − y�. This corresponds to the usual physical distance
between points in two and three-dimensional space.
How can we decide if two vectors are perpendicular? In order to help us do this,
we define the dot product x · y of two vectors x and y in Rn as the number

x · y = x1 y1 + x2 y2 + · · · + xn yn .

In matrix notation we can also write x · y = xT y. The dot product satisfies the
following properties:

1. x·y =y·x
2. (ax + by) · z = ax · z + by · z
3. z · (ax + by) = az · x + bz · y
4. x · x = �x�2 .
94 17. Dot Product and Orthogonality

They can be verified by direct computation. The second and third properties follow
from the distributivity of matrix multiplication. Other terms for dot product are
scalar product and inner product.
Now we will see how to determine if two vectors x and y in Rn are perpendicular.
First note that, assuming they are independent, they span a two-dimensional sub-
space of Rn . When endowed with the length function � �, this subspace satisfies all
the axioms of the Euclidean plane. We therefore have all the constructs of Euclidean
geometry in this plane including lines, circles, lengths, and angles. In particular, we
have the Pythagorean Theorem, which says that the sides of a triangle are in the
relation a2 + b2 = c2 if and only if the angle opposite side c is a right angle. (It goes
both directions; check your Euclid!)

|| y - x ||

|| y ||
x

|| x ||

FIGURE 8

If we write this equation for the triangle formed by the two vectors x and y in vector
notation and use the properties of the dot product, we have

�x�2 + �y�2 = �x − y�2


= (x − y) · (x − y)
=x·x−x·y−y·x+y·y
= �x�2 − 2x · y + �y�2 .

Canceling, we obtain 0 = −2x · y or x · y = 0. We therefore conclude that the vectors


x and y are perpendicular if and only if their dot product x · y = 0. Another term
for perpendicular is orthogonal. In mathematical shorthand we write the statement
“x is orthogonal to y” as x ⊥ y. Therefore the result above can be written as
x ⊥ y ⇔ x · y = 0.
17. Dot Product and Orthogonality 95
   
2 −2
Example 1: The vectors x =  2  and y =  1  are orthogonal because x · y = 0.
1 2
√ 1
Each has length 4 + 4 + 1 = 3. The unit vector in the direction of x is x=
  �x�
2
3
 
2
 .
3
1
3

Even though it is not necessary for linear algebra, the dot product can also tell us
the angle between any two vectors, orthogonal or not. For this we need the Law of
Cosines, which also appears in Euclid and which says that the sides of any triangle
are in the relation a2 + b2 = c2 + 2ab cos θ where θ is the angle opposite side c. Again
writing this equation for the triangle formed by the two vectors x and y in vector
notation �x�2 + �y�2 = �x − y�2 + 2�x��y� cos θ and computing (Exercise 9) we
obtain x · y = �x��y� cos θ or
x·y
cos θ = .
�x��y�

|| y - x ||

|| y ||

!
|| x ||

FIGURE 9
   
2 1
Example 2: The angle between the vectors 2 and 3  is determined by cos θ =
  
1 2
10
√ √ = 0.89087, so θ = arccos(0.89087) = 27.02◦ .
9 14

We are now in a position to compute the projection of one vector onto another.
Suppose we wish to find the vector p which is the geometrically perpendicular pro-
jection of the vector y onto the vector x. To be precise, we should say that we are
seeking the projection p of the vector y onto the direction defined by x or onto the line
96 17. Dot Product and Orthogonality

generated by x. Since we can do geometry in the plane defined by the two vectors x
and y, we immediately see from the figure below that p must have the property that
x ⊥ (y − p), so 0 = x · (y − p) = x · y − x · p or x · p = x · y. Also, since p lies on the
line generated by x, it must be some constant multiple of x, so p = cx. Substituting
this into the previous equation we obtain c(x · x) = x · y or c = (x · y)/(x · x). The
final result is therefore
x·y
p= x.
�x�2
We should think of the vector p as the component of y in the direction of x. In fact,
if we write y = p + (y − p), we have resolved y into the sum of its component in the
direction of x and its component perpendicular to x.
y

p
y-p

FIGURE 10
 
5
Example 3: To resolve y =  5  into its components in the direction of and
  −2    
2 2 4
  18    
perpendicular to 2 , just compute p = 2 = 4 and obtain
9
1 1 2
         
4 5 4 4 1
y = p + (y − p) =  4  +  5  −  4  =  4  +  1 
2 −2 2 2 −4

Having completed our discussion of orthogonality of vectors, we now turn to


subspaces. We say that two subspaces V and W are orthogonal subspaces if every
vector in V is orthogonal to every vector in W . For example, the z-axis is orthogonal
to the xy-plane in R3 . But note that xz-plane and the xy-plane are not orthogonal.
That is, a wall of a room
 is not perpendicular to the floor! This is because the x-
1
coordinate vector  0  is in both subspaces but is not orthogonal to itself. It is easy
0
17. Dot Product and Orthogonality 97

to check the orthogonality of subspaces if we have spanning sets for each subspace.
Just verify that every vector in one spanning set is orthogonal to every vector in
the other. For example, if V = span{v1 , v2 } and W = span{w1 , w2 } and the v’s are
orthogonal to the w’s, then any vector in V is orthogonal to any vector in W , because
(a1 v1 + a2 v2 ) · (b1 w1 + b2 w2 ) = a1 b1 v1 · w1 + a2 b1 v2 · w1 + a1 b2 v1 · w2 + a2 b2 v2 · w2 = 0
We make one more definition. The set W of all vectors perpendicular to a
subspace V is called the orthogonal complement of V and is written as W = V ⊥ . It is
easy to see that W is in fact a subspace (Exercise 12). It is also follows automatically,
but not so easily, that V is the perpendicular complement of W or V = W ⊥ (Exercise
13). In other words, the relationship is symmetric, and we are justified in saying that
V and W are orthogonal complements of each other. For example, the xy-plane
and the z-axis are orthogonal complements, but the x-axis and the y-axis are not.
Orthogonal complements are easy to compute.

Example
  4: Find the orthogonal complement of the line generated by the vector
1
 2 , and find the equations of the line. Here the first problem is to find all vectors
3
y orthogonal to the given generating vector, that is, to find all vectors y whose dot
product with the given vector is zero. Expressed in matrix notation this is just
 
y1
[ 1 2 3 ] y2  = 0.

y3
We solve this linear system and obtain
   
−2 −3
y = c 1  + d 0 .
0 1
The two vectors above therefore span the plane that is the orthogonal complement
of the given line. In fact, these two vectors are a basis for that plane. Now to find
the equations of the line itself, note that a vector x lies in the line if and only if x is
orthogonal to the plane we just found. In other words, the dot product of x with each
of the two vectors that generate that plane must be zero. Therefore x must satisfy
the equations −2x1 + x2 = 0 and −3x1 + x3 = 0. These are then the equations that
define the given line.
 
1
Example 5: Find the equations of the plane generated by the two vectors 1  and 
  1
1
 −1 . Again we look for all vectors orthogonal to the generating vectors. We
1
98 17. Dot Product and Orthogonality

therefore set up the linear system


 
� � y1 � �
1 1 1   0
y2 =
1 −1 1 0
y3
and get the solution  
−1
y = c 0 ,
1
which generates the line orthogonal to the given plane. Now to find the equation
form of the plane, note that any vector x in the plane must be orthogonal to the
orthogonal complement of the plane, that is, to the line just obtained. This means
that the dot product of x with the vector that generates the orthogonal line must be
zero. Therefore −x1 + x3 = 0 is the equation of the given plane.

Note that in Section 15 we learned how to go from the equation form of a subspace
to its vector form. We now know how to go in the reverse direction, that is, from its
vector form to its equation form.

EXERCISES
   
1 −6
 2   −2 
1. For the two vectors x =   and y =  
−2 2
−4 9
(a) Find their lengths.
(b) Find the unit vectors in the directions they define.
(c) Find the angle between them.
(d) Find the projection of y onto x.
(e) Resolve y into components in the direction of and perpendicular to x.
� �
2
2. In R find the point on the line generated by the vector
2
closest to the point
3
2 ).
(8, 11
� �
α
3. Find all vectors orthogonal to in R2 .
β
 
2
4: Show that the line generated by the vector 2  is orthogonal to the plane gener-

    1
1 2
ated by the two vectors  1  and  0 .
−4 −4
17. Dot Product and Orthogonality 99

5. Find the orthogonal complements of the subspaces generated by the following


vectors.
 
1
(a)  1 
1
   
1 1
(b) 1 , 3 
  
1 7
   
1 2
1 1
(c)  ,  
1 1
2 1
     
1 2 2
1 0 2
(d)  ,  ,  
0 1 0
2 1 3

6. Find equations defining the subspaces in Exercise 5 above.

7. True or false?
(a) If two subspaces V and W are orthogonal, then so are their orthogonal comple-
ments.
(b) If U is orthogonal to V and V is orthogonal to W , then U is orthogonal to W .

1
8. Show that the length of x is one.
�x�

9. Derive x · y = �x��y� cos θ from the Law of Cosines.

1
10. Show x · y = (�x + y�2 − �x − y�2 ).
4
11. Show that if the vectors v1 , v2 , v3 are all orthogonal to one another, then they
must be linearly independent. (Hint: Write c1 v1 + c2 v2 + c3 v3 = 0 and show the c’s
are all zero by dotting both sides with each of the v’s.) Of course this result extends
to arbitrary numbers of vectors v1 , v2 , . . . , vn .

12. If V is a subspace, then show W = V ⊥ is also a subspace, that is, show W is


closed under addition and scalar multiplication.

13. Let V be a subspace of R8 and W = V ⊥ . We wish to show that W ⊥ = V , or,


what is the same thing, (V ⊥ )⊥ = V.
100 17. Dot Product and Orthogonality

(a) Suppose V has a basis v1 , v2 , v3 . Let


 
· · · v1 ···
A =  · · · v2 ···,
· · · v3 ···

and by counting leading and free variables in the system Ax = 0 show that
V ⊥ = W has a basis w1 , w2 , w3 , w4 , w5 .
(b) Let
 
· · · w1 ···
 · · · w2 ···
 
B =  · · · w3 ···,
 
· · · w4 ···
· · · w5 ···
and by counting leading and free variables in the system Bx = 0 show that W ⊥
has dimension 3.
(c) Observe that each of the three vectors v1 , v2 , v3 satisfy Bx = 0 and therefore are
in W ⊥ . Since they are also independent, conclude that W ⊥ = span{v1 , v2 , v3 } =
V.

14. Show if v · w = ±||v||||w||, then v = cw for some constant c. Hint: Expand


||v − cw||2 and show that it equals zero if c = ±||v||/||w||. Interpret this as saying
that if the angle between two vectors is 0 or π, then one vector is a multiple of the
other.
18. Linear Transformations 101

18. LINEAR TRANSFORMATIONS

Many problems in the physical sciences involve transformations, that is, the
way in which input data is changed into output data. It often happens that the
transformations in question are linear. In this section we present some of the ba-
sic terminology and facts about linear transformations. As usual we consider only
Euclidean spaces.
We define a transformation to be a function that takes points in Rn as input and
produces points in Rm as output, or, in other words, maps points in Rn to points in
Rm . For example, S(x1 , x2 ) = (x21 , x2 + 1) is a transformation that maps R2 to R2 .
Instead of mapping points to points, we can think ��of transformations
�� � � as mapping
x1 x21
vectors to vectors. We can therefore write S as S = . This is the
x2 x2 + 1
view we will take from now on. The picture we should keep in mind is that in general
a transformation T maps the vector x in Rn to the vector T (x) in Rm .
Rm
Rn T(x)

T
x

FIGURE 11

We further define a transformation T from Rn to Rm to be a linear transforma-


tion if for all vectors x and y and constants c it satisfies the properties:

1. T (x + y) = T (x) + T (y)
2. T (cx) = cT (x)

Note that if we take c = 0 in property 2 we have T (0) = 0. A linear transformation


must therefore take the origin to the origin. (The transformation S above is therefore
not linear.) Let’s try to view these two properties geometrically. Property 1 says
that under the map T the images of x and y when added together should be the same
as the image of x + y, and property 2 says that the image of x when multiplied by
c should be the same as the image of cx. We can think of property 1 as saying that
T must take the vertices of the parallelogram defined by x and y into the vertices of
the parallelogram defined by T (x) and T (y).
102 18. Linear Transformations

T(y)

y T(x + y)

x+y
T(x)

FIGURE 12
It is an immediate consequence of the definition that a linear transformation takes
subspaces to subspaces. In other words, if S is a subspace of Rn , then T (S), which is
the set of all vectors of the form T (x), is a subspace of Rm . It is a further consequence
of the definition that every linear transformation must have a certain special form.
We now determine what that form must be.
First, we can create linear transformations by using matrices. Suppose A is
an m × n matrix. Then we can define the transformation T (x) = Ax. Because of
the way matrix multiplication works, the input vector x is in Rn and the output
vector Ax is in Rm . This transformation is linear because T (x + y) = A(x + y) =
Ax + Ay = T (x) + T (y) and T (cx) = A(cx) = cAx = cT (x), which both follow
from the properties of matrix multiplication. Therefore every m × n matrix induces
a linear transformation from Rn to Rm .
Second, every linear transformation is induced by some matrix. Suppose T is a
linear transformation that maps from Rn to Rm . Then we can write
18. Linear Transformations 103

        
x1 1 0 0
 x2   0 1  0 
T 
 ...  = T
x1  .  + x2  .  + · · · + xn  . 
  ..   ..   .. 
xn 0 0 1
     
1 0 0
 0   1   0 
= x1 T   .  + x2 T  .  + · · · + xn T
 ..   .. 
 . 
 .. 
0 0 1
     
a11 a12 a1n
 a21   a22   a2n 
= x1 
 ..  + x  ..  + · · · + x  .. 
.   .   . 
2 n

am1 am2 amn


  
a11 a12 ... a1n x1
 a21 a22 ... a2n   x2 
=
 ... .. .. ..  . .
  .. 
. . .
am1 am2 ... amn xn

(The second equality follows from the linearity of T . The fourth equality follows
from Section 2 Exercise 7.) Therefore every linear transformation T has a matrix
representation as T (x) = Ax.
Note also that
   
x1 a11 x1 + a12 x2 + . . . + a1n xx
 x2   a21 x1 + a22 x2 + . . . + a2n xn 
T  
 ...  =  .. .

.
xn am1 x1 + am2 x2 + . . . + amn xn
So every linear transformation must have this form. From now on, we will forget
about the formal linear transformation T and instead just consider the matrix A as
a transformation from one Euclidean space to another. Note that A is completely
determined by what it does to the coordinate vectors. This follows either from
the computation above  or just from matrix
 multiplication. For  example, if A =
� � 1 � � 0 � � 0 � �
3 −1 1 3 −1 1
, then A  0  = , A1 = , and A  0  = .
1 5 2 1 5 2
0 0 1
Let S be a linear transformation from Rn to Rq and T be a linear transformation
from Rq to Rn . The the composition T ◦ S is defined to be the transformation
(T ◦ S)(x) = T (S(x)) that takes Rn to Rm . It is a linear transformation since
T (S(x + y)) = T (S(x) + (S(y)) = T (S(x)) + T (S(y)) and T (S(c)) = T (cS(x)) =
cT (S(x)). If S has matrix A and T has matrix B, then the question arises, what is
104 18. Linear Transformations

the matrix for the composition T ◦ S? If we compute T (S(x)) = B(Ax) = (BA)x,


we see immediately that that answer is that it is the product matrix BA. The key to
this observation is the relation B(Ax) = (BA)x, which follows from the associativity
of matrix multiplication.
Since this result is so important, we will again compute the matrix of the com-
position, but this time directly. To find the jth column of the matrix for T ◦ S we
know that all we have to do is see what it does to the jth coordinate vector.
  
0  a 
  ...  1j
    a2j 
  
T S  1j  = T  . 
 . 
  .  .
  .. 
aqj
0
     
1 0 0
 0   1   0 
= a1j T  
 ...  + a2j T
 .  + · · · + aqj T
 .. 
 . 
 .. 
0 0 1
    b 
b11 b12 1q
 b21   b22  b
 2q 
= a1j     
 ...  + a2j  ...  + · · · + aqj  .. 

.
bm1 bm2 bmq
 b a + b a + ··· + b a 
11 1j 12 2j 1q qj
b
 21 1j a + b a
22 2j + · · · + b 2q aqj 
= .. 

.
bm1 a1j + bm2 a2j + · · · + bmq aqj

This is exactly the jth column of the product matrix BA.


Now we investigate the geometry of several specific linear transformations in
order to build up our intuition. In all of the examples below, the matrix is be square
and is therefore a map between Euclidean spaces of the same dimension. It can
therefore also be thought of as a map from one Euclidean space to itself.
� � � � � � � �
2 0 x 2x x
Example 1: Let A = , then A = = 2 . The effect of this
0 2 y 2y y
matrix is to stretch every vector by a factor of 2.
� � � � � �
2 0 x 2x
Example 2: Let A = , then A = . This matrix stretches in the
0 3 y 3y
x-direction by a factor of 2 and in the y-direction by a factor of 3.
18. Linear Transformations 105
� � � � � �
1 0 x x
Example 3: Let A = , then A = . This matrix reflects the plane
0 −1 y −y
R2 across the x-axis.

� � � � � �
1 0 x x
Example 4: Let A = , then A = . This matrix perpendicularly
0 0 y 0
projects the plane R2 onto the x-axis.

� � � � � � � � � �
0 −1 1 0 0 −1
Example 5: Let A = , then A = and A = . Clearly A
1 0 0 1 1 0
rotates the coordinate vectors by 90◦ , but does this mean that it rotates every vector
by this amount? Yes, as we will see in the next example.

Example 6: Let’s consider the transformation that rotates the plane R2 by an angle
θ. The first thing we must do is to show that this transformation is linear. Since any
rotation T takes the parallelogram defined by x and y to the congruent parallelogram
defined by T (x) and T (y), it takes the vertex x + y to the vertex T (x) + T (y).
Therefore it satisfies the property T (x) + T (y) = T (x + y), which is Property 1 for
linear transformations. Property 2 can be verified in the same way.

T(x + y)
T(x) y

x+y
T(y)

FIGURE 13

We conclude that a rotation is a linear transformation. We are therefore justified in


asking for its matrix representation A. To �find� A all
� we �have to do
� �is to�compute �
1 cos θ 0 − sin θ
where the coordinate vectors go. Clearly A = and A =
� � 0 sin θ 1 cos θ
cos θ − sin θ
and therefore A = .
sin θ cos θ
106 18. Linear Transformations

T 0 = -sin ! 0
1 cos ! 1

1 cos !
T =
0 sin !

!
!

1
0

FIGURE 14
Example 7: Now consider reflection across an arbitrary line through the origin. A
reflection clearly takes the parallelogram defined by x and y to the congruent par-
allelogram defined by T (x) and T (y) and therefore satisfies Property 1. Property 2
can be verified in the same way.
T(x + y)

T(y)

x+y
T(x)

FIGURE 15
A reflection is therefore a linear transformation and so has a matrix representation
determined by where it takes the � coordinate
� � �vectors. �For� example,
� � if A reflects
1 0 0 1
R2 across the line y = x, then A = and A = and therefore
� � 0 1 1 0
0 1
A= .
1 0

Example 8: To show that a perpendicular projection of R2 onto an arbitrary line


through the origin is a linear transformation is a little more difficult. The parallelo-
gram defined by x and y is projected perpendicularly onto the line. By the congru-
ence of the two shaded triangles in the figure below we see that �T (x)� + �T (y)� =
18. Linear Transformations 107

�T (x + y)�, and since these vectors all lie on the same line and point in the same
direction, we conclude that T (x) + T (y) = T (x + y). The other two cases when the
line passes through the parallelogram or when x and y project to opposite sides of
the origin are similar. Property 2 can be verified in the same way.

x+y

T(x + y)
T(x)
T(y)

FIGURE 16
A projection is therefore a linear transformation and so has a matrix representation
determined by where it takes the coordinate vectors. For example, if A is the matrix
� � �1� � � �1�
1 2 0 2
of the projection of R2 onto the line y = x, then A = 1 and A = 1 ,
0 2 1 2
�1 1�
2 2
and therefore A = 1 1 .
2 2

� �
1 2
Example 9: Let A = . In this case, even though we know where the coordinate
0 1
vectors �go,�it is�still not�easy to see what the transformation does. But if we fix y = c
x x + 2c
then A = shows us that the horizontal line at level c is shifted 2c units
c c
to the right (if c is positive, to the left otherwise). This is a horizontal shear.

FIGURE 17
108 18. Linear Transformations
� �
4 2
Example 10: Let A = . Again the images of the coordinate vectors do not
−1 1
tell us much. It turns out that to see the geometrical effect of this matrix we will
need to compute its diagonal factorization. We will take up this approach in Section
22. Most matrices are in fact like this one or worse requiring even more sophisticated
factorizations.

Example 11: First rotate the plane R2 by 90◦ and then reflect across the 45◦ line. This
is a typical
� example
� of the composition of two linear transformations.
� � The rotation is
0 −1 0 1
A= (Example 5) and the reflection is B = (Example 7). To apply
1 0 1 0
them in the correct order to an arbitrary vector x we must write B(A(x)) which by
the associativity of matrix multiplication is the same as (BA)x. So we just compute
the product � �� � � �
0 1 0 −1 1 0
BA = = ,
1 0 1 0 0 −1
which is a reflection across the x-axis. Note that it is extremely important to perform
the multiplication in the correct order. The reverse order would result in
� �� � � �
0 −1 0 1 −1 0
AB = = ,
1 0 1 0 0 1

which is a reflection across the y-axis. This is incorrect!

EXERCISES

1. Prove that linear transformations take subspaces to subspaces.

2. Describe the geometrical effect of each of the following transformations (where


α2 + β 2 = 1 in (g) and (l)).
� � � � � 1 � � √1 �
− 12 − √12
0 −1 0 0 2 2
(a) (b) (c) (d)
−1 0 0 1 − 12 1
2
√1 √1
2 2
� 1 √ � � √ �  
3 1
− 2 3 � � 0 −1 0
2 2 2 α −β
(e) √ (f) √ (g) (h)  1 0 0 
− 2 3 1
− 3
− 1 β α
2 2 2 0 0 1
       
0 0 −1 1 0 0 0 −1 0 α −β 0
(i)  0 1 0  (j)  0 1 0  (k)  1 0 0  (l)  β α 0 
1 0 0 0 0 0 0 0 −1 0 0 −1

3. Find the 3 × 3 matrix that


18. Linear Transformations 109

(a) reverses the direction of every vector.


(b) projects R3 onto the xz-plane.
(c) reflects R3 across the plane x = y.
(d) rotates R3 around the x-axis by 45◦ .

4. Find the image of the unit circle x2 + y 2 = 1 under transformations induced by


the two matrices below. What are the image curves? (Hint: Let (x̄, ȳ) be the image
of (x, y) where x2 + y 2 = 1, and find an equation satisfied by (x̄, ȳ).)
� �
2 0
(a)
0 2
� �
2 0
(b)
0 3

5. Describe how the following two matrices transform the grid consisting of horizontal
and vertical
� � lines at each integral point of the x and y-axes.
1 0
(a)
3 1
� �
3 1
(b)
1 3
� �
1 1
6. The matrix maps R2 onto the x-axis but is not a projection. Why?
0 0

7. In each case below find the matrix that represents the resulting transformation
and describe it geometrically.
(a) Transform R2 by first rotating by −90◦ and then reflecting in the line x + y = 0.
(b) Transform R2 by first rotating by 30◦ , then reflecting across the 135◦ line, and
then rotating by −60◦ .
(c) Transform R3 by first rotating the xy-plane, then the xz-plane, then the yz-
plane, all through 90◦ .

8. Interpret the equality


� �� � � �
cos β − sin β cos α − sin α cos (α + β) − sin (α + β)
=
sin β cos β sin α cos α sin (α + β) cos (α + β)

geometrically. Obtain the trigonometric equalities

cos (α + β) = cos α cos β − sin α sin β


sin (α + β) = sin α cos β + cos α sin β.
110 18. Linear Transformations

9. Show that the matrix that reflects


� R across the�line through the origin that
2

cos 2θ sin 2θ
makes an angle θ with the x-axis is . (Hint: Compute where the
sin 2θ − cos 2θ
coordinate vectors go.)

10. Show that the matrix that projects


� R2 onto the line� through the origin that
cos2 θ cos θ sin θ
makes an angle θ with the x-axis is . (Hint: Compute where
cos θ sin θ sin2 θ
the coordinate vectors go.)
� �� � � �
cos θ sin θ 1 0 cos θ − sin θ
11. Interpret the equality = geometri-
sin θ − cos θ 0 −1 sin θ cos θ
cally. Conclude that any rotation can be written as the product of two reflections.

12. Prove the converse of the result of the previous exercise, that is, prove the product
of any two reflections is a rotation. (Use the results of Exercises 8 and 9.)

13. Find the matrix that represents the linear transformation T (x1 , x2 , x3 , x4 ) =
(x2 , x4 + 2x3 , x1 + x3 , 2x3 ).
     
1 � � 0 � � 0 � �
4 0 −3
14. If T  0  = , T 1 = , T 0 = , then find the matrix of T .
5 −2 1
0 0 1
� � � � � � � �
5 6 3 7
15. If T = ,T = , then find its matrix.
4 −2 2 1

16. If T rotates R2 by 30◦ and dilates it by a factor of 5, then find its matrix.

17. If T reflects R3 in the xy-plane and dilates it by a factor of 2,


1
then find its
matrix.
19. Row Space, Column Space, Null Space 111

19. ROW SPACE, COLUMN SPACE, NULL SPACE

In the previous section we considered matrices as linear transformations. All of


the examples we looked at were square matrices. Now we consider rectangular ma-
trices and try to understand the geometry of the linear transformations they induce.
To do this, we define three fundamental subspaces associated with any matrix. Let
A be an m × n matrix. We view A as a map from Rn to Rm and make the following
definitions. The subspace of Rn spanned by the rows of A (thought of as column
vectors) is called the row space of A and is written row(A). The subspace of Rm
spanned by the columns of A is called the column space of A and is written col(A).
The set of vectors x in Rn such that Ax = 0 is called the null space of A and is
written null(A). In fact, null(A) is a subspace of Rn . (This follows from Section 15
where it was shown that the set of solutions to Ax = 0 is closed under addition and
scalar multiplication.)
Rn Rm
null(A)

row(A)

col(A)

FIGURE 18

Now we will show how to compute each of these subspaces for any given ma-
trix. By “compute these subspaces”, we mean “find bases for these subspaces.” To
illustrate, we will use the example
 
1 2 0 4 1
A = 0 0 0 2 2.
1 2 0 6 3

1. row(A): To find a basis for row(A), we use the method of Section 16 Example
2. Recall that to find a basis for a subspace spanned by a set of vectors we just
write them as rows of a matrix and then do Gaussian elimination. In this case, the
spanning vectors are already the rows of a matrix, so running Gaussian elimination
(actually Gauss-Jordan elimination) on A we obtain
 
1 2 0 0 −3
U = 0 0 0 2 2 .
0 0 0 0 0
112 19. Row Space, Column Space, Null Space

Since row(A) = row(U ), the two nonzero independent rows of U form a basis for
row(A), so
   
 1 0 

 
 2   0  
   
row(A) has basis  0  ,  0  .

    
 0
 2 

−3 2
2. col(A): We have just seen that A and U have the same row spaces. Do they
also have the same column spaces? No, this is not true! What is true is that the
columns of A that form a basis for col(A) are exactly those columns that correspond
to the columns of U that form a basis for col(U ). In this example they are columns
1 and 4. The reason for this is as follows: The two systems Ac = 0 and U c = 0
have exactly the same solutions. Furthermore, linear combinations of the columns
of A can be written as Ac and of U as U c. This implies that independence and
dependence relations between the columns of U correspond to independence and
dependence relations between the corresponding columns of A. Therefore, since the
pivot columns of U are linearly independent (because no such vector is a linear
combination of the vectors that preceed it), the same is true of the pivot columns
of A. And likewise, since every nonpivot column of U is a linear combination of the
pivot columns, the same is true of A. That is, for the U of our example, columns 1
and 4 are independent, and any other columns are dependent on these two (Exercise
8). Therefore the same can be said of A. We conclude that
   
 1 4 
col(A) has basis  0  ,  2  .
 
1 6

3. null(A): We want to find a basis for all solutions of Ax = 0. But we have done
this before (Section 16 Example 3). We just solve U x = 0 and obtain
     
−2 0 3
 1  0  0 
     
x = a 0  + b1 + c 0 .
     
0 0 −1
0 0 1

We conclude that
     
 −2 0 3 

 
 1   0   0  
     
null(A) has basis  0  ,  1  ,  0  .

     
 0
 0 −1 

0 0 1
19. Row Space, Column Space, Null Space 113

-2
1
0 R3
R5
0 1
0 2
null(A) 0
0 col(A)
4
-3 2
0
0 A 3
1 3
0 0
0 0 1
-1 0
1 1

row(A)
0
0
0
2
2

FIGURE 19

We make a series of observations about these three fundamental subspaces.


1. From the example above, we immediately see that the number of leading variables
in U , which is called the rank of A, determines the number of vectors in the bases of
both row(A) and col(A). We therefore have

dim(col(A)) = dim(row(A)) = rank(A).

2. The number of free variables in U determines the number of vectors in the basis of
null(A). Since (the number of leading variables) + (the number of free variables) =
n, we have
dim(row(A)) + dim(null(A) = n.
3. If x is any vector in null(A), then Ax = 0, which when written out looks like
    
row 1 of A x1 0
 row 2 of A   x2   0 
A=
 ..  .  =  . .
  ..   .. 
.
row m of A xn 0

Because of the way matrix multiplication works, this means that x is orthogonal
to each row of A and therefore to row(A). Therefore null(A) is the orthogonal

complement of row(A). We write row(A) = null(A) and conclude that null(A) and
row(A) are orthogonal complements of each other. (See Section 17.) This is the
114 19. Row Space, Column Space, Null Space

reason that Figure 18 was drawn the way that it was, that is, with the line null(A)
perpendicular to the plane row(A).
4. As we have seen many times before, the equation Ax = b can be written as
     
a11 a12 a1n
 a21  a  a 
x1   + x2  22  + · · · + xn  2n 
.
 ..  .
 ..   ...  = b.
am1 am2 amn
This immediately says that the system Ax = b has a solution if and only if b is in
col(A). Another way of saying this is that col(A) consists of all those vectors b for
which there exists a vector x such that Ax = b, or in other words col(A) is the image
of Rn under the transformation A.
5. If x0 is a solution of the system Ax = b, then any other solution can be written
as x0 + w where w is any vector in null(A). For suppose y is another solution, then
A(x0 − y) = Ax0 − Ay = b − b = 0 ⇒ x0 − y = w where w is some vector in null(A),
so we have y = x0 + w. Note that when we solve Ax = b by Gaussian elimination,
we get all solutions expressed in this form automatically.
6. Suppose null(A) = {0}, that is, the null space of A consists of only the zero
vector. (In this case say that the null space is trivial, not empty. A null space can
never be empty. It must always contain at least the zero vector.) Then A has several
important properties which we summarize in a theorem:
Theorem. For any matrix A the following statements are equivalent.
(a) null(A) = {0}
(b) A is one-one (that is, A takes distinct vectors to distinct vectors).
(c) If Ax = b has a solution x, it must be unique.
(d) A takes linearly independent sets to linearly independent sets.
(e) The columns of A are linearly independent.
Proof: We prove (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (a)
(a) ⇒ (b): x �= y ⇒ x − y �= 0 ⇒ Ax − Ay = A(x − y) �= 0 ⇒ Ax �= Ay.
(b) ⇒ (c): Suppose Ax = b and Ay = b, then Ax = Ay ⇒ x = y.
(c) ⇒ (d): If v1 , v2 , . . . , vn are linearly independent, then c1 Ax1 + c2 Ax2 + · · · +
cn Axn = 0 ⇒ A(c1 x1 + c2 x2 + · · · + cn xn ) = 0 = A0 ⇒ c1 x1 + c2 x2 + · · · + cn xn =
0 ⇒ c1 = c2 = · · · = cn = 0.
(d) ⇒ (e): A maps the set of coordinate vectors, which are independent, to the set
of its own columns, which therefore must also be independent.
(e) ⇒ (a): The equation Ax = 0 can be interpreted as a linear combination of the
columns of A equaling zero. Since the columns of A are independent, this can happen
only if x = 0. This ends the proof.

If A is a square matrix, this theorem can be combined with the theorem of Section
9 as follows.
19. Row Space, Column Space, Null Space 115

Theorem. For an n × n matrix A the following statements are equivalent.


(a) A is nonsingular
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) null(A) = {0}
(e) det(A) �= 0.
(f) A has rank n.
(g) The columns of A are linearly independent.
(h) The rows of A are linearly independent.
Proof: From the theorem of Section 8 we have the equivalence of (a), (b), (c), (d),
and (e). Then (d) ⇔ (g) follows from the previous theorem, and (f) ⇔ (g) ⇔ (h) is
obvious from dim(row(A)) = dim(col(A)) = rank(A).

EXERCISES

1. For each matrix below find bases for the row, column, and null spaces and fill in the
blanks in the sentence “As a linear transformation, A maps from dimensional
Euclidean space to dimensional Euclidean space and has rank equal to .”
� �
1 2
(a)
2 4
� �
1 2
(b)
2 3
 
2 4 2
(c)  0 4 2 
2 8 4
 
3 2 −1
 6 3 5 
(d)  
−3 −1 8
0 −1 7
 
1 2 −1 −4 1
(e)  2 4 −1 −3 5 
3 6 −3 −12 3
 
2 8 4 0 0
 2 7 2 1 −2 
(f)  
−2 −6 0 −1 6
0 2 4 −2 4
116 19. Row Space, Column Space, Null Space
 
1
2. The 3 × 3 matrix A has null space generated by the vector 1  and column space

1
equal to the xy-plane.
   
−3 −3
(a) Is  −3  in null(A)? What does A  −3  equal?
−3 −3
 
−3
(b) Is 13  in col(A)? Is it in the image of A?

0
 
−5
(c) Is Ax =  −5  solvable?
2
 
−4
(d) Is  6  in row(A)?
−2
 
1
3. The 2 × 3 matrix A has row space generated by the vector 2  and column space

� � 9
2
generated by the vector
  −1
−2
(a) Is  −4  in row(A)?
−8
 
−2
(b) Is  −1  in null(A)?
2
(c) Find a basis for null(A).
� �
−3
(d) Is in col(A)?
3
� �
−4
(e) Is Ax = solvable?
2

4. Describe the row, column, and null spaces of the following kinds of transformations
of R2 .
(a) rotations
(b) reflections
(c) projections
19. Row Space, Column Space, Null Space 117

5. For each case below explain why it is not possible for a matrix to exist with the
stated properties.  
1
(a) Row space and null space both contain the vector 2 .

  3  
3 1
(b) Column space has basis  2  and null space has basis  3 .
1 1
(c) Column space = R and row space = R .
4 3

6. Show that if null(A) = {0}, then A takes subspaces into subspaces of the same
dimension. In particular, A takes all of Rn into an n-dimensional subspace of Rm .

7. Prove the following assertions for an m × n matrix A.


(a) rank(A) ≤ n and m.
(b) If rank(A) = n, then n ≤ m (A is tall and skinny) and A is one-one.
(c) If rank(A) = m, then n ≥ m (A is short and fat) and Ax = b has at least one
solution for any b.
 
1 2 0 0 −3
8. Show directly from the definition that columns 1 and 4 of  0 0 0 2 2 
0 0 0 0 0
are linearly independent, while columns 1, 4, and any other columns are linearly
dependent.

9. Write down all possible row echelon forms for 2 × 3 matrices.

10. Give examples of


matrices
  A such that
 1 
(a) null(A) = span  2 
 
3
 ⊥
 1 
(b) null(A) = span  2 
 
 3
 1 
(c) col(A) = span  2 
 
3
(d) A is 4 × 5 and dim(null(A)) = 3
118 20. Least Squares and Projection

20. LEAST SQUARES AND PROJECTIONS

When a scientist wants to fit a mathematical model to data, he often samples a


greater number of data points than the number of unknowns in the model. The result
is an overdetermined inconsistent system (one with more equations than unknowns
and no solution). We illustrate this situation with the following two examples.

Example 1: We want to fit a straight line y = c + dx to the data (0, 1), (1, 4), (2, 2),
(3, 5). This means we must find the c and d that satisfy the equations

c+d·0=1
c+d·1=4
c+d·2=2
c+d·3=5

or the system
   
1 0 � � 1
1 1 c 4
  =  .
1 2 d 2
1 3 5
This is an example of a curve fitting problem.
y (3,5)

(1,4)
y = c + dx

(2,2)
(0,1)

FIGURE 20

Example 2: Suppose we have experimentally determined the molecular weights of


the following six oxides of nitrogen:

NO N2 O NO2 N2 O3 N2 O5 N2 O4
30.006 44.013 46.006 76.012 108.010 92.011
20. Least Squares and Projections 119

We want to use this information to compute the atomic weights of nitrogen and
oxygen as accurately as possible. This means that we must find the N and O that
satisfy the equations
1 · N + 1 · O = 30.006
2 · N + 1 · O = 44.013
1 · N + 2 · O = 46.006
2 · N + 3 · O = 76.012
2 · N + 5 · O = 108.010
2 · N + 4 · O = 92.011
or the system
   
1 1 30.006
2 1  � �  44.013 
   
1 2 N  46.006 
  = .
2 3 O  76.012 
   
2 5 108.010
2 4 92.011

Each of these problems requires the solution of an overdetermined system Ax =


b. We know that a system can have no solution, one solution, or infinitely many
solutions. But in practice, when a system with more equations than unknowns arises
from experimental data, it is extremely unlikely that the second or third cases will
occur. We are therefore faced with the problem of “solving” an overdetermined
inconsistent system of equations – an impossibility!
Since there is no hope of finding a solution to the system in the normal sense,
the only thing we can do is to find x’s that satisfy Ax ≈ b. The “best” x would be the
one that makes this approximate equality as close to an exact equality as possible.
To give meaning to this last statement, we rewrite the system as Ax − b ≈ 0. The
left-hand side of this “equation” is a vector. Our goal then is to find an x that makes
this vector as close to zero as possible, or, in other words, as small as possible. Since
we measure the size of a vector by its length, we come to a formulation of the least
squares problem for Ax = b: Find the vector x that makes �Ax − b� as small as
possible. The vector x that does this is called the least squares solution to Ax = b.
If we write out �Ax − b� for Example 1 above we get
� �
� c + 0d − 1 �
� �
� c + 1d − 4 � �
� � = (c + 0d − 1)2 + (c + 1d − 4)2 + (c + 2d − 2)2 + (c + 3d − 5)2 .
� c + 2d − 2 �
� �
c + 3d − 5
Each term under the square root can be interpreted as the square of the vertical
distance by which the line y = c + dx misses each data point. Our goal is to minimize
the sum of the squares of these errors. This is why such problems are called least
squares problems. (In statistics they are called linear regression problems.)
120 20. Least Squares and Projection

How do we find the x that minimizes �Ax − b�? First we view A as a map from
R to Rm . Then b and col(A) both lie in Rm . Note that b does not lie in col(A)
n

otherwise Ax = b would be solvable exactly. The matrix A takes vectors x to vectors


Ax in col(A).
Rm
b

Rn Ax-b
x A

Ax
col(A)

FIGURE 21

Our problem is to find the Ax that makes Ax − b as short as possible, or said another
way, to find a vector of the form Ax that is as close to b as possible. Intuitively
this occurs when Ax − b is orthogonal to col(A). (For a proof see Exercise 10.) And
this holds if and only if Ax − b is orthogonal to the columns of A, that is, if the
dot product Ax − b with each column of A is zero. If we write the columns of A
horizontally, we can express these conditions all at once as

   0
col 1 of A  ..
 col 2 of A   . 0
 ..   Ax − b 
= 
 .   ...  .
..
col n of A . 0

This is just AT (Ax − b) = 0, which can be rewritten as AT Ax − AT b = 0 or as

AT Ax = AT b.

These are called the normal equations for the least squares problem Ax = b. They
form an n × n linear system that can be solved by Gaussian elimination. We sum-
marize: The least squares solution to the overdetermined inconsistent linear system
Ax ≈ b is defined to be that vector x that minimizes the length of the vector Ax − b.
It is found as the exact solution to the normal equations AT Ax = AT b. We can now
solve the two problems at the beginning of this section.
20. Least Squares and Projections 121

Example 1 again: The normal equations for this problem are


   
� � 1 0 � � � � 1
1 1 1 1 1 1 c 1 1 1 1 4
  =  
0 1 2 3 1 2 d 0 1 2 3 2
1 3 5

or multiplied out are � �� � � �


4 6 c 12
=
6 14 d 23
and the solution by Gaussian elimination is
� � � �
c 1.5
= .
d 1

So the best fit line in the least squares sense is y = 1.5 + x.

Example 2 again: The normal equations for this problem are


   
1 1 30.006
� � 2 1� � � � 44.013 
   
1 2 1 2 2 2 1 2 N 1 2 1 2 2 2  46.006 
  =  
1 1 2 3 5 4 2 3 O 1 1 2 3 5 4  76.012 
   
2 5 108.010
2 4 92.011

or multiplied out are � �� � � �


18 29 N 716.104
=
29 56 O 1302.161
and the solution by Gaussian elimination is
� � � �
N 14.0069
= .
O 15.9993

It is clear that the matrix AT A is square and symmetric (see Section 2 Exercise
6(e)). But when we said that the least squares solution is the solution of the normal
equations, we were implicitly assuming that the normal equations could be solved,
that is, that AT A is nonsingular. This is true if the columns of A are independent,
because in that case we have AT Ax = 0 ⇒ xT AT Ax = 0 ⇒ (Ax)T (Ax) = 0 ⇒
�Ax�2 = 0 ⇒ Ax = 0 ⇒ x = 0. But if the columns of A are not independent, then
AT A will be singular. In fact, for large scale problems AT A is usually singular, or is
so close to being singular that Gaussian elimination tends to give very inaccurate an-
swers. For such problems it is necessary to use more numerically stable methods such
as the QR factorization (see the next section) or the singular value decomposition.
122 20. Least Squares and Projection

In solving the least squares problem, we have inadvertently found the solution
to a seemingly unrelated problem: the computation of projection matrices. From
our geometrical considerations, the vector p = Ax is the orthogonal projection of the
vector b onto the subspace col(A). Solving the normal equations for x we obtain x =
(AT A)−1 AT b, and putting this expression back into p we obtain p = A(AT A)−1 AT b.
Therefore, to find the projection of any vector b onto col(A), we simply multiply b
by the matrix P = A(AT A)−1 AT . We conclude that

P = A(AT A)−1 AT

is the matrix that projects Rm onto the subspace col(A).

Example
  3:Find
 the matrix that projects R onto the plane spanned by the vectors
3

1 2
 0  and  1 . First line up the two vectors (in any order) to form the matrix
1  1
1 2
A = 0 1 , and then compute

1 1

P = A(AT A)−1 AT
   −1
1 2 � � 1 2 � �
   1 0 1   1 0 1
= 0 1 0 1
2 1 1 2 1 1
1 1 1 1
 
1 2 �� ��−1 � �
2 3 1 0 1
= 0 1
3 6 2 1 1
1 1
 
1 2 � �� �
2 −1 1 0 1
= 0 1
−1 2
2 1 1
1 1 3
2 1 1 
3 3 3
 1 2
− 13 
= 3 3 .
1
3 − 13 2
3

Just as in the case of least squares, the columns of A must be independent for this
to work; that is, the two given vectors must form a basis for the subspace to be
projected onto.

Note that P in the example above is symmetric. It turns out that this is true of any
projection matrix (Exercise 9(a)). Furthermore, projection matrices also satisfy the
property P 2 = P (Exercise 9(a)). These observations also go in the other direction;
20. Least Squares and Projections 123

that is, any matrix P that satisfies P T = P and P 2 = P is the projection matrix
of Rm onto col(P ). We need only verify that P x − x is orthogonal to col(P ) for
any vector x. We check all the required dot products at once with the computation
P T (P x − x) = P (P x − x) = P 2 x − P x = P x − P x = 0.
Projection matrices can be used to compute reflection matrices. First we have
to precisely define what we mean by a reflection. Let S be a subspace of Rm .
Any vector x can be written as x = P x + (x − P x) where P x is the projection
of x onto S and x − P x is the component of x orthogonal to S. If we reverse the
direction of x − P x we get a new vector y = P x − (x − P x) which we define to
be the reflection of x across the subspace S. Note that y can then be written as
y = P x − x + P x = 2P x − x = (2P − I)x, and therefore the matrix R = 2P − I
reflects Rm across the subspace S.

x
x - Px S

Px -(x - Px)

FIGURE 22

The equation x = P x + (x − P x) above also shows that any vector x can be re-
solved into a component in S and a component in S ⊥ . Furthermore, since orthogonal
vectors are linearly independent (Section 17 Exercise 11), this resolution is unique.
From this we can see more precisely how any matrix A behaves as a linear transfor-
mation from one Euclidean space to another. Let S = null(A), so that S ⊥ = row(A).
Then any vector x can be expressed uniquely as x = n + r, where n is in null(A) and
r is in col(A). Applying A to x we obtain Ax = An + Ar = 0 + Ar. This shows that
A essentially projects x onto r in row(A) and then maps r to a unique vector Ar
in col(A). Any matrix can therefore be visualized as a projection onto its row space
followed by one-one linear transformation of its row space onto its column space.

EXERCISES

1. Solve Ax = b in the least squares sense for the two cases below.
   
1 0 5
0 1 4
(a) A =   and b =  
1 1 6
1 2 4
124 20. Least Squares and Projection

   
1 4 −1 −1
2 3 1   2 
(b) A =   and b =  
0 3 1 −1
1 2 −1 1

2. For each case below find the line or surface of the indicated type that best fits the
given data in the least squares sense.
(a) y = ax: (1, 5), (2, 3), (−1, 3), (3, 4), (0, 1)
(b) y = a + bx: (0, 0), (1, 1), (3, 2), (4, 5)
(c) z = a + bx + cy: (0, 1, 6), (1, 0, 5), (0, 0, 1), (1, 1, 6)
(d) z = a + bx2 + cy 2 : (0, 1, 10), (0, 2, 5), (−1, 1, 20), (1, 0, 15)
(e) y = a + bt + ct2 : (1, 5), (0, −6), (2, 8), (−1, 5)
(f) y = a + b cos t + c sin t: (0, 3), ( π2 , 5), (− π2 , 3), (π, −3)

3. We want to use the following molecular weights of sulfides of copper and iron to
compute the atomic weights of copper, iron, and sulfur.
Cu2 S CuS FeS Fe3 S4 Fe2 S3 FeS2
159.15 95.61 87.92 295.81 207.90 119.98
Express this problem as an overdetermined linear system. Write down the normal
equations. Do not solve them!

4. Find the projection matrices for �the� indicated subspaces below.


1
(a) R2 onto the line generated by .
3
 
2
(b) R3 onto the line generated by  1 .
2
   
1 −2
(c) R3 onto the plane spanned by  1  ,  1 .
1 1
   
1 1
(d) R3 onto the plane spanned by  0  ,  1 .
1 1
   
1 0
1 0
(e) R4 onto the plane spanned by   ,  .
0 1
1 0
 
1
5. Find the projection of the vector 2  onto the plane in Exercise 4(c) above.

3
20. Least Squares and Projections 125

6. Find the reflection matrix of R3 across the plane in Exercise 4(c) above.

7. Find the projection matrices for the indicated subspaces below.


(a) R2 onto the line y = 2x.
(b) R3 onto the plane x − y − 2z = 0.

8. Show that as transformations the matrices below have the following geometric
interpretations.
� �
−1 0
(a) (i) Reflection through the origin, (ii) rotation by π radians, (iii)
0 −1
reflection across the x-axis and reflection across the y-axis.
 
−1 0 0
(b)  0 −1 0  (i) Reflection across the z-axis and (ii) rotation by π radians
0 0 1
around the z-axis.
 
−1 0 0
(c)  0 −1 0  (i) Reflection through the origin and (ii) rotation by π radians
0 0 −1
around the z-axis and reflection across the xy-plane.

9. Use matrix algebra to prove


(a) if P = A(AT A)−1 AT , then P T = P and P 2 = P .
(b) if R = 2P − I, then RT = R and R2 = I.

10. If S is a subspace of Rn , b is a vector not in S, and w is a vector in S such that


b − w is orthogonal to S, then show �b − w� ≤ �b − z� where z is any other vector in
S. (Use the Pythagorean Theorem on the right triangle with sides b − w and z − w.)
Conclude that w is the unique point in S closest to b.
b

S w

FIGURE 23
126 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization

21. ORTHOGONAL MATRICES, GRAM-SCHMIDT, AND QR FACTORIZA-


TION

A set of vectors q1 , q2 , · · · , qn is orthogonal if every pair of vectors in the set is


orthogonal, that is, qi · qj = 0 for i �= j. Furthermore the set is orthonormal if all
the vectors in the set are unit vectors, that is, �q1 � = �q2 � = · · · = �qn � = 1. We
know that such a set of vectors is linearly independent (Section 17 Exercise 11). We
say that it forms an orthogonal or orthonormal basis (whichever the case) for the
subspace that it spans.
� � � �
1 0
Example 1: In R the coordinate vectors
2
and form an orthonormal basis,
� � � � 0 1
3 4
while the vectors and form an orthogonal basis. If we divide the second
4 −3
two vectors by their lengths to make them unit � �vectors� (this� is called normalizing the
3 4
5 5
vectors), we obtain the orthonormal basis 4 and . Since we have a basis,
5 − 35
we should be able to express any vector in R as a linear combination
2
�3� � of these two

� � 4
2 5 5
vectors. Suppose, for example, we want to write = c 4 +d . As we
7 5 − 35
�3 4
�� � � �
5 5 c 2
have done many times before, we rewrite this as 4 = and solve
5 −53
d 7
by Gaussian elimination. But this time the coefficient matrix has a special form: its
columns are orthonormal. We will see in a moment that this fact will enable us to
solve the system much more easily than by using Gaussian elimination.

We say that a square matrix Q is an orthogonal matrix if its columns are orthonormal.
(It is not called an orthonormal matrix even though that might make more sense.)
Clearly the columns of Q are orthonormal if and only if QT Q = I, which can therefore
be taken as the defining condition for a matrix to be orthogonal.

Example 2: Here are some orthogonal matrices. These are especially nice ones be-
cause they don’t involve square roots.
 2 2   3 6 
�3 4
� 3 3 − 13 −7 2
7 7
5 5  2 − 1 2   6 3 2 
4 3  3 3 3   7 7 7 
5 −5 1 2 2 2
−3 3 3 7 − 7 37
6

 1 
4 8 1   10 10 5  2 − 12 − 12 − 12
−9 −9  −1
7
9
4 4   10
15 15
11
15
2   2
1
2 − 12 − 12 

9 9 − 9   15 − 15 15   1 1 
4 1 8 5 2 14  − 2 − 12 1
2 −2 
9 9 9 − 15 − 15 15
− 12 − 12 − 12 1
2
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 127

Now we make a series of observations about orthogonal matrices.


1. From the defining condition for an orthogonal matrix QT Q = I we immediately
have Q−1 = QT . This suggests that to solve a system Qx = b with an orthogonal
coefficient matrix like that in Example 1 above, we just multiply both sides by QT to
obtain QT Qx = QT b or x = QT b. Thus a linear system with an orthogonal coefficient
matrix can by solved by a simple matrix multiplication.
2. Since QT is the inverse of Q, we also have QQT = QQ−1 = I. This immediately
says that the rows ofan orthogonal
 matrix are orthonormal as well as the columns!
2 2
3 3
 2 − 13 
3. The matrix Q =  3  has orthonormal columns but is not an orthogonal
− 13 2
3
matrix because it is not square. Note that QT Q = I but QQT �= I. Check it!
4. As a transformation an orthogonal matrix Q preserves length, distance, dot prod-
uct, and angles. Let’s consider each separately.
(a) length: �Qx�2 = (Qx)T (Qx) = xT QT Qx = xT x = �x�2 ⇒ �Qx� = �x�.
(b) distance: From (a) and �Qx − Qy� = �Q(x − y)� = �x − y�.
(c) dot product: Qx · Qy = (Qx)T (Qy) = xT QT Qy = xT y = x · y.
(d) angles: The angle between Qx and Qy is given by arccos((Qx · Qy)/(�Qx��Qy�))
which from (a) and (c) equals arccos((x · y)/(�x��y�)) which is the angle between x
and y.
5. If a matrix Q preserves length, it must be orthogonal. This is the converse of 4(a)
above. Since Q preserves length, it preserves distance (as in 4(b) above). By the
SSS congruence theorem of Euclidean geometry this implies that Q takes triangles
into congruent triangles and therefore preserves angles. Another way to prove this
is to show that Q must preserve dot products and, since angles can be expressed in
terms of the dot product, must preserve angles also. (See Exercise 18 where even
more is proved.) Since Q preserves lengths and angles, it takes orthonormal sets
into orthonormal sets. In particular Q takes the coordinate vectors of Rn into an or-
thonormal set, but this set consists of the columns of Q. Therefore Q has orthonormal
columns and so is an orthogonal matrix.
We leave orthogonal matrices for a moment and consider a seemingly unrelated
problem: Given a basis v1 , v2 , . . . , vn of a subspace V , find an orthonormal basis
q1 , q2 , . . . , qn for V . We will use an example to illustrate a method for doing this.
For simplicity, instead of a subspace, we will take all of R3 . Suppose we are given
the following basis for R3 :

     
−2 2 7
v1 =  −2  , v2 = 8  ,
 v3 = 7  .

1 2 1

We will first find an orthogonal basis p1 , p2 , p3 , and then normalize it to get the
128 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization

orthonormal basis q1 , q2 , q3 . The first step is to set p1 = v1 :


 
−2
p1 =  −2  .
1
The second step is to find a vector p2 that is orthogonal to p1 and such that
span{p1 , p2 } = span{v1 , v2 }. We can accomplish this by defining p2 to be the com-
ponent of v2 orthogonal to p1 . Just subtract from v2 its projection onto p1 :
v2 · p1
p2 = v2 − p1
p 1 · p1
   
2 −2
−18 
= 8 − −2 
9
2 1
 
−2
= 4 .

4

p3
v3

p2

v2

v 1 = p1

FIGURE 24

The third step is to find a vector p3 that is orthogonal to p1 and p2 and such that
span{p1 , p2 , p3 } = span{v1 , v2 , v3 }. We can accomplish this by defining p3 to be the
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 129

component of v3 orthogonal to span{p1 , p2 }. Just subtract from v3 its projection


onto span{p1 , p2 }. To find this projection we don’t have to compute a projection
matrix as might be expected. All we have to do is to subtract off the projection of
v3 onto p1 and p2 separately. (This works because p1 and p2 are orthogonal. See
Exercise 12.)
v3 · p1 v3 · p2
p3 = v3 − p1 − p2
p1 · p1 p2 · p2
     
7 −2 −2
−27  18 
= 7 − −2  − 4 
9 36
1 1 4
 
2
=  −1  .
2
At each stage the p’s and the v’s just are linear combinations of each other, so
we have span{p1 } = span{v1 }, span{p1 , p2 } = span{v1 , v2 }, and span{p1 , p2 , p3 } =
span{v1 , v2 , v3 }. Finally we normalize the p’s to obtain the orthonormal q’s:
 2  1  2 
−3 −3 3
 −2   2   −1 
q1 =  3  , q2 =  3  , q3 =  3  .
1 2 2
3 3 3
The method that we have just illustrated is called the Gram-Schmidt process. It
should be clear how to extend it to larger numbers of vectors.
We can also express the result of the Gram-Schmidt process in terms of matrices.
First note that

v1 is in span{q1 }
v2 is in span{q1 , q2 }
v3 is in span{q1 , q2 , q3 }.

Using matrices this can be written as


 . .. ..   .. .. ..   
.. . . . . . ∗ ∗ ∗
   
 v1 v2 v3  =  q1 q2 q3   0 ∗ ∗ 
.. .. .. .. .. .. 0 0 ∗
. . . . . .
If we define A to be the matrix with columns v1 , v2 , v3 , Q to be the matrix with
columns q1 , q2 , q3 , and R to be the appropriate upper triangular matrix, then we
have A = QR. We can interpret this as a factorization of the matrix A into an
orthogonal matrix times an upper triangular matrix. For our example this looks like
   −2 −1 2  
−2 2 7 3 3 3 ∗ ∗ ∗
 −2 8 7  =   −3
2 2
3 − 13 
0 ∗ ∗.
1 2 1 1
3
2
3
2
3
0 0 ∗
130 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization

It is easy to find R. Just multiply the equation A = QR by QT on the left to obtain


R = QT A:
     .. .. .. 
∗ ∗ ∗ · · · q1 · · · . . .
 0 ∗ ∗  =  · · · q2 · · ·    v1 v2 v3 

0 0 ∗ · · · q3 · · · .. .. ..
. . .
 2 2 1  
−3 −3 3 −2 2 7
 −1 2 2 
= 3 3 3  −2 8 7
2
3
1
−3 3 2 1 2 1
 
3 −6 −9
= 0 6 3 .
0 0 3
We finally have
   −2 − 13 2  
−2 2 7 3 3 3 −6 −9
 −2  2 2
− 13 
8 7  =  −3 3  0 6 3 .
1 2 1 1
3
2
3
2
3
0 0 3

This shows that any square matrix A with independent columns has a factorization
A = QR into an orthogonal Q and an upper triangular R. In fact, we can make an
even more general statement. Suppose that we had started with the matrix
 
−2 2
B =  −2 8 
1 2

Then we would have had the factorization


   −2 −1 
−2 2 3 3 � �

 −2 8  =  − 3 2 2  3 −6
3  .
0 6
1 2 1
3
2
3

We see that B = QR where now Q has orthonormal columns but is not orthogonal!
Fortunately QT Q = I is still true so the method above to find R still works. We
conclude that any matrix A with independent columns has a factorization of the
form A = QR where Q has orthonormal columns and R is upper triangular. This is
called the QR factorization and is the third great matrix factorization that we have
seen (after the LU and diagonal factorizations). Actually, it is possible to obtain
a QR-like factorization for any matrix whatever, but we will stop here. Note that
Gram-Schmidt process, on which all this is based, is the first truly new computational
technique we have had since we first introduced Gaussian elimination! In fact, there
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 131

2n3
are efficient algorithms that can perform Gram-Schmidt in operations, which
3
makes it competitive with Gaussian elimination in many situations.
The QR factorization has a wide range of applications. We mention two. For the
first, recall an overdetermined inconsistent system Ax = b has a least squares solution
given by the normal equations AT Ax = AT b. Suppose we have the QR factorization
A = QR. Then plugging into the normal equations we obtain (QR)T QRx = (QR)T b
or RT QT QRx = RT QT b or RT Rx = RT QT b. Since RT is nonsingular (it’s triangular
with nonzeros down its diagonal), we can multiply through by (RT )−1 to obtain

Rx = QT b.

This equation is another matrix expression of the normal equations. Since R is upper
triangular, it can be solved simply by back substitution. Of course, most of the work
was done in finding the QR factorization of A in the first place. In practice the QR
method preferable to solving the normal equations directly since the Gram-Schmidt
process for finding the QR factorization is more numerically stable than Gaussian
elimination.

Example 3: Recall the system


   
1 0 � � 1
1 1 c 4
  =  .
1 2 d 2
1 3 5

from the line fitting problem of Section 20. We find the QR factorization of the
coefficient matrix: 1 
2 − √320
 
1 0 1 
 − √120  � �
1 1  2  2 √3
 =1  .
1 2 2 √1  0 5
 20 
1 3 1 3 √
2 20

This gives the normal equations in the form


 
� �� � � 1 1 1 1 � 1
2 √3 c 2 2 2 2 4
=  
0 5 d − √320 − √120 √1
20
√3
20
2
5
� �
√6
= .
5

The solution is c = 1.5 and d = 1 as before.


132 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization

The second application of the QR factorization is to computing projection ma-


trices. If we have the QR factorization A = QR, the projection matrix P of Rn onto
col(A) becomes
P = A(AT A)−1 AT
= QR((QR)T (QR))−1 (QR)T
= QR(RT QT QR)−1 RT QT
= QR(RT R)−1 RT QT
= QRR−1 (RT )−1 RT QT
= QQT .
So the projection matrix assumes a very simple form: P = QQT . Of course, again
all the work has been done earlier in finding the QR factorization of A.

Example 4: Suppose
 we
 want
 the projection matrix P of R3 onto the subspace
1 2
spanned by 0 and 1 . (This is Section 20 Example 3.) We construct the
  
1 1
matrix A with these two vectors as its columns and find its QR factorization:
 √1 √1

  √ 
1 2 
2 6
 2 √3
2
0 1 =  0 √2   √ .
 6 
0 √3
1 1 √1 − √1 2
2 6

Then the projection matrix is


 √1 √1

� √1 �

2 6
 0 √1
P = 0 √2
 2 2
 6  √1 √2 − √16
√1 6 6
2
− √16
 2 1 1 
3 3 3
 1 2 1 
= 3 3 − 3 .
1 1 2
3 −3 3

EXERCISES

1. Use the Gram-Schmidt process to orthonormalize the following sets of vectors.


� � � �
5 −22
(a) ,
12 −19
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 133
    
−3 1 1
(b)   
6 , −9   5 
2 4 11
     
1 1 0 1
 −1   −1   −2   0 
(c)  ,   
−1 1 0 0
−1 1 −2 −1
   
−10 20
(d)  11  ,  −7 
2 26
    
1 2 3
 −1   −2   0 
(e)  ,  
−1 −1 −1
−1 −1 2

2. Find the QR factorizations of the following matrices.


� �
5 −22
(a)
12 −19
 
−3 1 1
(b)  6 −9 5 
2 4 11
 
1 1 0 1
 −1 −1 −2 0 
(c)  
−1 1 0 0
−1 1 −2 −1
 
−10 20
(d)  11 −7 
2 26
 
1 2 3
 −1 −2 0 
(e)  
−1 −1 −1
−1 −1 2

   2  2   1
3 3 3 −3
 2   1   2 
3. Express  9  as a linear combination of the vectors  3 , − 3 ,  3 .
3 − 13 2
3
2
3
134 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
  4
− 89 9
 4  7
4. Extend the orthonormal set  9 ,  9  to a basis of R3 , or, what is the same
1 4
9 9
 
− 89 4
9 ∗
 4 7
∗
thing, find a third column that makes the matrix  9 9  orthogonal.
1 4
9 9 ∗

   
1 0 � � 5
0 1 x 4
5. Use the QR factorization to find the least squares solution of   =  .
1 1 y 6
1 2 4

6. Use the QR factorization to find the projection matrix of R4 onto the plane
   
1 0
 −1   0 
spanned by the vectors   and  .
−1 −2
1 −2

7. Show that if Q is an orthogonal matrix then det(Q) = ±1.

8. Show that if Q1 and Q2 are orthogonal matrices, then so is Q1 Q2 .

9. Show that if Q is an orthogonal matrix, then QT AQ has the same eigenvalues as


A.

10. Which of the following transformations are orthogonal: rotations, reflections, or


projections?
� �
α ∗
11. Let Q = be an orthogonal matrix.
β ∗
� � � �
α −β
(a) Show that the only unit vectors that are orthogonal to are and
� � � � � � β α
β α −β α β
. Conclude that Q = or .
−α β α β −α
� �
β
(b) Show that Q must be a rotation by arctan or a reflection in the line that
� � α
1 β
makes an angle of arctan with the x-axis.
2 α
(c) Conclude that any orthogonal transformation of R2 must be a rotation or a
reflection.
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 135

12. If p1 , p2 , �
· · · , pm is�an orthogonal
� basis for a�subspace �
� S of Rn , v is a vector outside
v · p1 v · p2 v · pm
S, and w = p1 + p2 + · · · pm , then show v − w ⊥ S.
p1 · p1 p2 · p2 pm · p m
(Hint: Verify v − w ⊥ pi for all i.) Conclude that w is the orthogonal projection of
v onto S.

13. How would you extend an orthonormal basis v1 , v2 , · · · , vp of a subspace V of Rn


to an orthonormal basis v1 , v2 , · · · , vp , vp+1 , · · · , vn of all of Rn ?

14. If P is a projection, show (2P − I)T (2P − I) = I. Conclude that any reflection
is an orthogonal transformation.

15. If Q is orthogonal, then Q−1 = ?


� � � � � � � �
1 −3 1 −2
16. If T = and T = , then is T orthogonal?
3 1 2 −1

17. If A is n × n and Q is n × n orthogonal, then is


(a) AAT symmetric?
(b) AAT invertible?
(c) AAT orthogonal?
(d) QT symmetric?
(e) QT invertible?
(f) QT orthogonal?

18. If T is any transformation of Rn to itself that preserves distance and such that
T (0) = 0, then T is linear and can be represented as T (x) = Qx where Q is an
orthogonal matrix. This can be proved in the following way. (1) T preserves distance
and the origin ⇒ ||T (x)|| = ||x||, ||T (y)|| = ||y||, and ||T (x) − T (y)||2 = ||x − y||2 .
Expand this to show that T (x) · T (y) = x · y. (2) Expand ||cT (x) − T (cx)||2 and use
(1) to show that it equals zero. (3) Expand ||T (x + y) − T (x) − T (y)||2 and use (1)
to show that it equals zero. Conclude that T is linear and preserves dot products.
Interpret this as saying that any transformation that preserves length and the origin
must be linear and can be represented by an orthogonal matrix.
136 22. Diagonalization of Symmetric and Orthogonal Matrices

22. DIAGONALIZATION OF SYMMETRIC AND ORTHOGONAL MATRICES

In Sections 9 and 10 we learned how to find eigenvalues, eigenvectors, and diag-


onal factorizations. Our point of view was purely algebraic. Now we consider these
concepts geometrically. The first thing to mention is that all eigenvectors v associ-
ated with a particular eigenvalue λ of a matrix A form a subspace that we call the
eigenspace of A for the eigenvalue λ. (See Exercise 1)

Example
� 11 1:
� We now illustrate the geometry of diagonalization with the matrix
3
5 −5
2 4 , which has eigenvalues λ = 2 and λ = 1 with associated eigenvectors
5 5
� � � �
3 1
and . If we think in terms of how this matrix operates on its eigenvectors
1 2
we have
� 11 �
3 � � � � � 11 �
3 � � � �
5 − 5 3 3 5 − 5 1 1
=2 and =1 .
2
5
4
5 1 1 2
5
4
5 2 2

In this case the eigenspaces are the two lines generated by the two eigenvectors. A
maps each line to itself but stretches one by a factor of 2 and the other by a factor of
1. All other vectors are moved in more complicated ways. We can see how they are
moved by observing that,� since
� the
� �two eigenvectors form a basis for R , any vector in
2

3 1
R2 can be written as a +b . The numbers a and b are the coordinates of the
1 2
vector with respect
� �to the� skewed
� coordinate
� � �system
� defined by the two eigenvectors.
3 1 3 1
Since A maps a +b to 2a +b , we see that the effect of A is very
1 2 1 2
simple when viewed in this new coordinate system.

FIGURE 25
22. Diagonalization of Symmetric and Orthogonal Matrices 137

The diagonal factorization A = SDS −1 , which in this case looks like


� 11 � � �� �� �−1
5 − 35 3 1 2 0 3 1
= ,
2
5
4
5 1 2 0 1 1 2

also has a geometric interpretation illustrated by the diagram below. The diagram
means that a vector can be mapped horizontally by A (transcontinental railroad) or
around the horn by SDS −1 (clipper ship). In either case it will arrive at the same
destination. In particular we can watch how the eigenvectors are mapped. Since
� � � � � � � �
1 3 0 1
S = and S = ,
0 1 1 2

we have � � � � � � � �
3 1 1 0
S −1
= and S −1
= .
1 0 2 1
Therefore we see that the two eigenvectors are first taken to the two coordinate
vectors, then stretched by factors of 2 and 1, and finally sent back to stretched
versions of the original two eigenvectors.

1 1
2 2
A 2 3
1
3
1

S -1 S

0 0
1 D 1

1 1
0 2 0

FIGURE 26

Now we cover some points that were skipped over in Section 11.
1. To construct the diagonal factorization A = SDS −1 we need n linearly indepen-
dent eigenvectors to serve as the columns of S. The independence of the columns
138 22. Diagonalization of Symmetric and Orthogonal Matrices

will insure that S −1 exists (see Section 19). The problem of diagonalization therefore
reduces to the question of whether there are enough independent eigenvectors.
2. Eigenvectors that are associated with distinct eigenvalues are linearly independent.
In other words, if v1 , v2 , · · · , vn are eigenvectors for A with associated eigenvalues
λ1 , λ2 , · · · , λn where λi �= λj for all i �= j, then all the v’s are linearly independent.
To see this, assume it is not true and find the first vector vi (reading from left to
right) that can be written as a linear combination of the v’s to its left. Suppose this
vector is v5 . Then we know that v1 , v2 , v3 , v4 are linearly independent, and therefore
we have an equation of the form v5 = c1 v1 + c2 v2 + c3 v3 + c4 v4 . Multiply one copy of
this equation by A to obtain λ5 c5 = c1 λ1 v1 + c2 λ2 v2 + c3 λ3 v3 + c4 λ4 v4 and another
copy by λ5 to obtain λ5 v5 = c1 λ5 v1 + c2 λ5 v2 + c3 λ5 v3 + c4 λ5 v4 . Subtracting one from
the other gives 0 = c1 (λ1 −λ5 )v1 c1 +c2 (λ2 −λ5 )v2 +c3 (λ3 −λ5 )v3 +c4 (λ4 −λ5 )v4 . Since
v1 , v2 , v3 , v4 are independent, all the coefficients in this equation must equal zero. But
since all the λ’s are different, the only way this can happen is if c1 = c2 = c3 = c4 = 0.
But this means that v5 = 0, a contradiction. From this result we see that an n × n
matrix is diagonalizable if there are n real and distinct eigenvalues.
3. Unfortunately there are many � interesting
� matrices that have repeated
� �eigenvalues.
2 3 2 0
For example the shear matrix and the diagonal matrix both have
0 2 0 2
eigenvalues λ = 2, 2 (meaning that the eigenvalue is repeated), but the shear matrix
has only one independent eigenvector whereas the diagonal matrix has two. What
is the relationship in general between the number of independent eigenvectors asso-
ciated with a particular eigenvalue λ0 of a matrix A and the number of times λ0 is
repeated as a root of the characteristic polynomial of A? If we define the first number
to be the geometric multiplicity of λ0 and the second to be the algebraic multiplicity
of λ0 , then we can state the answer to this question formally as follows.
Theorem. For any eigenvalue, geometric multiplicity ≤ algebraic multiplicity.
Proof: Suppose λ0 has geometric multiplicity p, meaning that there are p inde-
pendent eigenvectors v1 , v2 , · · · , vp for λ0 . Expand this set of vectors to a basis
v1 , v2 , · · · , vp , · · · , vn for Rn . Then we have
 
λ0 · · · 0
     .. . . . 
.. .. .. .. .. ..  . . .. D 
 . . .   . . . 
 0 · · · λ


A  v1 vp vn  
 =  v1 vp vn 
0
,
.. .. .. .. .. 
..  .0 · · · 0 
.. 
. . . . . .  .. . E 
0 ··· 0

which can be written A = SBS −1 where S is the matrix of column


� vectors
� and
C D
B is the matrix on the extreme right. Then B has the form , and so
0 E
the characteristic polynomial det(A − λI) = det(B − λI) = det(C − λI) det(E −
22. Diagonalization of Symmetric and Orthogonal Matrices 139

λI) = (λ0 − λ)p det(E − λI) (see Section 9 Exercise 3), meaning that the algebraic
multiplicity of λ0 is at least p. This ends the proof.

There are important classes of matrices that always have diagonal factorizations.
In particular we will now investigate symmetric and orthogonal matrices and show
that they always have especially nice diagonal, or at least diagonal-like, factorizations.
� �
41 −12
Example 2: Consider the symmetric matrix A = . As usual, we com-
−12 34 � � � �
3 4
pute the eigenvalues 25 and 50, the corresponding eigenvectors and , and
4 −3
set up the factorization
� �� �� �−1
3 4 25 0 3 4
A=
4 −3 0 50 4 −3

But note that the two eigenvectors have a very special property: they are orthogonal.
We can therefore normalize them so that the factorization becomes
�3 4
�� �� 3 4
�−1
5 5 25 0 5 5
A= 4 ,
5 − 3
5 0 50 4
5 − 3
5

which has the form A = QDQ−1 where Q is an orthogonal matrix. Because Q is


orthogonal, we also have Q−1 = QT , so we can write the factorization as A = QDQT
or as �3 ��
4 �� 3 4
�T
5 5 25 0 5 5
A= 4 .
5 − 3
5 0 50 4
5 − 3
5

As in Example 1 the eigenvectors set up a coordinate system with respect to which


the action of A is very simple. The difference is that this time the coordinate system
is rectangular.
 
4 −2 −2
Example 3: Consider the symmetric matrix A =  −2 4 2 . We compute the
−2 2 4
eigenvalues λ = 8, 2, 2 and the corresponding eigenvectors
     
−1 1 1
 1 , 1, 0.
1 0 1

The first vector is orthogonal to the second and third, but those two are not or-
thogonal to each other. They are however both associated with the eigenvalue 2, so
140 22. Diagonalization of Symmetric and Orthogonal Matrices

they generate the eigenspace, in this case a plane, of the eigenvalue 2. If we run the
Gram-Schmidt process on these two eigenvectors, we will stay within the eigenspace
and generate the two orthonormal eigenvectors
 1   √1 
√ 6
2
 √1  √ 
 2  −16  .
 
0 √2
6

If we normalize the first eigenvector and assemble all the pieces, we obtain the fac-
torization
 √1    − √1 T
− 3 √12 √1
6 8 0 0 3
√1
2
√1
6
 √1   √1 
A= √1 − √16  
 0 2 0  3 √1 − √16 
 3 2   2 
√1 0 √2 0 0 2 √1 0 √2
3 6 3 6

Again it has the form QDQT where Q is an orthogonal matrix.

In the previous two examples, eigenvectors that come from different eigenvalues
seemed to be automatically orthogonal. This is in fact true for any symmetric matrix
A. We prove this by letting Av = λv and Aw = µw where λ �= µ and noting that
λv · w = wT λv = wT Av = (wT Av)T = v T Aw = v T µw = µv · w ⇒ (λ − µ)v · w = 0 ⇒
v · w = 0. (Justify each step.)
Can every symmetric matrix be factored as in the previous two examples? That
is, does every symmetric have a diagonal factorization through orthogonal matrices,
or said another way, does every symmetric matrix have an orthonormal basis of eigen-
vectors? The answer is yes, and such a factorization is called a spectral factorization.
We state this formally in the following theorem, which is one of the most important
results of linear algebra.
The Spectral Theorem. If A is a symmetric n × n matrix, then A has n real
eigenvalues (counting multiplicities) λ1 , λ2 , · · · , λn and its corresponding eigenvectors
form an orthonormal basis with respect to which A takes the form
 
λ1
 λ2 
 . 
 .. 
λn

or, in orther words, A can be expressed as A = QDQT where Q is orthogonal and D


as above.
Proof: We have to temporarily view A as a transformation of complex n-dimensional
space C n . Since the characteristic equation det(A − λI) = an λn + an−1 λn−1 + · · · +
22. Diagonalization of Symmetric and Orthogonal Matrices 141

a0 = 0 involves a polynomial of degree n, the Fundamental Theorem of Algebra


tells us that there are n (possibly complex) roots. If λ0 is one such root, then it
is an eigenvalue of A, so there is a vector v such that Av = λ0 v. If λ0 and v are
complex, then taking complex conjugates we have Av = λ0 v (Section 13 Exercise 2),
so that λ0 is also an eigenvalue with eigenvector v. We therefore have the equality
λ0 v T v = v T Av = (v T Av)T = v T Av = λ0 v T v. Canceling v T v (justified by Exercise 9)
we get λ0 = λ0 . Since λ0 equals its own conjugate, it must be real. (The eigenvector
v may not be real, but the fact that λ0 is a real eigenvalue ⇒ det(A − λ0 I) = 0 ⇒ the
real matrix A − λ0 I is singular ⇒ there is some real eigenvector for λ0 .) Therefore
every symmetric n × n matrix has n real eigenvalues (counting multiplicities).
The rest of the proof takes place in the real world and proceeds in steps. To
illustrate the proof, we let A be a 4 × 4 matrix. A has an eigenvalue λ1 (which
could, in the worst case, be repeated four times) with eigenvector v1 . Normalize v1
and expand it to an orthonormal basis of R4 . Let Q1 be the orthogonal matrix with
these vectors as its columns. (The first column is v1 .) Then we have
 
λ1 ∗ ∗ ∗
 0 ∗ ∗ ∗
AQ1 = Q1  .
0 ∗ ∗ ∗
0 ∗ ∗ ∗

But since QT1 AQ1 is symmetric (see Section 2 Exercise 6(f)), we can conclude that
 
λ1 0 0 0
 0 ∗ ∗ ∗
AQ1 = Q1  .
0 ∗ ∗ ∗
0 ∗ ∗ ∗
Let A2 be the 3 × 3 matrix in the lower right corner of the last factor on the right.
Then A2 is symmetric and, except for λ1 , has the same eigenvalues as A (see Section
9 Exercise 3). This ends step one.
Since A2 is symmetric, it has an eigenvalue λ2 with eigenvector v2 . Normalize
v2 and expand it to an orthonormal basis of R3 . Let U2 be the orthogonal matrix
with these vectors as its columns. (The first column is v2 .) Then as above we have
 
λ2 0 0
A2 U2 = U2  0 ∗ ∗  .
0 ∗ ∗

Putting this together with the result of step one, we have


 T  
1 0 0 0 1 0 0 0
0  T 0 
  Q1 AQ1  
0 U2 0 U2
0 0
142 22. Diagonalization of Symmetric and Orthogonal Matrices

 T   
1 0 0 0 λ1 0 0 0 1 0 0 0
0   0 0 
=    
0 U2 0 A2 0 U2
0 0 0
 
λ1 0 0 0
 0 λ2 0 0
= 
0 0 ∗ ∗
0 0 ∗ ∗
or letting Q2 equal the product of Q1 and the matrix containing U2 we have
 
λ1 0 0 0
 0 λ2 0 0 
QT2 AQ2 =  
0 0 ∗ ∗
0 0 ∗ ∗
Q2 is the product of orthogonal matrices and is therefore orthogonal (Section 21
Exercise 8). Let A3 be the 2 × 2 matrix in the lower right corner of the last factor on
the right. Then A3 is symmetric and, except for λ1 and λ2 , has the same eigenvalues
as A. This ends step two. In general, we continue in this manner until we obtain
 
λ1
 λ2 
QT AQ =  . ..


λn
This proves the Spectral Theorem.

The Spectral Theorem has many applications, which we will not pursue here.
Instead we will end with a spectral-like factorization for orthogonal matrices. Of
course, orthogonal matrices are not necessarily symmetric, so the Spectral Theorem
does not apply. In fact, most orthogonal
� �matrices are not diagonalizable at all as
0 −1
in the case of the rotation matrix . But let’s push ahead anyway with the
1 0
following example.

Example 4: We consider the orthogonal matrix


 2 2 
3 3 − 13
 −1 2 2 
A=  3 3 3 .
2
3 − 13 2
3

The characteristic equation for A is x3 − 2x2 + 2x − 1 = 0. We find its roots and use
Gaussian elimination with complex arithmetic as in Section 13 to obtain the following
three eigenvalue-eigenvector pairs:
22. Diagonalization of Symmetric and Orthogonal Matrices 143
   √   √ 
1 √ √3 + i √ √3 − i
1 1, 1
+i 3 − 3 + i, 1
−i 3 − 3 − i.
2 2 2 2
1 −2i 2i
We put all this together to obtain the complex diagonal factorization
 √ √ 
1 √3 + i √ 3 − i
A = 1 − 3 + i − 3 − i
1 −2i 2i
 
1 0 0

0 1 + i 3 0 
 2 2


0 0 1
2 −i 2
3
 √ √ −1
1 √3 + i √3 − i
1 − 3 + i − 3 − i .
1 −2i 2i
The equations for the second and third eigenvalue-eigenvector pairs can be written
as Av = λv and Av = λv. Just as in Section 13, we can therefore rewrite the
factorization in real form. Recall from that section that the equation Av = λv can
be written as A(x + iy) = (α + iβ)(x + iy), which when multiplied out becomes
Ax + iAy = (αx − βy) + i(βx + αy). Equating real and imaginary parts we obtain
Ax = αx − βy and Ay = βx + αy. This gives us the real block-diagonal factorization
 √ 1 0 0 
 √ −1
1 √ 3 1 √ 1 √3 1
 3 
A = 1 − 3 1 0
1
2 2 1 − 3 1 

1 0 −2 0 − 23 1 1 0 −2
2
Note that the columns of the first factor on the right are orthogonal, so that if we
normalize each column, we will have an orthogonal matrix. But we must be careful
that when we divide by lengths, the equations Ax = αx − βy and Ay = βx + αy
remain true. This can only be done if we divide x and y by the same number. In our
case, fortunately,
√ both the second and third columns, which correspond to x and y,
have length = 6. Therefore we are justified in writing
 √1 √1 √1
   √1 √1 √1
T
3 2 6 1 0 0 3 2 6

 √1 √1

√1   0 1 3   √1 

A= 3 − 6  2 2  − √12 √1 
2 √  3 6 
√1 0 − √2 0 − 23 1
2
√1 0 − √2
3 6 3 6

We have a factorization of the form A = QDQT where Q is orthogonal and D is


block-diagonal. We can now see the geometrical effect of A as a transformation of
R3 . The three columns of Q define an orthonormal basis, and A rotates R3 around
the axis defined by the first eigenvector by an angle of −π/3.

The kind of factorization we have just obtained can be realized for any orthogonal
matrix. We call it a real block-diagonal factorization .
144 22. Diagonalization of Symmetric and Orthogonal Matrices

Theorem. If A is an orthogonal matrix, then there is an orthonormal basis with


respect to which A takes the form
 
α1 β1
 −β1 α1 
 
 .. 
 . 
 
 αp βp 
 
 −βp αp 
 
 −1 
 .. 
 . 
 
 
 −1 
 
 1 
 . 
 .. 
1

or, in other words, A = QDQT where Q is orthogonal and D is as above.


Proof: First we investigate the nature of the eigenvalues. If λ is a possibly com-
plex eigenvalue of A, then Av = λv and Av = λv. From the computation v T v =
v T AT Av = (Av)T (Av) = λλv T v, cancelling v T v we obtain λλ = 1 or |λ| = 1. There-
fore A has n eigenvalues each of which is either ±1 or a complex number and its
complex conjugate both of length 1.
We will just give a sketch of the rest of the proof since it is very similar to that
of the Spectral Theorem. The proof proceeds in steps, and each step consists of two
cases. First, suppose A has eigenvalue λ = ±1 with eigenvector v. Normalize v and
expand it to an orthonormal basis of Rn , and let Q be the orthogonal matrix with
these vectors as its columns. Then we have
 
±1 ∗ ∗ ∗
 0 ∗ ∗ ∗
AQ= Q  .
0 ∗ ∗ ∗
0 ∗ ∗ ∗

But since QT AQ is orthogonal, we can conclude


 
±1 0 0 0
 0 ∗ ∗ ∗
AQ = Q  .
0 ∗ ∗ ∗
0 ∗ ∗ ∗

Let A2 be the matrix in the lower right corner of the last factor on the right. Then
A2 is orthogonal and, except for λ, has the same eigenvalues as A.
The second possibility is that λ is complex. Let x and y be the real and imaginary
parts of the eigenvector v. Assume for a moment that �x� = �y� and x · y = 0. Then
22. Diagonalization of Symmetric and Orthogonal Matrices 145

we can normalize x and y and still maintain the equations Ax = αx − βy and


Ay = βx + αy. Expand x and y into an orthonormal basis and let Q be the matrix
with these vectors as its columns. Then we have
 
α β ∗ ∗
 −β α ∗ ∗ 
AQ= Q  .
0 0 ∗ ∗
0 0 ∗ ∗

But since QT AQ is orthogonal, we can conclude (Exercise 10)


 
α β 0 0
 −β α 0 0
AQ = Q  .
0 0 ∗ ∗
0 0 ∗ ∗

Let A2 be as above, then A2 is orthogonal and, except for λ and λ, has the same
eigenvalues as A. This ends the first step. Continue in the obvious way as in the
Spectral Theorem.
We still have to prove �x� = �y� and x · y = 0. It is enough to show v T v = 0,
since then we would have v T v = (x+iy)T (x+iy) = x·x−y ·y +i2x·y = 0 ⇒ x·y = 0
and x · x = y · y or �x� = �y�. To show v T v = 0 we compute v T v = v T AT Av =
(Av)T (Av) = λ2 v T v. If v T v �= 0, then we could cancel it from both sides obtaining
λ2 = 1. But the only solutions to the equation λ2 = 1 are λ = ±1 (Exercise 11),
contradicting the assumption that λ is complex. Therefore v T v = 0. This ends the
proof.

Note that each consecutive pair of −1’s on the diagonal can be considered as a
plane rotation of π radians, and therefore they can be placed in the sequence of αβ
blocks. The block-diagonal matrix D then assumes the form
 
α1 β1
 −β1 α1 
 
 .. 
 . 
 
 αq βq 
 
 −βq αq .
 
 ±1 
 
 1 
 .. 
 . 
1

So we can say that an orthogonal transformation in Rn produces a rotation through


a certain angle in each of q mutually orthogonal planes and at most one reflection
146 22. Diagonalization of Symmetric and Orthogonal Matrices

that reverses one direction orthogonal to these planes. In R3 the only possibilities
are      
α β 0 −1 0 0 α β 0
 −β α 0   0 1 0  −β α 0 ,
0 0 1 0 0 1 0 0 −1
that is, a pure rotation, a pure reflection, or a rotation and reflection perpendicular
to the plane of rotation.
Finally we leave symmetric and orthogonal matrices and consider two important
scalar functions of arbitrary square matrices. They are the determinant and the
trace. The determinant of a matrix we already know something about. The trace of
a matrix A is defined as the sum of its diagonal elements

tr(A) = a11 + a22 + · · · + ann .

They both have simple and useful expressions in terms of the eigenvalues of A, which
are summarized in the following.
Theorem. The determinant of a matrix is equal to the product of its eigenvalues,
and the trace of a matrix is equal to the sum of its eigenvalues, both taken over the
complex numbers.
Proof: Consider the characteristic polynomial det(A − λI) of A.
 
a11 − λ a12 ··· a1n
 a21 a22 − λ ··· a2n 
det 
 .. .. .. 

. . .
an1 an2 ··· ann − λ
= (a11 − λ)(a22 − λ) · · · (ann − λ) + expressions in λn−2 , λn−3 , · · · , λ + constants
= (−λ)n + tr(A)(−λ)n−1 + · · · + det(A)

The first equality follows from the determinant formula. Note that the first term
contains all expressions involving λn and λn−1 . The second equality follows by simple
computation and the fact that det(A − 0I) = det(A). If λ1 , λ2 , · · · , λn are all the
eigenvalues of A, then the characteristic polynomial can also be written in factored
form as
det(A − λI) = C(λ1 − λ)(λ2 − λ) · · · (λn − λ)
= C[(−λ)n + (λ1 + λ2 + · · · + λn )(−λ)n−1 + · · · + λ1 λ2 · · · λn ]

Equating the two forms of the characteristic polynomial, we see that C = 1 and
therefore det(A) = λ1 λ2 · · · λn and tr(A) = λ1 + λ2 + · · · + λn .

These facts are useful in analyzing orthogonal transformations of R3 . Suppose


A is 3 × 3 orthogonal, so det(A) = ±1. From the considerations above A is a pure
22. Diagonalization of Symmetric and Orthogonal Matrices 147

rotation if and only if det(A) = 1. In this case tr(A) = 1 + 2α. Since α = cos θ where
θ is the angle of rotation, we have

tr(A) − 1
cos θ = .
2
This means that the angle of rotation can be computed without finding eigenvalues.
In particular, for the matrix
 2 2 
3 3 − 13
 −1 2 2 
A=  3 3 3 
2
3 − 13 2
3

of the earlier example, we have det(A) = 1, so A is a pure rotation such that cos θ =
(6/3 − 1)/2 = 1/2 and therefore θ = π/3. To find the axis and direction of the
rotation, it is still necessary to compute the eigenvectors.

EXERCISES

1. Show that an eigenspace of a matrix is a subspace.

2. Describe the eigenspaces of the following matrix and how the matrix acts on each.
What are the algebraic and geometric multiplicities of the eigenvalues?
     −1
2 3 0 −1 0 3
4 −1 0 0 −1 0 3
4

A= 4 3  
0 = 1 0 1   0 6 0   1 0 1
0 0 6 0 1 0 0 0 6 0 1 0

3. Find the diagonal factorizations of the following matrices and sketch a diagram
that geometrically describes the effect of each.
� �
1 4
(a)
1 −2
� �
2 −2
(b)
−2 −1
 
2 1 0
(c)  0 3 0 
0 0 3

4. Find the spectral factorizations of the following symmetric matrices.


148 22. Diagonalization of Symmetric and Orthogonal Matrices
� �
2 −2
(a)
−2 −1
 
3 −2 0
(b)  −2 0 0 
0 0 1
 
4 0 −2
(c)  0 5 0 
−2 0 1
 
0 2 2
(d)  2 0 −2 
2 −2 0

5. Find the spectral factorizations of the following transformations and reconstruct


their matrices. � �
3
(a) Projection of R onto the line defined by
2
.
1
� �
3
(b) Reflection of R across the line defined by
2
.
1

6. Find the real block-diagonal factorizations of the following orthogonal matrices


and describe geometrically the transformations they define.
 1 
3 − 23 − 23
 2 1
− 23 
(a)  − 3 3 
2 2 1
−3 −3 3
 2 
3 − 13 2
3
 2 2
− 13 
(b)  3 3 
− 13 2
3
2
3
 
0 0 0 1
 0 0 −1 0 
(c)  
0 1 0 0
−1 0 0 0

7. Construct
  the orthogonal matrix that rotates R around the axis defined by the
3

−1
vector  0  by 90◦ by writing down block-diagonal factorization of the matrix and
1
multiplying it out.
22. Diagonalization of Symmetric and Orthogonal Matrices 149

8. If Ax = αx − βy and Ay = βx + αy, then


. . . .
.. .. .. .. � �
    α β
Ax y = x y
.. .. .. .. −β α
. . . .
or .
.. ..   .. .. 
. . . � �
    α −β
Ay x = y x .
.. .. .. .. β α
. . . .
What does each equation say about the direction of the rotation of the plane spanned
by x and y? (Of course, they must say the same thing.)

9. If v is a nonzero (possibly complex) vector, then show v T v �= 0.


     
c1 α β
10. Show that if a vector  c2  is orthogonal to the two vectors  −β  and  α ,
c3 0 0
then c1 = c2 = 0.

11. Show that, even in the world of complex numbers, the only solutions to the
equation λ2 = 1 are λ = ±1. (Hint: Let λ = α + iβ and reach a contradiction.)

12. If Q is an orthogonal matrix such that det Q = −1, then what can you say about
Q as a transformation?

13. Fix the center of a basketball and choose n axes v1 , v2 , · · · , vn and angles
θ1 , θ2 , · · · , θn . Rotate the basketball around v1 by an angle θ1 , around v2 by an
angle θ2 , · · ·, and around vn by an angle θn . You could have achieved the same result
with one rotation around a certain axis and by a certain angle. Discuss why this is
true and how you could find the one axis and angle that will do the job. This is The
Larry Bird Theorem.

14. State one significant fact about the eigenvalues of


(a) a symmetric matrix.
(b) an orthogonal matrix.
(c) a stable matrix.
(d) a defective matrix.
(e) a singular matrix.
(g) a projection matrix.
(h) a reflection matrix.
150 22. Diagonalization of Symmetric and Orthogonal Matrices

15. For each matrix below decide if it is symmetric, orthogonal, invertible, a projec-
tion, or diagonalizable.
   
0 1 0 0 1 1 1 1
0 0 1 0 1 1 1 1 1
A=  B=  
0 0 0 1 4 1 1 1 1
1 0 0 0 1 1 1 1

Find their eigenvalues.

16. Show tr(A) + tr(B) = tr(A + B), tr(AB) = tr(BA), and tr(B −1 AB) = tr(A)

17. Show that A = SBS −1 ⇒ A and B have the same trace, determinant, eigen-
values, characteristic �polynomial,
� and rank.
� Find
� a counterexample for the converse
1 0 1 1
(⇐). Hint: Try A = and B =
0 1 0 1
23. Quadratic Forms 151

23. QUADRATIC FORMS

After linear functions, which we have already studied extensively in the form
of linear equations and linear transformations, quadratic functions are next in level
of complexity. Such functions arise in diverse applications, including geometry, me-
chanical vibrations, statistics, and electrical engineering, but matrix methods allow a
unified study of their properties. A quadratic equation in two variables is an equation
of the form
ax2 + bxy + cy 2 + dx + ey + f = 0

where at least one of the coefficients a, b, c is not zero. From analytic geometry, we
know that the graph of a quadratic equation is a conic section, that is, a circle, a
parabola, an ellipse, a hyperbola, a pair of lines, a single line, a point, or the empty
set. A quadratic equation may be expressed with matrices as
� �� � � �
a b/2 x x
[x y] + [d e] + f = 0.
b/2 c y y

The second degree terms


� �� �
a b/2 x
ax + bxy + cy = [ x y ]
2 2
b/2 c y

determine the type of conic section that the equation represents and are called the
quadratic form associated with the equation. Note that although the matrix above
is symmetric, the� same� quadratic
� form�can be generated by many other different
a b a 3b
matrices such as and . A quadratic equation in three variables is
0 c −2b c
an equation of the form

ax2 + by 2 + cz 2 + dxy + exz + f yz + gx + hy + iz + j = 0

or     
a d/2 e/2 x x
[x y 
z ] d/2 b f /2   y  + [g h i] y  + j = 0

e/2 f /2 c z z

where at least one of the coefficients a, b, c, d, e, f is not zero. The graphs of such equa-
tions are quadric surfaces, which include ellipsoids, hyperboloids, and paraboloids of
various types. Again the terms of second degree constitute the quadratic form asso-
ciated with the equation.
152 23. Quadratic Forms

In general, a quadratic form in n variables is an expression of the form


n �
� n
aij xi xj
i=1 j=1
  
a11 a12 ... a1n x1
 a21 a22 ... a2n   x2 
= [ x1 x2 · · · xn ] 
 ... .. .. .. 

 . 
 .. 
. . .
an1 an2 ... ann xn
= xT Ax,

and a quadratic equation in n variables has the representation xT Ax + bT x + c = 0.


We were able to express the quadratic forms in two and three variables above by
means of symmetric matrices. Can this always be done? Yes, since xT Ax = xT AT x
(Exercise 1), we have xT Ax = 12 (xT AT x + xT AT x) = xT ( 12 (A + AT ))x, and the
matrix 12 (A + AT ) is symmetric. (Exercise 2.) This just amounts to replacing the
off-diagonal elements aij and aji by 12 (aij +aji ). We can therefore always assume A is
symmetric. If A is also nonsingular, the quadratic form xT Ax is called nondegenerate.
We now turn to the question of how to recognize the graph of a quadratic equation.

Example 1: Suppose we have the quadratic equation 41x21 − 24x1 x2 + 34x22 = 1. We


can write this equation in the form xT Ax = 1 or
� �� �
41 −12 x1
[ x1 x2 ] = 1.
−12 34 x2

Since A is symmetric, it has a spectral factorization A = QDQT , which from Section


22 is �3 �� �� 3 �T
4
25 0 4
A= 4 5 5 5 5 .
5 − 35 0 50 4
5 − 35

If we substitute this into xT Ax we obtain xT QDQT x = (QT x)T D(QT x) = y T Dy


where y = QT x or
�3 ���� 3 �T � �
25
4
0 4
x1
[ x1 x2 ] 54 5 5 5
5 0
− 35 50 4
− 3
x2
� 5 �5� 3 �
25 0 x1 + 4
x2
= [ 5 x1 + 5 x2 5 x1 − 5 x2 ]
3 4 4 3 5 5
0 50 4 3
5 x1 − 5 x2
� �� �
25 0 y1
= [ y 1 y2 ]
0 50 y2
= 25y12 + 50y22 .
23. Quadratic Forms 153

The y-coordinates are therefore y1 = 35 x1 + 45 x2 and y2 = 45 x1 − 35 x2 , and the quadratic


equation expressed in these coordinates becomes 25y12 + 50y22 = 1, which is just an
ellipse. The x-coordinates and the y-coordinates are related by Q, which provides an
orthogonal transformation from y-space to x-space. Since orthogonal transformations
preserve distance, angle, and therefore congruence, the original quadratic� equation � � �
1 0
must also represent an ellipse. Furthermore Q takes the coordinate vectors ,
� � � � 0 1
3 4
in y-space to the eigenvectors 5
4 , 5
−3 in x-space, which just amounts to a simple
5 5
rotation. Therefore 41x21 − 24x1 x2 + 34x22 = 1 is a rotated ellipse with major and
minor axes along the eigenvectors of A.

Example 2: To find the graph of the quadratic equation 4x1 x2 + 4x1 x3 − 4x2 x3 = 1
we first write it as
  
0 2 2 x1
[ x1 x2 x3 ]  2 0 −2   x2  = 1.
2 −2 0 x3

From Section 22 Exercise 5(d) the spectral factorization A = QDQT for this matrix
looks like
   √1 √1 − √13
   √1 √1 − √13
T
0 2 2 2 6 2 0 0 2 6
 2 0 −2  = 
 0 √2 √1 
 0

2 0  0 √2 √1 
 .
6 3 6 3
2 −2 0 √1 − √16 √1 0 0 −4 √1 − √16 √1
2 3 2 3

Therefore setting y = QT x so that


   √1 √1 − √13
T   
√1 x1 − √1 x3

y1 2 6 x1 2 2
 y2  = 
 0 √2 √1     √1 x + √2 x − √1 x 
 x2 =  6 1 ,
6 3 6 2 6 3 
y3 √1
2
− √16 √1
3
x3 1
− x1 −
√ √
3
1
x2 + √1
3
x3 3

the quadratic equation in terms of the y-coordinates takes the form 2y12 + 2y22 − 4y32 =
1. This is a hyperboloid of revolution around the y3 axis, and therefore the quadratic
equation 4x1 x2 + 4x1 x3 − 4x2 x3 = 1 describes a hyperboloid of revolution around
the axis defined by the third column of Q.

The method just illustrated obviously works in general. We therefore have for
any quadratic form xT Ax, there is an orthogonal change of variables y = QT x with
respect to which the quadratic form becomes λ1 y12 +λ2 y22 +· · ·+λn yn2 . (A is symmetric
with eigenvalues λ1 , λ2 , · · · , λn and Q is orthogonal.) This is called the Principal Axis
Theorem. It is really just the Spectral Theorem in another form.
154 23. Quadratic Forms

EXERCISES

1. Show that xT Ax = xT AT x. (Hint: Since xT Ax is a 1 × 1 matrix, it must equal


its own transpose.)

2. Show that A + AT is symmetric for any square matrix A.

3. For the each of the following quadratic equations, find a rotation of the coordinates
so that the resulting quadratic form is in standard form, and identify and sketch the
curve or surface.
(a) x21 + x1 x2 + x22 = 6
(b) 7x21 + 7x22 − 5x23 − 32x1 x2 − 16x1 x3 + 16x2 x3 = 1 (Hint: The eigenvalues are
−9, −9, 27.)

4. For the quadratic equation 6x21 −6x1 x2 +14x22 −2x1 +x2 = 0, (a) find a rotation of
the coordinates so that the resulting quadratic form is in standard form, (b) eliminate
the linear terms by completing the square in each variable and making a translation
of the coordinates, and (c) identify and sketch the curve.

5. Identify the following conics.


(a) 14x2 − 16xy + 5y 2 = 6
(b) 2x2 + 4xy + 2y 2 + x − 3y = 1

6. Identify the following quadrics.


(a) 2x2 + 2y 2 + 3z 2 + 4yz = 3
(b) 2x2 + 2y 2 + z 2 + 4xz = 4
24. Positive Definite Matrices 155

24. POSITIVE DEFINITE MATRICES

Now we investigate how quadratic forms arise in the problem of maximizing and
minimizing functions of several variables. Suppose we want to determine the nature
of the critical points of a real valued function z = f (x, y). Assume for simplicity
a critical point occurs at (0, 0) and f (x, y) can be expanded in a Taylor series in a
neighborhood of that point. Then we have f (x, y) =

1
f (0, 0) + fx (0, 0)x + fy (0, 0)y + (fxx (0, 0)x2 + 2fxy (0, 0)xy + fyy (0, 0)y 2 ) + · · · .
2!
Since (0, 0) is a critical point, we must have fx (0, 0) = fy (0, 0) = 0. Putting this
back into the Taylor series and rewriting the second order terms, we have

f (x, y) − f (0, 0) = ax2 + bxy + cy 2 + higher order terms.

This means that f (x, y) behaves near (0, 0) like its second order terms ax2 +bxy+cy 2 .
That is to say, if the quadratic form ax2 + bxy + cy 2 is positive for every nonzero
choice of (x, y) then f (x, y) has a minimum at (0, 0), and if ax2 + bxy + cy 2 is
negative for every nonzero choice of (x, y) then f (x, y) has a maximum at (0, 0). In
general, an arbitrary quadratic form ax2 + bxy + cy 2 will assume positive, negative,
and zero values for various values of (x, y). But there are cases like 2x2 + 3y 2 and
x2 − 2xy + 2y 2 = (x − y)2 + y 2 that are positive for all nonzero values of (x, y), or
like −x2 − 6y 2 and −x2 + 4xy − 4y 2 = −(x − 2y)2 that are negative for all nonzero
values of (x, y).
We are therefore led to the following definition. A symmetric matrix A is positive
definite if its associated quadratic form xT Ax > 0 for every x �= 0. We also say A is
negative definite if −A is positive definite, that is if xT Ax < 0 for every x �= 0. How
can we tell if a symmetric matrix is positive definite? There are five ways to answer
this question, and we present them all in the following theorem. Its proof is long but
instructive. First we need a definition: For any square matrix
 
a11 a12 ... a1n
 a21 a22 ... a2n 
A=
 ... .. .. ..  ,
. . . 
an1 an2 ... ann

we define the leading principal submatrices of A to be


 
� � a11 a12 a13
a11 a12
A1 = [ a11 ] A2 = A3 =  a21 a22 a23  ···
a21 a22
a31 a32 a33

Now for the characterization of positive definite matrices.


156 24. Positive Definite Matrices

Theorem. For any symmetric n × n matrix A the following statements are equava-
lent.
(a) A is positive definite.
(b) All the eigenvalues of A are positive.
(c) All the leading principal submatrices A1 , A2 , · · · , An of A have positive determi-
nants.
(d) A can be reduced to upper triangular form with all pivots positive by using only
the Gaussian operation of multiplying one row by a scalar and subtracting from
another row (no row exchanges or scalar multiplications of rows are necessary).
(e) There is a matrix R (not necessarily square) with independent columns such
that A = RT R.

Proof: We show (a) ⇔ (b), (a) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (a).


(a) ⇒ (b): If A is positive definite and Ax = λx, then 0 < xT Ax = xT λx = λ�x�2
and therefore 0 < λ.
(b) ⇒ (a): By the Principal Axis Theorem xT Ax = λ1 y12 + λ2 y22 + · · · + λn yn2 where
y = QT x and Q is orthogonal. Therefore, if all the eigenvalues λ1 , λ2 , · · · , λn are
positive, then xT Ax > 0 for any x �= 0.
(a) ⇒ (c): Since A is positive definite, then so are all the leading principal submatrices
A1 , A2 , · · · , An . This follows for A2 for example from the equality
� �� �
a a12 x1
[ x1 x2 ] 11
a21 a22 x2
 
  x1
a11 a12 ... a1n
x 
 a21 a22 ... a2n   2 
= [ x1 0 ··· 0] ..   0  > 0.
x2  ... ..
.
..
.
 
.   .. 
.
an1 an2 ... ann
0

There are similar equalities for all the other leading principal submatrices. Therefore,
since det(Ai ) equals the product of its eigenvalues (by the symmetry of Ai and Section
22 Exercise 6), which are all positive by (b) above, we have det(Ai ) > 0.
(c) ⇒ (d): We first note that the Gaussian step of multiplying one row by a scalar and
substracting from another row has no effect on the determinant of a matrix or on the
determinant of its leading principal submatrices. We now illustrate the implication
of this for the 4 × 4 case. Initially A looks like

 
p11 ∗ ∗ ∗
 ∗ ∗ ∗ ∗
 
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
24. Positive Definite Matrices 157

and we have p11 = det(A1 ) > 0. We run one Gaussian step and obtain
 
p11 ∗ ∗ ∗
 0 p22 ∗ ∗
 .
0 ∗ ∗ ∗
0 ∗ ∗ ∗

Then p11 p22 = det(A2 ) > 0 ⇒ p22 > 0. We run another Gaussian step and obtain
 
p11 ∗ ∗ ∗
 0 p22 ∗ ∗
 .
0 0 p33 ∗
0 0 ∗ ∗

Then p11 p22 p33 = det(A3 ) > 0 ⇒ p33 > 0. Finally we run one more Gaussian step
and obtain  
p11 ∗ ∗ ∗
 0 p22 ∗ ∗ 
 .
0 0 p33 ∗
0 0 0 p44
Then p11 p22 p33 p44 = det(A4 ) > 0 ⇒ p44 > 0. Note that no row exchanges are
necessary. The general case is now clear.
(d) ⇒ (e): This is the hard one! We need a preliminary result: If A is symmetric and
has an LU-factorization A = LU , then it has a factorization of the form A = LDLT
where D is diagonal. We quickly indicate the proof. If we divide each row of U by its
pivot and place the pivots into a diagonal matrix D, we immediately have A = LDM
where M is upper triangular with ones down its diagonal. Our goal is to show
LT = M or LT M −1 = I. Since A is symmetric, AT = A ⇒ M T DLT = LDM ⇒
LT M −1 = D−1 (M T )−1 LD. In the last equation, the left side is upper triangular since
it is a product of upper triangular matrices, and the right side is lower triangular since
it is a product of lower triangular and diagonal matrices. Both sides are therefore
diagonal. Furthermore, since LT and M −1 are each upper triangular with ones down
their diagonals, then the same is true of M (LT )−1 (Exercise 1). We conclude that
M (LT )−1 = I. Now we use this result. Since A is symmetric with positive pivots, we
have A√= LDLT where the diagonal entries of D are all positive. We can therefore
define D to be the diagonal matrix with diagonal entries equal to the√square √ roots
of the corresponding diagonal entries of D. We then have A = (L D)( DLT ),
which has the form A = RT R.
(e) ⇒ (a): Since R has independent columns, Rx = 0 ⇔ x = 0. Therefore x �= 0 ⇒
xT Ax = xT RT Rx = (Rx)T (Rx) = �Rx�2 > 0. This ends the proof.
√ √
The factorization A = (L D)( DLT ) is called the Cholesky factorization of the
symmetric positive definite matrix A. It is useful in numerical applications and can
be computed by a simple variant of Gaussian elimination.
158 24. Positive Definite Matrices

From this theorem we can also characterize negative definite matrices. The
equivalent statements are (a) A is negative definite, (b) all the eigenvalues of A are
negative, (c) det(A1 ) < 0, det(A2 ) > 0, det(A3 ) < 0, · · · (Exercise 2), (d) all the
pivots of A are negative, and (e) A = −RT R for some matrix R with independent
columns.

Example 3: Let’s check each of the conditions above for the quadratic form 2x21 +
2x22 + 2x23 − 2x1 x2 − 2x1 x3 + 2x2 x3 . First we write it in the form xT Ax where
 
2 −1 −1
A =  −1 2 1 .
−1 1 2
The the spectral factorization of A is
 1    √1 T

2
√1
6
− √13  1 0 0 2
√1
6
− √13
 √1   
A= 0 √2
6 3 
0 1 0 0 √2
6
√1
3  .
√1
− √61 √1 0 0 4 √1 − √16 √1
2 3 2 3

All the eigenvalues are positive, and therefore A is positive definite. The leading
principal submatrices have determinants det(A1 ) = 2, det(A2 ) = 3, det(A3 ) = 4 and
are therefore all positive as they should be. The LU factorization of A is
  
1 0 0 2 −1 −1
A =  − 12 1 0   0 32 1 
2 .
−2 3 1
1 1
0 0 4
3

The pivots are all positive, so we have the factorization A = LDLT or


   
1 0 0 2 0 0 1 − 12 − 12
A =  − 12 1 0   0 32 0   0 1 1 
3 .
−2 3 1
1 1
0 0 3 4
0 0 1
√ √ √ √
We therefore can write A = (L D)( DLT ) = ( DLT )T ( DLT ) or
 √   √2 − √1 − √1 
2 �0 0
�2 2
 − √1 3  
0  0 
A= 2 2
3
2
√1
,
6
1 1 2
− √2 √6 √3 0 0 √2
3

which has the form A = RT R. There is nothing unique about R. For example, we
can also take the square
√ root
√ of the diagonal
√ matrix
√ in the spectral factorization of
A to obtain A = (Q D)( DQT ) = ( DQ)T ( DQ) or
 1  1 

2
√1
6
− √1
3

2
0 √2
2
 √1   √1 √2 
A= 0 √2
6 3   6
√2
6
− 6 
,
2 2 2 1 1 2

2
− √6 √
3
− √3 √3 √
3
24. Positive Definite Matrices 159

which also has the form A = RT R. There are many other such R’s, not even
necessarily square, for example

   1 −1 0 
1 1 0 0
  1 0 −1 
A = −1 0 0 1 .
0 0 0
0 −1 0 1
0 1 1

In fact, the product RT R should look familiar. It appears in the normal equations
AT Ax = AT b. We conclude that least squares problems invariably lead to positive
definite matrices.
Now let’s return to the problem of maximizing or minimizing a function of two
variables. We have seen that the question comes down to the positive or negative
definiteness of the quadratic form

1
(fxx (0, 0)x2 + 2fxy (0, 0)xy + fyy (0, 0)y 2 )
2!
or of the matrix � �
fxx (0, 0) fxy (0, 0)
.
fxy (0, 0) fyy (0, 0)
From the characterization of positive and negative definite matrices in terms of the
signs of the determinants of their principal leading submatrices, we immediately
obtain that (0, 0) is
a minimum point if fxx (0, 0) > 0 and fxx (0, 0)fyy (0, 0) − (fxy (0, 0))2 > 0,
a maximum point if fxx (0, 0) < 0 and fxx (0, 0)fyy (0, 0) − (fxy (0, 0))2 > 0.
This is just the second derivative test from the calculus of several variables. In the
n-variable case, if a function f (x1 , x2 , · · · , xn ) has a critical point at (0, 0, · · · , 0), then
fx1 (0, 0, · · · , 0) = fx2 (0, 0, · · · , 0) = · · · = fxn (0, 0, · · · , 0) = 0 and locally we have

f (x1 , x2 , · · · , xn )
= f (0, 0, · · · , 0)
f fx1 x2 ··· fx1 xn   
x1 x1 x1
1  fx2 x1 fx2 x2 ··· fx2 xn   x2 
+ [ x1 x2 · · · xn ] 
 .. .. .. .. 

 . 
 .. 
2! . . . .
fxn x1 fxn x2 · · · fxn xn (0,0,···,0) xn
+ higher order terms

The matrix of second derivatives is called the Hessian of f (x1 , x2 , · · · , xn ). If the


Hessian evaluated at (0, 0, · · · , 0) is positive or negative definite, then a maximum or
minimum occurs at (0, 0, · · · , 0). To determine if a large matrix is positive definite, it
160 24. Positive Definite Matrices

is obviously not efficient to use the determinant test as we did for the 2×2 case above.
It is much better to check the signs of the pivots, because they are easily found by
Gaussian elimination. So we have come full circle. Gauss reigns supreme here as in
every other domain of linear algebra. That is the paramount and overriding principle
of the subject and of these notes.

EXERCISES

1. Show by example that the set of upper triangular matrices with ones down their
diagonals is closed under multiplication and inverse.

2. Why does the determinant test for negative definiteness look like det(A1 ) <
0, det(A2 ) > 0, det(A3 ) < 0, · · ·?

3. Let A and B be symmetric positive definite, C be nonsingular, E be nonsingular


and symmetric, and F just symmetric. Prove that
(a) A + B is positive definite. (Use the definition.)
(b) A is nonsingular and A−1 is positive definite. (Use the eigenvalue test and the
Spectral Theorem.)
(c) C T AC is positive definite. (Use the definition.)
(d) E 2 is positive definite. (Use the eigenvalue test and the Spectral Theorem.)
(e) eF is positive definite. (Use the eigenvalue test and the Spectral Theorem.)
(f) The diagonal elements aii of A are all positive. (Take x to be a coordinate vector
in the definition.)

4. Show by an example that the product of two positive definite symmetric matrices
may not define a positive definite quadratic form.

5. Write the quadratic form 3x21 + 4x22 + 5x23 + 4x1 x2 + 4x2 x3 in the form xT Ax
and verify all the statements in the theorem on positive definite matrices. That is,
show A has all eigenvalues positive and all pivots positive and obtain two different
factorizations of the form A = RT R, one from A = QDQT and the other from
A = LDLT . Describe the quadric surface 3x21 + 4x22 + 5x23 + 4x1 x2 + 4x2 x3 = 16
(Hint: λ = 1, 4, 7)

6. For positive
 definite
 matrices A, make a reasonable definition of A, and compute
3 2 0
it for A =  2 4 2 . (See Exercise 5 above.)
0 2 5

7. Decide if the each of the indicated critical points is a maximum or minimum.


(a) f (x, y) = −1 + 4(ex − x) − 5x sin y + 6y 2 at the point (0, 0).
24. Positive Definite Matrices 161

(b) f (x, y) = (x2 − 2x) cos y at the point (1, π).

8. Test the following matrix for positive definiteness the easiest way you can.
 
1 0 1 0
0 2 1 1
 
1 1 3 1
0 1 1 2

9. A symmetric matrix A is positive semidefinite if its associated quadratic form


xT Ax ≥ 0 for every x �= 0. Characterize positive semidefinite matrices in terms of
their eigenvalues.
162 Answers to Exercises

ANSWERS TO EXERCISES

SECTION 1
     
� � � � 1
−2 −.5
−2 −1 0
1. (a) (b) (c)  3  (d)  5  (e)  
1 −4 2
−1 −3
1
 
1.5
2.  −.5 
−3
4. 150, 100
5. 580, 50
6. 10 servings of pasta, 1 serving of chicken, 4 servings of broccoli
7. y = x3 − 2x2 − 3x + 5
8. y = 3x3 − 5x2 + x + 2

SECTION 2    
� � 7 0 17
10 14 −2 
1. (a) (b) 10 7  (c)  4  (d) [ 2 14 −8 ]
8 −4 0
6 −5 −7
     
4 8 12 −8 10 8 −3 12
(e) [ 32 ] (f) 5 10 15 
 (g)  −14 26  (h)  5 0 7 
6 12 18 −2 −1 −6 −3 −8
     
4 0 −1 32 0 0 0 0 0

(i) 0 1 0  
(j) 0 1 0  
(k) 0 0 0
2 −2 1 0 0 243 0 0 0
 
2 5 0
6. (a) −1 0 −1 

3 7 0
9. All but the last two.

SECTION 3   
� �� � 1 0 0 2 1 3
1 0 4 −6
1. (a) (b)  −1 1 0  0 6 4 
.75 1 0 9.5
2 0 1 0 0 −2
  
1 0 0 0 1 3 2 −1
 2 1 0 0  0 −1 −1 4 
(c)   
−3 −11 1 0 0 0 −6 43
1 2 −.5 1 0 0 0 15.5
Answers to Exercises 163
  
1 0 0 0 0 2 1 0 0 0
2 1 0 0 0  0 3 3 0 0
  
(d)  0 1 1 0 0  0 0 1 1 0
  
0 0 −1 1 0 0 0 0 2 1
0 0 0 2 1 0 0 0 0 1
 
    2
� � 1
2  0 
1 0  
2. (a) (b)  −1  (c)   (d)  −1 
2 0  
3 0
1
1
3. 350, 1628

SECTION
 4
2
1.  −3 
4
2. all except (c)
3. (a) none, (b) infinitely many

SECTION 5    
� � .5 0 0 .5 −1.5 .5
−7 4
1. (a) (b)  0 10 0  (c)  0 .5 −.5 
2 −1
0 0 −.2 0 0 .2
     
1 −2 1 0
10 −6 1 −1 0 1
 1 −2 2 −3 
(d)  −2 1 0  (e)  −5 1 3  (f)  
0 1 −1 1
−7 5 −1 7 −1 −4
−2 3 −2 3
� �
1 d −b
(g)
ad − bc −c a
 
2
3.  7 
2
4. only (c)
10. (a) False. (b) True.

SECTION 6
    
2 1 0 0 0 s0 3
1 4 1 0 0   s1   12 
    
1.  0 1 4 1 0   s2  =  0  s0 = s2 = s4 = 0, s1 = 3, s3 = −3
    
0 0 1 4 1 s3 −12
0 0 0 1 2 s4 −3
164 Answers to Exercises

SECTION  7  
2 −1
2. (a)  .5  + c −1 
0 1
(b) no solution
     
3 1 −2
(c)  0  + c 0  + d 1 
0 1 0
   
3 −1
 −1   0 
(d)   + c 
0 1
0 0
     
2 −1 −2
 0   0   1 
(e)   + c  + d 
−1.5 −.5 0
0 1 0
� �
3
(f)
−5
3. (a) two intersecting lines
(b) two parallel lines
(c) one line
4. (a) three planes intersecting in a point
(b) one plane intersecting two parallel planes
(c) three nonparallel planes with no intersection
(d) a line of intersection
(e) a plane of intersection
8. eggs = −2 + c, milk = 4 − c, orangejuice = c where 2 ≤ c ≤ 4
9. a = 2, b = c = d = 1
10. x2 + y 2 − 4x − 6y + 4 = 0

SECTION 8
1. (a) −6 (b) −16 (c) −24 (d) −12 (e) −1 (f) −1
1
4. − , −6
6
5. (a) 3 (b) −12 (c) x + 2y − 18 (d) −x3 + 6x2 − 8x
7. True.
SECTION 9 � � � �
1 1
1. (a) for λ = 1: , for λ = 2:
0 1
Answers to Exercises 165
     
0 1 −1
(b) for λ = 1:  1  and  0  , for λ = 3:  0 
0 −2 1
     
−2 0 2
(c) for λ = 0:  1  , for λ = 2:  −1  , for λ = 4:  1 
1 1 1
     
0 1 1
(d) for λ = −1 :  −1  , for λ = 2:  −2  , for λ = 6:  −1 
1 1 1
     
1 1 −1
   
(e) for λ = 2: 0 and 1 , for λ = −4:  1 
1 0 1
       
1 1 0 0
0 1 1 0
(f) for λ = −2:   and  , for λ = 3:   and  
0 0 1 1
0 0 0 1
     
1 0 0
5. (a)  0 ,  1 ,  0 
0 0 1
  
1 0
(b) 0   0
0 1
 
0
(c)  0 
1

SECTION 10
� � � �� �� �−1
1 1 1 1 1 0 1 1
1. (a) =
0 2 0 1 0 2 0 1
     −1
5 0 2 0 1 −1 1 0 0 0 1 −1
(b)  0 1 0  =  1 0 0  0 1 0 1 0 0 
−4 0 −1 0 −2 1 0 0 3 0 −2 1
     −1
2 2 2 −2 0 2 0 0 0 −2 0 2
(c)  1 2 0  =  1 −1 1   0 2 0   1 −1 1 
1 0 2 1 1 1 0 0 4 1 1 1
     −1
6 4 4 0 1 1 −1 0 0 0 1 1
(d)  −7 −2 −1  =  −1 −2 −1   0 2 0   −1 −2 −1 
7 4 3 1 1 1 0 0 6 1 1 1
166 Answers to Exercises

     −1
0 2 2 1 1 −1 2 0 0 1 1 −1
(e)  2 0 −2  =  0 1 1   0 2 0   0 1 1 
2 −2 0 1 0 1 0 0 −4 1 0 1
     −1
−2 0 0 0 1 1 0 0 −2 0 0 0 1 1 0 0
 0 −2 5 −5   0 1 1 0   0 −2 0 0 0 1 1 0
(f)  =   
0 0 3 0 0 0 1 1 0 0 3 0 0 0 1 1
0 0 0 3 0 0 0 1 0 0 0 3 0 0 0 1
3. (a) Maybe. (In fact it is.) (b) Yes, since symmetric. (c) Maybe. (In fact it is
not.)

SECTION 11
�� �� � �� �� �−1
1 2 1 1 e 0 1 1
1. (a) exp =
0 2 0 1 0 e2 0 1
     −1
5 0 2 0 1 −1 e 0 0 0 1 −1
(b) exp  0 1 0  =  1 0 0  0 e 0  1 0 0 
−4 0 −1 0 −2 1 0 0 e 3
0 −2 1
     −1
2 2 2 −2 0 2 1 0 0 −2 0 2
(c) exp  1 2 0  =  1 −1 1   0 e2 0   1 −1 1 
1 0 2 1 1 1 0 0 e4 1 1 1
 
6 4 4
(d) exp  −7 −2 −1 
7 4 3
   −1  −1
0 1 1 e 0 0 0 1 1
=  −1 −2 −1   0 e2 0   −1 −2 −1 
1 1 1 0 0 e6 1 1 1
    2  −1
0 2 2 1 1 −1 e 0 0 1 1 −1
(e) exp  2 0 −2  =  0 1 1   0 e2 0  0 1 1 
2 −2 0 1 0 1 0 0 e −4
1 0 1
 
−2 0 0 0
 0 −2 5 −5 
(f) exp  
0 0 3 0
0 0 0 3
  −2  −1
1 1 0 0 e 0 0 0 1 1 0 0
 0 1 1 0  0 e−2 0 0  0 1 1 0 
=   
0 0 1 1 0 0 e3 0 0 0 1 1
0 0 0 1 0 0 0 e3 0 0 0 1
�� � � � �� t �� �−1
1 2 1 1 e 0 1 1
2. (a) exp t =
0 2 0 1 0 e 2t
0 1
Answers to Exercises 167
  
5 0 2
(b) exp  0 1 0  t
−4 0 −1
  t  −1
0 1 −1 e 0 0 0 1 −1
= 1 0 0   0 et 0   1 0 0 
0 −2 1 0 0 e 3t
0 −2 1
      −1
2 2 2 −2 0 2 1 0 0 −2 0 2
(c) exp  1 2 0  t =  1 −1 1   0 e2t 0   1 −1 1 
1 0 2 1 1 1 0 0 e4t 1 1 1
  
6 4 4
(d) exp  −7 −2 −1  t
7 4 3
   −t  −1
0 1 1 e 0 0 0 1 1
=  −1 −2 −1   0 e2t 0   −1 −2 −1 
1 1 1 0 0 e6t 1 1 1
      2t  −1
0 2 2 1 1 −1 e 0 0 1 1 −1
(e) exp  2 0 −2  t =  0 1 1   0 e2t 0 0 1 1 
2 −2 0 1 0 1 0 0 e −4t
1 0 1
  
−2 0 0 0
 0 −2 5 −5  
(f) exp   t
0 0 3 0
0 0 0 3
  −2t  −1
1 1 0 0 e 0 0 0 1 1 0 0
 0 1 1 0  0 e−2t 0 0  0 1 1 0 
=   
0 0 1 1 0 0 e 3t
0 0 0 1 1
0 0 0 1 0 0 0 e3t 0 0 0 1

SECTION 12
� � � �
t 1 2t 1
1. (a) c1 e + c2 e
0 1
     
0 1 −1
(b) c1 et  1  + c2 et  0  + c3 e3t  0 
0 −2 1
     
−2 0 2

(c) c1 1 + c2 e 2t  
−1 + c3 e4t  
1
1 1 1
     
0 1 1
(d) c1 e−t  −1  + c2 e2t  −2  + c3 e6t  −1 
1 1 1
168 Answers to Exercises
     
1 1 −1
(e) c1 e2t  0  + c2 e2t  1  + c3 e−4t  1 
1 0 1
       
1 1 0 0
 0 1
  1
  0
(f) c1 e−2t   + c2 e−2t   + c3 e3t   + c4 e3t  
0 0 1 1
0 0 0 1
� � � �
1 1
2. (a) et + 2e2t
0 1
     
0 1 −1
(b) 2et  1  + 2et  0  + e3t  0 
0 −2 1
     
−2 0 2
(c)  1 +e 2t  
−1 + e 4t  
1
1 1 1
     
0 1 1
(d) e−t  −1  − e2t  −2  + e6t  −1 
1 1 1
     
1 1 −1
(e) 3e2t  0  + 2e2t  1  + 1e−4t  1 
1 0 1
       
1 1 0 0
 0   1   1   0
(f) e−2t   + e−2t   + e3t   + 2e3t  
0 0 1 1
0 0 0 1
3. (a) neutrally stable (b) unstable (c) stable

SECTION 13
3. α ± iβ
� �� �� �−1
3+i 3−i 3 + i2 0 3+i 3−i
4. (a)
2 2 0 3 − i2 2 2
� �� �� �−1
3 1 3 2 3 1
=
2 0 −2 3 2 0
   −1
−i i 0 −1 + i3 0 0 −i i 0
(b)  1 − i 1 + i 1   0 −1 − i3 0   1 − i 1 + i 1 
1 1 0 0 0 1 1 1 0
   −1
0 −1 0 −1 3 0 0 −1 0

= 1 −1 1   −3 −1 0   1 −1 1 
1 0 0 0 0 1 1 0 0
Answers to Exercises 169
� � � �
3 1
5. (a) (c1 e cos 2t + c2 e sin 2t)
3t 3t
+ (−c1 e sin 2t + c2 e cos 2t)
3t 3t
2 0
 
0
(b) (c1 e−t cos 3t + c2 e−t sin 3t)  1  +
1
   
−1 0
(−c1 e−t sin 3t + c2 e−t cos 3t)  −1  + c3 et  1 
0 0
6. (a) c1 = 2, c2 = −3 (b) c1 = 1, c2 = 2, c3 = 3

SECTION 14 � � � � � � � � � �
1 k −1 1 + (.25)k 1 64
1. (a) uk = 64(1) k
− 64(.25) = 64 → 64 =
2 1 2 − (.25)k 2 128
� � � � � �
6 6 6(−1)k + 12(.5)k
(b) uk = 1(−1)k + 2(.5)k = , bounded, no limit
2 4 2(−1)k + 8(.5)k
� � � � � �
3 k 2 5 k −2 1 6(3)k − 10(−1)k
(c) uk = (3) + (−1) = , blows up
4 1 4 1 4 3(3)k + 5(−1)k
     −1
.5 .5 .5 2 2 0 1 0 0 2 2 0
2.  .25 .5 0  =  1 −1 −1   0 0 0   1 −1 −1  ,
.25 0 .5 1 −1 1 0 0 .5 1 −1 1
       
2 0 2 2
k  k   k
1(1) 1 + (.5) −1 = 1 − (.5) → 1

1 1 1 + (.5) k
1
     −1
.5 0 .5 −1 1 −1 .5 0 0 −1 1 −1
3.  0 .5 .5  =  1 1 −1   0 1 0   1 1 −1  ,
.5 .5 0 0 1 2 0 0 −.5 0 1 2
       
−1 1 −1 50
−30(.5)k  1  + 50(1)k  1  − 10(−.5)k  −1  →  50 
0 1 2 50
   
1 .25 0 1
4.  0 .5 .5 ,  0 , everyone dies!
0 .25 .5 0
5. Everyone has blue eyes!

SECTION 15
3. (a) not closed under addition or scalar multiplication
(b) not closed under addition
(c) not closed under scalar multiplication
(d) not closed under addition
170 Answers to Exercises

(e) not closed under scalar multiplication


4. All span the plane of x1 − x2 = 0 in R3 .
� �
1
6. (a) c
3
   
−1 −1
(b) c  0  + d  1 
1 0
 
−1
(c) c 0 

1
     
4 −3 2
0  0  1
(d) c   + d   + e 
0 1 0
1 0 0
   
2 −1
1  0 
   
(e) c  4  + d  −1 
   
0 1
1 0
� � �1�
1
7. (a) +c 3
0 1
     
1 −1 −1
(b)  0  + c  0  + d  1 
0 1 0

SECTION 16
1. (a) independent
(b) independent
(c) dependent
(d) independent
(e) dependent
� � � �
1 0
2. (a) ,
0 1
     
1 0 0
(b)  0 , 1 , 0
   
0 0 1
   
1 0
(c)  0 , 1
 
0 2
Answers to Exercises 171
     
1 0 0
0 1 0
(d)   ,   ,  
1 0 0
0 0 1
   
3 0
0 3
(e)   ,  
3 0
1 1
3. Same answers
  as for Section 15 Exercise 6.
3 2
5. (a) 3  1  − 2  2 
2 1
(b) no solution
     
3 2 −1
(c) (6 + c)  1  + (−4 − c)  2  + c  1  many solutions
2 1 −1
� � � �
2 1
(d) ) +6
1 2
6. (a) U and V might be, W is not.
(b) U does not, V and W might.
(c) U and W are not, V might be.

SECTION 17

1. (a) �x� = 5, �y� = 5 5
 √ 
 1  −565
5  √ 
 2   −525 
 5   
(b)  2  ,  2 

 5  √ 
 5 5 
− 45 9

5 5
(c) 153.43◦
 
−2
 −4 
(d)  
4
8
   
−2 −4
 −4   2 
(e)  + 
4 −2
8 1
2. (5, 15/2)
� �
−β
3. c
α
172 Answers to Exercises
   
−1 −1
5. (a) c  1  + d  0 
0 1
 
2
(b) c  −3 
1
   
1 0
 −3   −1 
(c) c   + d 
0 1
1 0
 
−1
 1 
(d) c  
2
0
6. (a) −x1 + x2 = 0, −x1 + x3 = 0
(b) 2x1 − 3x2 + x3 = 0
(c) x1 − 3x2 + x4 = 0, −x2 + x3 = 0
(d) −x1 + x2 + 2x3 = 0
7. (a) False. (b) False.

SECTION 18
2. (a) Reflection of R2 in 135◦ line.
(b) Projection of R2 onto y-axis.
(c) Projection of R2 onto 135◦ line.
(d) Rotation of R2 by 45◦ .
(e) Rotation of R2 by −60◦ .
(f) Reflection of R2 in 150◦ line.
� �
β
(g) Rotation of R2 by arctan .
α
(h) Rotation of R3 around z-axis by 90◦ .
(i) Rotation of R3 around y-axis by −90◦ .
(j) Projection of R3 onto xy-plane.
(k) Rotation of R3 around z-axis by 90◦ and� reflection
� in xy-plane.
β
(l) Rotation of R3 around z-axis by arctan and reflection in xy-plane.
α
 
−1 0 0
3. (a)  0 −1 0 
0 0 −1
 
1 0 0
(b)  0 0 0 
0 0 1
Answers to Exercises 173
 
0 1 0
(c) 1 0 0
0 0 1
 
1 0 0
(d)  0 √12 − √12 
0 √12 √1
2
4. (a) x2 + y 2 = 4, a circle of radius 2.
� �2 � �2
x y
(b) + = 1, an ellipse.
2 3
� �
1 0
7. (a) , reflects in the x-axis.
0 −1
� �
−1 0
(b) , reflects in the y-axis.
0 1
   
0 0 1 1
(c)  0 −1 0 , rotates by 180◦ around the line defined by the vector  0 .
1 0 0 1
 
0 1 0 0
0 0 2 1
13.  
1 0 1 0
0 0 2 0

SECTION 19
� � � � � �
1 1 −2
1. (a) ; ; ; R2 → R2 ; rank = 1
2 2 1
� � � � � � � � � �
1 0 1 2 0
(b) , ; , ; ; R2 → R2 ; rank = 2
0 1 2 3 0
         
1 0 2 4 0
(c)  0 ,  2 ;  0 ,  4 ;  1 ; R3 → R3 ; rank = 2
0 1 2 8 −2
       3   2   −1   
1 0 0 0
       6   3   5    3
(d) 0 , 1 , 0 ;  ,  ,  ; 0 ; R → R4 ; rank = 3
−3 −1 8
0 0 1 0
0 −1 7
         
1 0     −4 −1 −2
2 0 1 −1  0   0   1 
            
(e)  0 ,  1 ; 2 , −1 ;  −3 ,  −5 ,  0 ; R5 → R3 ; rank = 2
         
1 5 3 −3 0 1 0
4 3 1 0 0
174 Answers to Exercises

         
1 0 0       16 6
2 8 0
 0  1 0  −4   −2 
       2   7   1     
(f)  −6 ,  2 ,  0 ;  ,  ,  ;  0 ,  1 ; R5 → R4 ;
      −2 −6 −1    
0 0 1 −2 0
0 2 −2
−16 4 2 1 0
rank = 3 
0
2. (a) Yes,  0 . (b) Yes, yes. (c) No. (d) Yes.
0
   
−9 −2
3. (a) No. (b) No. (c)  0 ,  1 . (d) No. (e) Yes.
1 0
5. (a) Since row(A) ⊥ null(A).
(b) Since dim(row(A)) + dim(null(A)) = 3 and dim(col(A)) = dim(row(A)).
(c) Since dim(col(A)) = dim(row(A)).
6. None or infinitely many.

SECTION�20�
4
1. (a)
1
 5 
3
(b)  −1 
2
1
4
2. (a) y = 43 x
(b) y = −.2 + 1.1x
(c) z = 2 + 2x + 3y
(d) z = 10 + 8x2 − y 2
(e) y = − 32 − 32 t + 72 t2
(f) y = 2 + 3 cos t + sin t
        
5 0 3 Cu 413.91 Cu 63.543
3.  0 15 21  F e  =  1511.13  The solution is  F e  =  55.851 !
3 21 32 S 2389.58 S 32.065
� �
.1 .3
4. (a)
.3 .9
4 2 4
9 9 9
2 1 2 
(b) 9 9 9 
4 2 4
9 9 9
 
1 0 0
0 1 1 
(c)  2 2 
0 1
2
1
2
Answers to Exercises 175
1 
2 0 1
2
0 1 0
(d)  
1
2 0 1
2
1 
3
1
3 0 1
3
 1 1
0 1
 3 3 3
(e)  
0 0 1 0
1
3
1
3 0 1
3
 
1
5
5. 2
5
2
 
1 00
6. 0 01
0 10
� �
.2 .4
7. (a)
.4 .8
5 1 1 
6 6 3
 1 5
− 13 
(b)  6 6 
1
3 − 13 1
3

SECTION 21
� 5 � � �
13 − 12
13
1. (a) 12 , 5
13 13
    6
− 37 − 27 7
 6   −3   2 
(b)  7 ,
  7 7,
2 6 3
7 7 7
 1   1     
2 2 − 12 1
2
 −1   −1   −1   1 
 2   2   2  2 
(c)  1 ,  1 ,  1 ,  1 
 −2   2   2   2 
− 12 1
2 − 12 − 12
   
− 23 1
3
 11   2 
(d)  15 ,  15 
2 14
15 15
176 Answers to Exercises

 1   1   1 
2 2 2
 −1   −1   1 
 2   2   2 
(e)  1 ,  1 ,  1 
 −2   2   −2 
− 12 1
2
1
2
� 5
�� �
13 − 12
13 13 −26
2. (a) 12
13
5
13 0 13
   
− 37 − 27 6
7 7 −7 7
 6
− 37 2 
(b)  7 7  0 7 7
2
7
6
7
3
7
0 0 7
 1 1 1 
− 12  
2 2 2 2 0 2 1
 −1 − 12 −2 1 1 
 2 2 0 2 0 0
(c)  1 1  
2  0 0 2 0
1 1
 −2 2 2
− 12 1
− 12 − 12 0 0 0 1
2
 2 1 
−3 3 � �
 11 2  15 −15
(d)  15 15 
2 14
0 30
15 15
 1 1 1 
2 2 2  
 −1 1 1  2 3 1
 2 −2 2 
(e)  1 1 1  0 1 2
 −2 − 2  0 0
2 3
− 12 1
2
1
2
 
7
3. 1
7
 1   
9 − 19
 4   4
4.  9  or  − 9 
− 89 8
9
� �
4
5.
1
 1 
4 − 14 − 14 1
4
 −1 1 1
− 14 
 4 4 4 
6.  1 1 3 1 
 −4 4 4 4 
1
4 − 14 1
4
3
4
Answers to Exercises 177

SECTION 22
� �� �� �−1
4 −1 2 0 4 −1
3. (a)
1 1 0 −3 1 1
� �� �� �−1
−2 1 3 0 −2 1
(b)
1 2 0 −2 1 2
   −1
1 1 0 2 0 0 1 1 0
(c)  0 1 0   0 3 0   0 1 0 
0 0 1 0 0 3 0 0 1
� √2 � � � − √2 �T
− 5 √15 � √1
3 0 5 5
4. (a)
√1 √2 0 −2 √1 √2
5 5 5 5
    − √2 T
− √25 √1 0  √1 0
5 4 0 0 5 5
    √1 
(b)  √1
5
√2
5
0  0 −1 0  5 √2
5
0
0 0 1 0 0 1 0 0 1
   0 T
0 √2 √1 √2 √1
5 5 5 0 0 5 5
 −1 0 0     −1 0 0 
(c)   0 5 0  
0 − √15 √2 0 0 0 0 − √15 √2
5 5
 √1 √1
  √1 T
− √13   √1 − √13
 2 0 0  2
2 6 6
 
(d)  0 √2 √1 0 2 0  0 √2 √1 
 6 3   6 3 
√1 − 16
√ √1 0 0 −4 √1 − 16
√ √1
2 3 2 3
� 3
√ 1

�� �� 3
√ √1 �T � �
2 5 2 5 1 0 2 5 2 5 .9 .3
5. (a) =
1
√ −3
√ 0 0 1
√ −3
√ .3 .1
2 5 2 5 2 5 2 5
� 3
√ √1 �� �� 3
√ 1

�T � �
2 5 2 5 1 0 2 5 2 5 .8 .6
(b) =
1
√ −3
√ 0 −1 1
√ −3
√ .6 −.8
2 5 2 5 2 5 2 5
 √1 √1 √1
 √1
 √1 √1
T
 
 1 0 0
2 6 3 2 6 3
 √1 √1 √1
 √1 √1 √1

6. (a) − 2 0 1 0 − 2 
 6 3   6 3 
0 − √26 √1 0 0 −1 0 − √26 √1
3 3
  √  T
− √16 √1
2
√1
3
1 3
0 − √16 √1
2
√1
3
2 2
 √1  √   √1 
(b) − 6 − √12 √1
 − 3 1
0 − 6 − √12 √1

 3  2 2  3 
√2
6
0 √1
3
0 0 1 √2
6
0 √1
3
178 Answers to Exercises

   T
0 −1 0 0 0 1 0 0 0 −1 0 0
0 0 0 1   −1 0 0 0   0 0 0 1
(c)    
0 0 1 0 0 0 0 1 0 0 1 0
1 0 0 0 0 0 −1 0 1 0 0 0
   1 
− √12 √1 0    − √1 √1 0
T
2 − √12 − 12
2 1 0 0 2 2
 
 0 0 1  0 1
7.    0 0 −1   0  =

√1
2
0 √1
2


√1 √1 0 0 1 0 √1 √1 0 − 12 − √12 1
2 2 2 2 2

15. Eigenvalues for A are ±1, ±i; for B are 0, 0, 0, 1.

Das könnte Ihnen auch gefallen