Beruflich Dokumente
Kultur Dokumente
Nishan Krikorian
Northeastern University
December 2009
PART 1: Algebra
1. Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. The LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4. Row Exchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5. Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6. Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7. Systems with Many Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9. Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10. Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
11. Matrix Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
12. Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
13. The Complex Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
14. Difference Equations and Markov Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
PART 2: Geometry
15. Vector Spaces, Subspaces and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
16. Linear Independence, Basis, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
17. Dot Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
18. Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
19. Row Space, Column Space, and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
20. Least Squares and Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization . . . . . . . . . . . . . . . 126
22. Diagonalization of Symmetric and Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . 136
23. Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
24. Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Answers to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
1. Gaussian Elimination 1
PART 1: ALGEBRA
1. GAUSSIAN ELIMINATION
The central problem of linear algebra is to find the solutions of systems of linear
equations. We begin with a simple system of three equations and three unknowns:
2u + v − w = 5
4u − 2v = 0
6u − 7v + w = −9 .
The problem is to find the unknown values of u, v, and w, which are themselves
called unknowns or variables. To do this we use Gaussian elimination.
The first step of Gaussian elimination is to use the coefficient 2 of u in the first
equation to eliminate the u from the second and third equations. To accomplish this,
subtract 2 times the first equation from the second equation and 3 times the first
equation from the third equation. The result is
2u + v − w = 5
− 4v + 2w = −10
− 10v + 4w = −24 .
This completes the first elimination step. The coefficient 2 of u in the first equation is
called the pivot for this step. Next use the coefficient −4 of v in the second equation to
eliminate the v from the third equation. Just subtract 2.5 times the second equation
from the third equation to get
2u + v − w = 5
− 4v + 2w = −10
− w= 1.
This completes the second elimination step. The coefficient −4 of v in the second
equation is the pivot for this step. The coefficient −1 of w in the third equation is
the pivot of the third elimination step, which did not have to be performed. The
elimination process is now complete. The resulting system is equivalent to the original
one, and its simple triangular form suggests an obvious method of solution: The third
equation gives w = −1; substituting this into the second equation −4v+2(−1) = −10
gives v = 2; and substituting both into the first equation 2u + (2) − (−1) = 5 gives
u = 1. This simple process is called back substitution.
How did we determine the multipliers 2 and 3 in the first step and 2.5 in the
second? Each is just the leading coefficient of the row being subtracted from, divided
by the pivot for that step. For example, in the second step, 2.5 equals the coefficient
−10 divided by the pivot −4.
2 1. Gaussian Elimination
We said that the triangular system obtained above is equivalent to the original
system, but what does this mean? It means simply that the two systems have the
same solution. This is clear since any solution of the original system must also be
a solution of each system obtained after each step of Gaussian elimination. This
is because Gaussian elimination amounts to nothing more than the subtraction of
equals from equals. Therefore any solution of the original system must also be a
solution of the final triangular system. And by reversing this argument we see that
any solution of the final triangular system must also be a solution of the original
system. Both systems must therefore have the same solutions.
We can simplify Gaussian elimination by noticing that there is no need to carry
the symbols for the unknowns u, v, w along in each step. We can instead represent
the system as an array: �
2 1 −1 �� 5
4 −2 0 � 0 .
�
6 −7 1 � −9
The numbers multiplying the unknowns in the equations are called coefficients and
are determined by their position in the array. They are separated from the right-hand
sides of the equations by a vertical line. The first elimination step gives
�
2 1 −1 �� 5
0 −4 2 �� −10 ,
0 −10 4 � −24
Next subtract −2 times the third row from the second row and 1 times the third row
from the first row to obtain
�
2 1 0 �� 4
0 −4 0 � −8 ,
�
0 0 −1 � 1
and then subtract −.25 times the second row from the first to obtain
�
2 0 0 �� 2
0 −4 0 �� −8 .
0 0 −1 � 1
Clearly the purpose of these steps is to introduce zeros above the diagonal entries.
The coefficient part of the array is now in diagonal form, and the solution u = 1,
v = 2, w = −1 is obvious. This method of using Gaussian elimination forwards
and backwards is called Gauss-Jordan elimination. It can be used for solving small
problems by hand, but it is inefficient for large problems. We will see later (Section 3)
that ordinary Gaussian elimination with back substitution requires fewer operations
and is therefore preferable.
EXERCISES
(a) u − 6v = −8
3u − 2v = −8
4 1. Gaussian Elimination
(b) 5u − v = −1
−3u + 2v = −5
(c) 2u + v + 3w = −4
−2u + 5v + w = 18
4u + 2v + 4w = −6
(d) 4u − 2v + 4w = −24
2u + 3v − w = 17
−8u + 2v + 5w = −1
(e) 3u + 5v = 3
− 2v − 3w = −6
6w + 2x = 14
− w − 2x = −4
2. Solve the system below. When a zero pivot arises, exchange the equation with the
one below it and continue.
u + v + w = −2
3u + 3v − w = 6
u − v + w = −1
3. Try to solve the system below. Why won’t the trick in the previous problem work
here?
u + v + w = −2
3u + 3v − w = 6
u + v + w = −1
4. A farmer has two breeds of chickens, Rhode Island Red and Leghorn. In one year,
one Rhode Island Red hen will yield 10 dozen eggs and 4 pounds of meat, and one
Leghorn hen will yield 12 dozen eggs and 3 pounds of meat. The farmer has a market
for 2700 dozen eggs and 900 pounds of meat. How many hens of each breed should
he have to meet the demand of the market exactly?
6. A nutritionist determines her minimum daily needs for energy (1,800 kcal), protein
(92 g), and calcium (470 mg). She chooses three foods, pasta, chicken, and broccoli,
and she collects the following data on the nutritive value per serving of each.
1. Gaussian Elimination 5
7. Find the cubic polynomial y = ax3 + bx2 + cx + d that interpolates (that is, whose
graph pass through) the points (−1,5), (0,5), (1,1), (2,−1).
8. Find the cubic polynomial function f (x) = ax3 + bx2 + cx + d such that f (0) =
2, f � (0) = 1, f (1) = 1, f � (1) = 0. (This is called cubic Hermite interpolation). Sketch
its graph.
6 2. Matrix Notation
2. MATRIX NOTATION
This is just the familiar dot product of two vectors. To extend the definition to the
product of a matrix and a column vector, take the product of each row of the matrix
with the column vector and stack the results to form a new column vector:
4 1 3 4·3+1·1+3·0 13
3
2 6 8 2 · 3 + 6 · 1 + 8 · 0 12
1 = = .
1 0 9 1·3+0·1+1·0 3
0
2 2 1 2·3+2·1+1·0 8
Note that the number of columns of the matrix must equal the number of components
of the vector being multiplied. As an application, we note that the system of equations
considered in the previous section
2u + v − w = 5
4u − 2v = 0
6u − 7v + w = −9
Finally, to multiply two matrices, just multiply the left matrix times each column
of the right matrix and line up the resulting two vectors in a new matrix. For example
13
4 1 3 23
3 5
2 6 8 12 18
1 0 = .
1 0 9 3 14
0 1
2 2 1 8 11
8 2. Matrix Notation
Note again that the number of columns of the left factor must equal the number of
rows of the right factor for this to make sense. In this example we multiplied a 4 × 3
matrix by a 3 × 2 matrix and obtained a 4 × 2 matrix. In general, if A is m × n and
B is n × p, then AB is m × p.
Matrix multiplication satisfies the associative law (AB)C = A(BC) and the two
distributive laws A(B + C) = AB + AC and (B + C)D = BD + CD. (The proofs of
these properties are tedious and will be omitted.) It does not, however, satisfy the
commutative law. That is, in general AB �= BA. For example
� �� � � �� �
2 3 0 1 0 1 2 3
�= .
1 2 1 1 1 1 1 2
In fact for many pairs of matrices AB is defined whereas BA is not. (See Exercise
2.)
For every n there is a special n × n matrix, which we call I, with ones down its
diagonal (also called its main diagonal) and zeros everywhere else. For example, in
the 3 × 3 case
1 0 0
I = 0 1 0.
0 0 1
It is easy to see that for any 3 × 3 matrix A we have IA = AI = A and that this
property carries over to the n × n case. For this reason I is called the identity matrix.
The notation for a general matrix A with m rows and n columns is
a11 a12 a13 . . . a1n
a21 a22 a23 . . . a2n
A= a a32 a33 . . . a3n
31 ..
... ..
.
..
.
..
. .
am1 am2 am3 . . . amn
where aij denotes the entry in the ith row and the jth column. Using this notation
we can define matrix multiplication as follows. Let A be m × n and B be n × p,
�
n
then C is the m × p matrix with ijth coefficient cij = aik bkj . We will try to
k=1
avoid expressions like this, but it is important to understand them when writing
computer programs to perform matrix computations. In fact, we can write a very
simple program that uses Gaussian elimination and back substitution to solve an
arbitrary linear system of n equations and n unknowns. First express the system in
array form:
�
a11 a12 a13 . . . a1n � a1 n+1
�
a21 a22 a23 . . . a2n � a2 n+1
�
a31 a32 a33 . . . a3n � a3 n+1 .
. .. .. .. .. �� ..
.. . . . . � .
�
an1 an2 an3 . . . ann an n+1
2. Matrix Notation 9
Finally, we summarize the algebraic laws satisfied by matrix addition and multi-
plication. (The following equalities assume that all indicated operations make sense.)
EXERCISES
1. Compute
� the following:
�
5 7 −1
(a) 2
4 −2 0
6 2 1 −2
(b) 7 1 + 3 6
1 2 5 −7
4 0 −1 3
(c) 0 1 0 4
2 −2 1 −5
10 2. Matrix Notation
4 0 −1
(d) [ 3 4 −5 ] 0 1 0
2 −2 1
4
(e) [1 2 3]5
6
4
(f) 5[1 2 3]
6
2 −1 3 0 1
(g) 5 0 7 2 1
0 −1 0 −2 3
4 0 −1 2 −1 3
(h) 0 1 0 5 0 7
2 −2 1 0 −1 0
4 0 −1 1 0 0
(i) 0 1 0 0 1 0
2 −2 1 0 0 1
5
2 0 0
(j) 0 1 0
0 0 3
3
0 1 2
(k) 0 0 1
0 0 0
2. Which of the expressions 2A, A+B, AB, and BA makes sense for the two matrices
below? Which do not?
� � � �
5 7 −1 2 3
A= B=
4 −2 0 1 2
4. Show with a 3 × 3 example that the product of two upper triangular matrices is
upper triangular.
6. For any matrix A, we define its transpose AT to be the matrix whose columns are
the corresponding rows of A.
2 −1 3
(a) What is the transpose of 5 0 7 ?
0 −1 0
(b) Illustrate the formula (A + B)T = AT + B T with a 2 × 2 example.
(c) The formula (AB)T = B T AT holds as long as the product AB makes sense.
(This requires a proof, which we omit.) Illustrate this with a 2 × 2 example,
and use it to prove the formula (ABC)T = C T B T AT .
(d) If a matrix satisfies AT = A, then what kind of matrix is it? (See Exercise 3
above.)
(e) Show that for any matrix C (not necessarily square) the matrix C T C is sym-
metric. (Use (c) and (d) above.)
(f) Show if A and B are square matrices and A is symmetric, then B T AB is sym-
metric.
(g) Show with a 2 × 2 example that the product of two symmetric matrices may not
be symmetric.
4 0 −1 c1 4 0 −1
7. Verify 0 1 0 c2 = c1 0 + c2 1 + c3 0 .
2 −2 1 c3 2 −2 1
8. Verify
4 0 −1
[ c1 c2 c3 ] 0 1 0 = c1 [ 4 0 −1 ] + c2 [ 0 1 0 ] + c3 [ 2 −2 1].
2 −2 1
(f) A2 + 2AB + B 2
10. Convince yourself that the product AB of two matrices can be thought of as A
multiplying the columns of B to produce the columns of AB or
. .. .. .. .. ..
.. . . . . .
A b1 b2 =
· · · bn Ab1 Ab2 · · · Abn .
.. .. .. .. .. ..
. . . . . .
11. Assuming the operations make sense, which are symmetric matrices?
(a) AT A
(b) AT AAT
(c) AT + A
3. The LU Factorization 13
3. THE LU FACTORIZATION
We call the resulting upper triangular matrix U . The following equations describe
exactly how the Gaussian steps turn the rows of A into the rows of U .
row 1 of U = row 1 of A
row 2 of U = row 2 of A − 2(row 1 of U )
row 3 of U = row 3 of A − 3(row 1 of U ) − 2.5(row 2 of U )
Note that once a row is used as “pivotal row,” it never changes from then on. It
therefore can be considered as a row of U . We can solve these equations for the rows
of A to obtain
row 1 of A = 1(row 1 of U )
row 2 of A = 2(row 1 of U ) + 1(row 2 of U )
row 3 of A = 3(row 1 of U ) + 2.5(row 2 of U ) + 1(row 3 of U ) .
or
2 1 −1 1 0 0 2 1 −1
4 −2 0 = 2 1 0 0 −4 2 .
6 −7 1 3 2.5 1 0 0 −1
14 3. The LU Factorization
We write this equation as A = LU and call the product on the right the LU factor-
ization of A. Note that L is the lower triangular matrix with ones down its diagonal,
with the multipliers 2 and 3 from the first Gaussian step in its first column, and
with the multiplier 2.5 from the second Gaussian step in its second column. The
pattern is the same for every matrix. Any square matrix can be factored by Gaussian
elimination into a product of a lower triangular L with ones down its diagonal and
an upper triangular U , under the proviso that all pivots are nonzero.
How can the LU factorization of A be used to solve the original system Ax = b?
First we replace A by LU in the system to get LU x = b. Then we note that this
system
can
be solved by solving the two systems Ly = b and U x = y in order. Letting
r
y = s , the first system Ly = b is
t
1 0 0 r 5
2 1 0s = 0 ,
3 2.5 1 t −9
slightly harder since there is an extra forward substitution step. But now suppose you
have a second system Ax2 = b2 with a different right-hand side. The LU factorization
method would factor A into LU and then solve LU x1 = b1 and LU x2 = b2 both by
forward and back substitution. On the other hand, the array method would have to
run through the entire Gaussian elimination process twice, once for each system. So
if you have several systems to solve, all of which differ only in their right-hand sides,
then the LU factorization method is preferable.
By counting operations, we can compare the relative expense in computer time of
elimination verses forward and back substitution. We will count only multiplications
and divisions since they take much more time than addition and subtraction. In
the first elimination step for an n × n matrix, a multiplier (one division) times the
second through nth entries of the first row (n − 1 multiplications) is subtracted
from a row below the first. This results in n operations. Since there are n − 1
rows to be subtracted from, the total number of operations for the first step is
(n − 1)n = n2 − n. The second step is exactly like the first except that it is performed
on an (n − 1) × (n − 1) matrix and therefore requires (n − 1)2 − (n − 1) operations.
Continuing in this manner we see that the total number of operations required for
Gaussian elimination is
�n
n3 − n
(k − k) =
2
,
3
k=1
and since n is negligible compared to n3 for large n, we conclude the number of oper-
ations required to compute the LU factorization of an n × n matrix is approximately
n3 /3. Back substitution is much faster since the number of operations required is
easily seen to be
� n
n(n + 1)
k= ,
2
k=1
EXERCISES
2. Use the LU factorizations above and forward and back substitution to solve
� �
−8
(a) Ax =
13
12
(b) Bx = −6
18
0
4
(c) Cx =
−1
2
4
5
(d) Dx = −4
2
3
3. If your computer performs 106 operations/sec and costs $500/hour to run, then
how large a linear system can you solve with a budget of $2? Of $200?
4. Row Exchanges 17
4. ROW EXCHANGES
We now return to the question of what happens when we run into zero pivots.
u + 2v + 3w = 1
2u + 4v + 9w = 5
2u + 6v + 7w = 4 .
A zero pivot has appeared. But note that there is a nonzero entry lower down in the
second column, in this case the 2 in the third row. The problem can therefore be
fixed by just exchanging the second and third rows:
�
1 2 3 �� 1
0 2 1�2.
�
0 0 3�3
This has the harmless effect of exchanging the second and third equations. In this
case we are done with elimination since the array is now ready for back substitution.
u + 2v + 3w = 1
2u + 4v + 9w = 5
3u + 6v + 7w = 5 .
Example 3: In the example above, suppose the right-hand side of the third equation
is equal to 1 instead of 5, then the elimination gives
�
1 2 3 �� 1
0 0 3�3.
�
0 0 0�0
What we really have here is two equations with three unknowns. Back substitution
breaks down since the first equation cannot determine both u and v by itself. In this
case there are infinitely many solutions to the original system. (See Section 7.)
We conclude that when we run into a zero pivot, we should look for a nonzero
entry in the column below the zero pivot. If we find one, we make a row exchange
and continue. If we don’t, then we must stop; a unique solution to the system does
not exist. A matrix for which Gaussian elimination possibly with row exchanges
produces a triangular system with nonzero pivots is called nonsingular. Otherwise
the matrix is called singular.
What happens to the LU factorization of A when there are row exchanges? The
answer is that the product of the L and U we obtain no longer equals the original
matrix A but equals A with row exchanges. Suppose we knew what row exchanges
would be necessary before we started. Then if we performed those exchanges on A
first, we would get the normal LU factorization of this altered A. The altered version
of A is realized by premultiplying A by a permutation matrix P , which is just the
identity matrix with some of its rows exchanged. We would then obtain the equation
P A = LU . For the first example of this section this looks like
1 0 0 1 2 3 1 0 0 1 2 3
0 0 12 4 9 = 2 1 00 2 1.
0 1 0 2 6 7 2 0 1 0 0 3
4. Row Exchanges 19
EXERCISES
1 4 2 u −2
1. Solve by the array method −2 −8 3 v = 32
0 1 1 w 1
5. INVERSES
� �� 2 1
� � �
1 −1 3 3 1 0
= .
1 2 − 13 1
3 0 1
� �
1 0
Some matrices do not have inverses. For example the matrix cannot have
2 0
an inverse since � �� � � �
1 0 a b a b
= ,
2 0 c d 2a 2b
so there is no choice of a, b, c, d that will make the right-hand side equal to the identity
matrix.
How can we tell if a matrix has an inverse, and, if it does have an inverse, then
how do we compute it? We answer the second question first. Let’s try to find the
inverse of
2 −3 2
A = 1 −1 1 .
3 2 2
This means that we are looking for a matrix B such that AB = I or
2 −3 2 b11 b12 b13 1 0 0
1 −1 1 b21 b22
b23 = 0 1 0.
3 2 2 b31 b32 b33 0 0 1
Since the coefficient matrix is the same for all three systems, we can just find the LU
factorization of A and then use forward and back substitution three times to find the
three solution vectors. These vectors, when lined up, will form the columns of B.
22 5. Inverses
If we want to find the solution by hand, we can use the array method and a trick
to avoid running through Gaussian elimination three times. First set up the array
�
2 −3 2 �� 1 0 0
1 −1 1 � 0 1 0
�
3 2 2�0 0 1
Now in this situation we would normally use back substitution three times. But we
could also use Gauss-Jordan elimination. That is, use the −1 in the third row to
eliminate the entries in the column above it by subtracting multiples of the third row
from the second (unnecessary since that entry is already zero) and from the first.
This gives �
2 −3 0 �� 11 −26 2
0 .5 0 �� −.5 1 0.
0 0 −1 5 � −13 1
Then use the .5 in the second row to eliminate the −3 in the first row.
�
2 0 0 �� 8 −20 2
0 .5 0 � −.5 1 0.
�
0 0 −1 � 5 −13 1
The three columns on the right are the solutions to the three linear systems, so
4 −10 1
A−1 = −1 2 0 .
−5 13 −1
These two methods for finding inverses, (1) LU factorization and forward and
back substitution n times and (2) Gauss-Jordan elimination, each require n3 oper-
ations. Either method will work as long as A is nonsingular. Gauss-Jordan elimi-
nation does, however, present some organizational clarity when finding inverses by
5. Inverses 23
hand. Furthermore, since Gauss-Jordan elimination is just the array method per-
formed several times at once, row exchanges can be made without affecting the final
answer.
Once we have the inverse of a matrix, what can we do with it? It might seem
at first glance that A−1 can be used to solve the system Ax = b directly. Just
apply A−1 to both sides to obtain x = A−1 b. This turns out to be much inferior to
ordinary Gaussian elimination with back substitution for two reasons: (1) It takes
n3 operations to find A−1 as compared with n3 /3 operations to solve Ax = b by
Gaussian elimination. (2) Computing inverses, by whatever method, is subject to
much more numerical instability and round-off error than is Gaussian elimination.
Inverses are valuable in theory and for conceptualization. In some areas of statistics
and linear programming it is occasionally necessary to actually compute an inverse.
But for most large-scale applications, the computation of matrix inverses can and
should be avoided.
We end this section with a major result, which we state and prove formally. It
basically says that for any matrix A the three questions, (1) does Gaussian elimination
work, (2) does A have an inverse, and (3) does Ax = b have a unique solution, all
have the same answer.
Theorem. For any square matrix A the following statements are equivalent (all are
true or all are false).
(a) A is nonsingular (that is, Gaussian elimination, possibly with row exchanges,
produces nonzero pivots)
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) Ax = 0 has the unique solution x = 0.
into the zero pivot, we will have a situation that looks something like
∗ ∗ ∗ ∗ ∗ x1 0
0 ∗ ∗ ∗ ∗ x2 0
0 0 0 ∗ ∗ x3 = 0 .
0 0 0 ∗ ∗ x4 0
0 0 0 ∗ ∗ x5 0
But if we set x5 = x4 = 0 and x3 = 1 and solve for x2 and x1 by back substitution,
we will get a nonzero solution to Ax = 0. This shows that (d) is false. The pattern is
the same in all cases. If A is any singular matrix, then Gaussian elimination applied
to Ax = 0 will produce a system that in exactly the same way can be shown to have
nonzero solutions. This proves the theorem.
Note that the first statement of the theorem is equivalent to the fact that A has
P A = LU factorization, which we can now write A = P −1 LU . (We skip the proof
that P −1 exists.)
EXERCISES
2. From Exercises 1(b) and 1(c) what can you say about the inverse of a diagonal
matrix and of an upper triangular matrix?
11
3. Let A be the matrix of Exercise 1(e). Solve the system Ax = 23 by using A−1
13
4. Which of the following matrices is invertible? Why? (See Section 4 Exercise 2.)
1 4 2
(a) 6 −8 2
−2 −8 −4
1 4 2
(b) −2 −8 −3
−1 −4 5
1 3 2 0
0 5 0 2
(c)
0 0 10 2
0 0 0 11
1 3 2 −1
0 5 3 2
(d)
0 0 0 2
0 0 0 10
8. There is a slight hole in our proof of (a) ⇒ (b) in the theorem of this section.
To find the inverse of A, we applied Gauss-Jordan elimination to the array [A, I] to
obtain [I, B]. We then concluded AB = I so that B is a right-inverse of A. But how
26 5. Inverses
do we know that B is also a left-inverse of A? Prove that it is, that is, prove that
BA = I by applying the reverse of the same Gauss-Jordan steps in reverse order to
the array [B,I] to obtain [I,A].
9. More generally, it is true that if a matrix has a one-sided inverse, then it must have
a two-sided inverse. Or more simply stated, AB = I ⇒ BA = I. To prove this, argue
as follows: AB = I ⇒ B is nonsingular ⇒ B is invertible ⇒ A = B −1 ⇒ BA = I.
Fill in the details.
6. TRIDIAGONAL MATRICES
When coefficient matrices arise in applications, they usually have special pat-
terns. In such cases Gaussian elimination often simplifies. We now illustrate this
by looking at tridiagonal matrices, which are the simplest kind of band matrices. A
matrix is tridiagonal if all of its nonzero elements are either on the main diagonal or
adjacent to the main diagonal. Here is an example (from Section 3 Exercise 1(d)):
2 1 0 0 0
4 5 3 0 0
0 3 4 1 0
0 0 −1 1 1
0 0 0 4 3
If we run Gaussian elimination on this matrix, we obtain
2 1 0 0 0
0 3 3 0 0
0 0 1 1 0.
0 0 0 2 1
0 0 0 0 1
This example reveals three properties of tridiagonal matrices and Gaussian elimina-
tion. (1) There is at most one nonzero multiplier in each Gaussian step. (2) The
superdiagonal entries (that is, the entries just above the main diagonal) don’t change.
And (3) the final upper triangular matrix has nonzero entries only on its diagonal
and superdiagonal. If we count the number of operations required to triangulate a
tridiagonal matrix, we find it is equal to n instead of the usual n3 /3. We conclude
that large systems involving tridiagonal matrices are very easy to solve. In fact, we
can write a quick and efficient program that will solve tridiagonal systems directly:
x0 b0
d1 c1
a2 d2 c2 x1 b1
x2 = b2
a3 d3 c3
. .
. . . .. ..
an dn xn bn
for k = 2 to n do
if dk−1 = 0 then signal failure and stop
m = ak /dk−1
dk = dk − mck−1
bk = bk − mbk−1
if dn = 0 then signal failure and stop
xn = bn /dn
for k = n − 1 down to 1 do xk = (bk − ck xk+1 )/dk
28 6. Tridiagonal Matrices
Tridiagonal matrices arise in many situations: electrical circuits, heat flow prob-
lems, the deflection of beams, and so on. Here we show how tridiagonal matrices are
used in cubic spline interpolation. We are given data (x0 , y0 ), (x1 , y1 ), · · · , (xn , yn ).
The points x1 , x2 , · · · , xn−1 are called interior nodes, and x0 and xn are called bound-
ary nodes. The problem is to find a cubic polynomial on each of the intervals
[x0 , x1 ], [x1 , x2 ], · · · , [xn−1 , xn ] such that, at each interior node, the cubic on the left
and the cubic on the right have the same heights, the same slopes, and the same
curvature (that is to say, the same second derivative). If we glue these cubics to-
gether we will obtain a cubic spline, which is a smooth curve passing through all the
data. To make the problem completely determined, we need conditions at the two
boundary nodes. Often these are taken to be the requirement that the spline has no
curvature (zero second derivatives) at the boundary nodes. The spline thus obtained
is called a natural spline. Splines have applications in CAD-CAM, font design, and
modeling.
s1
(x1 , y1 )
(x3 , y3 )
s3
s4
(x0 , y0 )
s0 s2 (x4 , y4 )
(x2 , y2 )
x0 x1 x2 x3 x4
FIGURE 1
How do we find the cubic polynomials that make up the spline? If we knew what
slopes the spline curve should have at its nodes, then we could find the cubic poly-
nomial on each interval using the method of Section 1 Exercise 8. Let s0 , s1 , · · · , sn
be the unknown slopes at the nodes. For simplicity assume that the data is equally
spaced, that is, x1 − x0 = x2 − x1 = · · · = xn − xn−1 = h. Then with some algebraic
effort it is possible to show that the conditions described above force the slopes to
6. Tridiagonal Matrices 29
The first and last equations come from the conditions at the boundary nodes. All
the other equations come from the conditions at the interior nodes. The system is
tridiagonal and therefore easy to solve, even when there is a large number of nodes.
Once the slopes s0 , s1 , · · · , sn are known, the cubic polynomial on each interval can
be found as a cubic Hermite interpolant. (See Section 1 Exercise 8.)
EXERCISES
1. Write down the system that the slopes of the natural spline interpolant of the
data (0,0), (1,1), (2,4), (3,1), (4,0) must satisfy. Solve it. Sketch the resulting spline
curve.
Clearly we cannot use back substitution. Even worse, the second equation, 0u + 0v =
−4, has no solution. This indicates that the entire system has no solution. The
coefficient matrix is of course singular, and the system is said to be inconsistent.
7. Systems with Many Solutions 31
This time the second equation is trivially satisfied for all u and v. So we set v = c
where c is an arbitrary constant and try to continue with back substitution. The first
equation then gives u = 2 − c. The solution is therefore
u=2−c
v=c
or in vector form
u −6 1
v 1 −1
= + c .
w 0 1
x 2 0
There are two free variables, v and x. Each is set to a different arbitrary constant.
The solution is therefore
u = 2 − 3c − 2d
v=d
w = 1 − .5c
x=c
or in vector form
u 2 −3 −2
v 0 0 1
= + c + d .
w 1 −.5 0
x 0 1 0
This time we have a an infinite number of solutions parametrized by two arbitrary
constants.
In general, Gaussian elimination will put the array into echelon form
�
• ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗�∗
�
0 • ∗ ∗ ∗ ∗ ∗ ∗ ∗�∗
�
0 0 0 • ∗ ∗ ∗ ∗ ∗�∗,
�
0 0 0 0 0 0 0 0 •�∗
�
0 0 0 0 0 0 0 0 0 ∗
and Gauss-Jordan elimination will put the array into row-reduced echelon form
�
• 0 ∗ 0 ∗ ∗ ∗ ∗ 0�∗
�
0 • ∗ 0 ∗ ∗ ∗ ∗ 0�∗
�
0 0 0 • ∗ ∗ ∗ ∗ 0�∗.
�
0 0 0 0 0 0 0 0 •�∗
�
0 0 0 0 0 0 0 0 0 ∗
34 7. Systems with Many Solutions
In either case, we get a staircase pattern where the first nonzero entry in each row
(indicated by bullets above) is a pivot. This is the precise mathematical definition of
pivot. For square nonsingular matrices, all pivots occur on the main diagonal. For
singular matrices, at least one pivot occurs to the right of the main diagonal. (Up to
now we have been referring to this informally as the case of a “zero pivot.”)
EXERCISES
1. Solutions can be written in many equivalent ways. Show that the following
expressions represent the same set of solutions.
� � � � � � � � � � � �
u 2 −1 u 2 8
(a) = +c and = +c �
v 0 1 v 0 −8
u 3 0 0 u 3 0 0
(b)
v = 0 +c 1 +d 0 and �
v = 0 +c 1 +d 1 �
w 0 0 1 w 0 0 1
3. Solve each of the following 2 × 2 systems. Then graph each equation as a line and
give a geometric reason for the number of solutions of each system.
� � �
1 3 �� 2
(a)
3 2�1
� � �
2 1 �� −1
(b)
−6 −3 � −4
� � �
3 −1 �� 2
(c)
−6 2 � −4
4. Solve each of the following 3 × 3 systems. Then graph each equation as a plane
and give a geometric reason for the number of solutions of each system.
�
1 1 0 �� 1
(a) 1 −1 0 �� 0
0 0 1�0
�
2 0 0 �� 2
(b) 0 0 3 �� 0
0 0 3�6
�
1 0 0 �� 0
(c) 0 1 0 �� 0
1 1 0�1
�
1 0 0 �� 1
(d) 0 1 0 �� 0
1 1 0�1
�
1 1 1 �� 1
(e) 2 2 2 �� 2
3 3 3�3
8. A nutritious breakfast drink can be made by mixing whole egg, milk, and orange
juice in a blender. The food energy and protein for these ingredients are given below.
How much of each should be blended to produce a drink with 560 calories of energy
and 24 grams of protein?
energy (kcal) protein (g)
1 egg 80 6
1 cup milk 180 9
1 cup orange juice 100 3
The reaction must be balanced, that is, the number of atoms of each element must
be the same before and after the reaction. For oxygen, for example, this would mean
2a + b = 2c + 3d. While there are many possible choices for a, b, c, d that balance the
reaction, it is customary to use the smallest possible positive integers. Find such a
solution.
10. Find the equation of the circle in the form c1 (x2 + y 2 ) + c2 x + c3 y + c4 = 0 that
passes through the points (2,6), (2,0), (5,3).
8. Determinants 37
8. DETERMINANTS
Determinants have been known and studied for 300 years. Today, however, there
is far less emphasis on them than in the past. In modern mathematics, determinants
play an important but narrow role in theory and almost no role at all in computations.
We will make use of them in our study of eigenvalues in Section 9. The determinant
det(A) is a number associated with a square matrix A. For 2 × 2 and 3 × 3 matrices
it is defined as follows
� �
a11 a12
det = a11 a22 − a21 a12
a21 a22
a11 a12 a13
det a21 a22 a23 =
a31 a32 a33
a11 a22 a33 + a12 a23 a31 + a13 a32 a21 − a31 a22 a13 − a21 a12 a33 − a32 a23 a11 .
These are the familiar diagonal rules from high school. These rules cannot be extended
to larger matrices! For such matrices we must use the general definition:
�
det(A) = sign(σ)a1σ(1) a2σ(2) a3σ(3) · · · anσ(n) ,
σ
The symbol sign(σ) is equal to +1 or −1 depending on how the rows and columns
are chosen. We intentionally leave this definition of the determinant vague since it is
hard to understand, difficult to motivate, and impossible to compute. It is important
to us only because from it the following properties of the determinant can be proved.
We will omit the proofs since in this section we want to get through the determinant
as quickly as possible. In a later section we present another approach that will make
clear where the mysterious determinant formula comes from and how the properties
are dertived.
38 8. Determinants
(2) If A has a zero row or two equal rows or two rows that are multiples of each other,
then det(A) = 0.
1 4 2 1 4 2 1 4 2
det 0 0 0 = 0
det 3 5 2 = 0
det 3 5 2 = 0
5 7 1 1 4 2 2 8 4
(3) The determinant changes sign when two rows are exchanged.
1 2 2 3 1 3
det 5 7 1 = − det 5 7 1
3 1 3 1 2 2
(4) The typical Gaussian elimination operation of subtracting a multiple of one row
from another leaves the determinant unchanged.
1 2 2 1 2 2
det 3 1 3 = det 0 −5 −3
5 7 1 5 7 1
(5) If all the entries in a row have a common factor, then that factor can be taken
outside the determinant
6 3 12 2 1 4
det 5 7 1 = 3 det 5 7 1
2 5 2 2 5 2
(6) The determinant of the transpose of a matrix is the same as the determinant of
the matrix itself: det(AT ) = det(A).
1 2 2 1 5 3
det 5 5
5 = det 2 5 1
3 1 3 2 5 3
8. Determinants 39
(7) The determinant of a (lower or upper) triangular matrix is the product of its
diagonal entries.
2 3 7
det 0 5 2 = 2 · 5 · 3 = 30
0 0 3
Note that property 6 means that all the properties about rows also hold for
columns. Note also that property 9 can be added to the theorem of Section 5 to
obtain
Theorem. For any square matrix A the following statements are equivalent.
(a) A is nonsingular
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) Ax = 0 has the unique solution x = 0.
(e) det(A) �= 0.
Proof: We show (a) ⇔ (e). If A is nonsingular, then Gaussian elimination will produce
an upper triangular matrix with nonzero pivots. Since by properties 3 and 4 Gaussian
elimination changes at most the sign of the determinant, we have that det(A) �= 0.
If A is singular, then Gaussian elimination will produce an upper triangular matrix
with at least one zero pivot. By the same argument, det(A) = 0.
If the determinant is to have any practical value, there must be an efficient way to
compute it. We could try to use the formula in the definition of the determinant. But
as we saw, the formula consists of a sum of products of n entries of A, where in each
product each factor is an entry from a different row and a different column. Since the
first entry in a product can be chosen in n ways, the second in n − 1 ways, the third in
n − 2 ways, and so on, there are therefore n(n−1)(n−2)(n−3) · · · (2)(1) = n! different
products in the sum. This means there are n! products that must be summed up,
each of which requires n − 1 multiplications, resulting in (n − 1)n! multiplications
in all. For a 25 × 25 matrix there would be 24·25! or 3.7×1026 multiplications. A
computer that can perform a million multiplications a second would take 1013 years
to compute this determinant! This is clearly unacceptable.
An alternate approach is suggested by the proof of property (10). Use Gaussian
elimination to triangulate the matrix. Then the determinant is the product of the
40 8. Determinants
diagonal entries (the pivots!) times +1 or −1 depending upon whether there was an
even or odd number of row exchanges. For example, the matrix at the beginning of
Section 4 was reduced to an upper triangular matrix by Gaussian elimination with
one row exchange.
1 2 3 1 2 3 1 2 3
2 4 9 → 0 0 3 → 0 2 1
2 6 7 0 2 1 0 0 3
We therefore have
1 2 3 1 2 3
det 2 4 9 = − det 0 2 1 = −1 · 2 · 3 = −6.
2 6 7 0 0 3
Since this method uses only Gaussian elimination, it requires n3 /3 operations. For a
25 × 25 matrix this is only 5208 operations or only 0.005 seconds on our hypothetical
computer! The method above is an excellent way to compute the determinant, but it
takes just as many steps as Gaussian elimination. In fact, it is Gaussian elimination!
Why do we want to compute a determinant in the first place? What can it tell
us about a matrix? Whether or not the matrix is singular? But we can determine
that just by doing Gaussian elimination. If we run into a zero pivot that cannot
be cured by row exchanges, then we know the matrix is singular. Otherwise we get
its LU factorization. So do we ever need to compute a determinant in practice?
No! Determinants are rarely computed outside a classroom. They are important,
however, for theoretical developments as we will see in the next section.
The determinant can be evaluated in other ways. In particular, there is the
cofactor expansion of the determinant. It expresses the determinant of a matrix as a
sum of determinants of smaller matrices. Here we use it to find the determinant of
the matrix above:
1 2 3 � � � � � �
4 9 2 9 2 4
det 2 4 9 = 1 det − 2 det + 3 det
6 7 2 7 2 6
2 6 7
= 1(28 − 54) − 2(14 − 18) + 3(12 − 8)
= −26 + 8 + 12
= −6.
In words, the determinant of the matrix on the left is the sum of the entries of its
first row times the cofactors of its first row. A cofactor is the determinant of the
2 × 2 matrix obtained from the original matrix by crossing out a particular row and
a column, with an appropriate sign placed in front of the determinant. In particular,
the cofactor of the first entry is the determinant of the matrix obtained by crossing
8. Determinants 41
out the first row and first column; the cofactor of the second entry is the determinant
of the matrix obtained by crossing out the first row and the second column with a
negative sign in front; and the cofactor of the third entry is the determinant of the
matrix obtained by crossing out the first row and the third column. Here is another
cofactor expansion of the same matrix:
1 2 3 � � � � � �
2 9 1 3 1 3
det 2 4 9 = −2 det + 4 det − 6 det
2 7 2 7 2 9
2 6 7
= −2(14 − 18) + 4(7 − 6) − 6(9 − 6)
= 8 + 4 − 18
= −6.
This time we expanded with respect to the second column. Note that the 2 × 2 ma-
trices arise in the same way, by crossing out the row and column of the corresponding
entry. Note also the signs. In general the signs in the definition of the cofactors form
a checkerboard pattern:
+ − + − ···
− + − + ···
+ − + − ···.
− + − + ···
.. .. .. ..
. . . .
Here’s an example of a cofactor expansion of the determinant of a 4 × 4 matrix:
1 1 2 4
1 0 4 2
det
1 −1 0 0
2 2 2 6
0 4 2 1 4 2 1 0 2 1 0 4
= 1 det −1 0 0 − 1 det 1 0
0 + 2 det 1 −1
0 − 4 det 1 −1 0.
2 2 6 2 2 6 2 2 6 2 2 2
We expanded with respect to the first row. In this case we are now faced with
finding four 3 × 3 determinants. We could use either cofactor expansion or the high-
school formula on each of these smaller determinants. (Note that we should have
expanded with respect to the third row because then we would have had only two
3 × 3 determinants to evaluate.) It is becoming clear that the method of cofactor
expansion requires a great deal of computation. Just think about the 5 × 5 case! In
fact, it generally requires exactly the same number of multiplications as the formula
that defined the determinant in the first place. It is therefore extremely impractical.
It does, however, have some value in theoretical considerations and in the hand
42 8. Determinants
det A = ai1 [(−1)i+1 det Mi1 ] + ai2 [(−1)i+2 det Mi2 ] + · · · + ain [(−1)i+n det Min ]
where Mij is the submatrix formed by deleting the ith row and jth column of A. (The
formula for expansion with respect to columns is similar.) Note that the cofactor is
officially defined as the entire quantity in brackets, that is, as the determinant of the
submatrix Mij times (−1)i+j . The formula is not very illuminating, and we make no
attempt to prove it.
EXERCISES
4. For the matrix in Exercise 1(a) find det(A−1 ) and det(AT ) without doing any
work.
7. True or false? “If det(A) = 0, then the homogeneous system Ax = 0 has nonzero
solutions.”
44 9. Eigenvalues
9. EIGENVALUES
There are many problems in engineering and science where, given a square matrix
A, it is necessary to know if there is a number λ (read “lambda”) and a nonzero vector
x such that Ax = λx. The number λ is called an eigenvalue of A and the vector x
is called an eigenvector associated with λ. (“Eigen” is a German word meaning “its
own” or “peculiar to it.”) For example
5 4 4 1 1
−7 −3 −1 −1 = 5 −1 .
7 4 2 1 1
1
So that 5 is an eigenvalue of the matrix above and −1 is an associated eigenvector.
1
Note that any multiple
of this vector is also an eigenvector. That is, any vector of
1
the form c −1 is an eigenvector associated with the eigenvalue 5. So what we
1
actually have is an infinite family of eigenvectors. Note also that
thisinfinite family
−2
�
can be represented in many other ways such as, for example, c 2 .
−2
Suppose we want to find the eigenvalues of a matrix A. We start by rewriting
the equation Ax = λx as Ax=λIx or Ax − λIx=0 or (A − λI)x=0. We therefore
want to find those numbers λ for which the homogeneous system (A − λI)x = 0 has
nonzero solutions x. By the theorem of the previous section, this is equivalent to
asking for those numbers λ that make the matrix A − λI singular or, in other words,
for which det(A − λI) = 0. This equation is called the characteristic equation of A.
The left-hand side is a polynomial in λ and is called the characteristic polynomial of
A.
First set � � � � � �
4 2 λ 0 4−λ 2
A − λI = − = .
−1 1 0 λ −1 1−λ
The characteristic equation of A is det(A−λI) = 0, which can be rewritten as follows:
� �
4−λ 2
det =0
−1 1−λ
9. Eigenvalues 45
(4 − λ)(1 − λ) − 2(−1)=0
λ2 − 5λ + 6=0
(λ − 2)(λ − 3)=0.
The eigenvalues of A are therefore λ = 2 and λ = 3. We can go further and find the
associated eigenvectors. For the case λ = 2 we wish to find nonzero solutions of the
system (A − 2I)x = 0, which can be rewritten as
� �� � � �
4−2 2 u 0
= .
−1 1 − 2 v 0
to get � � �
2 2 �� 0
.
0 0�0
The solution is v = c and u = −c, or in vector form
� � � �
u −1
=c .
v 1
Therefore, for each of the two eigenvalues we have found an infinite family of eigen-
vectors parametrized by a single arbitrary constant.
46 9. Eigenvalues
Example 2: Things can become more complicated as the size of the matrix increases.
Consider the matrix
2 3 0
A = 4 3 0.
0 0 6
Proceeding as before we have the characteristic equation det(A − λI) = 0 rewritten
as
2−λ 3 0
det 4 3−λ 0 = 0,
0 0 6−λ
(2 − λ)(3 − λ)(6 − λ) − 3 · 4(6 − λ)=0,
[(2 − λ)(3 − λ) − 3 · 4](6 − λ)=0,
[λ2 − 5λ − 6](6 − λ)=0,
−(λ − 6)2 (λ + 1)=0.
Here we have two eigenvalues, λ = 6 and λ = −1. To find the eigenvectors for λ = −1,
solve (A − (−1)I)x = 0 or
2+1 3 0 u 0
4 3+1 0 v = 0
0 0 6+1 w 0
by reducing �
3 3 0 �� 0
4 4 0 �� 0
0 0 7�0
to �
3 3 0 �� 0
0 0 0 �� 0 .
0 0 7�0
The solution is w = 0, v = c, and u = −c, or in vector form
u −1
v = c 1 .
w 0
by reducing �
−4 3 0 �� 0
4 −3 0 �� 0
0 0 0�0
to �
−4 3 0 �� 0
0 0 0 �� 0 .
0 0 0�0
which is easily seen to have characteristic equation (λ − 2)2 = 0 and therefore the
repeated eigenvalue λ = 2, 2. But in solving the system (A − 2I)x = 0 we obtain
� �� � � �
2−2 1 u 0
=
0 2−2 v 0
48 9. Eigenvalues
or � � �
0 1 �� 0
.
0 0�0
So u = c and v = 0 or in vector form
� � � �
u 1
=c .
v 0
Example 4: Even worse, a matrix can have no (real) eigenvalues at all. For example
the matrix � �
0 1
A=
−1 0
In this section we have seen that, in order to understand eigenvalues, we have to know
something about determinants. In fact, the characteristic polynomial is defined as a
determinant. Because of this, in practice it is very difficult to compute characteristic
polynomials for large matrices. Even when this can be done, the problem of finding
the roots of a high degree polynomial is numerically unstable. For practical computa-
tions, a much more sophisticated algorithm called the QR method, which has nothing
to do with characteristic polynomials, is used to find eigenvalues and eigenvectors.
Although the characteristic polynomial is important in theory, in practice it is rarely,
if ever, computed.
EXERCISES
2. Suppose you and I are computing eigenvectors. We get the results below. Explain
in what sense we got the same answers, or not.
−3 4
(a) You get 9 and I get −12 .
6 −8
1 0 1 1
(b) You get 1 , 1 and I get 2 , 0 .
0 1 1 −1
1 0 1 1 1
(c) You get 1 , 1 and I get 2 , 0 , 1 .
0 1 1 −1 0
1 0 1 2
(d) You get 1 , 1 and I get 2 , 4 .
0 1 1 2
(g) If A is triangular, then its eigenvalues are its diagonal entries a11 , a22 , · · · , ann .
� �
B C
4. If A = is the matrix of Section 8 Exercise 6, then show that the
0 D
eigenvalues of A are the eigenvalues of B together with the eigenvalues of D. (Hint:
Show det(A − λI) = det(B − λI) det(D − λI).)
5. Findthe eigenvalues
and associated eigenvectors of each of the following matrices.
2 0 0
(a) 0 2 0
0 0 2
2 1 0
(b) 0 2 0
0 0 2
2 1 0
(c) 0 2 1
0 0 2
10. Diagonalization 51
10. DIAGONALIZATION
which had two eigenvalues, λ = 2 and λ = 3. If we write the two equations Ax1 = 2x1
and Ax2 = 3x2 , where x1 and x2 are the associated eigenvectors, we obtain
� �� � � � � �� � � �
4 2 −1 −1 4 2 −2 −2
=2 and =3 .
−1 1 1 1 −1 1 1 1
The two eigenvectors can be lined up to form the columns of a matrix S so that the
two equations above can be combined into one matrix equation AS = SD where D
is the diagonal matrix of eigenvalues:
� �� � � �� �
4 2 −1 −2 −1 −2 2 0
= .
−1 1 1 1 1 1 0 3
. .. .. .. .. ..
.. . . . . .
A v1 v2 · · · vn = Av1 Av2 · · · Avn
.. .. .. .. .. ..
. . . . . .
. .. ..
.. . .
= λ1 v1 λ2 v2 · · · λn vn
.. .. ..
. . .
.
.. .. .. λ1
. . λ2
= v1 v2 · · · vn ..
.. .. .. .
. . . λn
52 10. Diagonalization
This last step is possible only if S is invertible. S will in fact be invertible if its
columns, which are the eigenvectors v1 , v2 · · · , vn , are linearly independent. This of
course leaves a giant gap in our discussion since at this point we still don’t know what
“linear independent” means. We will fill this gap in Sections 16 and 19. Our method
for finding eigenvectors, which is to solve (A − λI)x = 0 by Gaussian elimination,
does in fact produce linearly independent eigenvectors, one for each free variable.
The only question is are there enough linearly independent eigenvectors to form a
square matrix S? If the answer is yes, then A can be factored into A = SDS −1 where
S is invertible and D is diagonal, and A is called diagonalizable. If the answer is no,
then A is not diagonalizable.
Note that the diagonal factorization of a matrix is not completely unique. For exam-
ple,
−1
2 3 0 1 0 3 −1 0 0 1 0 3
4 3 0 = −1 0 4 0 6 0 −1 0 4
0 0 6 0 1 0 0 0 6 0 1 0
is an equally valid factorization.
eigenvector, and these eigenvectors can be used to form the columns of S since they
are independent. But why do distinct eigenvalues insure diagonability in general?
This follows from the fact, to be proved later, that eigenvectors associated with dis-
tinct eigenvalues are always independent. (See Section 22.)
3. It would be helpful if we could decide if a matrix is diagonalizable just by looking
at it, without having to go through the tedious process of determining if it has enough
independent eigenvectors. Unfortunately there is no simple way to do this. But there
is an important class of matrices that are automatically diagonalizable. These are
the symmetric matrices. A deep theorem in linear algebra, called The Spectral The-
orem, says in part that all symmetric matrices are diagonalizable. (See Section 22.)
A nonsymmetric matrix may or may not be diagonalizable, but, fortunately, many
of the matrices that arise in physics and engineering are symmetric and are therefore
diagonalizable.
EXERCISES
3. Decide which of the following matrices are diagonalizable just by looking at them.
0 −2 2
(a) −2 0 −2
2 2 2
0 2 2
(b) 2 0 −2
2 −2 0
0 2 2
(c) −2 0 2
2 −2 0
4. If �A �is 2 × 2 with
� eigenvalues
� λ1 = 6 and λ2 = 7 and associated eigenvectors
5 2
v1 = and v2 = , then find the following.
9 4
(a) The characteristic polynomial of A.
(b) det(A)
(c) A
(d) The eigenvalues of A2 .
(e) det(A2 )
54 11. Matrix Exponential
So far we have developed a simple algebra for square matrices. We can add,
subtract, and multiply them, and therefore expressions like I + 2A − 3A2 + A3 make
sense. Of course we cannot divide matrices, but A−1 can be thought of as the
reciprocal of a matrix (defined only if A is nonsingular).
√ Is it possible for us to go
further and give meaning to expressions like A, e , ln(A), sin(A), cos(A), . . .?
A
This infinite series converges to ex for any value of x and therefore can be taken as
the definition of ex . We use it as the starting point for the matrix exponential by
simply defining
�∞
1 n 1 1
eA = A = I + A + A2 + A3 + · · ·
n=0
n! 2! 3!
for a square matrix A. Does this make sense? Let’s try an example:
�� �� � � � � � �2 � �
0 0 1 0 0 0 1 0 0 1 0
exp = + + + ··· =
0 0 0 1 0 0 2! 0 0 0 1
(Note that eA is also written as exp(A).) The exponential of the zero matrix is
therefore the identity matrix. Let’s try another example:
�� �� � � � � � �2 � �3
2 0 1 0 2 0 1 2 0 1 2 0
exp = + + + + ···
0 3 0 1 0 3 2! 0 3 3! 0 3
� � � � � 22 � � 23 �
1 0 2 0 0 0
= + + 2! + 3! + ···
0 1 0 3 2
0 32! 0 33!
3
� ∞ n
2
n! 0
= n=0 ∞ n
�
0 3
n!
� 2 � n=0
e 0
= .
0 e3
It is clear that to exponentiate a diagonal matrix you just exponentiate its diagonal
entries. Note that in both computations above the infinite series of matrices con-
verged (trivially in the first example). Does this always happen? Yes! It can be
11. Matrix Exponential 55
shown that the infinite series for eA converges for any square matrix A whatever.
(We omit the proof.) Therefore eA exists for any square matrix A.
Accepting this, we still have the problem of how to compute eA for more compli-
cated matrices than those in the two previous examples. We can use two properties of
the matrix exponential to help us. The first is that if AB = BA then eA+B = eA eB .
(We omit the proof.) This just says that, if A and B commute, then for these matri-
ces the matrix exponential satisfies the familiar law of exponents. We use this fact
to compute the following:
�� �� �� � � ��
2 3 2 0 0 3
exp = exp +
0 2 0 2 0 0
�� �� �� ��
2 0 0 3
= exp exp
0 2 0 0
� 2 � �� � � � � �2 �
e 0 1 0 0 3 1 0 3
= + + + ···
0 e2 0 1 0 0 2! 0 0
� 2 � �� � � � � � �
e 0 1 0 0 3 1 0 0
= + + + ···
0 e2 0 1 0 0 2! 0 0
� 2 �� �
e 0 1 3
=
0 e2 0 1
� 2 �
e 3e2
= .
0 e2
(Don’t forget to first show the two matrices above commute in order to justify the
use of the law of exponents.)
The second helpful property of matrix exponentials is that if A = SDS −1 then
eA = SeD S −1 . The proof is so simple we exhibit it here:
�∞
1
e =
A
(SDS −1 )n
n=0
n!
�∞
1
= SDn S −1 (See Section 10 Exercise 2.)
n=0
n!
�∞ �
� 1
=S Dn S −1
n=0
n!
= SeD S −1
We could multiply out the right-hand side, or we might just want to leave it in this
form. If A is defective, that is, if A doesn’t have a diagonalization factorization, then
there are more sophisticated ways to compute eA . We will not pursue them here. In
applications to ODE’s we will need to compute matrix exponentials of the form eAt .
But this is easy for diagonalizable matrices like the one above since
� � � �� �� �−1
4 −5 1 5 −t 0 1 5
t=
2 −3 1 2 0 2t 1 2
and therefore
�� � � � �� �� �−1
4 −5 1 5 e−t 0 1 5
exp t )= .
2 −3 1 2 0 e2t 1 2
There is one more property of matrix exponentials that we will need in appli-
d at
cations. It is analogous to the derivative formula e = aeat . For the matrix
dt
d
exponential it is just eAt = AeAt . The proof follows:
dt
∞
d At d � 1
e = (At)n
dt dt n=0 n!
∞
d � 1 n n
= A t
dt n=0 n!
�∞
1 n n−1
= A nt
n=1
n!
∞
� 1
=A An−1 tn−1
n=1
(n − 1)!
∞
� 1
=A (At)n−1
n=1
(n − 1)!
�∞
1
=A (At)n
n=0
n!
= AeAt .
11. Matrix Exponential 57
EXERCISES
d
4. Verify the formula eAt = AeAt where A is equal to the following matrices.
dt
� �
2 0
(a)
0 3
� �
2 3
(b)
0 2
7. Prove (eA )−1 = e−A and conclude that eA is nonsingular for any square matrix
A.
58 12. Differential Equations
Example 1: Suppose we want to solve the following linear system of first-order ordi-
nary differential equations with initial conditions:.
ẋ = 4x − 5y x(0) = 8
ẏ = 2x − 3y y(0) = 5
If we let A be the matrix defined above (called the coefficient matrix), then the
system becomes simply u̇ = Au. The solution of the system u̇ = Au with initial con-
dition u(0) is u(t) = eAt u(0). This fact follows immediately from the computations
d At
(e u(0)) = A(eAt u(0)) and eA0 u(0) = Iu(0) = u(0). For the example above, the
dt
solution would just be �� � �� �
4 −5 8
u(t) = exp t .
2 3 5
Since the coefficient matrix has the diagonal factorization
� � � �� �� �−1
4 −5 1 5 −1 0 1 5
= ,
2 −3 1 2 0 2 1 2
we have � �� �� �−1 � �
1 5 e−t 0 1 5 8
u(t) = .
1 2 0 e2t 1 2 5
To find the final solution it looks like we are going to have to compute an inverse.
But in fact this can be avoided by writing
� �−1 � � � �
1 5 8 c
= 1
1 2 5 c2
12. Differential Equations 59
as � �� � � �
1 5 c1 8
= ,
1 2 c2 5
which is just a linear system. Solving by Gaussian elimination we obtain
� � � �
c1 3
= .
c2 1
If no initial conditions are given, then c1 and c2 would have to be carried through to
the end. The solution would then look like
� �� �� �
1 5 e−t 0 c1
u(t) =
1 2 0 e 2t
c2
� �
c e−t + 5c2 e2t
= 1 −t
c1 e + 2c2 e2t
� � � �
−t 1 2t 5
= c1 e + c2 e .
1 2
We have expressed the solution in matrix form and in vector form. Note that the
vector form is a linear combination of exponentials involving the eigenvalues times
the associated eigenvectors. In fact if we set t = 0 in the vector form, then from the
initial conditions we obtain
� � � � � �
1 5 8
c1 + c2 =
1 2 5
60 12. Differential Equations
or � �� � � �
1 5 c1 8
= ,
1 2 c2 5
which is the same system for the c’s that we obtained above. So the vector form of
the solution carries all the information we need. This suggests that we really don’t
need the matrix factorization at all. To find the solution to u̇ = Au, just find the
eigenvalues and eigenvectors of A, and, assuming there are enough eigenvectors, write
down the solution in vector form.
Note that once we recognize the general form of the solution, we can just write it
down without going through the matrix exponential at all. In general, it is clear that
if A is diagonalizable, that is, if it has eigenvalues λ1 , λ2 , · · · , λn and independent
eigenvectors v1 , v2 , · · · , vn , then the solution to u̇ = Au has the form
It is also clear that the eigenvalues decide how the solutions behave as t → ∞. If all
the eigenvalues are negative, then all the solutions consist only of linear combinations
of dying exponentials, and therefore u(t) → 0 as t → ∞. In this case the matrix A
is called stable. If at least one eigenvalue is positive, then there are solutions u(t)
containing at least one growing exponential and therefore those u(t) → ∞ as t → ∞.
In this case the matrix A is called unstable . This is the situation with both systems
above. There is also a third possibility. If all the eigenvalues are negative or zero with
at least one actually equal to zero, then the solutions consist of linear combinations
of dying exponentials and at least one constant function, and therefore all solutions
stay bounded as t → ∞. In this case the matrix A is called neutrally stable. The
eigenvalues therefore determine the qualitative nature of the solution.
All this is clear enough for diagonalizable matrices, but what about defective
matrices? Consider the following example:
� � � �� �
ẋ 2 3 x
= .
ẏ 0 2 y
The solution is
� � �� � �� � � 2t �� �
x 2 3 x(0) e 3te2t x(0)
= exp t = .
y 0 2 y(0) 0 e2t y(0)
(See Section 11 Exercise 3.) A term of the form te2t has appeared. This is typical of
defective systems. Note that this term does not change the qualitative nature of the
solution u(t) as t → ∞. In general, terms of the form tn eλt arise, but they tend to
zero or infinity as t → ∞ depending on whether λ is negative or positive. The factor
tn ultimately has no effect. It can be shown that this behavior holds for all defective
matrices. That is, the definitions of stable, unstable, and neutrally stable and their
implications about the long-term behavior of solutions hold for these matrices also.
(Actually a more precise statement has to be made in the case that zero is a multiple
eigenvalue, but we will ignore this possibility.) All of this will become clearer when
we consider the Jordan form of a matrix in a later section.
EXERCISES
2. Find the solutions of the systems above with the initial conditions below.
� � � � x(0) 1 x(0) 0
x(0) 3
(a) = (b) y(0) = 2 (c) y(0) = 1
y(0) 2
z(0) −3 z(0) 3
62 12. Differential Equations
x(0) 2
x(0) 0 x(0) 4
y(0) 2
(d) y(0) = 0 (e) y(0) = 3 (f) =
z(0) 1
z(0) 1 z(0) 4
w(0) 2
4. Here is another way to derive the general form of the solution of the system
u̇ = Au, assuming the diagonal factorization A = SDS −1 . Make the change of
variables w = S −1 u, and show that the system then becomes ẇ = Dw. This is just a
simple system of n individual of ODE’s of the form ẇ1 = λ1 w1 , ẇ2 = λ2 w2 , · · · , ẇn =
λn wn . These equations are well-known to have solutions w1 (t) = c1 eλ1 t , w2 (t) =
c2 eλ2 t , · · · , wn (t) = cn eλn t . Write this as
c1 eλ1 t
c2 eλ2 t
w(t) =
...
cn eλn t
where the v’s are the columns of S, that is, the eigenvectors of A. This alternate
approach avoids the matrix exponential, but it does not generalize so easily to the
complex case or the case of defective matrices.
13. The Complex Case 63
notation.
� We
� write the first eigenvalue and associated eigenvector as λ = 2 + i and
1+i
v= . Then the second eigenvalue and associated eigenvector are λ = 2 − i
1� �
1−i
and v = . Clearly they are just complex conjugates of the first eigenvalue
1
and eigenvector and therefore don’t add any new information. We can ignore them.
Now identify� the � real
� and
� imaginary parts of λ and v as λ = α + iβ = 2 + i and
1 1
v = x+iy = +i . Then the basic equation Av = λv can be written A(x+iy) =
1 0
(α + iβ)(x + iy). When multiplied out it becomes Ax + iAy = (αx − βy) + i(βx + αy).
Since complex numbers are equal if and only if their real and imaginary parts are
equal, this equation implies that Ax = αx − βy and Ay = βx + αy. These two
equations can be written simultaneously in matrix form as
� � � �� �
x1 y1 x1 y1 α β
A =
x2 y2 x2 y2 −β α
or � �� �� �−1
x1 y1 α β x1 y1
A= .
x2 y2 −β α x2 y2
Therefore for the matrix of our example we obtain
� � � �� �� �−1
3 −2 1 1 2 1 1 1
= .
1 1 1 0 −1 2 1 0
This our desired factorization. Everything on the right side is real. The middle factor
is no longer diagonal, but it exhibits the real and imaginary parts of the eigenvalue
in a nice pattern. (The question of the independence of the vectors x and y will be
settled in Section 16 Exercise 7.)
Let’s look at another example. Let
−2 −2 −2 −2
1 0 −2 −1
B= .
0 0 1 −2
0 0 1 3
0 0 −1 + i −1 − i
−i i 1 1
B=
−1 + i −1 − i 0 0
1 1 0 0
2+i 0 0 0
0 2−i 0 0
0 0 −1 + i 0
0 0 0 −1 − i
−1
0 0 −1 + i −1 − i
−i i 1 1
.
−1 + i −1 − i 0 0
1 1 0 0
13. The Complex Case 67
The coefficient matrix is just B of the second example. We solve the system in the
same way as above using the real block-diagonal factorization of B and obtain
68 13. The Complex Case
w(t) −2 −2 −2 −2 w(0)
x(t) 1 0 −2 −1 x(0)
= exp t
y(t) 0 0 1 −2 y(0)
z(t) 0 0 1 3 z(0)
0 0 −1 1
0 −1 1 0
=
−1 1 0 0
1 0 0 0
2 1 0 0
−1 2 0 0
exp t
0 0 −1 1
0 0 −1 −1
−1
0 0 −1 1 w(0)
0 −1 1 0 x(0)
−1 1 0 0 y(0)
1 0 0 0 z(0)
2t
0 0 −1 1 e cos t e2t sin t 0 0 c1
0 −1 1 0 −e sin t e cos t 2t 2t
0 0 c2
=
−1 1 0 0 0 0 e−t cos t e−t sin t c3
1 0 0 0 0 0 −e sin t e cos t
−t −t
c4
0 0 −1 1 c1 e2t cos t + c2 e2t sin t
0 −1 1 0 −c1 e2t sin t + c2 e2t cos t
=
−1 1 0 0 c3 e−t cos t + c4 e−t sin t
1 0 0 0 −c3 e−t sin t + c4 e−t cos t
0 0
0 −1
= (c1 e2t cos t + c2 e2t sin t) + (−c1 e2t sin t + c2 e2t cos t)
−1 1
1 0
−1 1
1 0
+(c3 e−t cos t + c4 e−t sin t) + (−c3 e−t sin t + c4 e−t cos t) .
0 0
0 0
(The third equality requires a slight generalization of Section 11 Exercise 5(b).)
Now we can see the pattern. If λ = α + iβ, v = x + iy is a complex eigenvalue-
eigenvector pair for the coefficient matrix, then so is λ = α − iβ, v = x − iy, and they
together will contribute terms like
· · · + (c1 eαt cos βt + c2 eαt sin βt)x + (−c1 eαt sin βt + c2 eαt cos βt)y + · · ·
The imaginary part β of the eigenvalue controls the frequency of the oscillations. The
real part α of the eigenvalue determines whether the oscillations grow without bound
or die out. We can therefore extend the language of the real case and say that a
matrix is stable if all of its eigenvalues have negative real parts, is unstable if one of
its eigenvalues has positive real part, and is neutrally stable if all of its eigenvalues
have nonpositive real parts with at least one with real part actually equal to zero.
What about defective matrices? These are matrices with repeated complex
eigenvalues that do not provide enough independent eigenvectors with which to con-
struct a diagonalization. It is still possible by more general kinds of factorizations to
compute exponentials of such matrices. In systems of differential equations such ma-
trices will produce solutions containing terms of the form tn eαt cos βt and tn eαt sin βt.
Just as in the real case, the factor of tn doesn’t have any effect on the long-term qual-
itative behavior of such solutions. Stability or instability and the oscillatory behavior
of the solutions is still determined by the eigenvalues. Therefore, if you know the
eigenvalues of a system of differential equations, you know a lot about the behavior
of the solutions of that system without actually solving it.
Finally we present an application that describes vibrations in mechanical and
electrical systems. In modeling mass-spring systems, Newton’s second law of motion
and Hooke’s law lead to the second-order differential equation mẍ(t) + kx(t) = 0,
where m = the mass, k = the spring constant, and x(t) = the displacement of the
mass as a function of time. For simplicity, divide by m and let ω 2 = k/m, so the
equation becomes ẍ + ω 2 x = 0. In order to use the machinery that we have built up,
we have to cast this second-order equation into a first-order system. To do this let
y1 = x and y2 = ẋ. We then obtain the system
ẏ1 = y2
ẏ2 = − ω 2 y1
or in matrix form � � � �� �
ẏ1 0 1 y1
= .
ẏ2 −ω 2 0 y2
To solve the system we have to diagonalize the coefficient matrix. The eigenvalues
are λ = ±iω. Using Gaussian elimination to solve (A − iωI)x = 0
� � �
−iω 1 �� 0
−ω 2 −iω � 0
� � �
−iω 1 �� 0
0 0�0
� � � � � �
1 1 0
we obtain the eigenvector = +i . The solution of the system is
iω 0 ω
therefore
� � � � � �
y1 (t) 1 0
= (c1 cos ωt + c2 sin ωt) + (−c1 sin ωt + c2 cos ωt) .
y2 (t) 0 ω
70 13. The Complex Case
It follows that the solution of the original problem is x(t) = y1 (t) = c1 cos ωt +
c2 sin ωt. This is the mathematical representation of simple harmonic motion.
EXERCISES
� � � �
1 a −b
1. Verify = +i and (a + ib)(c + id) = (a + ib)(c + id).
a + ib a + b2
2 a + b2
2
(a) ẋ = 9x − 10y
ẏ = 4x − 3y
(b) ẋ = − x + 3z
ẏ = − 5x + y + z
ż = − 3x − z
6. Find the solutions of the systems in Exercise 5 with the following initial conditions.
� � � �
x(0) 3
(a) =
y(0) 1
x(0) −2
(b) y(0) = −1
z(0) 3
14. Difference Equations and Markov Matrices 71
u1 = Au0
u2 = Au1
u3 = Au2
..
.
uk = Auk−1 .
The basic challenge posed by a difference equation is to describe the behavior of the
sequence u0 , u1 , u2 , u3 , · · ·. Specifically, (1) determine if the sequence has a limit and
if so then find it, and (2) find an explicit formula for uk in terms of u0 . To this end
we observe that
u1 = Au0
u2 = Au1 = A(Au0 ) = A2 u0
u3 = Au2 = A(A2 u0 ) = A3 u0
..
.
uk = Ak u0
= SDk S −1 u0
= SDk c
. k
.. .. .. λ1 c1
. . λk2 c2
= v1 v2 · · · vn .. .
..
.. .. .. .
. . . λkn
cn
.
.. c1 λ1
k
.. ..
. . c2 λk2
= v1 v2 · · · vn ..
.. .. .. .
. . . cn λkn
. . .
.. .. ..
= c1 λk1 v1 + c2 λk2 v2 + · · · + cn λkn vn
.. .. ..
. . .
= c1 λk1 v1 + c2 λk2 v2 + · · · + cn λkn vn
This is then the general solution of the difference equation. (Note its similarity to
the general solution of a system of ODE’s in Section 12.) The c’s are determined by
the equation c = S −1 u0 . We can avoid the taking of an inverse by multiplying this
equation by S to obtain the linear system Sc = u0 , which can be solved by Gaussian
elimination. This can also be seen by letting k = 0 in the general solution to obtain
u0 = c1 v1 + c2 v2 + · · · + cn vn , which is again Sc = u0 .
To determine the long-term behavior of uk , let the eigenvalues be ordered so
that |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |. Then from the general solution uk = c1 λk1 v1 + c2 λk2 v2 +
· · · + cn λkn vn it is clear that the behavior of uk as k → ∞ is determined by the size
of λ1 . To be specific,
|λ1 | < 1 ⇒ uk → 0
|λ1 | = 1 ⇒ uk bounded, may have a limit
|λ1 | > 1 ⇒ uk blows up.
diagonal factorization
� � � �� �� �−1
0 2 2 4 2 0 2 4
A= = ,
−.5 2.5 2 1 0 .5 2 1
we have � �k � �
0 2 2
uk =
−.5 2.5 5
� �� �k � �−1 � �
2 4 2 0 2 4 2
=
2 1 0 .5 2 1 5
� �� k �� �
2 4 2 0 c1
=
2 1 0 .5k c2
� � � �
2 4
= c1 (2)k + c2 (.5)k .
2 1
(Of course, we could have written down the solution in this form as soon as we knew
the eigenvaluse and eigenvectors. We really didn’t need the diagonal factorization.
We only have to make sure that there are enough independent eigenvectors
� � � �to insure
� �
2 4 c1 2
that the diagonal factorization exists.) And since the system =
2 1 c2 5
has the solution c1 = 3, c2 = −1, we obtain
� � � � � �
k 2 k 4 6(2)k − 4(.5)k
uk = 3(2) + (−1)(.5) = .
2 1 6(2)k − (.5)k
Example 2: Each year 2/10 of the people in California move out and 1/10 of the
people outside California move in. Let Ik and Ok be the numbers of people inside
and outside California in the kth year. The initial populations are I0 = 20 million
and O0 = 202 million. The relationship between the populations in successive years
is given by
� � � �� �
Ik+1 = .8Ik + .1Ok Ik+1 .8 .1 Ik
or = .
Ok+1 = .2Ik + .9Ok Ok+1 .2 .9 Ok
�
�
Ik
The problem is to find the population distribution uk = and to determine if
Ok
it tends to a stable limit. This is, of course, the problem of solving the difference
equation uk = Auk−1 where A is the matrix above. As usual we find the diagonal
factorization of A
� � � �� �� �−1
.8 .1 1 1 1 0 1 1
=
.2 .9 2 −1 0 .7 2 −1
74 14. Difference Equations and Markov Matrices
This example exhibits two esssential properties that hold in many chemical, bio-
logical, and economic processes: (1) the total quantity in question is always constant,
and (2) the individual quantities are never negative. As a consequence of these two
properties, note that the columns of the matrix A above are nonnegative and add to
one. This can be interpreted as saying that each year all the people inside California
have to either remain side or move out (⇒ the first column adds to one), and all the
people outside California have to either move in or remain outside (⇒ the second
column adds to one). Any matrix with nonnegative entries whose columns add to
one is called a Markov matrix and the process it describes is called a Markov process.
Markov matrices have several important properties, which we state but do not prove
in the following theorem.
Theorem. Any Markov matrix A has the following properties.
(a) All the eigenvalues of A satisfy |λ| ≤ 1.
(b) λ = 1 is always an eigenvalue and there exists an associated eigenvector v1 with
all entries ≥ 0.
(c) If any power of A has all entries positive, then multiples of v1 are the only
eigenvectors associated with λ = 1 and Ak u0 → c1 v1 for any u0 .
We cannot prove this theorem completely with the tools we have developed so far,
but we can make parts of it plausible. First, since the columns of A sum to one, we
have AT v = v where v is the column vector consisting only of one’s. This means
that one is an eigenvalue of AT and therefore of A also since both matrices have
the same eigenvalues (Section 9 Problem 3(a)). Second, assume A has a diagonal
factorization and λ2 , · · · , λn all have absolute value < 1. Then as usual we have
Ak u0 = c1 (1)k v1 +c2 λk2 v2 +· · ·+cn λkn vn , so that clearly Ak u0 → c1 v1 . This is exactly
what happened in the example above. Note also that, since the limiting vector c1 v1
is a multiple of the eigenvector associated with λ = 1, we have A(c1 v1 ) = c1 v1 . We
therefore say c1 v1 is a stable distribution or it represents a steady state. In terms of
14. Difference Equations and Markov Matrices 75
� �� � � �
.8 .1 74 74
the population example this means = . In other words, if the
� .2� .9 148 148
74
initial population distribution is , it will remain as such forever. And if the
148 � �
74
initial population distribution is something else, it will tend to in the long
148
run.
EXERCISES
1. For the difference equation uk = Auk−1 where the matrix A and the starting
vector u0 �are as given
� below, compute
� � uk and comment upon its behavior as k → ∞.
.5 .25 128
(a) A = and u0 = .
� .5 .75 � 64
� �
−2.5 4.5 18
(b) A = and u0 = .
� −1 � 2 � � 10
1 4 −1
(c) A = and u0 = .
1 1 2
2. Suppose multinational companies in the U.S., Japan, and Europe have total assets
of $4 trillion. Initially the distribution of assets is $2 trillion in the U.S., $0 in Japan,
and $2 trillion in Europe. Each year the distribution changes according to
U Sk+1 .5 .5 .5 U Sk
Jk+1 = .25 .5 0 Jk .
Ek+1 .25 0 .5 Ek
(We are implicitly making the completely false assumption that the world economy
is a zero-sum game!)
(a) Find the diagonal factorization of A.
(b) Find the distribution of assets in year k.
(c) Find the limiting distribution of assets.
(d) Show the limiting distribution is stable.
3. A truck rental company has centers in New York, Los Angeles, and Chicago.
Every month half of the trucks in New York and Los Angeles go to Chicago, the
other half stay where they are, and the trucks in Chicago are split evenly between
New York and Los Angeles. Initially the distribution of trucks is 90, 30, and 30 in
New York, Los Angeles, and Chicago respectively.
N Yk+1 ∗ ∗ ∗ N Yk
LAk+1 = ∗ ∗ ∗ LAk .
Ck+1 ∗ ∗ ∗ Ck
76 14. Difference Equations and Markov Matrices
4. Suppose there is an epidemic in which every month half of those are well become
sick, a quarter of those who are sick get well, and another quarter of those who are
sick die. Find the corresponding Markov matrix and find its stable distribution.
Dk+1 ∗ ∗ ∗ Dk
Sk+1 = ∗ ∗ ∗ Sk
Wk+1 ∗ ∗ ∗ Wk
represents how the distribution of genotypes in one generation transforms to the next
under our restrictive mating policy (that is, only blue-eyed males can reproduce).
What is the limiting distribution?
6. Suppose in the setup of the previous problem we allow males of all genotypes to
reproduce. Let G and g respectively represent the proportion of G genes and g genes
in the initial generation. (They also must be nonnegative and sum to one.) Show
that G = p + q/2 and g = r + q/2. Show that the Markov matrix
G .5G 0
A= g .5 G
0 .5g g
14. Difference Equations and Markov Matrices 77
PART 2: GEOMETRY
The presentation so far has been entirely algebraic. Matrices have been added
and multiplied, equations have been solved, but nothing of a geometric nature has
been considered. Yet there is a natural geometric approach to matrices that is at
least as important as the algebraic approach. The mechanics of Gaussian elimination
has produced for us one kind of understanding of linear systems, but for a different
and deeper understanding we must look to geometry.
We will assume some familiarity with, lines, planes, and geometrical vectors
in two and three dimensional physical space. We now want to examine what is
really at the heart of these concepts. To do this, we define an abstract model of
a vector space and then show how this idea can be used to develop concepts and
properties that are valid in all concrete instances of vector spaces. A vector space V
is a collection of objects, called vectors, on which two operations are defined, addition
and multiplication by scalars (numbers). If the scalars are real numbers, the vector
space is a real vector space, and if the scalars are complex numbers, the vector space
is a complex vector space. V must be closed under addition and scalar multiplication.
This means that if x and y are vectors in V and if a is a scalar, then x + y and ax
are also vectors in V . The operations must also satisfy the following rules:
1. x+y =y+x
2. x + (y + z) = (x + y) + z
3. There is a “zero” vector 0 such that x + 0 = x for all x.
4. For each vector x, there is a unique vector −x such that x + (−x) = 0.
5. 1x = x
6. (ab)x = a(bx)
7. a(x + y) = ax + ay
8. (a + b)x = ax + bx
To put meat on this abstract definition we need some examples. For us the most
important vector spaces are the real Euclidean spaces R1 , R2 , R3 , . . .. The space Rn
consists of all n×1 column matrices with the familiar definitions of addition and scalar
multiplication of matrices. (We have been calling such matrices column vectors all
along.) That these spaces are vector spaces follows directly from the properties of
matrices. The first three spaces can be identified with familiar geometric objects: R1
is represented by the real line, R2 by the real plane, and R3 by physical 3-space. The
representationsareclear. For example, the point (x1 , x2 , x3 ) in 3-space corresponds
x1
to the vector x2 in R3 . Likewise, a vector in a higher dimensional Euclidean
x3
15. Vector Spaces, Subspaces, and Span 79
space is completely determined by its components, even though the geometry is hard
to visualize.
x3 x3
x1
(x1 , x2 , x3 )
x2
x3
x2 x2
x1 a point a vector
x1
FIGURE 2
If we take column vectors whose components we allow to be complex numbers, we
obtain the complex Euclidean spaces: C 1 , C 2 , C 3 , · · ·. (We were actually in the world
of complex spaces in Section 13.) Even more abstract vector spaces that cannot be
visualized as any kind of Euclidean space are functions spaces. A particular example
is C 0 [0, 1], the collection of all real valued functions defined and continuous on [0,1].
It is easy to see that C 0 [0, 1] is a real vector space, but it is impossible to see it
geometrically. For now, since we want to keep things as concrete as possible, we will
concentrate on real Euclidean spaces.
One nice thing about the first three Euclidean spaces R1 , R2 , and R3 is that for
them addition and scalar multiplication have simple geometric interpretations: The
sum x + y is the diagonal of the parallelogram with sides formed by x and y. The
difference x−y is the other side of the parallelogram with one side side y and diagonal
x. (Note that the line segment from y to x is not the vector x − y and in fact is not
a vector at all!) The product ax is the vector obtained from x by multiplying its
length by a. And the vector −x has the same length as x but points in the opposite
direction. This geometric desciption even extends to higher dimensional Euclidean
spaces.
x+y
y y
x
x
x-y
FIGURE 3
80 15. Vector Spaces, Subspaces, and Span
It turns out that the vector spaces that we will need most occur inside the
standard spaces Rn . We formalize this idea by saying that a subset S of a vector
space V is a subspace of V if S has the following properties:
1. S contains the zero vector.
2. If x and y are vectors in S, then x + y is also a vector in S.
3. If x is a vector in S and a is any scalar, then ax is also a vector in S.
Since addition and scalar multiplication in S follow the rules of the host space V ,
there is no need to verify the rules for a vector space for S. It is automatically a
vector space in its own right. We now look at some examples of subspaces of Rn .
�
�
x1
Example 1: Consider all vectors in R2 whose components satisfy the equation
x2
x1 + 2x2 = 0. Clearly they are represented by points in R2 that lie on a line through
the origin. These vectors form a subspace of R2 since sums and scalar products of
vectors that satisfy the equation must also satisfy the equation. (We will prove this
using matrix notation later.) Furthermore, we can find all such vectors explicitly.
We just write the equation in matrix form
� �
x
[1 2] 1 = [0]
x2
and solve as usual; that is, we write the array [ 1 2 | 0 ] , run Gaussian elimination
(unnecessary here of �course),
� assign leading and free variables, and express the solu-
−2
tion in vector form c . We get all multiples of one vector, clearly a line through
1
the origin. It is easy to show that such vectors are closed under addition and scalar
multiplication (proved in greater generality later), thereby giving another verification
that we have a subspace.
x2
c -2
1
x1
FIGURE 4
If we change the equation to x1 + 2x2 = 2 we still have a line. Vectors that
satisfy this equation, however, cannot form a subspace since the sum of two such
15. Vector Spaces, Subspaces, and Span 81
vectors does
� �not satisfy
� � the equation. If we solve the equation we obtain vectors of
2 −2
the form +c . Again we see that we do not have a subspace because these
0 1
vectors are not closed under addition. We can also see this geometrically by adding
two vectors that point to the line and noting that the result no longer points to the
line. Even more simply, the line does not pass through the origin, so the zero vector
is not even included.
x1
Example 2: Consider all vectors x2 in R3 whose components satisfy the equation
x3
x1 − x2 + x3 = 0. This equation defines a plane in R3 passing through the origin.
Vectors that satisfy this equation are closed under addition and scalar multiplication,
and the plane is therefore a subspace. We can find all such vectors by writing the
equation in matrix form
x1
[ 1 −1 1 ] x2 = [ 0 ]
x3
and solving. We use the array [ 1 −1 1 | 0 ] to obtain the solution
1 −1
c1 + d 0
0 1
in vector form. This is the vector representation of the plane. Again, vectors of
this form are closed under addition and scalar multiplication and therefore form a
subspace.
-1
x3 0
1
x2
1
1
x1 0
FIGURE 5
82 15. Vector Spaces, Subspaces, and Span
Example 3: This time we want all vectors in R3 that satisfy the two equations
x1 − x2 = 0
x2 − x3 = 0
simultaneously. Again, vectors that satisfy both equations are closed under addition
and scalar multiplication. To find all such vectors we write the equations in matrix
form
� � x1 � �
1 −1 0 0
x2 = .
0 1 −1 0
x3
1
and solve to obtain c 1 . All multiples of this single vector generate a line in R3
1
passing through the origin. This makes sense since each equation defines a plane in
R3 and their intersection must be a line. This also suggests the general fact that the
intersection of any number of subspaces of a vector space is itself a subspace. The
conditions of closure under addition and scalar multiplication are easily verified.
Example 4: Finally, consider all vectors in R4 whose components satisfy the equation
x1 + x2 − x3 + x4 = 0. We might expect that this equation defines some kind of
geometric plane passing through the origin. If we solve it we obtain
−1 1 −1
1 0 0
a + b + c .
0 1 0
0 0 1
These vectors do form a subspace, but it is hard to visualize. Later we will give
precise meaning to the notion that this subspace is a “three dimensional hyperplane
in four space.”
All of the examples above have the same form, which can be expressed more
simply in matrix notation. Each defines a set S as the collection of all vectors x
in Rn that satisfy a system of equations Ax = 0. The problem is to show that S
is a subspace. We can do this directly as follows: If Ax = 0 and Ay = 0, then
A(x + y) = Ax + Ay = 0 + 0 = 0 and A(cx) = c(Ax) = c(0) = 0. Thus vectors that
satisfy the system Ax = 0 are closed under addition and scalar multiplication. The
second way to verify that S is a subspace is to solve the system Ax = 0 as we did
in the examples. The solution in vector form will look like a1 v1 + a2 v2 + . . . + an vn ,
15. Vector Spaces, Subspaces, and Span 83
where the a’s are arbitrary constants and the v’s are vectors. Vectors of this form
are closed under addition since
(a1 v1 + a2 v2 + . . . + an vn ) + (b1 v1 + b2 v2 + . . . + bn vn ) =
EXERCISES
1. Show that C 0 [0, 1] is a real vector space. Show that C 1 [0, 1], which is the set of all
functions that are continuous and have continuous derivatives on [0,1], is a subspace
of C 0 [0, 1].
x1
(a) All vectors x2 that satisfy the equation x1 − x2 + x3 = 0
x3
1 0
(b) All vectors of the form c 1 + d 0
0 1
� �
x1
3. None of the following subsets of vectors in R2 is a subspace. Why?
x2
(a) All vectors where x1 = 1.
(b) All vectors where x1 = 0 or x2 = 0.
(c) All vectors where x1 ≥ 0.
(d) All vectors where x1 and x2 are both ≥ 0 or both ≤ 0.
(e) All vectors where x1 and x2 are both integers.
4. Describe
geometrically
the subspace of R3 spanned by the following vectors.
1 0
(a) 1 , 0
0 1
84 15. Vector Spaces, Subspaces, and Span
1 0
(b) 1 , 0
1 1
1 1
(c) 1 , 1
0 1
1 1 0
(d) 1 , 1 , 0
0 1 1
6. Find vector representations for the following geometric objects, or said another
way, find spanning sets of vectors for each of the following subspaces.
(a) 3x1 − x2 = 0 in R2 .
(b) x1 + x2 + x3 = 0 in R3 .
(c) x1 + x2 + x3 = 0 and x1 − x2 + x3 = 0 in R3 .
(d) x1 − 2x2 + 3x3 − 4x4 = 0 in R4 .
(e) x1 + 2x2 − x3 = 0, x1 − 2x2 + x4 = 0, x2 − x5 = 0 in R5 .
7. Find vector representations for the following geometric objects and describe them.
(a) 3x1 − x2 = 3 in R2 .
(b) x1 + x2 + x3 = 1 in R3 .
16. Linear Independence, Basis, and Dimension 85
It is possible for different sets of vectors to span the same subspace. For example,
it is easy to see geometrically that the two sets of vectors
1 1 0 1 0
1,1,0 and 1,0
0 1 1 0 1
generate the plane x1 − x2 = 0 in R3 . The mathematical reason for this is that the
second vector in the first set can be written as a linear combination of the first and
third vectors:
1 1 0
1 = 1 + 0
1 0 1
Since the second vector can be regenerated from the other two, it is really not needed
and therefore can be dropped from the spanning set. The question arises how in
general can we reduce a spanning set to one of minimal size and still have it span the
same subspace.
x3
0
0
1 1
1
1
x2
1
x1 1
0
FIGURE 6
those vectors could have been the one to have been dropped. Now suppose among the
remaining vectors we find another linear relationship, say 2v1 +v4 −4v5 = 0. Then we
can solve for v1 or for v4 or for v5 and can therefore drop any one of these vectors from
the spanning set. Suppose we drop v4 . We then obtain S = span{v1 , v3 , v5 , v6 }. At
this point suppose there does not exist any linear relationship between the remaining
vectors. Then this process of shrinking the spanning set will have to stop.
a1 v1 + a2 v2 + . . . + an vn = 0
where at least some of the coefficients a1 , a2 , . . . , an are not zero, and we say that
they are linearly independent if the only linear combination of them that equals zero
is the trivial one
0v1 + 0v2 + . . . + 0vn = 0.
If a set of vectors v1 , v2 , . . . , vn is (1) linearly independent and (2) spans a subspace
S, then we say these vectors form a basis for S. (We state these definitions for
subspaces, but, since any vector space is a subspace of itself, they also hold for vector
spaces.)
v2 v3
v1
v2 v2
v1 v1
FIGURE 7
The process described in the example above can now be expressed in the lan-
guage of linear independence and basis as follows: Suppose a set of vectors span a
subspace S. If these vectors are linearly dependent, then there is a nontrivial linear
combination of them that equals zero. In this case one of the vectors can be dropped
from the spanning set. (Any vector that appears in the linear combination with a
nonzero coefficient can be chosen.) The remaining vectors will still span S. This
process of successively dropping dependent vectors can be continued until the set of
spanning vectors is linearly independent. The resulting spanning set is therefore a
basis for S. Although this process of successively dropping vectors from spanning set
16. Linear Independence, Basis, and Dimension 87
is not a practical way to actually find a basis for a subspace, it does prove that every
subspace has a basis.
The importance of a basis to a subspace lies in the fact that not only can every
vector in a subspace be represented as a linear combination of the vectors in its basis,
but, even further, that this representation is unique. If for a basis v1 , v2 , . . . , vn we
have v = a1 v1 +a2 v2 +. . .+an vn and also v = b1 v1 +b2 v2 +. . .+bn vn , then subtraction
gives 0 = (a1 − b1 )v1 + (a2 − b2 )v2 + . . . + (an − bn )vn . But since v1 , v2 , . . . , vn are
linearly independent, all the coefficients (ai − bi ) = 0, and therefore ai = bi . We
conclude that there is only one way to write a vector as a linear combination of basis
vectors.
The two nonzero rows of U , when made into column vectors, will form a basis for S:
1 0
3 0
,
0 1
2 3
Why does this work? First note that, because of the nature of Gaussian operations,
every row of U is a linear combination of the rows of A. Furthermore, since A can
be reconstructed from U by reversing the sequence of Gaussian operations, every
row of A is a linear combination of the rows of U . We can now draw a number of
conclusions. First, the rows of U must span the same subspace as the rows of A.
Second, since there is some linear combination of the rows of A that results in the
88 16. Linear Independence, Basis, and Dimension
third row of U , which is the zero vector, the rows of A must therefore be linearly
dependent. And finally, because of the echelon form of U , the nonzero rows of U are
automatically linearly independent. (See Exercise 4.)
Now that we have a basis for S, we can express any vector in S as a unique
linear combination of the basis vectors. For example, to express the vector
2
6
−3
−5
If the given vector is in S, then as we have seen there will be exactly one solution,
otherwise there will be no solution. We make the extremely important observation
that this equation is equivalent to the linear system
1 0 � � 2
3 0 a 6
=
0 1 b −3
2 3 −5
(See Section 2 Exercise 7), which we can solve by Gaussian elimination. In this case
we obtain the solution a = 2 and b = −3 so that
1 0 2
3 0 6
2 − 3 = .
0 1 −3
2 3 −5
Example 3: Find a basis for the subspace S of all vectors in R4 whose components
satisfy the equation x1 + x2 − x3 + x4 = 0. This was Example 4 of the previous
section. There we found S consisted of all vectors of the form
−1 1 −1
1 0 0
a + b + c .
0 1 0
0 0 1
16. Linear Independence, Basis, and Dimension 89
The three column vectors clearly span S, and in fact they are also linearly indepen-
dent. This is true because if
−1 1 −1 −a + b − c 0
1 0 0 a 0
a + b + c = = ,
0 1 0 b 0
0 0 1 c 0
then clearly a = b = c = 0. These three vectors therefore form a basis for S. This
holds in general. That is, if we solve a homogeneous system Ax = 0 by Gaussian
elimination, set the free variables equal to arbitrary constants, and write the solution
in vector form, then we obtain a linear combination of independent vectors, one for
each free variable. Therefore, in all the examples of the previous section, we were
actually finding not just spanning sets but bases! Furthermore, the comment in
Section 10,“ Our method for finding eigenvectors, which is to solve (A − λI)x = 0
by Gaussian elimination, does in fact produce linearly independent eigenvectors, one
for each free variable,” is justified.
There is no unique choice of a basis for a subspace. In fact, there are infinitely
many possibilities. For example, the each of three sets of vectors
1 0 1 1 1 0
1,0 , 1,1 , 1,0
0 1 0 1 1 1
are bases for the plane x1 − x2 = 0 in R3 . You can no doubt think of many more.
For the Euclidean spaces Rn , however, there is the following natural choice of basis:
1 0 0 0
0 1 0 0
. .
.. , .. , . . . , ... , ...
0 0 1 0
0 0 0 1
These are the vectors that point along the coordinate axes, so we will call them
coordinate vectors. They clearly span and are linearly independent and therefore
form a basis for Rn .
Even though the set of vectors in a basis is not unique, it is true that the number
of vectors in a basis is unique. This number we define to be the dimension of the
subspace. Clearly the Euclidean space Rn has dimension n. It now makes sense to
talk about things like “a three dimensional hyperplane passing through the origin in
four space.” We state this important property of bases formally as:
90 16. Linear Independence, Basis, and Dimension
Theorem. Any two bases for a subspace contain the same number of vectors.
Proof: It is enough to show that in a subspace S the number of vectors in any linearly
independent set must be less than or equal to the number of vectors in any spanning
set. Since a basis is both linearly independent and spans, this means that any two
bases must contain exactly the same number of vectors. We now illustrate the proof
in a special case. The general case will then be clear. Suppose v1 , v2 , v3 span the
subspace S and w1 , w2 , w3 , w4 is some larger set of vectors in S. We show that the
w’s must be linearly dependent. Since the v’s span, each w can be written as a linear
combination of the v’s:
w1 = a11 v1 + a12 v2 + a13 v3
w2 = a21 v1 + a22 v2 + a23 v3
w3 = a31 v1 + a32 v2 + a33 v3
w4 = a41 v1 + a42 v2 + a43 v3 .
In matrix terms this is
. .. .. .. .. .. ..
.. . . . . . . a11 a21 a31 a41
w1 w2 w3 w4 = v1 v2 v3 a12 a22 a32 a42 ,
.. .. .. .. .. .. .. a13 a23 a33 a43
. . . . . . .
which we write as W = V A. Since A has fewer rows than columns, there are nontrivial
solutions to the homogeneous system Ax = 0 (see Section 7 Exercise 5(b)), that is,
there is a nonzero vector c such that Ac = 0. We then have W c = (V A)c = V (Ac) =
V 0 = 0. But the equation W c = 0 when written out is just c1 w1 + c2 w2 + c3 w3 +
c4 w4 = 0 and is therefore a nontrivial linear combination of the w’s. The w’s are
therefore linearly dependent and we are done.
We see that a basis is a maximal independent set of vectors in the sense that it
cannot be made larger without losing independence. It is also a minimal spanning
set of vectors since it cannot be made smaller and still span the space. Note that we
have been implicitly assuming that the number of vectors in a basis is finite. It is
possible to extend the discussion above to the infinite dimensional case, but we will
not do this.
EXERCISES
2. Find bases for the subspaces spanned by the sets of vectors in Exercise 1 above.
In each case indicate the dimension.
3. Find bases for the subspaces defined by the equations in Section 15 Exercise 6. In
each case indicate the dimension.
1 3 0 2
4. Show directly from the definition that the nonzero rows of 0 0 1 3 are
0 0 0 0
linearly independent.
5. Express each vector as a linear combination of the vectors in the indicated sets.
5 3 2
(a) −1 1 , 2
4 2 1
−3 3 2
(b) 1 1 , 2
4 2 1
10 3 2 −1
(c) −2 1,2, 1
8 2 1 −1
� � �� � � ��
8 2 1
(d) , For this case draw a picture!
13 1 2
(c) The set (is) (is not) (might be) a basis for R5 .
7. If the complex vectors v and v are linearly independent over the complex numbers
and if v = x + iy, then show that the real vectors x and y are linearly independent
v+v
over the complex numbers. (Hint: Assume ax + by = 0 and use x = and
2
v−v
x = to show a = b = 0.) This settles a technnical question about complex
2
vectors from Section 13.
17. Dot Product and Orthogonality 93
So far, in our discussion of vector spaces, there has been no mention of “length”
or “angle.” This is because the definition of a vector space does not require such con-
cepts. For many vector spaces however, especially for Euclidean spaces, there is a nat-
ural way to establish these notions that� is often
� quite useful. In two-dimensional space
x1
the physical length of the vector x = is by the Pythagorean Theorem equal to
x2
� x1
x21 + x22 , and in three-dimensional space the physical length of the vector x = x2
x3
�
is by two applications of the Pythagorean Theorem equal to x1 + x2 + x3 . It seems
2 2 2
(There are situations and applications where other measures of length are more ap-
propriate. But this one will be adequate for our purposes.) Note that since our
vectors are column
√ vectors, the length of a vector can also be written in matrix nota-
tion as �x� = xT x. It is easy to see that the length function satisfies the following
two properties:
Note also that if we multiply any vector x by the reciprocal of its length, we get
1
x, which is a vector of length one. We say this is the unit vector in the direction
�x�
of x. With this notion of length we can immediately define the distance between two
points x and y in Rn as �x − y�. This corresponds to the usual physical distance
between points in two and three-dimensional space.
How can we decide if two vectors are perpendicular? In order to help us do this,
we define the dot product x · y of two vectors x and y in Rn as the number
x · y = x1 y1 + x2 y2 + · · · + xn yn .
In matrix notation we can also write x · y = xT y. The dot product satisfies the
following properties:
1. x·y =y·x
2. (ax + by) · z = ax · z + by · z
3. z · (ax + by) = az · x + bz · y
4. x · x = �x�2 .
94 17. Dot Product and Orthogonality
They can be verified by direct computation. The second and third properties follow
from the distributivity of matrix multiplication. Other terms for dot product are
scalar product and inner product.
Now we will see how to determine if two vectors x and y in Rn are perpendicular.
First note that, assuming they are independent, they span a two-dimensional sub-
space of Rn . When endowed with the length function � �, this subspace satisfies all
the axioms of the Euclidean plane. We therefore have all the constructs of Euclidean
geometry in this plane including lines, circles, lengths, and angles. In particular, we
have the Pythagorean Theorem, which says that the sides of a triangle are in the
relation a2 + b2 = c2 if and only if the angle opposite side c is a right angle. (It goes
both directions; check your Euclid!)
|| y - x ||
|| y ||
x
|| x ||
FIGURE 8
If we write this equation for the triangle formed by the two vectors x and y in vector
notation and use the properties of the dot product, we have
Even though it is not necessary for linear algebra, the dot product can also tell us
the angle between any two vectors, orthogonal or not. For this we need the Law of
Cosines, which also appears in Euclid and which says that the sides of any triangle
are in the relation a2 + b2 = c2 + 2ab cos θ where θ is the angle opposite side c. Again
writing this equation for the triangle formed by the two vectors x and y in vector
notation �x�2 + �y�2 = �x − y�2 + 2�x��y� cos θ and computing (Exercise 9) we
obtain x · y = �x��y� cos θ or
x·y
cos θ = .
�x��y�
|| y - x ||
|| y ||
!
|| x ||
FIGURE 9
2 1
Example 2: The angle between the vectors 2 and 3 is determined by cos θ =
1 2
10
√ √ = 0.89087, so θ = arccos(0.89087) = 27.02◦ .
9 14
We are now in a position to compute the projection of one vector onto another.
Suppose we wish to find the vector p which is the geometrically perpendicular pro-
jection of the vector y onto the vector x. To be precise, we should say that we are
seeking the projection p of the vector y onto the direction defined by x or onto the line
96 17. Dot Product and Orthogonality
generated by x. Since we can do geometry in the plane defined by the two vectors x
and y, we immediately see from the figure below that p must have the property that
x ⊥ (y − p), so 0 = x · (y − p) = x · y − x · p or x · p = x · y. Also, since p lies on the
line generated by x, it must be some constant multiple of x, so p = cx. Substituting
this into the previous equation we obtain c(x · x) = x · y or c = (x · y)/(x · x). The
final result is therefore
x·y
p= x.
�x�2
We should think of the vector p as the component of y in the direction of x. In fact,
if we write y = p + (y − p), we have resolved y into the sum of its component in the
direction of x and its component perpendicular to x.
y
p
y-p
FIGURE 10
5
Example 3: To resolve y = 5 into its components in the direction of and
−2
2 2 4
18
perpendicular to 2 , just compute p = 2 = 4 and obtain
9
1 1 2
4 5 4 4 1
y = p + (y − p) = 4 + 5 − 4 = 4 + 1
2 −2 2 2 −4
to check the orthogonality of subspaces if we have spanning sets for each subspace.
Just verify that every vector in one spanning set is orthogonal to every vector in
the other. For example, if V = span{v1 , v2 } and W = span{w1 , w2 } and the v’s are
orthogonal to the w’s, then any vector in V is orthogonal to any vector in W , because
(a1 v1 + a2 v2 ) · (b1 w1 + b2 w2 ) = a1 b1 v1 · w1 + a2 b1 v2 · w1 + a1 b2 v1 · w2 + a2 b2 v2 · w2 = 0
We make one more definition. The set W of all vectors perpendicular to a
subspace V is called the orthogonal complement of V and is written as W = V ⊥ . It is
easy to see that W is in fact a subspace (Exercise 12). It is also follows automatically,
but not so easily, that V is the perpendicular complement of W or V = W ⊥ (Exercise
13). In other words, the relationship is symmetric, and we are justified in saying that
V and W are orthogonal complements of each other. For example, the xy-plane
and the z-axis are orthogonal complements, but the x-axis and the y-axis are not.
Orthogonal complements are easy to compute.
Example
4: Find the orthogonal complement of the line generated by the vector
1
2 , and find the equations of the line. Here the first problem is to find all vectors
3
y orthogonal to the given generating vector, that is, to find all vectors y whose dot
product with the given vector is zero. Expressed in matrix notation this is just
y1
[ 1 2 3 ] y2 = 0.
y3
We solve this linear system and obtain
−2 −3
y = c 1 + d 0 .
0 1
The two vectors above therefore span the plane that is the orthogonal complement
of the given line. In fact, these two vectors are a basis for that plane. Now to find
the equations of the line itself, note that a vector x lies in the line if and only if x is
orthogonal to the plane we just found. In other words, the dot product of x with each
of the two vectors that generate that plane must be zero. Therefore x must satisfy
the equations −2x1 + x2 = 0 and −3x1 + x3 = 0. These are then the equations that
define the given line.
1
Example 5: Find the equations of the plane generated by the two vectors 1 and
1
1
−1 . Again we look for all vectors orthogonal to the generating vectors. We
1
98 17. Dot Product and Orthogonality
Note that in Section 15 we learned how to go from the equation form of a subspace
to its vector form. We now know how to go in the reverse direction, that is, from its
vector form to its equation form.
EXERCISES
1 −6
2 −2
1. For the two vectors x = and y =
−2 2
−4 9
(a) Find their lengths.
(b) Find the unit vectors in the directions they define.
(c) Find the angle between them.
(d) Find the projection of y onto x.
(e) Resolve y into components in the direction of and perpendicular to x.
� �
2
2. In R find the point on the line generated by the vector
2
closest to the point
3
2 ).
(8, 11
� �
α
3. Find all vectors orthogonal to in R2 .
β
2
4: Show that the line generated by the vector 2 is orthogonal to the plane gener-
1
1 2
ated by the two vectors 1 and 0 .
−4 −4
17. Dot Product and Orthogonality 99
7. True or false?
(a) If two subspaces V and W are orthogonal, then so are their orthogonal comple-
ments.
(b) If U is orthogonal to V and V is orthogonal to W , then U is orthogonal to W .
1
8. Show that the length of x is one.
�x�
1
10. Show x · y = (�x + y�2 − �x − y�2 ).
4
11. Show that if the vectors v1 , v2 , v3 are all orthogonal to one another, then they
must be linearly independent. (Hint: Write c1 v1 + c2 v2 + c3 v3 = 0 and show the c’s
are all zero by dotting both sides with each of the v’s.) Of course this result extends
to arbitrary numbers of vectors v1 , v2 , . . . , vn .
and by counting leading and free variables in the system Ax = 0 show that
V ⊥ = W has a basis w1 , w2 , w3 , w4 , w5 .
(b) Let
· · · w1 ···
· · · w2 ···
B = · · · w3 ···,
· · · w4 ···
· · · w5 ···
and by counting leading and free variables in the system Bx = 0 show that W ⊥
has dimension 3.
(c) Observe that each of the three vectors v1 , v2 , v3 satisfy Bx = 0 and therefore are
in W ⊥ . Since they are also independent, conclude that W ⊥ = span{v1 , v2 , v3 } =
V.
Many problems in the physical sciences involve transformations, that is, the
way in which input data is changed into output data. It often happens that the
transformations in question are linear. In this section we present some of the ba-
sic terminology and facts about linear transformations. As usual we consider only
Euclidean spaces.
We define a transformation to be a function that takes points in Rn as input and
produces points in Rm as output, or, in other words, maps points in Rn to points in
Rm . For example, S(x1 , x2 ) = (x21 , x2 + 1) is a transformation that maps R2 to R2 .
Instead of mapping points to points, we can think ��of transformations
�� � � as mapping
x1 x21
vectors to vectors. We can therefore write S as S = . This is the
x2 x2 + 1
view we will take from now on. The picture we should keep in mind is that in general
a transformation T maps the vector x in Rn to the vector T (x) in Rm .
Rm
Rn T(x)
T
x
FIGURE 11
1. T (x + y) = T (x) + T (y)
2. T (cx) = cT (x)
T(y)
y T(x + y)
x+y
T(x)
FIGURE 12
It is an immediate consequence of the definition that a linear transformation takes
subspaces to subspaces. In other words, if S is a subspace of Rn , then T (S), which is
the set of all vectors of the form T (x), is a subspace of Rm . It is a further consequence
of the definition that every linear transformation must have a certain special form.
We now determine what that form must be.
First, we can create linear transformations by using matrices. Suppose A is
an m × n matrix. Then we can define the transformation T (x) = Ax. Because of
the way matrix multiplication works, the input vector x is in Rn and the output
vector Ax is in Rm . This transformation is linear because T (x + y) = A(x + y) =
Ax + Ay = T (x) + T (y) and T (cx) = A(cx) = cAx = cT (x), which both follow
from the properties of matrix multiplication. Therefore every m × n matrix induces
a linear transformation from Rn to Rm .
Second, every linear transformation is induced by some matrix. Suppose T is a
linear transformation that maps from Rn to Rm . Then we can write
18. Linear Transformations 103
x1 1 0 0
x2 0 1 0
T
... = T
x1 . + x2 . + · · · + xn .
.. .. ..
xn 0 0 1
1 0 0
0 1 0
= x1 T . + x2 T . + · · · + xn T
.. ..
.
..
0 0 1
a11 a12 a1n
a21 a22 a2n
= x1
.. + x .. + · · · + x ..
. . .
2 n
(The second equality follows from the linearity of T . The fourth equality follows
from Section 2 Exercise 7.) Therefore every linear transformation T has a matrix
representation as T (x) = Ax.
Note also that
x1 a11 x1 + a12 x2 + . . . + a1n xx
x2 a21 x1 + a22 x2 + . . . + a2n xn
T
... = .. .
.
xn am1 x1 + am2 x2 + . . . + amn xn
So every linear transformation must have this form. From now on, we will forget
about the formal linear transformation T and instead just consider the matrix A as
a transformation from one Euclidean space to another. Note that A is completely
determined by what it does to the coordinate vectors. This follows either from
the computation above or just from matrix
multiplication. For example, if A =
� � 1 � � 0 � � 0 � �
3 −1 1 3 −1 1
, then A 0 = , A1 = , and A 0 = .
1 5 2 1 5 2
0 0 1
Let S be a linear transformation from Rn to Rq and T be a linear transformation
from Rq to Rn . The the composition T ◦ S is defined to be the transformation
(T ◦ S)(x) = T (S(x)) that takes Rn to Rm . It is a linear transformation since
T (S(x + y)) = T (S(x) + (S(y)) = T (S(x)) + T (S(y)) and T (S(c)) = T (cS(x)) =
cT (S(x)). If S has matrix A and T has matrix B, then the question arises, what is
104 18. Linear Transformations
� � � � � �
1 0 x x
Example 4: Let A = , then A = . This matrix perpendicularly
0 0 y 0
projects the plane R2 onto the x-axis.
� � � � � � � � � �
0 −1 1 0 0 −1
Example 5: Let A = , then A = and A = . Clearly A
1 0 0 1 1 0
rotates the coordinate vectors by 90◦ , but does this mean that it rotates every vector
by this amount? Yes, as we will see in the next example.
Example 6: Let’s consider the transformation that rotates the plane R2 by an angle
θ. The first thing we must do is to show that this transformation is linear. Since any
rotation T takes the parallelogram defined by x and y to the congruent parallelogram
defined by T (x) and T (y), it takes the vertex x + y to the vertex T (x) + T (y).
Therefore it satisfies the property T (x) + T (y) = T (x + y), which is Property 1 for
linear transformations. Property 2 can be verified in the same way.
T(x + y)
T(x) y
x+y
T(y)
FIGURE 13
T 0 = -sin ! 0
1 cos ! 1
1 cos !
T =
0 sin !
!
!
1
0
FIGURE 14
Example 7: Now consider reflection across an arbitrary line through the origin. A
reflection clearly takes the parallelogram defined by x and y to the congruent par-
allelogram defined by T (x) and T (y) and therefore satisfies Property 1. Property 2
can be verified in the same way.
T(x + y)
T(y)
x+y
T(x)
FIGURE 15
A reflection is therefore a linear transformation and so has a matrix representation
determined by where it takes the � coordinate
� � �vectors. �For� example,
� � if A reflects
1 0 0 1
R2 across the line y = x, then A = and A = and therefore
� � 0 1 1 0
0 1
A= .
1 0
�T (x + y)�, and since these vectors all lie on the same line and point in the same
direction, we conclude that T (x) + T (y) = T (x + y). The other two cases when the
line passes through the parallelogram or when x and y project to opposite sides of
the origin are similar. Property 2 can be verified in the same way.
x+y
T(x + y)
T(x)
T(y)
FIGURE 16
A projection is therefore a linear transformation and so has a matrix representation
determined by where it takes the coordinate vectors. For example, if A is the matrix
� � �1� � � �1�
1 2 0 2
of the projection of R2 onto the line y = x, then A = 1 and A = 1 ,
0 2 1 2
�1 1�
2 2
and therefore A = 1 1 .
2 2
� �
1 2
Example 9: Let A = . In this case, even though we know where the coordinate
0 1
vectors �go,�it is�still not�easy to see what the transformation does. But if we fix y = c
x x + 2c
then A = shows us that the horizontal line at level c is shifted 2c units
c c
to the right (if c is positive, to the left otherwise). This is a horizontal shear.
FIGURE 17
108 18. Linear Transformations
� �
4 2
Example 10: Let A = . Again the images of the coordinate vectors do not
−1 1
tell us much. It turns out that to see the geometrical effect of this matrix we will
need to compute its diagonal factorization. We will take up this approach in Section
22. Most matrices are in fact like this one or worse requiring even more sophisticated
factorizations.
Example 11: First rotate the plane R2 by 90◦ and then reflect across the 45◦ line. This
is a typical
� example
� of the composition of two linear transformations.
� � The rotation is
0 −1 0 1
A= (Example 5) and the reflection is B = (Example 7). To apply
1 0 1 0
them in the correct order to an arbitrary vector x we must write B(A(x)) which by
the associativity of matrix multiplication is the same as (BA)x. So we just compute
the product � �� � � �
0 1 0 −1 1 0
BA = = ,
1 0 1 0 0 −1
which is a reflection across the x-axis. Note that it is extremely important to perform
the multiplication in the correct order. The reverse order would result in
� �� � � �
0 −1 0 1 −1 0
AB = = ,
1 0 1 0 0 1
EXERCISES
5. Describe how the following two matrices transform the grid consisting of horizontal
and vertical
� � lines at each integral point of the x and y-axes.
1 0
(a)
3 1
� �
3 1
(b)
1 3
� �
1 1
6. The matrix maps R2 onto the x-axis but is not a projection. Why?
0 0
7. In each case below find the matrix that represents the resulting transformation
and describe it geometrically.
(a) Transform R2 by first rotating by −90◦ and then reflecting in the line x + y = 0.
(b) Transform R2 by first rotating by 30◦ , then reflecting across the 135◦ line, and
then rotating by −60◦ .
(c) Transform R3 by first rotating the xy-plane, then the xz-plane, then the yz-
plane, all through 90◦ .
cos 2θ sin 2θ
makes an angle θ with the x-axis is . (Hint: Compute where the
sin 2θ − cos 2θ
coordinate vectors go.)
12. Prove the converse of the result of the previous exercise, that is, prove the product
of any two reflections is a rotation. (Use the results of Exercises 8 and 9.)
13. Find the matrix that represents the linear transformation T (x1 , x2 , x3 , x4 ) =
(x2 , x4 + 2x3 , x1 + x3 , 2x3 ).
1 � � 0 � � 0 � �
4 0 −3
14. If T 0 = , T 1 = , T 0 = , then find the matrix of T .
5 −2 1
0 0 1
� � � � � � � �
5 6 3 7
15. If T = ,T = , then find its matrix.
4 −2 2 1
16. If T rotates R2 by 30◦ and dilates it by a factor of 5, then find its matrix.
row(A)
col(A)
FIGURE 18
Now we will show how to compute each of these subspaces for any given ma-
trix. By “compute these subspaces”, we mean “find bases for these subspaces.” To
illustrate, we will use the example
1 2 0 4 1
A = 0 0 0 2 2.
1 2 0 6 3
1. row(A): To find a basis for row(A), we use the method of Section 16 Example
2. Recall that to find a basis for a subspace spanned by a set of vectors we just
write them as rows of a matrix and then do Gaussian elimination. In this case, the
spanning vectors are already the rows of a matrix, so running Gaussian elimination
(actually Gauss-Jordan elimination) on A we obtain
1 2 0 0 −3
U = 0 0 0 2 2 .
0 0 0 0 0
112 19. Row Space, Column Space, Null Space
Since row(A) = row(U ), the two nonzero independent rows of U form a basis for
row(A), so
1 0
2 0
row(A) has basis 0 , 0 .
0
2
−3 2
2. col(A): We have just seen that A and U have the same row spaces. Do they
also have the same column spaces? No, this is not true! What is true is that the
columns of A that form a basis for col(A) are exactly those columns that correspond
to the columns of U that form a basis for col(U ). In this example they are columns
1 and 4. The reason for this is as follows: The two systems Ac = 0 and U c = 0
have exactly the same solutions. Furthermore, linear combinations of the columns
of A can be written as Ac and of U as U c. This implies that independence and
dependence relations between the columns of U correspond to independence and
dependence relations between the corresponding columns of A. Therefore, since the
pivot columns of U are linearly independent (because no such vector is a linear
combination of the vectors that preceed it), the same is true of the pivot columns
of A. And likewise, since every nonpivot column of U is a linear combination of the
pivot columns, the same is true of A. That is, for the U of our example, columns 1
and 4 are independent, and any other columns are dependent on these two (Exercise
8). Therefore the same can be said of A. We conclude that
1 4
col(A) has basis 0 , 2 .
1 6
3. null(A): We want to find a basis for all solutions of Ax = 0. But we have done
this before (Section 16 Example 3). We just solve U x = 0 and obtain
−2 0 3
1 0 0
x = a 0 + b1 + c 0 .
0 0 −1
0 0 1
We conclude that
−2 0 3
1 0 0
null(A) has basis 0 , 1 , 0 .
0
0 −1
0 0 1
19. Row Space, Column Space, Null Space 113
-2
1
0 R3
R5
0 1
0 2
null(A) 0
0 col(A)
4
-3 2
0
0 A 3
1 3
0 0
0 0 1
-1 0
1 1
row(A)
0
0
0
2
2
FIGURE 19
2. The number of free variables in U determines the number of vectors in the basis of
null(A). Since (the number of leading variables) + (the number of free variables) =
n, we have
dim(row(A)) + dim(null(A) = n.
3. If x is any vector in null(A), then Ax = 0, which when written out looks like
row 1 of A x1 0
row 2 of A x2 0
A=
.. . = . .
.. ..
.
row m of A xn 0
Because of the way matrix multiplication works, this means that x is orthogonal
to each row of A and therefore to row(A). Therefore null(A) is the orthogonal
⊥
complement of row(A). We write row(A) = null(A) and conclude that null(A) and
row(A) are orthogonal complements of each other. (See Section 17.) This is the
114 19. Row Space, Column Space, Null Space
reason that Figure 18 was drawn the way that it was, that is, with the line null(A)
perpendicular to the plane row(A).
4. As we have seen many times before, the equation Ax = b can be written as
a11 a12 a1n
a21 a a
x1 + x2 22 + · · · + xn 2n
.
.. .
.. ... = b.
am1 am2 amn
This immediately says that the system Ax = b has a solution if and only if b is in
col(A). Another way of saying this is that col(A) consists of all those vectors b for
which there exists a vector x such that Ax = b, or in other words col(A) is the image
of Rn under the transformation A.
5. If x0 is a solution of the system Ax = b, then any other solution can be written
as x0 + w where w is any vector in null(A). For suppose y is another solution, then
A(x0 − y) = Ax0 − Ay = b − b = 0 ⇒ x0 − y = w where w is some vector in null(A),
so we have y = x0 + w. Note that when we solve Ax = b by Gaussian elimination,
we get all solutions expressed in this form automatically.
6. Suppose null(A) = {0}, that is, the null space of A consists of only the zero
vector. (In this case say that the null space is trivial, not empty. A null space can
never be empty. It must always contain at least the zero vector.) Then A has several
important properties which we summarize in a theorem:
Theorem. For any matrix A the following statements are equivalent.
(a) null(A) = {0}
(b) A is one-one (that is, A takes distinct vectors to distinct vectors).
(c) If Ax = b has a solution x, it must be unique.
(d) A takes linearly independent sets to linearly independent sets.
(e) The columns of A are linearly independent.
Proof: We prove (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (a)
(a) ⇒ (b): x �= y ⇒ x − y �= 0 ⇒ Ax − Ay = A(x − y) �= 0 ⇒ Ax �= Ay.
(b) ⇒ (c): Suppose Ax = b and Ay = b, then Ax = Ay ⇒ x = y.
(c) ⇒ (d): If v1 , v2 , . . . , vn are linearly independent, then c1 Ax1 + c2 Ax2 + · · · +
cn Axn = 0 ⇒ A(c1 x1 + c2 x2 + · · · + cn xn ) = 0 = A0 ⇒ c1 x1 + c2 x2 + · · · + cn xn =
0 ⇒ c1 = c2 = · · · = cn = 0.
(d) ⇒ (e): A maps the set of coordinate vectors, which are independent, to the set
of its own columns, which therefore must also be independent.
(e) ⇒ (a): The equation Ax = 0 can be interpreted as a linear combination of the
columns of A equaling zero. Since the columns of A are independent, this can happen
only if x = 0. This ends the proof.
If A is a square matrix, this theorem can be combined with the theorem of Section
9 as follows.
19. Row Space, Column Space, Null Space 115
EXERCISES
1. For each matrix below find bases for the row, column, and null spaces and fill in the
blanks in the sentence “As a linear transformation, A maps from dimensional
Euclidean space to dimensional Euclidean space and has rank equal to .”
� �
1 2
(a)
2 4
� �
1 2
(b)
2 3
2 4 2
(c) 0 4 2
2 8 4
3 2 −1
6 3 5
(d)
−3 −1 8
0 −1 7
1 2 −1 −4 1
(e) 2 4 −1 −3 5
3 6 −3 −12 3
2 8 4 0 0
2 7 2 1 −2
(f)
−2 −6 0 −1 6
0 2 4 −2 4
116 19. Row Space, Column Space, Null Space
1
2. The 3 × 3 matrix A has null space generated by the vector 1 and column space
1
equal to the xy-plane.
−3 −3
(a) Is −3 in null(A)? What does A −3 equal?
−3 −3
−3
(b) Is 13 in col(A)? Is it in the image of A?
0
−5
(c) Is Ax = −5 solvable?
2
−4
(d) Is 6 in row(A)?
−2
1
3. The 2 × 3 matrix A has row space generated by the vector 2 and column space
� � 9
2
generated by the vector
−1
−2
(a) Is −4 in row(A)?
−8
−2
(b) Is −1 in null(A)?
2
(c) Find a basis for null(A).
� �
−3
(d) Is in col(A)?
3
� �
−4
(e) Is Ax = solvable?
2
4. Describe the row, column, and null spaces of the following kinds of transformations
of R2 .
(a) rotations
(b) reflections
(c) projections
19. Row Space, Column Space, Null Space 117
5. For each case below explain why it is not possible for a matrix to exist with the
stated properties.
1
(a) Row space and null space both contain the vector 2 .
3
3 1
(b) Column space has basis 2 and null space has basis 3 .
1 1
(c) Column space = R and row space = R .
4 3
6. Show that if null(A) = {0}, then A takes subspaces into subspaces of the same
dimension. In particular, A takes all of Rn into an n-dimensional subspace of Rm .
Example 1: We want to fit a straight line y = c + dx to the data (0, 1), (1, 4), (2, 2),
(3, 5). This means we must find the c and d that satisfy the equations
c+d·0=1
c+d·1=4
c+d·2=2
c+d·3=5
or the system
1 0 � � 1
1 1 c 4
= .
1 2 d 2
1 3 5
This is an example of a curve fitting problem.
y (3,5)
(1,4)
y = c + dx
(2,2)
(0,1)
FIGURE 20
NO N2 O NO2 N2 O3 N2 O5 N2 O4
30.006 44.013 46.006 76.012 108.010 92.011
20. Least Squares and Projections 119
We want to use this information to compute the atomic weights of nitrogen and
oxygen as accurately as possible. This means that we must find the N and O that
satisfy the equations
1 · N + 1 · O = 30.006
2 · N + 1 · O = 44.013
1 · N + 2 · O = 46.006
2 · N + 3 · O = 76.012
2 · N + 5 · O = 108.010
2 · N + 4 · O = 92.011
or the system
1 1 30.006
2 1 � � 44.013
1 2 N 46.006
= .
2 3 O 76.012
2 5 108.010
2 4 92.011
How do we find the x that minimizes �Ax − b�? First we view A as a map from
R to Rm . Then b and col(A) both lie in Rm . Note that b does not lie in col(A)
n
Rn Ax-b
x A
Ax
col(A)
FIGURE 21
Our problem is to find the Ax that makes Ax − b as short as possible, or said another
way, to find a vector of the form Ax that is as close to b as possible. Intuitively
this occurs when Ax − b is orthogonal to col(A). (For a proof see Exercise 10.) And
this holds if and only if Ax − b is orthogonal to the columns of A, that is, if the
dot product Ax − b with each column of A is zero. If we write the columns of A
horizontally, we can express these conditions all at once as
0
col 1 of A ..
col 2 of A . 0
.. Ax − b
=
. ... .
..
col n of A . 0
AT Ax = AT b.
These are called the normal equations for the least squares problem Ax = b. They
form an n × n linear system that can be solved by Gaussian elimination. We sum-
marize: The least squares solution to the overdetermined inconsistent linear system
Ax ≈ b is defined to be that vector x that minimizes the length of the vector Ax − b.
It is found as the exact solution to the normal equations AT Ax = AT b. We can now
solve the two problems at the beginning of this section.
20. Least Squares and Projections 121
It is clear that the matrix AT A is square and symmetric (see Section 2 Exercise
6(e)). But when we said that the least squares solution is the solution of the normal
equations, we were implicitly assuming that the normal equations could be solved,
that is, that AT A is nonsingular. This is true if the columns of A are independent,
because in that case we have AT Ax = 0 ⇒ xT AT Ax = 0 ⇒ (Ax)T (Ax) = 0 ⇒
�Ax�2 = 0 ⇒ Ax = 0 ⇒ x = 0. But if the columns of A are not independent, then
AT A will be singular. In fact, for large scale problems AT A is usually singular, or is
so close to being singular that Gaussian elimination tends to give very inaccurate an-
swers. For such problems it is necessary to use more numerically stable methods such
as the QR factorization (see the next section) or the singular value decomposition.
122 20. Least Squares and Projection
In solving the least squares problem, we have inadvertently found the solution
to a seemingly unrelated problem: the computation of projection matrices. From
our geometrical considerations, the vector p = Ax is the orthogonal projection of the
vector b onto the subspace col(A). Solving the normal equations for x we obtain x =
(AT A)−1 AT b, and putting this expression back into p we obtain p = A(AT A)−1 AT b.
Therefore, to find the projection of any vector b onto col(A), we simply multiply b
by the matrix P = A(AT A)−1 AT . We conclude that
P = A(AT A)−1 AT
Example
3:Find
the matrix that projects R onto the plane spanned by the vectors
3
1 2
0 and 1 . First line up the two vectors (in any order) to form the matrix
1 1
1 2
A = 0 1 , and then compute
1 1
P = A(AT A)−1 AT
−1
1 2 � � 1 2 � �
1 0 1 1 0 1
= 0 1 0 1
2 1 1 2 1 1
1 1 1 1
1 2 �� ��−1 � �
2 3 1 0 1
= 0 1
3 6 2 1 1
1 1
1 2 � �� �
2 −1 1 0 1
= 0 1
−1 2
2 1 1
1 1 3
2 1 1
3 3 3
1 2
− 13
= 3 3 .
1
3 − 13 2
3
Just as in the case of least squares, the columns of A must be independent for this
to work; that is, the two given vectors must form a basis for the subspace to be
projected onto.
Note that P in the example above is symmetric. It turns out that this is true of any
projection matrix (Exercise 9(a)). Furthermore, projection matrices also satisfy the
property P 2 = P (Exercise 9(a)). These observations also go in the other direction;
20. Least Squares and Projections 123
that is, any matrix P that satisfies P T = P and P 2 = P is the projection matrix
of Rm onto col(P ). We need only verify that P x − x is orthogonal to col(P ) for
any vector x. We check all the required dot products at once with the computation
P T (P x − x) = P (P x − x) = P 2 x − P x = P x − P x = 0.
Projection matrices can be used to compute reflection matrices. First we have
to precisely define what we mean by a reflection. Let S be a subspace of Rm .
Any vector x can be written as x = P x + (x − P x) where P x is the projection
of x onto S and x − P x is the component of x orthogonal to S. If we reverse the
direction of x − P x we get a new vector y = P x − (x − P x) which we define to
be the reflection of x across the subspace S. Note that y can then be written as
y = P x − x + P x = 2P x − x = (2P − I)x, and therefore the matrix R = 2P − I
reflects Rm across the subspace S.
x
x - Px S
Px -(x - Px)
FIGURE 22
The equation x = P x + (x − P x) above also shows that any vector x can be re-
solved into a component in S and a component in S ⊥ . Furthermore, since orthogonal
vectors are linearly independent (Section 17 Exercise 11), this resolution is unique.
From this we can see more precisely how any matrix A behaves as a linear transfor-
mation from one Euclidean space to another. Let S = null(A), so that S ⊥ = row(A).
Then any vector x can be expressed uniquely as x = n + r, where n is in null(A) and
r is in col(A). Applying A to x we obtain Ax = An + Ar = 0 + Ar. This shows that
A essentially projects x onto r in row(A) and then maps r to a unique vector Ar
in col(A). Any matrix can therefore be visualized as a projection onto its row space
followed by one-one linear transformation of its row space onto its column space.
EXERCISES
1. Solve Ax = b in the least squares sense for the two cases below.
1 0 5
0 1 4
(a) A = and b =
1 1 6
1 2 4
124 20. Least Squares and Projection
1 4 −1 −1
2 3 1 2
(b) A = and b =
0 3 1 −1
1 2 −1 1
2. For each case below find the line or surface of the indicated type that best fits the
given data in the least squares sense.
(a) y = ax: (1, 5), (2, 3), (−1, 3), (3, 4), (0, 1)
(b) y = a + bx: (0, 0), (1, 1), (3, 2), (4, 5)
(c) z = a + bx + cy: (0, 1, 6), (1, 0, 5), (0, 0, 1), (1, 1, 6)
(d) z = a + bx2 + cy 2 : (0, 1, 10), (0, 2, 5), (−1, 1, 20), (1, 0, 15)
(e) y = a + bt + ct2 : (1, 5), (0, −6), (2, 8), (−1, 5)
(f) y = a + b cos t + c sin t: (0, 3), ( π2 , 5), (− π2 , 3), (π, −3)
3. We want to use the following molecular weights of sulfides of copper and iron to
compute the atomic weights of copper, iron, and sulfur.
Cu2 S CuS FeS Fe3 S4 Fe2 S3 FeS2
159.15 95.61 87.92 295.81 207.90 119.98
Express this problem as an overdetermined linear system. Write down the normal
equations. Do not solve them!
6. Find the reflection matrix of R3 across the plane in Exercise 4(c) above.
8. Show that as transformations the matrices below have the following geometric
interpretations.
� �
−1 0
(a) (i) Reflection through the origin, (ii) rotation by π radians, (iii)
0 −1
reflection across the x-axis and reflection across the y-axis.
−1 0 0
(b) 0 −1 0 (i) Reflection across the z-axis and (ii) rotation by π radians
0 0 1
around the z-axis.
−1 0 0
(c) 0 −1 0 (i) Reflection through the origin and (ii) rotation by π radians
0 0 −1
around the z-axis and reflection across the xy-plane.
S w
FIGURE 23
126 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
We say that a square matrix Q is an orthogonal matrix if its columns are orthonormal.
(It is not called an orthonormal matrix even though that might make more sense.)
Clearly the columns of Q are orthonormal if and only if QT Q = I, which can therefore
be taken as the defining condition for a matrix to be orthogonal.
Example 2: Here are some orthogonal matrices. These are especially nice ones be-
cause they don’t involve square roots.
2 2 3 6
�3 4
� 3 3 − 13 −7 2
7 7
5 5 2 − 1 2 6 3 2
4 3 3 3 3 7 7 7
5 −5 1 2 2 2
−3 3 3 7 − 7 37
6
1
4 8 1 10 10 5 2 − 12 − 12 − 12
−9 −9 −1
7
9
4 4 10
15 15
11
15
2 2
1
2 − 12 − 12
9 9 − 9 15 − 15 15 1 1
4 1 8 5 2 14 − 2 − 12 1
2 −2
9 9 9 − 15 − 15 15
− 12 − 12 − 12 1
2
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 127
−2 2 7
v1 = −2 , v2 = 8 ,
v3 = 7 .
1 2 1
We will first find an orthogonal basis p1 , p2 , p3 , and then normalize it to get the
128 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
p3
v3
p2
v2
v 1 = p1
FIGURE 24
The third step is to find a vector p3 that is orthogonal to p1 and p2 and such that
span{p1 , p2 , p3 } = span{v1 , v2 , v3 }. We can accomplish this by defining p3 to be the
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 129
v1 is in span{q1 }
v2 is in span{q1 , q2 }
v3 is in span{q1 , q2 , q3 }.
This shows that any square matrix A with independent columns has a factorization
A = QR into an orthogonal Q and an upper triangular R. In fact, we can make an
even more general statement. Suppose that we had started with the matrix
−2 2
B = −2 8
1 2
We see that B = QR where now Q has orthonormal columns but is not orthogonal!
Fortunately QT Q = I is still true so the method above to find R still works. We
conclude that any matrix A with independent columns has a factorization of the
form A = QR where Q has orthonormal columns and R is upper triangular. This is
called the QR factorization and is the third great matrix factorization that we have
seen (after the LU and diagonal factorizations). Actually, it is possible to obtain
a QR-like factorization for any matrix whatever, but we will stop here. Note that
Gram-Schmidt process, on which all this is based, is the first truly new computational
technique we have had since we first introduced Gaussian elimination! In fact, there
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 131
2n3
are efficient algorithms that can perform Gram-Schmidt in operations, which
3
makes it competitive with Gaussian elimination in many situations.
The QR factorization has a wide range of applications. We mention two. For the
first, recall an overdetermined inconsistent system Ax = b has a least squares solution
given by the normal equations AT Ax = AT b. Suppose we have the QR factorization
A = QR. Then plugging into the normal equations we obtain (QR)T QRx = (QR)T b
or RT QT QRx = RT QT b or RT Rx = RT QT b. Since RT is nonsingular (it’s triangular
with nonzeros down its diagonal), we can multiply through by (RT )−1 to obtain
Rx = QT b.
This equation is another matrix expression of the normal equations. Since R is upper
triangular, it can be solved simply by back substitution. Of course, most of the work
was done in finding the QR factorization of A in the first place. In practice the QR
method preferable to solving the normal equations directly since the Gram-Schmidt
process for finding the QR factorization is more numerically stable than Gaussian
elimination.
from the line fitting problem of Section 20. We find the QR factorization of the
coefficient matrix: 1
2 − √320
1 0 1
− √120 � �
1 1 2 2 √3
=1 .
1 2 2 √1 0 5
20
1 3 1 3 √
2 20
Example 4: Suppose
we
want
the projection matrix P of R3 onto the subspace
1 2
spanned by 0 and 1 . (This is Section 20 Example 3.) We construct the
1 1
matrix A with these two vectors as its columns and find its QR factorization:
√1 √1
√
1 2
2 6
2 √3
2
0 1 = 0 √2 √ .
6
0 √3
1 1 √1 − √1 2
2 6
EXERCISES
2 2 1
3 3 3 −3
2 1 2
3. Express 9 as a linear combination of the vectors 3 , − 3 , 3 .
3 − 13 2
3
2
3
134 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
4
− 89 9
4 7
4. Extend the orthonormal set 9 , 9 to a basis of R3 , or, what is the same
1 4
9 9
− 89 4
9 ∗
4 7
∗
thing, find a third column that makes the matrix 9 9 orthogonal.
1 4
9 9 ∗
1 0 � � 5
0 1 x 4
5. Use the QR factorization to find the least squares solution of = .
1 1 y 6
1 2 4
6. Use the QR factorization to find the projection matrix of R4 onto the plane
1 0
−1 0
spanned by the vectors and .
−1 −2
1 −2
12. If p1 , p2 , �
· · · , pm is�an orthogonal
� basis for a�subspace �
� S of Rn , v is a vector outside
v · p1 v · p2 v · pm
S, and w = p1 + p2 + · · · pm , then show v − w ⊥ S.
p1 · p1 p2 · p2 pm · p m
(Hint: Verify v − w ⊥ pi for all i.) Conclude that w is the orthogonal projection of
v onto S.
14. If P is a projection, show (2P − I)T (2P − I) = I. Conclude that any reflection
is an orthogonal transformation.
18. If T is any transformation of Rn to itself that preserves distance and such that
T (0) = 0, then T is linear and can be represented as T (x) = Qx where Q is an
orthogonal matrix. This can be proved in the following way. (1) T preserves distance
and the origin ⇒ ||T (x)|| = ||x||, ||T (y)|| = ||y||, and ||T (x) − T (y)||2 = ||x − y||2 .
Expand this to show that T (x) · T (y) = x · y. (2) Expand ||cT (x) − T (cx)||2 and use
(1) to show that it equals zero. (3) Expand ||T (x + y) − T (x) − T (y)||2 and use (1)
to show that it equals zero. Conclude that T is linear and preserves dot products.
Interpret this as saying that any transformation that preserves length and the origin
must be linear and can be represented by an orthogonal matrix.
136 22. Diagonalization of Symmetric and Orthogonal Matrices
Example
� 11 1:
� We now illustrate the geometry of diagonalization with the matrix
3
5 −5
2 4 , which has eigenvalues λ = 2 and λ = 1 with associated eigenvectors
5 5
� � � �
3 1
and . If we think in terms of how this matrix operates on its eigenvectors
1 2
we have
� 11 �
3 � � � � � 11 �
3 � � � �
5 − 5 3 3 5 − 5 1 1
=2 and =1 .
2
5
4
5 1 1 2
5
4
5 2 2
In this case the eigenspaces are the two lines generated by the two eigenvectors. A
maps each line to itself but stretches one by a factor of 2 and the other by a factor of
1. All other vectors are moved in more complicated ways. We can see how they are
moved by observing that,� since
� the
� �two eigenvectors form a basis for R , any vector in
2
3 1
R2 can be written as a +b . The numbers a and b are the coordinates of the
1 2
vector with respect
� �to the� skewed
� coordinate
� � �system
� defined by the two eigenvectors.
3 1 3 1
Since A maps a +b to 2a +b , we see that the effect of A is very
1 2 1 2
simple when viewed in this new coordinate system.
FIGURE 25
22. Diagonalization of Symmetric and Orthogonal Matrices 137
also has a geometric interpretation illustrated by the diagram below. The diagram
means that a vector can be mapped horizontally by A (transcontinental railroad) or
around the horn by SDS −1 (clipper ship). In either case it will arrive at the same
destination. In particular we can watch how the eigenvectors are mapped. Since
� � � � � � � �
1 3 0 1
S = and S = ,
0 1 1 2
we have � � � � � � � �
3 1 1 0
S −1
= and S −1
= .
1 0 2 1
Therefore we see that the two eigenvectors are first taken to the two coordinate
vectors, then stretched by factors of 2 and 1, and finally sent back to stretched
versions of the original two eigenvectors.
1 1
2 2
A 2 3
1
3
1
S -1 S
0 0
1 D 1
1 1
0 2 0
FIGURE 26
Now we cover some points that were skipped over in Section 11.
1. To construct the diagonal factorization A = SDS −1 we need n linearly indepen-
dent eigenvectors to serve as the columns of S. The independence of the columns
138 22. Diagonalization of Symmetric and Orthogonal Matrices
will insure that S −1 exists (see Section 19). The problem of diagonalization therefore
reduces to the question of whether there are enough independent eigenvectors.
2. Eigenvectors that are associated with distinct eigenvalues are linearly independent.
In other words, if v1 , v2 , · · · , vn are eigenvectors for A with associated eigenvalues
λ1 , λ2 , · · · , λn where λi �= λj for all i �= j, then all the v’s are linearly independent.
To see this, assume it is not true and find the first vector vi (reading from left to
right) that can be written as a linear combination of the v’s to its left. Suppose this
vector is v5 . Then we know that v1 , v2 , v3 , v4 are linearly independent, and therefore
we have an equation of the form v5 = c1 v1 + c2 v2 + c3 v3 + c4 v4 . Multiply one copy of
this equation by A to obtain λ5 c5 = c1 λ1 v1 + c2 λ2 v2 + c3 λ3 v3 + c4 λ4 v4 and another
copy by λ5 to obtain λ5 v5 = c1 λ5 v1 + c2 λ5 v2 + c3 λ5 v3 + c4 λ5 v4 . Subtracting one from
the other gives 0 = c1 (λ1 −λ5 )v1 c1 +c2 (λ2 −λ5 )v2 +c3 (λ3 −λ5 )v3 +c4 (λ4 −λ5 )v4 . Since
v1 , v2 , v3 , v4 are independent, all the coefficients in this equation must equal zero. But
since all the λ’s are different, the only way this can happen is if c1 = c2 = c3 = c4 = 0.
But this means that v5 = 0, a contradiction. From this result we see that an n × n
matrix is diagonalizable if there are n real and distinct eigenvalues.
3. Unfortunately there are many � interesting
� matrices that have repeated
� �eigenvalues.
2 3 2 0
For example the shear matrix and the diagonal matrix both have
0 2 0 2
eigenvalues λ = 2, 2 (meaning that the eigenvalue is repeated), but the shear matrix
has only one independent eigenvector whereas the diagonal matrix has two. What
is the relationship in general between the number of independent eigenvectors asso-
ciated with a particular eigenvalue λ0 of a matrix A and the number of times λ0 is
repeated as a root of the characteristic polynomial of A? If we define the first number
to be the geometric multiplicity of λ0 and the second to be the algebraic multiplicity
of λ0 , then we can state the answer to this question formally as follows.
Theorem. For any eigenvalue, geometric multiplicity ≤ algebraic multiplicity.
Proof: Suppose λ0 has geometric multiplicity p, meaning that there are p inde-
pendent eigenvectors v1 , v2 , · · · , vp for λ0 . Expand this set of vectors to a basis
v1 , v2 , · · · , vp , · · · , vn for Rn . Then we have
λ0 · · · 0
.. . . .
.. .. .. .. .. .. . . .. D
. . . . . .
0 · · · λ
A v1 vp vn
= v1 vp vn
0
,
.. .. .. .. ..
.. .0 · · · 0
..
. . . . . . .. . E
0 ··· 0
λI) = (λ0 − λ)p det(E − λI) (see Section 9 Exercise 3), meaning that the algebraic
multiplicity of λ0 is at least p. This ends the proof.
There are important classes of matrices that always have diagonal factorizations.
In particular we will now investigate symmetric and orthogonal matrices and show
that they always have especially nice diagonal, or at least diagonal-like, factorizations.
� �
41 −12
Example 2: Consider the symmetric matrix A = . As usual, we com-
−12 34 � � � �
3 4
pute the eigenvalues 25 and 50, the corresponding eigenvectors and , and
4 −3
set up the factorization
� �� �� �−1
3 4 25 0 3 4
A=
4 −3 0 50 4 −3
But note that the two eigenvectors have a very special property: they are orthogonal.
We can therefore normalize them so that the factorization becomes
�3 4
�� �� 3 4
�−1
5 5 25 0 5 5
A= 4 ,
5 − 3
5 0 50 4
5 − 3
5
The first vector is orthogonal to the second and third, but those two are not or-
thogonal to each other. They are however both associated with the eigenvalue 2, so
140 22. Diagonalization of Symmetric and Orthogonal Matrices
they generate the eigenspace, in this case a plane, of the eigenvalue 2. If we run the
Gram-Schmidt process on these two eigenvectors, we will stay within the eigenspace
and generate the two orthonormal eigenvectors
1 √1
√ 6
2
√1 √
2 −16 .
0 √2
6
If we normalize the first eigenvector and assemble all the pieces, we obtain the fac-
torization
√1 − √1 T
− 3 √12 √1
6 8 0 0 3
√1
2
√1
6
√1 √1
A= √1 − √16
0 2 0 3 √1 − √16
3 2 2
√1 0 √2 0 0 2 √1 0 √2
3 6 3 6
In the previous two examples, eigenvectors that come from different eigenvalues
seemed to be automatically orthogonal. This is in fact true for any symmetric matrix
A. We prove this by letting Av = λv and Aw = µw where λ �= µ and noting that
λv · w = wT λv = wT Av = (wT Av)T = v T Aw = v T µw = µv · w ⇒ (λ − µ)v · w = 0 ⇒
v · w = 0. (Justify each step.)
Can every symmetric matrix be factored as in the previous two examples? That
is, does every symmetric have a diagonal factorization through orthogonal matrices,
or said another way, does every symmetric matrix have an orthonormal basis of eigen-
vectors? The answer is yes, and such a factorization is called a spectral factorization.
We state this formally in the following theorem, which is one of the most important
results of linear algebra.
The Spectral Theorem. If A is a symmetric n × n matrix, then A has n real
eigenvalues (counting multiplicities) λ1 , λ2 , · · · , λn and its corresponding eigenvectors
form an orthonormal basis with respect to which A takes the form
λ1
λ2
.
..
λn
But since QT1 AQ1 is symmetric (see Section 2 Exercise 6(f)), we can conclude that
λ1 0 0 0
0 ∗ ∗ ∗
AQ1 = Q1 .
0 ∗ ∗ ∗
0 ∗ ∗ ∗
Let A2 be the 3 × 3 matrix in the lower right corner of the last factor on the right.
Then A2 is symmetric and, except for λ1 , has the same eigenvalues as A (see Section
9 Exercise 3). This ends step one.
Since A2 is symmetric, it has an eigenvalue λ2 with eigenvector v2 . Normalize
v2 and expand it to an orthonormal basis of R3 . Let U2 be the orthogonal matrix
with these vectors as its columns. (The first column is v2 .) Then as above we have
λ2 0 0
A2 U2 = U2 0 ∗ ∗ .
0 ∗ ∗
T
1 0 0 0 λ1 0 0 0 1 0 0 0
0 0 0
=
0 U2 0 A2 0 U2
0 0 0
λ1 0 0 0
0 λ2 0 0
=
0 0 ∗ ∗
0 0 ∗ ∗
or letting Q2 equal the product of Q1 and the matrix containing U2 we have
λ1 0 0 0
0 λ2 0 0
QT2 AQ2 =
0 0 ∗ ∗
0 0 ∗ ∗
Q2 is the product of orthogonal matrices and is therefore orthogonal (Section 21
Exercise 8). Let A3 be the 2 × 2 matrix in the lower right corner of the last factor on
the right. Then A3 is symmetric and, except for λ1 and λ2 , has the same eigenvalues
as A. This ends step two. In general, we continue in this manner until we obtain
λ1
λ2
QT AQ = . ..
λn
This proves the Spectral Theorem.
The Spectral Theorem has many applications, which we will not pursue here.
Instead we will end with a spectral-like factorization for orthogonal matrices. Of
course, orthogonal matrices are not necessarily symmetric, so the Spectral Theorem
does not apply. In fact, most orthogonal
� �matrices are not diagonalizable at all as
0 −1
in the case of the rotation matrix . But let’s push ahead anyway with the
1 0
following example.
The characteristic equation for A is x3 − 2x2 + 2x − 1 = 0. We find its roots and use
Gaussian elimination with complex arithmetic as in Section 13 to obtain the following
three eigenvalue-eigenvector pairs:
22. Diagonalization of Symmetric and Orthogonal Matrices 143
√ √
1 √ √3 + i √ √3 − i
1 1, 1
+i 3 − 3 + i, 1
−i 3 − 3 − i.
2 2 2 2
1 −2i 2i
We put all this together to obtain the complex diagonal factorization
√ √
1 √3 + i √ 3 − i
A = 1 − 3 + i − 3 − i
1 −2i 2i
1 0 0
√
0 1 + i 3 0
2 2
√
0 0 1
2 −i 2
3
√ √ −1
1 √3 + i √3 − i
1 − 3 + i − 3 − i .
1 −2i 2i
The equations for the second and third eigenvalue-eigenvector pairs can be written
as Av = λv and Av = λv. Just as in Section 13, we can therefore rewrite the
factorization in real form. Recall from that section that the equation Av = λv can
be written as A(x + iy) = (α + iβ)(x + iy), which when multiplied out becomes
Ax + iAy = (αx − βy) + i(βx + αy). Equating real and imaginary parts we obtain
Ax = αx − βy and Ay = βx + αy. This gives us the real block-diagonal factorization
√ 1 0 0
√ −1
1 √ 3 1 √ 1 √3 1
3
A = 1 − 3 1 0
1
2 2 1 − 3 1
√
1 0 −2 0 − 23 1 1 0 −2
2
Note that the columns of the first factor on the right are orthogonal, so that if we
normalize each column, we will have an orthogonal matrix. But we must be careful
that when we divide by lengths, the equations Ax = αx − βy and Ay = βx + αy
remain true. This can only be done if we divide x and y by the same number. In our
case, fortunately,
√ both the second and third columns, which correspond to x and y,
have length = 6. Therefore we are justified in writing
√1 √1 √1
√1 √1 √1
T
3 2 6 1 0 0 3 2 6
√
√1 √1
√1 0 1 3 √1
A= 3 − 6 2 2 − √12 √1
2 √ 3 6
√1 0 − √2 0 − 23 1
2
√1 0 − √2
3 6 3 6
The kind of factorization we have just obtained can be realized for any orthogonal
matrix. We call it a real block-diagonal factorization .
144 22. Diagonalization of Symmetric and Orthogonal Matrices
Let A2 be the matrix in the lower right corner of the last factor on the right. Then
A2 is orthogonal and, except for λ, has the same eigenvalues as A.
The second possibility is that λ is complex. Let x and y be the real and imaginary
parts of the eigenvector v. Assume for a moment that �x� = �y� and x · y = 0. Then
22. Diagonalization of Symmetric and Orthogonal Matrices 145
Let A2 be as above, then A2 is orthogonal and, except for λ and λ, has the same
eigenvalues as A. This ends the first step. Continue in the obvious way as in the
Spectral Theorem.
We still have to prove �x� = �y� and x · y = 0. It is enough to show v T v = 0,
since then we would have v T v = (x+iy)T (x+iy) = x·x−y ·y +i2x·y = 0 ⇒ x·y = 0
and x · x = y · y or �x� = �y�. To show v T v = 0 we compute v T v = v T AT Av =
(Av)T (Av) = λ2 v T v. If v T v �= 0, then we could cancel it from both sides obtaining
λ2 = 1. But the only solutions to the equation λ2 = 1 are λ = ±1 (Exercise 11),
contradicting the assumption that λ is complex. Therefore v T v = 0. This ends the
proof.
Note that each consecutive pair of −1’s on the diagonal can be considered as a
plane rotation of π radians, and therefore they can be placed in the sequence of αβ
blocks. The block-diagonal matrix D then assumes the form
α1 β1
−β1 α1
..
.
αq βq
−βq αq .
±1
1
..
.
1
that reverses one direction orthogonal to these planes. In R3 the only possibilities
are
α β 0 −1 0 0 α β 0
−β α 0 0 1 0 −β α 0 ,
0 0 1 0 0 1 0 0 −1
that is, a pure rotation, a pure reflection, or a rotation and reflection perpendicular
to the plane of rotation.
Finally we leave symmetric and orthogonal matrices and consider two important
scalar functions of arbitrary square matrices. They are the determinant and the
trace. The determinant of a matrix we already know something about. The trace of
a matrix A is defined as the sum of its diagonal elements
They both have simple and useful expressions in terms of the eigenvalues of A, which
are summarized in the following.
Theorem. The determinant of a matrix is equal to the product of its eigenvalues,
and the trace of a matrix is equal to the sum of its eigenvalues, both taken over the
complex numbers.
Proof: Consider the characteristic polynomial det(A − λI) of A.
a11 − λ a12 ··· a1n
a21 a22 − λ ··· a2n
det
.. .. ..
. . .
an1 an2 ··· ann − λ
= (a11 − λ)(a22 − λ) · · · (ann − λ) + expressions in λn−2 , λn−3 , · · · , λ + constants
= (−λ)n + tr(A)(−λ)n−1 + · · · + det(A)
The first equality follows from the determinant formula. Note that the first term
contains all expressions involving λn and λn−1 . The second equality follows by simple
computation and the fact that det(A − 0I) = det(A). If λ1 , λ2 , · · · , λn are all the
eigenvalues of A, then the characteristic polynomial can also be written in factored
form as
det(A − λI) = C(λ1 − λ)(λ2 − λ) · · · (λn − λ)
= C[(−λ)n + (λ1 + λ2 + · · · + λn )(−λ)n−1 + · · · + λ1 λ2 · · · λn ]
Equating the two forms of the characteristic polynomial, we see that C = 1 and
therefore det(A) = λ1 λ2 · · · λn and tr(A) = λ1 + λ2 + · · · + λn .
rotation if and only if det(A) = 1. In this case tr(A) = 1 + 2α. Since α = cos θ where
θ is the angle of rotation, we have
tr(A) − 1
cos θ = .
2
This means that the angle of rotation can be computed without finding eigenvalues.
In particular, for the matrix
2 2
3 3 − 13
−1 2 2
A= 3 3 3
2
3 − 13 2
3
of the earlier example, we have det(A) = 1, so A is a pure rotation such that cos θ =
(6/3 − 1)/2 = 1/2 and therefore θ = π/3. To find the axis and direction of the
rotation, it is still necessary to compute the eigenvectors.
EXERCISES
2. Describe the eigenspaces of the following matrix and how the matrix acts on each.
What are the algebraic and geometric multiplicities of the eigenvalues?
−1
2 3 0 −1 0 3
4 −1 0 0 −1 0 3
4
A= 4 3
0 = 1 0 1 0 6 0 1 0 1
0 0 6 0 1 0 0 0 6 0 1 0
3. Find the diagonal factorizations of the following matrices and sketch a diagram
that geometrically describes the effect of each.
� �
1 4
(a)
1 −2
� �
2 −2
(b)
−2 −1
2 1 0
(c) 0 3 0
0 0 3
7. Construct
the orthogonal matrix that rotates R around the axis defined by the
3
−1
vector 0 by 90◦ by writing down block-diagonal factorization of the matrix and
1
multiplying it out.
22. Diagonalization of Symmetric and Orthogonal Matrices 149
11. Show that, even in the world of complex numbers, the only solutions to the
equation λ2 = 1 are λ = ±1. (Hint: Let λ = α + iβ and reach a contradiction.)
12. If Q is an orthogonal matrix such that det Q = −1, then what can you say about
Q as a transformation?
13. Fix the center of a basketball and choose n axes v1 , v2 , · · · , vn and angles
θ1 , θ2 , · · · , θn . Rotate the basketball around v1 by an angle θ1 , around v2 by an
angle θ2 , · · ·, and around vn by an angle θn . You could have achieved the same result
with one rotation around a certain axis and by a certain angle. Discuss why this is
true and how you could find the one axis and angle that will do the job. This is The
Larry Bird Theorem.
15. For each matrix below decide if it is symmetric, orthogonal, invertible, a projec-
tion, or diagonalizable.
0 1 0 0 1 1 1 1
0 0 1 0 1 1 1 1 1
A= B=
0 0 0 1 4 1 1 1 1
1 0 0 0 1 1 1 1
16. Show tr(A) + tr(B) = tr(A + B), tr(AB) = tr(BA), and tr(B −1 AB) = tr(A)
17. Show that A = SBS −1 ⇒ A and B have the same trace, determinant, eigen-
values, characteristic �polynomial,
� and rank.
� Find
� a counterexample for the converse
1 0 1 1
(⇐). Hint: Try A = and B =
0 1 0 1
23. Quadratic Forms 151
After linear functions, which we have already studied extensively in the form
of linear equations and linear transformations, quadratic functions are next in level
of complexity. Such functions arise in diverse applications, including geometry, me-
chanical vibrations, statistics, and electrical engineering, but matrix methods allow a
unified study of their properties. A quadratic equation in two variables is an equation
of the form
ax2 + bxy + cy 2 + dx + ey + f = 0
where at least one of the coefficients a, b, c is not zero. From analytic geometry, we
know that the graph of a quadratic equation is a conic section, that is, a circle, a
parabola, an ellipse, a hyperbola, a pair of lines, a single line, a point, or the empty
set. A quadratic equation may be expressed with matrices as
� �� � � �
a b/2 x x
[x y] + [d e] + f = 0.
b/2 c y y
determine the type of conic section that the equation represents and are called the
quadratic form associated with the equation. Note that although the matrix above
is symmetric, the� same� quadratic
� form�can be generated by many other different
a b a 3b
matrices such as and . A quadratic equation in three variables is
0 c −2b c
an equation of the form
or
a d/2 e/2 x x
[x y
z ] d/2 b f /2 y + [g h i] y + j = 0
e/2 f /2 c z z
where at least one of the coefficients a, b, c, d, e, f is not zero. The graphs of such equa-
tions are quadric surfaces, which include ellipsoids, hyperboloids, and paraboloids of
various types. Again the terms of second degree constitute the quadratic form asso-
ciated with the equation.
152 23. Quadratic Forms
Example 2: To find the graph of the quadratic equation 4x1 x2 + 4x1 x3 − 4x2 x3 = 1
we first write it as
0 2 2 x1
[ x1 x2 x3 ] 2 0 −2 x2 = 1.
2 −2 0 x3
From Section 22 Exercise 5(d) the spectral factorization A = QDQT for this matrix
looks like
√1 √1 − √13
√1 √1 − √13
T
0 2 2 2 6 2 0 0 2 6
2 0 −2 =
0 √2 √1
0
2 0 0 √2 √1
.
6 3 6 3
2 −2 0 √1 − √16 √1 0 0 −4 √1 − √16 √1
2 3 2 3
the quadratic equation in terms of the y-coordinates takes the form 2y12 + 2y22 − 4y32 =
1. This is a hyperboloid of revolution around the y3 axis, and therefore the quadratic
equation 4x1 x2 + 4x1 x3 − 4x2 x3 = 1 describes a hyperboloid of revolution around
the axis defined by the third column of Q.
The method just illustrated obviously works in general. We therefore have for
any quadratic form xT Ax, there is an orthogonal change of variables y = QT x with
respect to which the quadratic form becomes λ1 y12 +λ2 y22 +· · ·+λn yn2 . (A is symmetric
with eigenvalues λ1 , λ2 , · · · , λn and Q is orthogonal.) This is called the Principal Axis
Theorem. It is really just the Spectral Theorem in another form.
154 23. Quadratic Forms
EXERCISES
3. For the each of the following quadratic equations, find a rotation of the coordinates
so that the resulting quadratic form is in standard form, and identify and sketch the
curve or surface.
(a) x21 + x1 x2 + x22 = 6
(b) 7x21 + 7x22 − 5x23 − 32x1 x2 − 16x1 x3 + 16x2 x3 = 1 (Hint: The eigenvalues are
−9, −9, 27.)
4. For the quadratic equation 6x21 −6x1 x2 +14x22 −2x1 +x2 = 0, (a) find a rotation of
the coordinates so that the resulting quadratic form is in standard form, (b) eliminate
the linear terms by completing the square in each variable and making a translation
of the coordinates, and (c) identify and sketch the curve.
Now we investigate how quadratic forms arise in the problem of maximizing and
minimizing functions of several variables. Suppose we want to determine the nature
of the critical points of a real valued function z = f (x, y). Assume for simplicity
a critical point occurs at (0, 0) and f (x, y) can be expanded in a Taylor series in a
neighborhood of that point. Then we have f (x, y) =
1
f (0, 0) + fx (0, 0)x + fy (0, 0)y + (fxx (0, 0)x2 + 2fxy (0, 0)xy + fyy (0, 0)y 2 ) + · · · .
2!
Since (0, 0) is a critical point, we must have fx (0, 0) = fy (0, 0) = 0. Putting this
back into the Taylor series and rewriting the second order terms, we have
This means that f (x, y) behaves near (0, 0) like its second order terms ax2 +bxy+cy 2 .
That is to say, if the quadratic form ax2 + bxy + cy 2 is positive for every nonzero
choice of (x, y) then f (x, y) has a minimum at (0, 0), and if ax2 + bxy + cy 2 is
negative for every nonzero choice of (x, y) then f (x, y) has a maximum at (0, 0). In
general, an arbitrary quadratic form ax2 + bxy + cy 2 will assume positive, negative,
and zero values for various values of (x, y). But there are cases like 2x2 + 3y 2 and
x2 − 2xy + 2y 2 = (x − y)2 + y 2 that are positive for all nonzero values of (x, y), or
like −x2 − 6y 2 and −x2 + 4xy − 4y 2 = −(x − 2y)2 that are negative for all nonzero
values of (x, y).
We are therefore led to the following definition. A symmetric matrix A is positive
definite if its associated quadratic form xT Ax > 0 for every x �= 0. We also say A is
negative definite if −A is positive definite, that is if xT Ax < 0 for every x �= 0. How
can we tell if a symmetric matrix is positive definite? There are five ways to answer
this question, and we present them all in the following theorem. Its proof is long but
instructive. First we need a definition: For any square matrix
a11 a12 ... a1n
a21 a22 ... a2n
A=
... .. .. .. ,
. . .
an1 an2 ... ann
Theorem. For any symmetric n × n matrix A the following statements are equava-
lent.
(a) A is positive definite.
(b) All the eigenvalues of A are positive.
(c) All the leading principal submatrices A1 , A2 , · · · , An of A have positive determi-
nants.
(d) A can be reduced to upper triangular form with all pivots positive by using only
the Gaussian operation of multiplying one row by a scalar and subtracting from
another row (no row exchanges or scalar multiplications of rows are necessary).
(e) There is a matrix R (not necessarily square) with independent columns such
that A = RT R.
There are similar equalities for all the other leading principal submatrices. Therefore,
since det(Ai ) equals the product of its eigenvalues (by the symmetry of Ai and Section
22 Exercise 6), which are all positive by (b) above, we have det(Ai ) > 0.
(c) ⇒ (d): We first note that the Gaussian step of multiplying one row by a scalar and
substracting from another row has no effect on the determinant of a matrix or on the
determinant of its leading principal submatrices. We now illustrate the implication
of this for the 4 × 4 case. Initially A looks like
p11 ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
24. Positive Definite Matrices 157
and we have p11 = det(A1 ) > 0. We run one Gaussian step and obtain
p11 ∗ ∗ ∗
0 p22 ∗ ∗
.
0 ∗ ∗ ∗
0 ∗ ∗ ∗
Then p11 p22 = det(A2 ) > 0 ⇒ p22 > 0. We run another Gaussian step and obtain
p11 ∗ ∗ ∗
0 p22 ∗ ∗
.
0 0 p33 ∗
0 0 ∗ ∗
Then p11 p22 p33 = det(A3 ) > 0 ⇒ p33 > 0. Finally we run one more Gaussian step
and obtain
p11 ∗ ∗ ∗
0 p22 ∗ ∗
.
0 0 p33 ∗
0 0 0 p44
Then p11 p22 p33 p44 = det(A4 ) > 0 ⇒ p44 > 0. Note that no row exchanges are
necessary. The general case is now clear.
(d) ⇒ (e): This is the hard one! We need a preliminary result: If A is symmetric and
has an LU-factorization A = LU , then it has a factorization of the form A = LDLT
where D is diagonal. We quickly indicate the proof. If we divide each row of U by its
pivot and place the pivots into a diagonal matrix D, we immediately have A = LDM
where M is upper triangular with ones down its diagonal. Our goal is to show
LT = M or LT M −1 = I. Since A is symmetric, AT = A ⇒ M T DLT = LDM ⇒
LT M −1 = D−1 (M T )−1 LD. In the last equation, the left side is upper triangular since
it is a product of upper triangular matrices, and the right side is lower triangular since
it is a product of lower triangular and diagonal matrices. Both sides are therefore
diagonal. Furthermore, since LT and M −1 are each upper triangular with ones down
their diagonals, then the same is true of M (LT )−1 (Exercise 1). We conclude that
M (LT )−1 = I. Now we use this result. Since A is symmetric with positive pivots, we
have A√= LDLT where the diagonal entries of D are all positive. We can therefore
define D to be the diagonal matrix with diagonal entries equal to the√square √ roots
of the corresponding diagonal entries of D. We then have A = (L D)( DLT ),
which has the form A = RT R.
(e) ⇒ (a): Since R has independent columns, Rx = 0 ⇔ x = 0. Therefore x �= 0 ⇒
xT Ax = xT RT Rx = (Rx)T (Rx) = �Rx�2 > 0. This ends the proof.
√ √
The factorization A = (L D)( DLT ) is called the Cholesky factorization of the
symmetric positive definite matrix A. It is useful in numerical applications and can
be computed by a simple variant of Gaussian elimination.
158 24. Positive Definite Matrices
From this theorem we can also characterize negative definite matrices. The
equivalent statements are (a) A is negative definite, (b) all the eigenvalues of A are
negative, (c) det(A1 ) < 0, det(A2 ) > 0, det(A3 ) < 0, · · · (Exercise 2), (d) all the
pivots of A are negative, and (e) A = −RT R for some matrix R with independent
columns.
Example 3: Let’s check each of the conditions above for the quadratic form 2x21 +
2x22 + 2x23 − 2x1 x2 − 2x1 x3 + 2x2 x3 . First we write it in the form xT Ax where
2 −1 −1
A = −1 2 1 .
−1 1 2
The the spectral factorization of A is
1 √1 T
√
2
√1
6
− √13 1 0 0 2
√1
6
− √13
√1
A= 0 √2
6 3
0 1 0 0 √2
6
√1
3 .
√1
− √61 √1 0 0 4 √1 − √16 √1
2 3 2 3
All the eigenvalues are positive, and therefore A is positive definite. The leading
principal submatrices have determinants det(A1 ) = 2, det(A2 ) = 3, det(A3 ) = 4 and
are therefore all positive as they should be. The LU factorization of A is
1 0 0 2 −1 −1
A = − 12 1 0 0 32 1
2 .
−2 3 1
1 1
0 0 4
3
which has the form A = RT R. There is nothing unique about R. For example, we
can also take the square
√ root
√ of the diagonal
√ matrix
√ in the spectral factorization of
A to obtain A = (Q D)( DQT ) = ( DQ)T ( DQ) or
1 1
√
2
√1
6
− √1
3
√
2
0 √2
2
√1 √1 √2
A= 0 √2
6 3 6
√2
6
− 6
,
2 2 2 1 1 2
√
2
− √6 √
3
− √3 √3 √
3
24. Positive Definite Matrices 159
which also has the form A = RT R. There are many other such R’s, not even
necessarily square, for example
1 −1 0
1 1 0 0
1 0 −1
A = −1 0 0 1 .
0 0 0
0 −1 0 1
0 1 1
In fact, the product RT R should look familiar. It appears in the normal equations
AT Ax = AT b. We conclude that least squares problems invariably lead to positive
definite matrices.
Now let’s return to the problem of maximizing or minimizing a function of two
variables. We have seen that the question comes down to the positive or negative
definiteness of the quadratic form
1
(fxx (0, 0)x2 + 2fxy (0, 0)xy + fyy (0, 0)y 2 )
2!
or of the matrix � �
fxx (0, 0) fxy (0, 0)
.
fxy (0, 0) fyy (0, 0)
From the characterization of positive and negative definite matrices in terms of the
signs of the determinants of their principal leading submatrices, we immediately
obtain that (0, 0) is
a minimum point if fxx (0, 0) > 0 and fxx (0, 0)fyy (0, 0) − (fxy (0, 0))2 > 0,
a maximum point if fxx (0, 0) < 0 and fxx (0, 0)fyy (0, 0) − (fxy (0, 0))2 > 0.
This is just the second derivative test from the calculus of several variables. In the
n-variable case, if a function f (x1 , x2 , · · · , xn ) has a critical point at (0, 0, · · · , 0), then
fx1 (0, 0, · · · , 0) = fx2 (0, 0, · · · , 0) = · · · = fxn (0, 0, · · · , 0) = 0 and locally we have
f (x1 , x2 , · · · , xn )
= f (0, 0, · · · , 0)
f fx1 x2 ··· fx1 xn
x1 x1 x1
1 fx2 x1 fx2 x2 ··· fx2 xn x2
+ [ x1 x2 · · · xn ]
.. .. .. ..
.
..
2! . . . .
fxn x1 fxn x2 · · · fxn xn (0,0,···,0) xn
+ higher order terms
is obviously not efficient to use the determinant test as we did for the 2×2 case above.
It is much better to check the signs of the pivots, because they are easily found by
Gaussian elimination. So we have come full circle. Gauss reigns supreme here as in
every other domain of linear algebra. That is the paramount and overriding principle
of the subject and of these notes.
EXERCISES
1. Show by example that the set of upper triangular matrices with ones down their
diagonals is closed under multiplication and inverse.
2. Why does the determinant test for negative definiteness look like det(A1 ) <
0, det(A2 ) > 0, det(A3 ) < 0, · · ·?
4. Show by an example that the product of two positive definite symmetric matrices
may not define a positive definite quadratic form.
5. Write the quadratic form 3x21 + 4x22 + 5x23 + 4x1 x2 + 4x2 x3 in the form xT Ax
and verify all the statements in the theorem on positive definite matrices. That is,
show A has all eigenvalues positive and all pivots positive and obtain two different
factorizations of the form A = RT R, one from A = QDQT and the other from
A = LDLT . Describe the quadric surface 3x21 + 4x22 + 5x23 + 4x1 x2 + 4x2 x3 = 16
(Hint: λ = 1, 4, 7)
√
6. For positive
definite
matrices A, make a reasonable definition of A, and compute
3 2 0
it for A = 2 4 2 . (See Exercise 5 above.)
0 2 5
8. Test the following matrix for positive definiteness the easiest way you can.
1 0 1 0
0 2 1 1
1 1 3 1
0 1 1 2
ANSWERS TO EXERCISES
SECTION 1
� � � � 1
−2 −.5
−2 −1 0
1. (a) (b) (c) 3 (d) 5 (e)
1 −4 2
−1 −3
1
1.5
2. −.5
−3
4. 150, 100
5. 580, 50
6. 10 servings of pasta, 1 serving of chicken, 4 servings of broccoli
7. y = x3 − 2x2 − 3x + 5
8. y = 3x3 − 5x2 + x + 2
SECTION 2
� � 7 0 17
10 14 −2
1. (a) (b) 10 7 (c) 4 (d) [ 2 14 −8 ]
8 −4 0
6 −5 −7
4 8 12 −8 10 8 −3 12
(e) [ 32 ] (f) 5 10 15
(g) −14 26 (h) 5 0 7
6 12 18 −2 −1 −6 −3 −8
4 0 −1 32 0 0 0 0 0
(i) 0 1 0
(j) 0 1 0
(k) 0 0 0
2 −2 1 0 0 243 0 0 0
2 5 0
6. (a) −1 0 −1
3 7 0
9. All but the last two.
SECTION 3
� �� � 1 0 0 2 1 3
1 0 4 −6
1. (a) (b) −1 1 0 0 6 4
.75 1 0 9.5
2 0 1 0 0 −2
1 0 0 0 1 3 2 −1
2 1 0 0 0 −1 −1 4
(c)
−3 −11 1 0 0 0 −6 43
1 2 −.5 1 0 0 0 15.5
Answers to Exercises 163
1 0 0 0 0 2 1 0 0 0
2 1 0 0 0 0 3 3 0 0
(d) 0 1 1 0 0 0 0 1 1 0
0 0 −1 1 0 0 0 0 2 1
0 0 0 2 1 0 0 0 0 1
2
� � 1
2 0
1 0
2. (a) (b) −1 (c) (d) −1
2 0
3 0
1
1
3. 350, 1628
SECTION
4
2
1. −3
4
2. all except (c)
3. (a) none, (b) infinitely many
SECTION 5
� � .5 0 0 .5 −1.5 .5
−7 4
1. (a) (b) 0 10 0 (c) 0 .5 −.5
2 −1
0 0 −.2 0 0 .2
1 −2 1 0
10 −6 1 −1 0 1
1 −2 2 −3
(d) −2 1 0 (e) −5 1 3 (f)
0 1 −1 1
−7 5 −1 7 −1 −4
−2 3 −2 3
� �
1 d −b
(g)
ad − bc −c a
2
3. 7
2
4. only (c)
10. (a) False. (b) True.
SECTION 6
2 1 0 0 0 s0 3
1 4 1 0 0 s1 12
1. 0 1 4 1 0 s2 = 0 s0 = s2 = s4 = 0, s1 = 3, s3 = −3
0 0 1 4 1 s3 −12
0 0 0 1 2 s4 −3
164 Answers to Exercises
SECTION 7
2 −1
2. (a) .5 + c −1
0 1
(b) no solution
3 1 −2
(c) 0 + c 0 + d 1
0 1 0
3 −1
−1 0
(d) + c
0 1
0 0
2 −1 −2
0 0 1
(e) + c + d
−1.5 −.5 0
0 1 0
� �
3
(f)
−5
3. (a) two intersecting lines
(b) two parallel lines
(c) one line
4. (a) three planes intersecting in a point
(b) one plane intersecting two parallel planes
(c) three nonparallel planes with no intersection
(d) a line of intersection
(e) a plane of intersection
8. eggs = −2 + c, milk = 4 − c, orangejuice = c where 2 ≤ c ≤ 4
9. a = 2, b = c = d = 1
10. x2 + y 2 − 4x − 6y + 4 = 0
SECTION 8
1. (a) −6 (b) −16 (c) −24 (d) −12 (e) −1 (f) −1
1
4. − , −6
6
5. (a) 3 (b) −12 (c) x + 2y − 18 (d) −x3 + 6x2 − 8x
7. True.
SECTION 9 � � � �
1 1
1. (a) for λ = 1: , for λ = 2:
0 1
Answers to Exercises 165
0 1 −1
(b) for λ = 1: 1 and 0 , for λ = 3: 0
0 −2 1
−2 0 2
(c) for λ = 0: 1 , for λ = 2: −1 , for λ = 4: 1
1 1 1
0 1 1
(d) for λ = −1 : −1 , for λ = 2: −2 , for λ = 6: −1
1 1 1
1 1 −1
(e) for λ = 2: 0 and 1 , for λ = −4: 1
1 0 1
1 1 0 0
0 1 1 0
(f) for λ = −2: and , for λ = 3: and
0 0 1 1
0 0 0 1
1 0 0
5. (a) 0 , 1 , 0
0 0 1
1 0
(b) 0 0
0 1
0
(c) 0
1
SECTION 10
� � � �� �� �−1
1 1 1 1 1 0 1 1
1. (a) =
0 2 0 1 0 2 0 1
−1
5 0 2 0 1 −1 1 0 0 0 1 −1
(b) 0 1 0 = 1 0 0 0 1 0 1 0 0
−4 0 −1 0 −2 1 0 0 3 0 −2 1
−1
2 2 2 −2 0 2 0 0 0 −2 0 2
(c) 1 2 0 = 1 −1 1 0 2 0 1 −1 1
1 0 2 1 1 1 0 0 4 1 1 1
−1
6 4 4 0 1 1 −1 0 0 0 1 1
(d) −7 −2 −1 = −1 −2 −1 0 2 0 −1 −2 −1
7 4 3 1 1 1 0 0 6 1 1 1
166 Answers to Exercises
−1
0 2 2 1 1 −1 2 0 0 1 1 −1
(e) 2 0 −2 = 0 1 1 0 2 0 0 1 1
2 −2 0 1 0 1 0 0 −4 1 0 1
−1
−2 0 0 0 1 1 0 0 −2 0 0 0 1 1 0 0
0 −2 5 −5 0 1 1 0 0 −2 0 0 0 1 1 0
(f) =
0 0 3 0 0 0 1 1 0 0 3 0 0 0 1 1
0 0 0 3 0 0 0 1 0 0 0 3 0 0 0 1
3. (a) Maybe. (In fact it is.) (b) Yes, since symmetric. (c) Maybe. (In fact it is
not.)
SECTION 11
�� �� � �� �� �−1
1 2 1 1 e 0 1 1
1. (a) exp =
0 2 0 1 0 e2 0 1
−1
5 0 2 0 1 −1 e 0 0 0 1 −1
(b) exp 0 1 0 = 1 0 0 0 e 0 1 0 0
−4 0 −1 0 −2 1 0 0 e 3
0 −2 1
−1
2 2 2 −2 0 2 1 0 0 −2 0 2
(c) exp 1 2 0 = 1 −1 1 0 e2 0 1 −1 1
1 0 2 1 1 1 0 0 e4 1 1 1
6 4 4
(d) exp −7 −2 −1
7 4 3
−1 −1
0 1 1 e 0 0 0 1 1
= −1 −2 −1 0 e2 0 −1 −2 −1
1 1 1 0 0 e6 1 1 1
2 −1
0 2 2 1 1 −1 e 0 0 1 1 −1
(e) exp 2 0 −2 = 0 1 1 0 e2 0 0 1 1
2 −2 0 1 0 1 0 0 e −4
1 0 1
−2 0 0 0
0 −2 5 −5
(f) exp
0 0 3 0
0 0 0 3
−2 −1
1 1 0 0 e 0 0 0 1 1 0 0
0 1 1 0 0 e−2 0 0 0 1 1 0
=
0 0 1 1 0 0 e3 0 0 0 1 1
0 0 0 1 0 0 0 e3 0 0 0 1
�� � � � �� t �� �−1
1 2 1 1 e 0 1 1
2. (a) exp t =
0 2 0 1 0 e 2t
0 1
Answers to Exercises 167
5 0 2
(b) exp 0 1 0 t
−4 0 −1
t −1
0 1 −1 e 0 0 0 1 −1
= 1 0 0 0 et 0 1 0 0
0 −2 1 0 0 e 3t
0 −2 1
−1
2 2 2 −2 0 2 1 0 0 −2 0 2
(c) exp 1 2 0 t = 1 −1 1 0 e2t 0 1 −1 1
1 0 2 1 1 1 0 0 e4t 1 1 1
6 4 4
(d) exp −7 −2 −1 t
7 4 3
−t −1
0 1 1 e 0 0 0 1 1
= −1 −2 −1 0 e2t 0 −1 −2 −1
1 1 1 0 0 e6t 1 1 1
2t −1
0 2 2 1 1 −1 e 0 0 1 1 −1
(e) exp 2 0 −2 t = 0 1 1 0 e2t 0 0 1 1
2 −2 0 1 0 1 0 0 e −4t
1 0 1
−2 0 0 0
0 −2 5 −5
(f) exp t
0 0 3 0
0 0 0 3
−2t −1
1 1 0 0 e 0 0 0 1 1 0 0
0 1 1 0 0 e−2t 0 0 0 1 1 0
=
0 0 1 1 0 0 e 3t
0 0 0 1 1
0 0 0 1 0 0 0 e3t 0 0 0 1
SECTION 12
� � � �
t 1 2t 1
1. (a) c1 e + c2 e
0 1
0 1 −1
(b) c1 et 1 + c2 et 0 + c3 e3t 0
0 −2 1
−2 0 2
(c) c1 1 + c2 e 2t
−1 + c3 e4t
1
1 1 1
0 1 1
(d) c1 e−t −1 + c2 e2t −2 + c3 e6t −1
1 1 1
168 Answers to Exercises
1 1 −1
(e) c1 e2t 0 + c2 e2t 1 + c3 e−4t 1
1 0 1
1 1 0 0
0 1
1
0
(f) c1 e−2t + c2 e−2t + c3 e3t + c4 e3t
0 0 1 1
0 0 0 1
� � � �
1 1
2. (a) et + 2e2t
0 1
0 1 −1
(b) 2et 1 + 2et 0 + e3t 0
0 −2 1
−2 0 2
(c) 1 +e 2t
−1 + e 4t
1
1 1 1
0 1 1
(d) e−t −1 − e2t −2 + e6t −1
1 1 1
1 1 −1
(e) 3e2t 0 + 2e2t 1 + 1e−4t 1
1 0 1
1 1 0 0
0 1 1 0
(f) e−2t + e−2t + e3t + 2e3t
0 0 1 1
0 0 0 1
3. (a) neutrally stable (b) unstable (c) stable
SECTION 13
3. α ± iβ
� �� �� �−1
3+i 3−i 3 + i2 0 3+i 3−i
4. (a)
2 2 0 3 − i2 2 2
� �� �� �−1
3 1 3 2 3 1
=
2 0 −2 3 2 0
−1
−i i 0 −1 + i3 0 0 −i i 0
(b) 1 − i 1 + i 1 0 −1 − i3 0 1 − i 1 + i 1
1 1 0 0 0 1 1 1 0
−1
0 −1 0 −1 3 0 0 −1 0
= 1 −1 1 −3 −1 0 1 −1 1
1 0 0 0 0 1 1 0 0
Answers to Exercises 169
� � � �
3 1
5. (a) (c1 e cos 2t + c2 e sin 2t)
3t 3t
+ (−c1 e sin 2t + c2 e cos 2t)
3t 3t
2 0
0
(b) (c1 e−t cos 3t + c2 e−t sin 3t) 1 +
1
−1 0
(−c1 e−t sin 3t + c2 e−t cos 3t) −1 + c3 et 1
0 0
6. (a) c1 = 2, c2 = −3 (b) c1 = 1, c2 = 2, c3 = 3
SECTION 14 � � � � � � � � � �
1 k −1 1 + (.25)k 1 64
1. (a) uk = 64(1) k
− 64(.25) = 64 → 64 =
2 1 2 − (.25)k 2 128
� � � � � �
6 6 6(−1)k + 12(.5)k
(b) uk = 1(−1)k + 2(.5)k = , bounded, no limit
2 4 2(−1)k + 8(.5)k
� � � � � �
3 k 2 5 k −2 1 6(3)k − 10(−1)k
(c) uk = (3) + (−1) = , blows up
4 1 4 1 4 3(3)k + 5(−1)k
−1
.5 .5 .5 2 2 0 1 0 0 2 2 0
2. .25 .5 0 = 1 −1 −1 0 0 0 1 −1 −1 ,
.25 0 .5 1 −1 1 0 0 .5 1 −1 1
2 0 2 2
k k k
1(1) 1 + (.5) −1 = 1 − (.5) → 1
1 1 1 + (.5) k
1
−1
.5 0 .5 −1 1 −1 .5 0 0 −1 1 −1
3. 0 .5 .5 = 1 1 −1 0 1 0 1 1 −1 ,
.5 .5 0 0 1 2 0 0 −.5 0 1 2
−1 1 −1 50
−30(.5)k 1 + 50(1)k 1 − 10(−.5)k −1 → 50
0 1 2 50
1 .25 0 1
4. 0 .5 .5 , 0 , everyone dies!
0 .25 .5 0
5. Everyone has blue eyes!
SECTION 15
3. (a) not closed under addition or scalar multiplication
(b) not closed under addition
(c) not closed under scalar multiplication
(d) not closed under addition
170 Answers to Exercises
SECTION 16
1. (a) independent
(b) independent
(c) dependent
(d) independent
(e) dependent
� � � �
1 0
2. (a) ,
0 1
1 0 0
(b) 0 , 1 , 0
0 0 1
1 0
(c) 0 , 1
0 2
Answers to Exercises 171
1 0 0
0 1 0
(d) , ,
1 0 0
0 0 1
3 0
0 3
(e) ,
3 0
1 1
3. Same answers
as for Section 15 Exercise 6.
3 2
5. (a) 3 1 − 2 2
2 1
(b) no solution
3 2 −1
(c) (6 + c) 1 + (−4 − c) 2 + c 1 many solutions
2 1 −1
� � � �
2 1
(d) ) +6
1 2
6. (a) U and V might be, W is not.
(b) U does not, V and W might.
(c) U and W are not, V might be.
SECTION 17
√
1. (a) �x� = 5, �y� = 5 5
√
1 −565
5 √
2 −525
5
(b) 2 , 2
−
5 √
5 5
− 45 9
√
5 5
(c) 153.43◦
−2
−4
(d)
4
8
−2 −4
−4 2
(e) +
4 −2
8 1
2. (5, 15/2)
� �
−β
3. c
α
172 Answers to Exercises
−1 −1
5. (a) c 1 + d 0
0 1
2
(b) c −3
1
1 0
−3 −1
(c) c + d
0 1
1 0
−1
1
(d) c
2
0
6. (a) −x1 + x2 = 0, −x1 + x3 = 0
(b) 2x1 − 3x2 + x3 = 0
(c) x1 − 3x2 + x4 = 0, −x2 + x3 = 0
(d) −x1 + x2 + 2x3 = 0
7. (a) False. (b) False.
SECTION 18
2. (a) Reflection of R2 in 135◦ line.
(b) Projection of R2 onto y-axis.
(c) Projection of R2 onto 135◦ line.
(d) Rotation of R2 by 45◦ .
(e) Rotation of R2 by −60◦ .
(f) Reflection of R2 in 150◦ line.
� �
β
(g) Rotation of R2 by arctan .
α
(h) Rotation of R3 around z-axis by 90◦ .
(i) Rotation of R3 around y-axis by −90◦ .
(j) Projection of R3 onto xy-plane.
(k) Rotation of R3 around z-axis by 90◦ and� reflection
� in xy-plane.
β
(l) Rotation of R3 around z-axis by arctan and reflection in xy-plane.
α
−1 0 0
3. (a) 0 −1 0
0 0 −1
1 0 0
(b) 0 0 0
0 0 1
Answers to Exercises 173
0 1 0
(c) 1 0 0
0 0 1
1 0 0
(d) 0 √12 − √12
0 √12 √1
2
4. (a) x2 + y 2 = 4, a circle of radius 2.
� �2 � �2
x y
(b) + = 1, an ellipse.
2 3
� �
1 0
7. (a) , reflects in the x-axis.
0 −1
� �
−1 0
(b) , reflects in the y-axis.
0 1
0 0 1 1
(c) 0 −1 0 , rotates by 180◦ around the line defined by the vector 0 .
1 0 0 1
0 1 0 0
0 0 2 1
13.
1 0 1 0
0 0 2 0
SECTION 19
� � � � � �
1 1 −2
1. (a) ; ; ; R2 → R2 ; rank = 1
2 2 1
� � � � � � � � � �
1 0 1 2 0
(b) , ; , ; ; R2 → R2 ; rank = 2
0 1 2 3 0
1 0 2 4 0
(c) 0 , 2 ; 0 , 4 ; 1 ; R3 → R3 ; rank = 2
0 1 2 8 −2
3 2 −1
1 0 0 0
6 3 5 3
(d) 0 , 1 , 0 ; , , ; 0 ; R → R4 ; rank = 3
−3 −1 8
0 0 1 0
0 −1 7
1 0 −4 −1 −2
2 0 1 −1 0 0 1
(e) 0 , 1 ; 2 , −1 ; −3 , −5 , 0 ; R5 → R3 ; rank = 2
1 5 3 −3 0 1 0
4 3 1 0 0
174 Answers to Exercises
1 0 0 16 6
2 8 0
0 1 0 −4 −2
2 7 1
(f) −6 , 2 , 0 ; , , ; 0 , 1 ; R5 → R4 ;
−2 −6 −1
0 0 1 −2 0
0 2 −2
−16 4 2 1 0
rank = 3
0
2. (a) Yes, 0 . (b) Yes, yes. (c) No. (d) Yes.
0
−9 −2
3. (a) No. (b) No. (c) 0 , 1 . (d) No. (e) Yes.
1 0
5. (a) Since row(A) ⊥ null(A).
(b) Since dim(row(A)) + dim(null(A)) = 3 and dim(col(A)) = dim(row(A)).
(c) Since dim(col(A)) = dim(row(A)).
6. None or infinitely many.
SECTION�20�
4
1. (a)
1
5
3
(b) −1
2
1
4
2. (a) y = 43 x
(b) y = −.2 + 1.1x
(c) z = 2 + 2x + 3y
(d) z = 10 + 8x2 − y 2
(e) y = − 32 − 32 t + 72 t2
(f) y = 2 + 3 cos t + sin t
5 0 3 Cu 413.91 Cu 63.543
3. 0 15 21 F e = 1511.13 The solution is F e = 55.851 !
3 21 32 S 2389.58 S 32.065
� �
.1 .3
4. (a)
.3 .9
4 2 4
9 9 9
2 1 2
(b) 9 9 9
4 2 4
9 9 9
1 0 0
0 1 1
(c) 2 2
0 1
2
1
2
Answers to Exercises 175
1
2 0 1
2
0 1 0
(d)
1
2 0 1
2
1
3
1
3 0 1
3
1 1
0 1
3 3 3
(e)
0 0 1 0
1
3
1
3 0 1
3
1
5
5. 2
5
2
1 00
6. 0 01
0 10
� �
.2 .4
7. (a)
.4 .8
5 1 1
6 6 3
1 5
− 13
(b) 6 6
1
3 − 13 1
3
SECTION 21
� 5 � � �
13 − 12
13
1. (a) 12 , 5
13 13
6
− 37 − 27 7
6 −3 2
(b) 7 ,
7 7,
2 6 3
7 7 7
1 1
2 2 − 12 1
2
−1 −1 −1 1
2 2 2 2
(c) 1 , 1 , 1 , 1
−2 2 2 2
− 12 1
2 − 12 − 12
− 23 1
3
11 2
(d) 15 , 15
2 14
15 15
176 Answers to Exercises
1 1 1
2 2 2
−1 −1 1
2 2 2
(e) 1 , 1 , 1
−2 2 −2
− 12 1
2
1
2
� 5
�� �
13 − 12
13 13 −26
2. (a) 12
13
5
13 0 13
− 37 − 27 6
7 7 −7 7
6
− 37 2
(b) 7 7 0 7 7
2
7
6
7
3
7
0 0 7
1 1 1
− 12
2 2 2 2 0 2 1
−1 − 12 −2 1 1
2 2 0 2 0 0
(c) 1 1
2 0 0 2 0
1 1
−2 2 2
− 12 1
− 12 − 12 0 0 0 1
2
2 1
−3 3 � �
11 2 15 −15
(d) 15 15
2 14
0 30
15 15
1 1 1
2 2 2
−1 1 1 2 3 1
2 −2 2
(e) 1 1 1 0 1 2
−2 − 2 0 0
2 3
− 12 1
2
1
2
7
3. 1
7
1
9 − 19
4 4
4. 9 or − 9
− 89 8
9
� �
4
5.
1
1
4 − 14 − 14 1
4
−1 1 1
− 14
4 4 4
6. 1 1 3 1
−4 4 4 4
1
4 − 14 1
4
3
4
Answers to Exercises 177
SECTION 22
� �� �� �−1
4 −1 2 0 4 −1
3. (a)
1 1 0 −3 1 1
� �� �� �−1
−2 1 3 0 −2 1
(b)
1 2 0 −2 1 2
−1
1 1 0 2 0 0 1 1 0
(c) 0 1 0 0 3 0 0 1 0
0 0 1 0 0 3 0 0 1
� √2 � � � − √2 �T
− 5 √15 � √1
3 0 5 5
4. (a)
√1 √2 0 −2 √1 √2
5 5 5 5
− √2 T
− √25 √1 0 √1 0
5 4 0 0 5 5
√1
(b) √1
5
√2
5
0 0 −1 0 5 √2
5
0
0 0 1 0 0 1 0 0 1
0 T
0 √2 √1 √2 √1
5 5 5 0 0 5 5
−1 0 0 −1 0 0
(c) 0 5 0
0 − √15 √2 0 0 0 0 − √15 √2
5 5
√1 √1
√1 T
− √13 √1 − √13
2 0 0 2
2 6 6
(d) 0 √2 √1 0 2 0 0 √2 √1
6 3 6 3
√1 − 16
√ √1 0 0 −4 √1 − 16
√ √1
2 3 2 3
� 3
√ 1
√
�� �� 3
√ √1 �T � �
2 5 2 5 1 0 2 5 2 5 .9 .3
5. (a) =
1
√ −3
√ 0 0 1
√ −3
√ .3 .1
2 5 2 5 2 5 2 5
� 3
√ √1 �� �� 3
√ 1
√
�T � �
2 5 2 5 1 0 2 5 2 5 .8 .6
(b) =
1
√ −3
√ 0 −1 1
√ −3
√ .6 −.8
2 5 2 5 2 5 2 5
√1 √1 √1
√1
√1 √1
T
1 0 0
2 6 3 2 6 3
√1 √1 √1
√1 √1 √1
6. (a) − 2 0 1 0 − 2
6 3 6 3
0 − √26 √1 0 0 −1 0 − √26 √1
3 3
√ T
− √16 √1
2
√1
3
1 3
0 − √16 √1
2
√1
3
2 2
√1 √ √1
(b) − 6 − √12 √1
− 3 1
0 − 6 − √12 √1
3 2 2 3
√2
6
0 √1
3
0 0 1 √2
6
0 √1
3
178 Answers to Exercises
T
0 −1 0 0 0 1 0 0 0 −1 0 0
0 0 0 1 −1 0 0 0 0 0 0 1
(c)
0 0 1 0 0 0 0 1 0 0 1 0
1 0 0 0 0 0 −1 0 1 0 0 0
1
− √12 √1 0 − √1 √1 0
T
2 − √12 − 12
2 1 0 0 2 2
0 0 1 0 1
7. 0 0 −1 0 =
√1
2
0 √1
2
√1 √1 0 0 1 0 √1 √1 0 − 12 − √12 1
2 2 2 2 2