Sie sind auf Seite 1von 277

A Second Course

in Linear Algebra
A Second Course
in Linear Algebra

WILLIAM C. BROWN
Michigan State University
East Lansing, Michigan

A Wiley-lnterscience Publication
JOHN WILEY & SONS
New York Chichester Brisbane • Toronto • Singapore
Copyright © 1988 by John Wiley & Sons, Inc.
All rights reserved. Published simultaneously in Canada.
Reproduction or translation of any part of this work
beyond that permitted by Section 107 or 108 of the
1976 United States Copyright Act without the permission
of the copyright owner is unlawful. Requests for
permission or further information should be addressed to
the Permissions Department, John Wiley & Sons, Inc.

Library of Congress Cataloging-in-Publication Data


Brown, William C. (William Clough), 1943—
A second course in linear algebra.
"A Wiley-Interscience publication."
Bibliography: p.
Includes index.
1. Algebras, Linear. I. Title.
QA184.B765 1987 517.5 87—23117
ISBN 0-471-62602-3

Printed in the United States of America


10 9 8 7 6 5 4 3 2 1
To Linda
Preface

For the past two years, I have been teaching a first-year graduate-level course in
linear algebra and analysis. My basic aim in this course has been to prepare
students for graduate-level work. This book consists mainly of the linear algebra
in my lectures. The topics presented here are those that I feel are most important
for students intending to do advanced work in such areas as algebra, analysis,
topology, and applied mathematics.
Normally, a student interested in mathematics, engineering, or the physical
sciences will take a one-term course in linear algebra, usually at the junior level.
In such a course, a student will first be exposed to the theory of matrices, vector
spaces, determinants, and linear transformations. Often, this is the first place
where a student is required to do a mathematical proof. It has been my
experience that students who have had only one such linear algebra course in
their undergraduate training are ill prepared to do advanced-level work. I have
written this book specifically for those students who will need more linear
algebra than is normally covered in a one-term junior-level course.
This text is aimed at seniors and beginning graduate students who have had
at least one course in linear algebra. The text has been designed for a one-
quarter or semester course at the senior or first-year graduate level. It is assumed
that the reader is familiar with such animals as functions, matrices, determi-
nants, and elementary set theory. The presentation of the material in this text is
deliberately formal, consisting mainly of theorems and proofs, very much in the
spirit of a graduate-level course.
The reader will note that many familiar ideas are discussed in Chapter I.
I urge the reader not to skip this chapter. The topics are familiar, but my
approach, as well as the notation I use, is more sophisticated than a junior-level
VII
Viii PREFACE

treatment. The material discussed in Chapters 11—V is usually only touched


upon (if at all) in a one-term course. I urge the reader to study these chapters
carefully.
Having written five chapters for this book, I obviously feel that the reader
should study all five parts of the text However, time considerations often
demand that a student or instructor do less. A shorter but adequate course could
consist of Chapter I, Sections 1—6, Chapter II, Sections 1 and 2, and Chapters III
and V. If the reader is willing to accept a few facts about extending scalars, then
Chapters III, IV, and V can be read with no reference to Chapter II. Hence, a
still shorter course could consist of Chapter I, Sections 1—6 and Chapters III
and V.
It is my firm belief that any second course in linear algebra ought to contain
material on tensor products and their functorial properties. For this reason, I
urge the reader to follow the first version of a short course if time does not
permit a complete reading of the text. It is also my firm belief that the basic
linear algebra needed to understand normed linear vector spaces and real inner
product spaces should not be divorced from the intrinsic topology and analysis
involved. I have therefore presented the material in Chapter IV and the first half
of Chapter V in the same spirit as many analysis texts on the subject. My
original lecture notes on normed linear vector spaces and (real) inner product
spaces were based on Loomis and Sternberg's classic text Advanced Calculus.
Although I have made many changes in my notes for this book, I would still like
to take this opportunity to acknowledge my debt to these authors and their fine
text for my current presentation of this material.
One final word about notation is in order here. All important definitions are
clearly displayed in the text with a number. Notation for specific ideas (e.g.. F%i
for the set of natural numbers) is introduced in the main body of the text as
needed. Once a particular notation is introduced, it will be used (with only a few
exceptions) with the same meaning throughout the rest of the text. A glossary of
notation has been provided at the back of the book for the reader's convenience.

WILLIAM C. BROWN
East Lansing, Michigan
Septenther 1987
Contents

Chapter I. Linear Algebra


1. Definitions and Examples of Vector Spaces 1

2. Bases and Dimension 8


3. Linear Transformations 17
4. Products and Direct Sums 30
5. Quotient Spaces and the Isomorphism Theorems 38
6. Duals and Adjoints 46
7. Symmetric Bilinear Forms 53

Chapter II. Multilinear Algebra 59


1. Multilinear Maps and Tensor Products 59
2. Functorial Properties of Tensor Products 68
3. Alternating Maps and Exterior Powers 83
4. Symmetric Maps and Symmetric Powers 94

Chapter III. Canonical Forms of Matrices 98


1. Preliminaries on Fields 98
2. Minimal and Characteristic Polynomials 105
3. Eigenvalues and Eigenvectors 117
4. The Jordan Canonical Form 132
5. The Real Jordan Canonical Form 141
6. The Rational Canonical Form 159
X CONTENTS

Chapter IV. Normed Linear Vector Spaces


1. Basic Definitions and Examples 171
2. Product Norms and Equivalence 180
3. Sequential Compactness and the Equivalence of
Norms 186
4. Banach Spaces 200

Chapter V. Inner Product Spaces 206


1. Real Inner Product Spaces 206
2. Self-adjoint Transformations 221
3. Complex Inner Product Spaces 236
4. Normal Operators 243

Glossary of Notation 254

References 259

Subject Index 261


A Second Course
in Linear Algebra
Chapter I

Linear Algebra

1. DEFINITIONS AND EXAMPLES OF VECTOR SPACES

In this book, the symbol F will denote an arbitrary field. A field is defined as
follows:

Definition 1.1: A nonempty set F together with two functions (x, y) —* x + y and
(x, y) —* xy from F x F to F is called a field if the following nine axioms are
satisfied:

Fl. x+y=y+xforallx,yeF.
F2. x+(y+z)=(x+y)+zforallx,y,zeF.
Fl There exists a unique element 0€ F such that x + 0 = x for all x eF.
F4. For every x e F, there exists a unique element — x e F such that
x+(—x)=O.
F5. xy = yx for all x, yeF.
F6. x(yz) = (xy)z for all x, y, zeF.
F7. There exists a unique element 1 # 0 in F such that xl = x for all xc F.
F8. For every x 0 in F, there exists a unique ye F such that xy = 1.
F9. x(y + z) = xy + xz for all x, y, zeF.

Strictly speaking a field is an ordered triple (F, (x, y) —+ x + y, (x, y) —* xy)


satisfying axioms F1—F9 above. The map from F x F —* F given by
(x, y) x + y is called addition, and the map (x, y) —* xy is called multiplication.
When referring to some field (F, (x, y) —÷ x + y, (x, y) —+ xy), references to addition
and multiplication are dropped from the notation, and the letter F is used to
I
2 LINEAR ALGEBRA

denote both the set and the two maps satisfying axioms Fl —F9. Although this
procedure is somewhat ambiguous, it causes no confusion in concrete situations.
In our first example below, we introduce some notation that we shall use
throughout the rest of this book.

Example 1.2: We shall let Q denote the set of rational numbers, R, the set of real
numbers, and C, the set of complex numbers. With the usual addition and
multiplication, Q, R, and C are all fields with Q C. J
The fields in Example 1.1 are all infinite in the sense that the cardinal number
attached to the underlying set in question is infinite. Finite fields are very
important in linear algebra as well. Much of coding theory is done over finite
algebraic extensions of the field described in Example 1.3 below.

Example 1.3: Let / denote the set of integers with the usual addition x + y and
multiplication xy inherited from Q. Let p be a positive prime in / and set
= {O, 1,..., p — l}. becomes a (finite) field if we define addition $ and
multiplication modulo p. Thus, for elements x, y e there exist unique
integers k, z €/ such that x + y = kp + z with z We define x $ y to be z.
Similarly, xy = w w c p.
The reader can easily check that (Fr, ®, satisfies axioms Fl —F9. Thus, F1, is
a finite field of cardinality p. J
Except for some results in Section 7, the definitions and theorems in Chapter
I are completely independent of the field F. Hence, we shall assume that F is an
arbitrary field and study vector spaces over F.

Definition 14: A vector space V over F is a nonempty set together with two
cx + /1 from V x V to V (called addition) and (x, cx) —÷ xcx from
functions, (cx, /1) —*
F x V to V (called scalar multiplication), which satisfy the following axioms:
Vi. cx + ji = /1 + cx for all cx, fleV.
V2. cx+(fl+y)=(cx+fl)+yforallcx,fl,yeV.
V3. There exists an element 0eV such that 0 + cx = cx for all cx e V.
V4. For every cx e V, there exists a fle V such that cx + ji = 0.
V5. (xy)cx = x(ycx) for all x, yeF, and cxeV.
V6. x(cx + fi) = xcx + xfl for all xeF, and cx, 13eV.
V7. (x+y)cx=xcx+ycxforallx,yeF,andcxeV.
V8. lcx = cx for all cxeV.

As with fields, we should make the comment that a vector space over F is
—p cx + /3, (x, cx) —' xcx) consisting of a nonempty set V
really a triple (V, (cx, 13)
together with two functions from V x V to V and F x V to V satisfying axioms
Vi —V8. There may be many different ways to endow a given set V with the
DEFINITIONS AND EXAMPLES OF VECTOR SPACES

structure of a vector space over F. Nevertheless, we shall drop any reference to


addition and scalar multiplication when no confusion can arise and just use the
notation V to indicate a given vector space over F.
If V is a vector space over F, then the elements of V will be called vectors and
the elements of F scalars. We assume the reader is familiar with the elementary
arithmetic in V, and, thus, we shall use freely such expressions as — cx, cx — /1, and
cx, + + cx,, when dealing with vectors in V. Let us review some well-known
examples of vector spaces.

Example 1.5: Let IkJ = { 1, 2, 3,. . .} denote the set of natural numbers. For each
n e NJ, we have the vector space F" = {(x,,. , x,,) ; e F} consisting of all n-
. .

tuples of elements from F. Vector addition and scalar multiplication are defined
componentwise by (x,,. . , x,,) + (y,,.. . , y,,) = (x, + y,,.. . , x,, +
. and
x(x,,..., = (xx,,..., xx,,). In particular, when n = 1, we see F itself is a
vector space over F. fl

If A and B are two sets, let us denote the set of functions from A to B by BA.

Thus, BA = {f: A —÷ B f is a function}. In Example 1.5, F" can be viewed as the


set of functions from {1, 2,..., n} to F. Thus, cx = (x,,..., xjeF" is identified
with the function e Ft' given by = x1 for i = 1,.. . , n. These remarks
suggest the following generalization of Example 1.5.

Example 1.6: Let V be a vector space over F and A an arbitrary set. Then the set
V" consisting of all functions from A to V becomes a vector space over F when
we define addition and scalar multiplication pointwise. Thus, if f,g f + g is
the function from A to V defined by (f + g)(a) = f(a) + g(a) for all a cA. For xc F
and fe VA, xf is defined by (xf)(a) = x(f(a)). fl

If A is a finite set of cardinality n in Example 1.6, then we shall shorten our


notation for the vector space VA and simply write V". In particular, if V = F,
then = F" and we recover the example in 1.5.

Example 1.7: We shall denote the set of m x n matrices with coefficients


e F by Mm x The usual addition of matrices + = + and
scalar multiplication = make Mm JF) a vector space over F. fl

Note that our choice of notation implies that F" and M, ,,(F) are the same
vector space. Although we now have two different notations for the same vector
space, this redundancy is useful and will cause no confusion in the sequel.

Example 1.8: We shall let F[X] denote the set of all polynomials in an
indeterminate X over F. Thus, a typical element in F[X] is a finite sum of the
form + a,,_,X"' -i-a0. Here neNJ u{O}, and a0,...,a,,eF. The
usual notions of adding two polynomials and multiplying a polynomial by a
4 LINEAR ALGEBRA

constant, which the reader is familiar with from the elementary calculus, make
sense over any field F. These operations give F[X] the structure of a vector
space over F. LI

Many interesting examples of vector spaces come from analysis. Here are
some typical examples.

Example 1.9: Let I be an interval (closed, open, or half open) in IL We shall let
C(I) denote the set of all continuous, real valued functions on I. If k e T%J, we shall
let Ck(I) denote those fe C(I) that are k-times differentiable on the interior of I.
Then C(I) 2 C1(I) 2 C2(I) These sets are all vector spaces over R when
endowed with the usual pointwise addition (f + g)(x) = f(x) + g(x), x €1, and
scalar multiplication (yf)(x) = y(f(x)). J
Example 1.10: Let A = [a1, b1] x x [an, R" be a closed rectangle. We
shall let GR(A) denote the set of all real valued functions on A that are Riemann
integrable. Clearly Gt(A) is a vector space over when addition and scalar
multiplication are defined as in Example 1.9. LI

We conclude our list of examples with a vector space, which we shall study
carefully in Chapter III.

Example 1.11: Consider the following system of linear differential equations:

Here f1,..., ç e C1(I), where I is some open interval in IL f( denotes the


derivative off1, and the are scalars in P. Set A = e < A is called the
matrix of the system. If B is any matrix, we shall let Bt denote the transpose of B.
Set f = (f1,. .., fJ. We may think off as a function from {1,..., n} to C1(I), that
is, ft C1(I)". With this notation, our system of differential equations becomes
f' = Af. The set of solutions to our system is V = {fe If' = Af}. Clearly, V
is a vector space over P if we define addition and scalar multiplication
componentwise as in Example 1.9. LI

Now suppose V is a vector space over F. One rich source of vector spaces
associated with V is the set of subspaces of V. Recall the following definition:

Definition 1.12: A nonempty subset W of V is a subspace of V if W is a vector


space under the same vector addition and scalar multiplication as for V.

Thus, a subset W of V is a subspace if W is closed under the operations of \C


For example, C([a, b]), b]), P[X], and af([a, b]) are all subspaces of
DEFINITIONS AND EXAMPLES OF VECTOR SPACES

If we have a collection 9' = {W1 Ii e A} of subspaces of V, then there are some


obvious ways of forming new subspaces from 9'. We gather these constructions
together in the following example:

Example 1.13: Let 9' = {W1 lie A} be an indexed collection of subspaces of V.


In what follows, the indexing set A of 9' can be finite or infinite. Certainly the
intersection, W1, of the subspaces in 9' is a subspace of V. The set of all
finite sums of vectors from W1 is also a subspace of V. We shall denote this
subspace by W1. Thus, >1eA W1 = {L€A I
e W1 for all i e A}. Here and
throughout the rest of this book, if A is infinite, then the notation x1 means
that all X1 are zero except possibly for finitely many i cA. If A is finite, then
without any loss of generality, we can assume A = { 1, . . , n} for some n e N. (If
.

A = 4., then W1 = (0).) We shall then write >1eA W1 = W1 + +


If 9' has the property that for every i, j cA there exists a k e A such that
W1 u Wk, then clearly W1 is a subspace of V. J

In general, the union of two subspaces of V is not a subspace of V. In fact, if


W1 and W2 are subspaces of V, then W1 u W2 is a subspace if and only if
W2 or W2 c W1. This fact is easy to prove and is left as an exercise. In our
first theorem, we discuss one more important fact about unions.

Theorem 1.14: Let V be a vector space over an infinite field F. Then V cannot be
the union of a finite number of proper subspaces.

Proof Suppose W1,.. ., are proper subspaces of V such that


V = W1 u We shall show that this equation is impossible. We remind
the reader that a subspace W of V is proper if W V. Thus, V — W 4' for a
proper subspace W of V.
We may assume without loss of generality that W1 u Let
creW1 — Let /3eV — W1. Since F is infinite, and neither cc nor /3 is
zero, A = {cc + xfl xc F} is an infinite subset of V. Since there are only finitely
I

many subspaces W1, there exists a j e {1,..., n} such that A n is infinite.


Suppose j e {2,..., n}. Then there exist two nonzero scalars x, x' c F
such that x x', and cc + xji, cc + x'fl e Since is a subspace,
(x' — x)cc = x'(cc + xfi) — x(cr + Since x' — x 0, we conclude
But this is contrary to our choice of u u Thus, j = 1.
Now if j = 1, then again there exist two nonzero scalars x, x' e F such that
x#x', and cc-i-xfl, cc+x'fleW1. Then (x—x')fl=(cc+xfl)--(cc-i-x'/J)eW1.
Since x — x' 0, fleW1. This is impossible since /3 was chosen in V — W1. We
conclude that V cannot be equal to the union of W1,. .., W,,. This completes the
proof of Theorem 1.14. J
1fF is finite, then Theorem 1.14 is false in general. For example, let V = (F2)2.
Then V = W1 u W2 u W3, where W1 = {(0, 0), (1, 1)}, W2 = {(0, 0), (0, 1)}, and
= {(0, 0), (1, 0)}.
6 LINEAR ALGEBRA

Any subset S of a vector space V determines a subspace US) = n{W W a


subspace of V, W S}. We shall call L(S) the linear span of S. Clearly, L(S) is the
smallest subspace of V containing S. Thus, in Example 1.13, for instance,
L(UIEAWI) =
Let denote the set of all subsets of V. If Y(V) denotes the set of all
subspaces of V, then £P(V) and we have a natural function
L: which sends a subset Se
-÷ £s'(V), to its linear span L(S)e £s'(V).
Clearly, L is a surjective map whose restriction to 9'(V) is the identity. We
conclude this section with a list of the more important properties of the function

Theorem 1.15: The function L: —* .9'(V) satisfies the following poperties:

(a) For Sc gJ(V), L(S) is the subspace of V consisting of all finite linear
combinations of vectors from S. Thus,

L(S) cx1eS, n o}
= {S x1cxJx1eF,
(b) If then L(S1) c L(S2).
(c) If cxc L(S), then there exists a finite subset 5' c S such that cx eL(S').
(d) S L(S) for all Sc
(e) For every SecY(V), L(L(S)) = L(S).
(f) If fleL(Su{cx}) and then cxeL(Su{j3}). Here cx, /1eV and
Sc

Proof Properties (a)—(e) follow directly from the definition of the linear span.
We prove (f). If /1€ L(S u {cx}) — L(S), then /3 is a finite linear combination of
vectors from S u {cx}. Furthermore, cx must occur with a nonzero coefficient
in any such linear combination. Otherwise, /3 e L(S). Thus, there exist
vectors and nonzero scalars such that
/1 = x1cx1 + + + Since 0, we can write cx as a linear
combination of /3 and Namely,
LI

EXERCISES FOR SECTION 1

(1) Complete the details in Example 1.3 and argue ®,) is a field.
g e R[X] and g
(2) Let 0k(X) = {f(x)/g(x) If, 0} denote the set of rational
functions on R. Show that R(X) is a field under the usual definition of
addition fig + h/k = (kf + gh)/gk and multiplication (f/g)(h/k) = th/gk.
R(X) is called the field of rational functions over It Does F(X) make sense
for any field F?
EXERCISES FOR SECTION 1 7

(3) Set F = {a + beO}. Show that F is a subfield of C, that is,


F is a field under complex addition and multiplication. Show that
{a + integers} is not a subfield of C.
(4) Let I be an open interval in R. Let a e I. Let Va = {fe R' If has a derivative
at a}. Show that Va is a subspace of R'.
(5) The vector space RN is just the set of all sequences {a1} = (a1, a2, a3,...)
with a1 e R. What are vector addition and scalar multiplication here?
(6) Show that the following sets are subspaces of RN:
(a) W1 =
{{a1} e RN I {a1} is a bounded sequence}.
(c)
(7) Let (a1,..., — (0). Show that {(x1,..., = 0} is a
proper subspace of P.
(8) Identify all subspaces of R2. Find two subspaces W1 and W2 of R2 such
that W1 u W2 is not a subspace.
(9) Let V be a vector space over F. Suppose W1 and W2 are subspaces of V.
Show that W1 u W2 is a subspace of V if and only if W1 c or
c W1.
(10) Consider the following subsets of R[X]:
(a) W1 ={feR[X]If(0)=0}.
(b) W2 = {feR[X] 12f(0) =
(c) W3 = {fe R[X] the degree of f
I
n}.
(d) W4={feR[X]If(t)=f(i —t)for all teR}.
In which of these cases is W1 a subspace of R[X]?

(11) Let K, L, and M be subspaces of a vector space V. Suppose K L. Prove


Dedekind's modular law: K n (L + M) = L + (K n M).

(12) Let V = R3. Show that = (1, 0, 0) is not in the linear span of cx, ji, and y
where cx = (1,1,1), /3 = (0,1,—i), andy = (1,0,2).
(13) If and are subsets of a vector space V, show that
L(51 u S2) = L(S1) + L(S2).
(14) Let S be any subset of R[X] c RR. Show that ex L(S).

(15) Let cx1 = (a11' a12)eF2 for i = 1,2. Show that F2 = L({cx1, cx2}) if and only if
the determinant of the 2 x 2 matrix M = is nonzero. Generalize this
result to P.
(16) Generalize Example 1.8 to n + 1 variables X0,. . . , Xi,. The resulting vector
space over F is called the ring of polynomials in n + 1 variables (over F). It
is denoted F[X0,..., Show that this vector space is spanned by all
monomials as (m0,...,
8 LINEAR ALGEBRA

(17) A polynomial ft F[X0,.. ., Xj is said to be homogeneous of degree d if f is a


finite linear combination of monomials of degree d (i.e.,
m0 + + = d). Show that the set of homogeneous polynomials of
degree d is a subspace of F[X0,.. ., Xj. Show that any polynomial f can be
written uniquely as a finite sum of homogeneous polynomials.
(18) Let V = = At}. Show that Visa subspace of V
is the subspace of symmetric matrices of ,

(19) Let W = {A e At = — A}. Show that W is a subspace of


I
JF).
W is the subspace of all skew-symmetric matrices in
(20)
Show that A = B or A n B = 0.

2. BASES AND DIMENSION

Before proceeding with the main results of this section, let us recall a few facts
from set theory. If A is any set, we shall denote the cardinality of A by Al. Thus,
A is a finite set if and only if Al c oo. If A is not finite, we shall write Al = CX).
The only fact from cardinal arithmetic that we shall need in this section is the
following:

2.1: Let A and B be sets, and suppose IAI = cc. If for each x cA, we have some
finite set

A proof of 2.1 can be found in any standard text in set theory (e.g., [1]), and,
consequently, we omit it.
A relation R on a set A is any subset of the crossed product A x A. Suppose R
is a relation on a set A. If x, yeA and (x, y)e R, then we shall say x relates to y and
write x y. Thus, x y (x, y) e R. We shall use the notation (A, to indicate
the composite notion of a set A and a relation R c A x A. This notation is a bit
ambiguous since the symbol has no reference to R in it. However, the use of
will always be clear from the context. In fact, the only relation R we shall
systematically exploit in this section is the inclusion relation c among subsets of
[V some vector space over a field F].
A set A is said to be partially ordered if A has a relation R c A x A such that
(1) x x for all xeA, (2) if x y, and y x, then x = y, and (3) if x y, and
y z, then x z. A typical example of a partially ordered set is together
with the relation A B if and only if A B. If (A, is a partially ordered set,
and A1 A, then we say A1 is totally ordered if for any two elements x, ye A1,
we have at least one of the relations x y or y x. If (A, is a partially
ordered set, and A1 c A, then an element x e A is called an upper bound for A1 if
y x for all ye A1. Finally, an element xc (A, is a maximal element of A if
x y implies x = y.
BASES AND DIMENSION 9

We say a partially ordered set (A, is inductive if every totally ordered


subset of A has an upper bound in A. The crucial point about inductive sets is
the following result, which is called Zorn's lemma:

2.2: If a partially ordered set (A, is inductive, then a maximal element of A


exists.

We shall not give a proof of Zorn's lemma here. The interested reader may
consult [3, p. 33] for more details.
Now suppose V is an arbitrary vector space over a field F. Let S be a subset of

Definition 2.3: S is linearly dependent over F if there exists a finite subset


and nonzero scalars such that
+ = 0. S is linearly independent (over F) if S is not linearly
dependent.

Thus, if S is linearly independent, then whenever = 0 with


and then Note that our
definition implies the empty set 0 is linearly independent over F. When
considering questions of dependence, we shall drop the words "over F"
whenever F is clear from the context. It should be obvious, however, that if more
than one field is involved, a given set S could be dependent over one field and
independent over another. The following example makes this clear.

Example 2.4: Suppose V = R, the field of real numbers. Let F1 = Q, and


F2 = R. Then V is a vector space over both F1 and F2. Let S = = 1,
= S is a set of two vectors in V. Using the fact that every integer factors
uniquely into a product of primes, one sees easily that S is independent over F1.
But, clearly S is dependent over F2 since + (— 1)x2 = 0. fl

Definition 2.5: A subset S of V is called a basis of V if S is linearly independent


over F and L(S) = V.

If S is a basis of a vector space V, then every nonzero vector e V can be


written uniquely in the form = + + where {x1,. , S and
. .

x1,. . are nonzero scalars in F. Every vector space has a basis. In fact, any
. ,

given linearly independent subset S of V can be expanded to a basis.

Theorem 2.6: Let V be a vector space over F, and suppose S is a linearly


independent subset of V. Then there exists a basis B of V such that B S.

Proof. Let denote the set of all independent subsets of V that contain S. Thus,
9' = {A e £P(V)I A S and A is linearly independent over F}. We note that
9' # 0 since Se 9'. We partially order 9' by inclusion. Thus, for A1, A2 e 9',
10 LINEAR ALGEBRA

A1 if and only if A1 A2. The fact that ($0, is a partially ordered set is
clear.
Suppose 5 = lie A} is an indexed collection of elements from 6° that
form a totally ordered subset of 9'. We show Y has an upper bound. Set
A= Clearly, AeL?(V), S A, and A A fails to be
linearly independent, then there exists a finite subset oçj A and
nonzero scalars x1,..., e F such that + + = 0. Since 5 is totally
ordered, there exists an index i0 e A such that {x1,..., But then is
dependent, which is impossible since A
9'. A in 9'.
5 was arbitrary, we can now conclude that (9', c) is an inductive set.
Applying 2.2, we see that 9' has a maximal element B. Since Be 9', B S and B
is linearly independent. We claim that B is in fact a basis of V. To prove this
assertion, we need only argue = V. Suppose L(B) V. Then there exists a
vector eV — L(B). Since L(B), the set B u {oc} is clearly linearly independ-
ent. But then B u {x} 6°, and B u is strictly larger than B. This is contrary
to the maximality of B in 9'. Thus, L(B) = V, and B is a basis of V containing
S. El

Let us look at a few concrete examples of bases before continuing.

Example 2.7: The empty set 0 is a basis for the zero subspace (0) of any vector
space V. If we regard a field F as a vector space over itself, then any nonzero
element x of F forms a basis of F. El

Example 2.8: Suppose V = F, For each i = 1,... ,n, let =


(0,..., 1,. .., 0). Thus, is the n-tuple whose entries are all zero except for a 1
in the ith position. Set ö = Since = x1ö1 + +
we see b is a basis of We shall call ö the canonical (standard) basis of

Example 2.9: Let V = Mmxn(F). For any i = 1,...,m, andj = 1,...,n, let
denote the m x n matrix whose entries are all zero except for a 1 in the (i j)th
position. Since we see B = 1 C i C m, 1 (j (n} is a basis
I

for V. The elements in B are called the matrix units of Mm x fl


Example 2.10: Let V = F[X]. Let B denote the set of all monic monomials in X.
Thus, B = {1 = X°, X, X2,. . .}. Clearly, B is a basis of F[X]. fl

A specific basis for the vector space Ck(I) in Example 1.9 is hard to write
down. However, since R[X] Ck(I), Theorem 2.6 guarantees that one basis of
contains the monomials 1, X, X2
Theorem 2.6 says that any linearly independent subset of V can be expanded
to a basis of V. There is a companion result, which we shall need in Section 3.
Namely, if some subset S of V spans V, then S contains a basis of V.
BASES AND DIMENSION 11

Theorem 2.11: Let V be a vector space over F, and suppose V = L(S). Then S
contains a basis of V.

Proof IfS = 4) or {O}, then V = (0). In this case, 4) is a basis of V contained in S.


So, we can sume S contains a nonzero vector x. Let 9' = {A S I A linearly
independent over F}. Clearly, {oc} e Partially order b° by inclusion. If
= ieA} is a totally ordered subset of b°, then UiEAAi is an upper bound
I

for .f in 9'. Thus, (9°, c) is inductive. Applying 2.2, we see that 9' has a
maximal element B.
We claim B is a basis for V. Since BE 9', B c S and B is linearly independent
over F. If L(B) = V, then B is a basis of V, and the proof is complete. Suppose
L(B) V. Then S L(B), for otherwise V = L(S) L(L(B)) = L(B). Hence there
exists a vector /3 e S — L(B). Clearly, B u {fl} is linearly independent over F.
Thus, B u {fl} e b°. But /3 L(B) implies /3 B. Hence, B u {/J} is strictly larger
than B in Since B is maximal, this is a contradiction. Therefore, L(B) = V and
our proof is complete. fl

A given vector space V has many different bases. For example,


= —1
in F. What all bases of V have in common is their cardinality. We prove this fact
in our next theorem.

Theorem 2.12: Let V be a vector space over F, and suppose B1 and B2 are two
bases of V. Then IB1I = 1B21.

Proof? We divide this proof into two cases.

CASE 1: Suppose V has a basis B that is finite.


In this case, we shall argue 1B11 = IBI = 1B21. Suppose B = It
clearly suffices to show 1B11 = n. We suppose 1B11 n and derive a contra-
diction. There are two possibilities to consider here. Either IB1I = m <n or
B11 > n. Let us first suppose B1 = fim} with m <n. Since fl1eL(B),
= x1x1 + + At least one x1 here is nonzero since /11 0. Relabel-
ing the if need be, we can assume x1 # 0. Since B is linearly independent
over F, we conclude that e ocj u {x1}) — L({r2,. . ,oç}). It now
.

follows from Theorem 1.15(f) that e L({fl1, Since fr2'...'


is linearly independent over F, and we see that
{
is linearly independent over F. Since e L({
V = L({fl1, Thus, {fl1, is a basis of V.
Now we can repeat this argument m times. We get after possibly some
permutation of the that { fly,..., IL +1'•' is a basis of V. But
, flm} is already a basis of V. Thus, eL({f31,.. ., This
implies {fl1,..., Pm' .., xj is linearly dependent which is a contra-
diction. Thus, 1B11 cannot be less than n.
12 LINEAR ALGEBRA

Now suppose 1B11 > n (1B11 could be infinite here). By an argument similar
to that given above, we can exchange n vectors of B1 with . . , oç. Thus, we
.

construct a basis of V of the form B u 5, where S is some nonempty subset of


B1. But B is already a basis of V. Since S $ 0, B u S must then be linearly
dependent. This is impossible. Thus, if V has a basis consisting of n vectors,
then any basis of V has cardinality n, and the proof of the theorem is complete
in case 1.

CASE 2: Suppose no basis of V is finite.


In this case, both B1 and B2 are infinite sets. Let xe B1. Since B2 is a basis
of V, there exists a unique, finite subset c B2 such that e L(Aj and
L(A') for any proper subset A' of Aa. Thus, we have a well-defined function
p: B1 —* Y(B2) given by p(x) = Aa. Since B1 is infinite, we may apply 2.1 and
conclude that 1B11 Since for all reB1, V =
L(U2EB1 43. Thus UaEB1 4, is a subset of B2 that spans all of V. Since B2 is a
basis of V, we conclude U2EB! iSa = B2. In particular, 1B11 1B21. Reversing
the roles of B1 and B2 gives 1B21 1B11. This completes the proof of Theorem
2.12. fl
We shall call the common cardinality of any basis of V the dimension of V We
shall write dim V for the dimension of V. If we want to stress what field we are
over, then we shall use the notation dimF V for the dimension of the F-vector
space V. Thus, dim V = IBI, where B is any basis of V when the base field F is
understood.
Let us check the dimensions of some of our previous examples. In Example
2.4, dimF2 V = 1, and dimF, (V) = the cardinality of R. In Example 2.7,
dimF(0) = 0. In Example 2.8, dim P = n. In Example 2.9, dim Mm = mn.
In Example 2.10, dimV = the cardinality of NJ.
If the dimension of a vector space V is infinite, as in Examples 2.4 and 2.10, we
shall usually make no attempt to distinguish which cardinal number gives dim
V. Instead, we shall merely write dim V = x. If V has a finite basis oc,j,
we shall call V a finite-dimensional vector space and write dim V < cc, or,
more precisely, dim V = n c cc. Thus, for example, dimR = co, whereas
dimR R" = n < oo. In our next theorem, we gather together some of the more
elementary facts about dim V.

Theorem 2.13: Let V be a vector space over F.

(a) If W is a subspace of V, then dim W dim V.


(b) If V is finite dimensional and W is a subspace of V such that
dim W = dim V, then W = V.
(c) If W is a subspace of V, then there exists a subspace W' of V such that
W + W' = V and W n W' = (0).
(d) If V is finite dimensional and W1 and W2 are subspaces of V, then
dim(W1 + W2) + dim(W1 nW2) = dimW1 + dimW2.
BASES AND DIMENSION 13

Proof It follows from Theorem 2.6 that any basis of a subspace W of V can be
enlarged to a basis of V. This immediately proves (a) and (b). Suppose W is a
subspace of V. Let B be a basis of W. By Theorem 2.6, there exists a basis C of V
such that Let W'=L(C—B). Since C=Bu(C—B),
V = L(C) = L(B) + L(C — B) = W + W'. Since C is linearly independent and
B n (C — B) = 0, L(B) n L(C — B) = (0). Thus, W n W' = (0), and the proof of
(c) is complete.
To prove (d), let B0 = {x1,..., ocj be a basis of W1 n W2. If W1 n W2 = (0),
then we take B0 to be the empty set 0. We can enlarge B0 to a basis
Bi={txi,...,crn,Pi,...,/Jm} of W1. We can also enlarge B0 to a basis
B2 = {r1,..., oç, yr,..., of W2. Thus, dimW1 n W2 = n, dimW1 = n + m,
anddimW2 = n + p. WeclaimthatB =
basis of W1 + W2. Clearly L(B) = W1 + W2. We need only argue B is linearly
independent. Suppose + + ;y1 = 0 for some
e F. Then e W1 n W2 = L({x1,..., Thus, =
for some e F. Since B2 is a basis of W2, we conclude that z1 = =; = 0.
Since B1 is a basis of W1, x1 = = x,, = y1 = = 0. In particular, B is
linearly independent. Thus, dim(W1 + W2) = IBI = n + m + p, and the proof of
(d) follows. fl
A few comments about Theorem 2.13 are in order here. Part (d) is true
whether V is finite dimesional or not. The proof is the same as that given above
when dim(W1 + W2) c co. If dim(W1 + W2) = oo, then either W1 or W2 is an
infinite-dimensional subspace with the same dimension as W1 + W2. Thus, the
result is still true but rather uninteresting.
If V is not finite dimensional, then (b) is false in general. A simple example
illustrates this point.

Example 2.14: Let V = F[X], and let W be the subspace of V consisting of all
even polynomials. Thus, W = {> e F}. A basis of W is clearly all even
powers of X. Thus, dim V = dim W, but W V. fl

The subspace W' of V constructed in part (c) of Theorem 2.13 is called a


complement of W. Note that W' is not in general unique. For example, if V =
and W = L((1, 0)), then any subspace of the form L((a, b)) with b 0 is a
complement of W.
Finally, part (d) of Theorem 2.13 has a simple extension to finitely many
subspaces W1,..., Wk of V. We record this extension as a corollary.

Corollary 2.15: Let V be a finite-dimensional vector space of dimension n.


Suppose W1,. .., Wk are subspaces of V. For each i = 1,..., k, set
=n— Then
14 LINEAR ALGEBRA

(b)
(c) dim(Wln if and only if for all i= 1,...,k,
+ = V.

Proof Part (a) follows from Theorem 2.13 (d) by induction. Parts (b) and (c) are
easy consequences of(a). We leave the technical details for an exercise at the end
of this section. fl

Before closing this section, let us develop some useful notation concerning
bases. Suppose V is a finite-dimensional vector space over F. If = {x1,..., oç}
is a basis of V, then we have a natural function V -÷ 1(F) defined as
follows.

Definition 2.16: If = {x1,..., oç} is a basis of V, then =


(x1,..., if and only = /1.

Since is a basis of V, the representation of a given vector /1 as a linear


combination of is unique. Thus, Definition 2.16 is unambiguous. The
function V —÷ 1(F) is clearly bijective and preserves vector addition and
scalar multiplication. Consequently, [x46 + = xE/I]! + y[3]2 for all x, ye F
and /1, 3 eV. The column vector [fi]! is often called the cx skeleton of /1.
Suppose a = {cx1,..., and .., 3,j are two bases of V. Then there
is a simple relationship between the cx and (5 skeletons of a given vector 46. Let
M(3, cx) denote the n x n matrix whose columns are defined by the following
equation:

2.17: = ([s']! I [&]J


In equation 2.17, the ith column of M(b, cx) is the n x 1 matrix
Multiplication by M(b, induces a map from 1(F) to 1(F) that
connects the and skeletons. Namely:

Theorem 2.18: M(b, cx)[fl]5 = [/3]! for all /1eV.

Proof. Let us denote the ith column of any matrix M by Co14(M). Then
for each i=1,...,n, we have 1, 0,..., O)t
= Co14(M(ö, cx)) = Thus, the theorem is correct for /1 e
Now we have already noted that and [S]! preserve vector addition and
scalar multiplication. So does multiplication by M(3, cx) as a map on
Since any /3 e V is a linear combination of the vectors in we conclude that
M(ö, a)[/1]5 = [fi]! for every 46eV. El

The matrix M(O, a) defined in 2.17 is called the change of basis matrix
(between b and cx). It is often convenient to think of Theorem 2.18 in terms of the
EXERCISES FOR SECTION 2

following commutative diagram:

2.19:

By a diagram, we shall mean a collection of vector spaces and maps (represented


by arrows) between these spaces. A diagram is said to be commutative if any two
sequences of maps (i.e., composites of functions in the diagram) that originate at
the same space and end at the same space are equal. Thus, 2.19 is commutative if
and only if the two paths from V to 1(F), clockwise and counterclockwise,
are the same maps. This is precisely what Theorem 2.18 says.
Most of the maps or functions that we shall encounter in the diagrams in this
book will be linear transformations. We take up the formal study of linear
transformations in Section 3.

EXERCISES FOR SECTION 2

(1) Let = f ( n}. Show that each is a finite-


dimensional subspace of F[X] of dimension n + 1. Since F[X] =
observe that Theorem 1.14 is false when the word "finite"
is taken out of the theorem.
(2) Let be as in Exercise 1 with F = F2. Find a basis of V5 containing 1 + x
and x2 + x + 1.
(3) Show that any set of nonzero polynomials in F[X], no two of which have
the same degree, is linearly independent over F.
(4) Let V = {(a1, a2,.. .)e = 0 for all i sufficiently large}. Show that V is
an infinite-dimensional subspace of Find a basis for V.
(5) Prove Theorem 2.13(d) when dim(W1 + W2) = cc.

(6) Prove Corollary 2.15.

(7) Find the dimension of the subspace V = fi, y, 5}) R4, where
oc=(1,2,1,O),fl=(—1,1,—4,3),y=(2,3,3,—1),and5=(O,1,—1,1).
(8) Compute the following dimensions:
(a) dimR(C).
(b) dim0(R).
(c) dim0(F), where F is the field given in Exercise 3 of Section 1.
16 LINEAR ALGEBRA

(9) Suppose V is an n-dimensional vector space over the finite field Argue
that V is a finite set and find IVI.
(10) Suppose V is a vector space over a field F for which IVI > 2. Show that V
has more than one basis.
(11) Let F be a subfield of the field F'. This means that the operations of
addition and multiplication on F' when restricted to F make F a field.
(a) Show that F' is a vector space over F.
(b) Suppose dimF(F') = n. Let V be an m-dimensional vector space over F'.
Show that V is an mn-dimensional vector space over F.
(12) Show that dim(V") = n dim(V).
(13) Return to the space in Exercise 1. Let = for i = 1,..., r.
Set A = e + 1)x r(hl. Show that the dimension of L({p1,..., Pr}) is
precisely the rank of A.
(14) Show that the dimension of the subspace of homogeneous polynomials of
degree d in F[X0,..., Xj is the binomial coefficient ("p).
(15) Find the dimensions of the vector spaces in Exercises 18 and 19 of Section 1.
(16) Let Ac Mm JF). Set CS(A) = {AX I Xe 1(F)}. CS(A) is called the
column space of A. Set NS(A) = {X e 1(F) I AX = O}. NS(A) is called the
null space of A. Show that CS(A) is a subspace of Mm 1(F), and NS(A) is a
subspace of 1(F). Show that dim(CS(A)) + dim(NS(A)) = n.
(17) With the same notation as in Exercise 16, show the linear system AX = B
has a solution if and only if dim(CS(A)) dim(CS(A I B)). Here
Be Mm 1(F), and (Al B) is the m x (n + 1) augmented matrix obtained
from A by adjoining the column B.
(18) Suppose V and W are two vector spaces over a field F such that IVI = IWI.
Is dim V = dim W?
(19) Consider the set W of 2 x 2 matrices of the form

(x —x
z

and the set Y of 2 x 2 matrices of the form

(xy
z

Show that W and Y are subspaces of M2 2(F) and compute the numbers
dim(W), dim(Y), dim(W + Y), and dim(W n Y).
LINEAR TRANSFORMATIONS

3. LINEAR TRANSFORMATIONS

Let V and W be vector spaces over a field F.

Definition 3.1: A function T: V -. W is called a linear transformation (linear map,


homomorphism) if T(xoc + yfJ) = xT(oc) + yT(fl) for all x, ye F and cx, 46eV.

Before we state any general theorems about linear transformations, let us


consider a few examples.

Example 3.2: The map that sends every vector in V to OeW is clearly a linear
map. We shall call this map the zero map and denote it by 0. If T: V -÷ W and
S: W —÷ Z are linear transformations, then clearly the composite map ST: V —' Z
is a linear transformation. fl

Example 3.3: If V is finite dimensional with basis cx = {oc1,.. . , cxj, then


v 1(F) is a linear transformation that is bijective. fl

Example 3.4: Taking the transpose, A -÷ At, is clearly a linear map from

Mm x Mm x Then multiplication by
A (necessarily on the left) induces a linear transformation TA: V -÷ V given by
TA(B) = AB for all BeV. LI

Example 3.3 and 3.5 show that the commutative diagram in 2.19 consists of
linear transformations.

Example 3.6: Suppose V = Ck(I) with k ? 2. Then ordinary differentiation f —, f'


is a linear transformation from Ck(I) to '(I).

Example 3.7: Suppose V = F[X]. We can formally define a derivative f —. f' on


V as follows: If f(X) = then f'(X) = '. The reader can easily
check that this map, which is called the canonical derivative on F[X], is a linear
transformation. S
Example 3.8: Suppose V = Gt(A) as in Example 1.10. Then T(f) = JA f is a linear
transformation from V to It fl

We shall encounter many more examples of linear transformations as we


proceed. At this point, let us introduce a name for the collection of all linear
transformations from V to W.

Definition 3.9: Let V and W be vector spaces over F. The set of all linear
transformations from V to W will be denoted by HomF(V, W).
18 LINEAR ALGEBRA

When the base field F is clear from the context, we shall often
write Hom(V, W) instead of HomF(V, W). Thus, Hom(V, W) is the subset of
the vector space W" (Example 1.6) consisting of all linear transformations
from V to W. If T, S e Hom(V, W) and x, ye F, then the function xT + yS e W"
is in fact a linear transformation. For if a, be F and /1eV, then
(xT + + bfl) = xT(ax + hi?) + yS(atx + b/I) = xaT(x) + xbT(/I) + yaS(oc) +
ybS(/I) = a(xT(oc) + + b(xT(/I) + = a(xT + yS)(oc) + b(xT + ySXTh.
Therefore, xT + yS e Hom(V, W). We have proved the following theorem:

Theorem 3.10: Hom(V, W) is a subspace of W". fl


Since any T e Hom(V, W) has the property that T(O) = 0, we see that
Hom(V, W) is always a proper subspace of W" whenever W # (0).
At this point, it is convenient to introduce the following terminology.

Definition 3.11: Let T e Hom(V, W). Then,

(a) ker T = {xeVIT(x) = O}.


(b) ImT={T(r)eWlreV}.
(c) T is injective (monomorphism, 1 — 1) if ker T = (0).
(d) T is surjective (epimorphism, onto) if Im T = W.
(e) T is bijective (isomorphism) if T is both injective and surjective.
(f) We say V and W are isomorphic and write V W if there exists an
isomorphism T e Hom(V, W).

The set ker T is called the kernel of T and is clearly a subspace of V. Im T is


called the image of T and is a subspace of W. Before proceeding further, let us
give a couple of important examples of isomorphisms between vector spaces.

Example 3.12: M1 via the transpose A -÷ At. We have already


mentioned that = M1 JF). Thus, all three of the vector spaces
M1 JF), and F are isomorphic to each other. D

Example 3.13: Suppose V is a finite-dimensional vector space over F. Then


every basis = {x1,. .. , x,,} of V determines a linear transformation
V -÷ F given by T(oc)(fJ) = (x1,..., if and only if >J'1 = /1. T(oc) is just the
composite of the coordinate map V —' 1(F) and the transpose
1(F) —÷ M1 JF) = F. Since both of these maps are isomorphisms, we see
is an isomorphism. fl
It is often notationally convenient to switch back and forth from column
vectors to row vectors. For this reason, we give a formal name to the
isomorphism introduced in Example 3.13.
LINEAR TRANSFORMATIONS 19

Definition 3.14: Let V be a finite-dimensional vector space over F. If is a basis


of V, then 02: V -÷ F" is the linear transformation defined by (4 = ([flj)t for all
fleV.

Thus, (fl)2 = for all J3e V. We can now state the following theorem,
whose proof is given in Example 3.13:

Theorem 3.15: Let V be a finite-dimensional vector space over F and suppose


dim V = n. Then every basis of V determines an isomorphism V -÷ F". fl

We now have two isomorphisms V —' , 1(F) and V F" for every
choice of basis of a (finite-dimensional) vector space V. We shall be careful to
distinguish between these two maps although they only differ by an isomorph-
ism from 1(F) to M1 Notationally, F" is easier to write than
and so most of our subsequent theorems will be written using the map ft. With
this in mind, let us reinterpret the commutative diagram given in 2.19.
If A is any n x n matrix with coefficients in F, then A induces a linear
transformation SA: F" —, F" given by the following equation:

3.16:

SA((xl,..., xj) = (A(x1,..., xjt)t = (x1,...,

Using the notation in Example 3.5, we see 5A is the linear transformation that
makes the following diagram commutative:

3.17:

TA

I
—>F"

The vertical arrows in 3.17 are isomorphisms. Clearly, TA is an isomorphism if


and only if A is invertible. Thus, 5A is an isomorphism if and only if A is
invertible.
We shall replace the notation 5A (or TA) with At (or A) and simply write

A' A
F" >F" or

Now suppose and 3 are two bases of a finite-dimensional vector space V of


dimension n. If we combine diagrams 2.19 and 3.17, we have the following
20 LINEAR ALGEBRA

commutative diagram:

3.18:

(.)t

M(b, a)'
F"

Since = and = we get the following corollary to Theorem


3.15:

Corollary 3.19: Suppose V is a finite-dimensional vector space of dimension n


over F. If a and are two bases of V, then

3.20:

F"

a)'

is a commutative diagram of isomorphisms.


Proof: We have already noted from 3.18 that 3.20 is commutative. Both and
are isomorphisms from Theorem 3.15. We need only argue M(b, a) is an
invertible matrix. Then the map M(b, a)t = SM(La): -÷ F" is an isomorphism.
Now change of basis matrices M(ö, a) are always invertible. This follows from
Theorem 2.18. For any fJeV, we have M(a, b)M(ö, a)[fl]5 = M(a, b)[fl]a = [fl]5.
This equation easily implies M(a, ö)M(ö, a) = I,,, the n x n identity matrix. LI

In our next theorem, we shall need an isomorphic description of the vector


space V" introduced in Example 1.6.

Example 3.21: In this example, we construct a vector space isomorphic to V".


Let V be a vector space over F, and let n e N. Consider the Cartesian product
Vx x V (n times) = Clearly, V x x V is a vector
space over F when we define vector addition and scalar multiplication
by and
(xcz1,.. . , xaj.
LINEAR TRANSFORMATIONS 21

Suppose A is any finite set with Al = n. We can without any loss of generality
assume A = {1,.. ., n}. Then yA = There is a natural isomorphism
T:V x x given by where f(i)=tx1 for all
= 1,. . , n. The fact that T is an isomorphism is an easy exercise, which we
.

leave to the reader. El

Henceforth, we shall identify the vector spaces V x x V (n times), V" and


VA with Al = n and write just V" to represent any one of these spaces. Using this
notation, we have the following theorem:

Theorem 3.22: Let V and W be vector spaces over F, and suppose V is finite
dimensional. Let dim V = n.

(a) If a = {a1,.. ., is a basis of V, then for every (fl1,..., fljeW", there


exists a unique T e Hom(V, W) such that T(x1) = for i = 1,..., n.
(b) Every basis a of V determines an isomorphism 'P(cx): Hom(V, Wn.

Proof (a) Let = (cx1,..., an} be a basis for V. Then Hom(V, F)


is an isomorphism. Let $ = ., $j e The n-tuple fl deter-
mines a linear transformation e Hom(F", W) given by
L4(x1,. .., xj) = x1$1. The fact that is a linear trans-
formation is obvious. Set T = Then Te Hom(V, W) and
= for all i = 1,..., n. The fact that T is the only linear
transformation from V to W for which = for i = 1,..., n is an
easy exercise left to the reader.
(b) Fix a basis a = {a1,..., an} of V. Define 'P(a): Hom(V, W) -÷ by
'P(a)(T) = (T(tX1),.. . , The fact that \P(a) is a linear trans-
formation is obvious. We can define an inverse map x:
—÷ Hom(V, W) by fin)) = Lfr)2. Here fl' = , fJj.
.

Hence, is an isomorphism. El

Theorem 3.22(a) implies that a given linear transformation T e Hom(V, W) is


completely determined by its values on a basis of V. This remark is true
whether V is finite dimensional or infinite dimensional. To define a linear
transformation T from V to W, we need only define T on some basis
B= e A} of V and then extend the definition of T linearly to all of L(B) = V.
Thus, if = for all i cA, then T(EICA x1cx1) is defined to be x1$1. These
remarks provide a proof of the following generalization of Theorem 3.22(a):

3.23: Let V and W be vector spaces over F and suppose B = liE A} is a basis
of V. If lie A} is any subset of W, then there exists a unique T e Hom(V, W)
such that T(cx1) = for all i cA. Li
f.11

Now suppose V and W are both finite-dimensional vector spaces over F. Let
dimV=n and dimW=m. If a={ai,...,xn} is a basis of V and
22 LINEAR ALGEBRA

P= , 13m} is a basis of W, then the pair fi) determines a linear


transformation fi): Hom(V, W) -÷ Mm JF) defined by the following
>

equation:

3.24:

f(x, fl)(T) = ([TOxi)]p 1... I [T(cxj]p)


In equation 3.24, T Hom(V, W), and f(a, fJ)(T) is the m x n matrix whose ith
column is the m x 1 matrix If T1, T2 Hom(V, W) and x, ye F, then

JT(x, fJ)(xT1 + yT2) = ([(xT1 + I


1 [(xT1 +
= + I 1
+
= 1 I
+ I

= xr(cx, fJ)(T1) + fl)(T2)

Thus ['(a, fi) is indeed a linear transformation from Hom(V, W) to Mm x


Suppose T ker j@, fi). Then fox, J3)(T) = 0. In particular, [T@1)]p = 0 for all
= 1,..., n. But then 0 = = (T(aJ)p, and Theorem 3:15 implies
Thus, T = 0, and we coiiclude that
T(cx1) = 0. ,6) is an injective linear
transformation.
['(a, JJ) is surjective as well. To see this, let A = e Mm x Let
for i=1,...,n. Then and [Yi]p
(x11,.. . , = Co11(A) for all i = 1,..., n. It follows from Theorem 3.22 fhat
there exists a (necessarily unique) T e Hom(V, W) such that = for
= 1,..., n. Thus, ['(a, fl)(T) = A and ,6) is surjective. We have, now proved
the first statement in the following theorem:

Theorem 3.25: Let V and W be finite-dimensional vector spaces over F of


dimensions n and m, respectively. Let a be a basis of V and fJ a basis of W. Then
the map fi): Hom(V, W) -÷ Mm defined by equation 3.24 is an isomor-
<

phism. For every T Hom(V, W), the following diagram is commutative:

3.26:

F— * Ftm

Proof We need only argue that the diagram in 3.26 is commutative. Using the
LINEAR TRANSFORMATIONS 23

same notation as in 3.17, we have the following diagram:

3.27:

T
>w

N
IJXT)
nxl( F'j — >Mmxi(F)

IJXTY
—÷ Ftm

Since all the maps in 3.27 are linear and the bottom square commutes, we
need only check = on a basis of V. Then the top square of
3.27 is commutative, and the commutativity of 3.26 follows. For any e cc, we
have = [T(cxj]p = Col1(JT(cx, /JXT)) = ['(cx, f3)(T)(O,..., 1,..., =
['(cx, El

['(a, fl)(T) is called the matrix representation of the linear transformation T


with respect to the bases a and /3. Since the vertical arrows in 3.26 and /3) are
isomorphisms, V, W, Hom(V, W), and T are often identified with F", Ftm,
Mm and A = ['(cx, /1)(T). Thus, the distinction between a linear trans-
formation and a matrix is often blurred in the literature.
The matrix representation ['(cx, /3)(T) of T of course depends on the particular
bases a and /3 chosen. It is an easy matter to keep track of how ['ox, fl)(T) changes
with cx and /3.

Theorem 3.28: Let V and W be finite-dimensional vector spaces over F of


dimensions n and m, respectively. Suppose and are two bases of V and /1 and
/1 two bases of W. Then for every T e Hom(V, W), we have

3.29:

1
['(a', /3')(T) = M(/3, /3)(T)M(a, a')

Proof? Before proving equation 3.29, we note that M(/3, /3') (and M(a, a')) is the
m x m (and n x n) change of basis matrix given in equation 2.17. We have
already noted that change of bases matrices are invertible and consequently all
the terms in equation 3.29 make sense.
To see that 3.29 is in fact a valid equation, we merely combine the
24 LINEAR ALGEBRA

commutative diagrams 2.19 and 3.27. Consider the following diagram:

3.30:

)
[.]
/ T
>\'( © M(fJ,,8')
\nr
(

11!'
,d1
Mmxi(F)
f3'XT)

The diagram 3.30 is made up of four parts, which we have labeled ®'
and Rj. By Theorem 2.18, diagrams ® and © are commutative. By
Theorem 3.25, diagrams and ® are commutative. It follows that the entire
diagram 3.30 is commutative. In particular, M($, fJ')JT(cx, fJ)(T) =
cx'). Solving this equation for f(cx', fJ')(T) gives 3.29. El

Recall that two m x n matrices A, B e Mm are said to be equivalent if


there exist invertible matrices Pe Mm x rnft) and Q e JF) such that
A = PBQ. Equation 3.29 says that a given matrix representation JT(cx, fl)(T) of T
relative to a pair of bases /3) changes to an equivalent matrix when we replace
,6) by new bases /3'). This leads to the following question: What is the
simplest representation of a given linear transformation T? If we set
A = F(cx, fl)(T), then we are asking, What is the simplest matrix B equivalent to
A?
Recalling a few facts from elementary matrix theory gives us an easy answer
to that question. Any invertible matrix P is a product, P = E1, of
elementary matrices E1,..., Er. PA = (E1A) is the m x n matrix
obtained from A by preforming the elementary row operations on A represented
by E1,..., Er. Similarly (PA)Q is the m x n matrix obtained from PA by
preforming a finite number of elementary column operations on PA. Let us
denote the rank of any m x n matrix A (i.e., the number of linearly independent
rows or columns of A) by rk(A). If rk(A) = s, then we can clearly find invertible
matrices P and Q such that

PAQ
=
Here our notation

(I. 0
k° 0

means PAQ will have the s x s identity matrix, in its upper left-hand corner
and zeros everywhere else.
LINEAR TRANSFORMATIONS 25

If we apply these remarks to our situation in Theorem 3.28, we get the


following corollary:

Corollary 3.31: Let V and W be finite-dimensional vector spaces over F of


dimensions n and m, respectively. Let and be bases of V and W. Let
T e Hom(V, W), and set A = fl)(T). If rk(A) = s, then there exist bases cx' and
fJ' of V and W, respectively, such that

3.32:

F(&, fJ')(T)
= (4-]t)
There is another representation problem that naturally arises when consider-
ing Theorem 3.28. Suppose V = W. If is a basis of V, then any T e Hom(V, V) is
represented in terms of by an n x n matrix A = cx)(T). If we change to a
new basis of V, then the representation of T changes to B = JT(cx', cx')(T).
Equation 3.29 implies that B = PAP 1, where P = M(a, a'). Recall that two
n x n matrices A and B are similar if there exists an invertible n x n matrix P
such that B = Thus, different representations of the same
T e Hom(V, V) with respect tO different bases of V are similar matrices.
Now we can ask, What is the simplest representation of T? If we choose any
basis of V and set A = cx)(T), then our question becomes, What is the
simplest matrix B similar to A? That question is not so easy to answer as the
previous equivalence problem. We shall present some solutions to this question
in Chapter III of this book.
Theorem 3.25 implies that dim Hom(V, W) = (dim V)(dim W) when V and W
are finite dimensional. In our next theorem, we gather together some miscella-
neous facts about linear transformations and the dim(S) function.

Theorem 3.33: Let V and W be vector spaces over F and suppose


T e Hom(V, W). Then

(a) If T is surjective, dim V dim W.


(b) If dim V = dim W c cc, then T is an isomorphism if and only if either T is
injective or T is surjective.
(c) dim(Im T) + dim(ker T) = dim V.

Proof (a) follows immediately from Theorem 2.11. In (b), if T is an isomorph-


ism, then T is both injective and surjective. Suppose T is injective, and
n = dim V = dim W. Let = {a1,..., be a basis of V. Since T is injective,
Dx = {T(cx1),. . . , is a linearly independent set in W. Then dim W = n
implies Ta is a basis of W. In particular, W = L(Ta) = T(L(a)) = T(V). Thus, T is
surjective, and hence, an isomorphism.
Suppose T is surjective. If a = {cx1,..., is a basis of V, then
26 LINEAR ALGEBRA

W = T(V) = = By Theorem 2.11, Tx contains a basis of W. Since


dimW = n, fl is a basis of W. Now let rzeker T. Write = Then
0 = >x1T(tx1). Since Tx is a basis of W, x1 = = x,, = 0. Thus, = 0 and T is
injective. This competes the proof of (b).
We prove (c) in the case that dim V = n < oo. The infinite-dimensional case is
left as an exercise at the end of this section. Let = {cx1 ,.. ., ;} be a basis of
ker T.We take r = 0, and = if T is injective. By Theorem 2.6, we can expand
to a basis 4 = (a1,..., flj of V. Here r + s = n. We complete the
proof of (c) by arguing that TJJ = {T(j31),. .., T(fJ,)} is a basis of Im T.
Suppose 6 elm T. Then 6 = T(y) for some ye V. Since V =
y = x1a1 + + + y1fl1 + + for some x1, y1eF. Applying T to this
equation, gives 3€ L(Tfl). Thus, TfJ spans Im T.
Suppose >J. y1T(f31) = 0 for some y1 F. Then y1fl1 ker T. Thus,
y1f31 = x1 €F. Since 4 is a basis of V, we conclude that
x1 = = xr = y1 = = y, = 0. In particular, (T(fl1),. . . , T(,6j} is linearly
independent. Thus, TJJ is a basis of Im T, and the proof of (c) is complete. LI
We finish this section with a generalization of Theorem 3.33(c). We shall need
the following definition.
Definition 3.34: By a chain complex C = {(V1, d1) Ii Z} of vector spaces over F,
we shall mean an infinite sequence (V1} of vector spaces one for each integer
i €7, together with a sequence of linear transformations, d1 Hom(V1, 1)
for each id, such that = 0 for all id.
We usually draw a chain complex as an infinite sequence of spaces and maps
as follows:

3.35:

d
—÷V1÷1 V

If a chain complex C has only finitely many nonzero terms, then we can
change notation and write C as

3.36:

d d1
—>V0—*0

It is understood here that all other vector spaces and maps not explicitly
appearing in 3.36 are zero.

Definition 3.37: A chain complex

d
—V'
LINEAR TRANSFORMATIONS 27

is said to be exact if Im = ker for every ie7L.

Let us consider an important example.

Example 3.38: Let V and W be vector space over F, and let T e Hom(V, W).
Then

C:0-÷kerT V TIT >0

is an exact chain complex. Here i denotes the inclusion of ker T into V. E

We can generalize Example 3.38 slightly as follows:


Definition 3.39: By a short exact sequence, we shall mean an exact chain
complex C of the following form:

3.40:
d1
C:0—+V2

Thus, the example in 3.38 is a short exact sequence with V2 = ker T, d2 = i,


V1 = V, etc. Clearly, a chain complex C of the form depicted in 3.40 is a short
exact sequence if and only if d2 is injective, d1 is surjective, and Im d2 = ker d1.
Theorem 3.33(c) implies that if C is a short exact sequence, then
dim V2 — dim V1 + dim V0 = 0. We can now prove the following generaliza-
tion of this result:
Theorem 3.41: Suppose

d, d,
> > V0—÷0

is an exact chain complex. Then >)t.0 (—1)' dim V1 = 0.

Proof The chain complex C can be decomposed into the following short exact
sequences

d,
C1:0—*kerd1--÷V1 >V0-÷0

d2
C2:0-÷kerd2--*V2 )kerd1-÷0

d,
0—*ker > ker —>0
28 LINEAR ALGEBRA

If we now apply Theorem 3.33(c) to each C1 and add the results, we get

(1) Let V and W be vector spaces over F.


(a) Show that the Cartesian product V x W = {(a, fi) lrzeV, W} is a
vector space under componentwise addition and scalar multiplication.
(b) Compute dim(V x W) when V and W are finite dimensional.
(c) Suppose T: V -÷ W is a function. Show that T eHom(V, W) if and only
if the graph GT = {(a, T(2))e V x W e V} of T is a subspace of
VxW.
(2) Let T e Hom(V, W) and 5€ Hom(W, V). Prove the following statements:
(a) If ST is surjective, then S is surjective.
(b) If ST is injective, then T is injective.
(c) If ST = (the identity map on V) and TS = then T is an
isomorphism.
(d) If V and W have the same finite dimension n, then ST = h, implies T is
an isomorphism. Similarly, TS = 'w implies T is an isomorphism.
(3) Show that Exercise 2(d) is false in general. (Hint: Let V = W be the vector
space in Exercise 4 of Section 2.)
(4) Show that
(5) Let T e Hom(V, V). If T is not injective, show there exists a nonzero
5€ Hom(V, V) with TS = 0. If T is not surjective, show there exists a
nonzero 5€ Hom(V, V) such that ST = 0.

(6) In the proof of Corollary 3.19, we claimed that M(cx, cx)[fJ]5 =


for all /3eV implies M(a, b)M(ö, = Give a proof of this fact.
(7) When considering diagram 3.17 we claimed TA is an isomorphism if and
only if A is an invertible matrix. Give a proof of this fact.
(8) Show that Theorem 3.33(c) is correct for any vector spaces V and W. Some
knowledge of cardinal arithmetic is needed for this exercise.
(9) Let T Hom(V, V). Show that T2 = 0 if and only if there exist two
subspaces M and N of V such that
(a) M + N = V.
(b) MnN=(0).
(c) T(N) = 0.
(d) T(M) N.
EXERCISES FOR SECTION 3 29

(10) Let T e Hom(V, V) be an involution, that is, T2 = 'v• Show that there exists
two subspaces M and N of V such that
(a) M + N = V.
(b) MnN=(0).
(c) T(cx) = for every cxeM.
(d) T(cx) = for every xeN.
In Exercise 10, we assume 2 0 in F. If F = F2, are there subspaces M and
N satisfying (a)—(d)?
(11) Let TeHomF(V, V). If f(X) = + + a1X + a0eF[X]. then
f(T) = + + a1T + e Hom(V, V). Show that dimF V = tfl c CX)
implies there exists a nonzero polynomial f(X)e F[X] such that f(T) = 0.
(12) IfS, T e HomF(V, F) such that Sex) = 0 implies T= xS
for some xeF.
(13) Let W be a subspace of V with m=dimWCdimV=ncoo. Let
Z = {T e Hom(V, V) I = 0 for all e W}. Show that Z is a subspace of
Hom(V, V) and compute its dimension.
(14) Suppose V is a finite-dimensional vector space over F, and let
5, T e HomF(V, V). If ST = show there exists a polynomial f(X)e F[X]
such that S = f(T).

(15) Use two appropriate diagrams as in 3.27 to prove the following theorem:
Let V, W, Z be finite-dimensional vector spaces of dimensions n, m, and p,
respectively. Let at, /1, and y be bases of V, W, and Z. If T e Hom(V, W) and
Sc Hom(W, Z), then F(cx, y)(ST) = flfl, y)(S)f(at, fl)(T).
(16) Suppose
d1 d1
->V1 tV0_+O

and
d'1

C' exact. Let T0 e HOmF(VO, V'Q). Show that there


exists e HomF(VI, V) such that T1 - 1d1 = for all i = 1 The
collection of linear transformations {T1} is called a chain map from C to
C,.
(17) Suppose C = {(V1, d1) lie 71} and C' = {(V, lie 71} are two chain
complexes. Let T = be a chain map from C to C'. Thus, T1: V1 —÷
and T1_1d1 = for all id. For each id,
set =
V11 x Define a map di': by
30 LINEAR ALGEBRA

fi) = (—d1_1(x), T1_1(oc) + Show that C" = {(Vr, dflhiel} is a


chain complex. The complex C" is called the mapping cone of T.
(18) Use Theorem 3.33(c) to give another proof of Exercise 16 in Section 2.
(19) Find a T e C) that is not C-linear.
(20) Let V be a finite-dimensional vector space over F. Suppose T e HomF(V, V)
such that dim(Im(T2)) = dim(Im(T)). Show that Im(T) n ker(T) = {O}.
(21) The special case of equation 3.29 where V = W, = and = Ii' is very
important. Write out all the matrices and verify equation 3.29 in the
following example: T: —÷
is the linear transformation given by
T(61) = + 2ö2 + 1(ö2) = + — ö3, and T(ö3) = + 2(52.
Let = where = (1, 2, 1), = (1, 0, — 1) and =
(0, 1, — 1). Compute 1(6, ÔXT), fox, x)(T), and the change of bases matrices
in 3.29.

(22) Let V be a finite-dimensional vector space over F. Suppose TS = ST for


every S e H0mF(V, V). Show that T = x e F.
(23) Let A, BE with at least one of these matrices nonsingular. Show
that AB and BA are similar. Does this remain true if both A and B are
singular?

4. PRODUCTS AND DIRECT SUMS

Let {V11i e A} be a collection of vector spaces over a common field F. Our


indexing set A may be finite or infinite. We define the product V1 of the V1 as
follows:

Definition 4.1: fl's V1 = {f: A -÷ UI€A V11f is a function with f(i) e for all i e A}.

We can give the set the structure of a vector space (over F) by defining
addition and scalar multiplication pointwise. Thus, if f, g e f + g is
defined by (f + g)(i) = f(i) + g(i). If fe and x e F, then xf is defined by
(xf)(i) = x(f(i)). The fact that V1 is a vector space with these operations is
straightforward. Henceforth, the symbol V1 will denote the vector space
whose underlying set is given in 4.1 and whose vector operations are pointwise
addition and scalar multiplication.
Suppose V = V1 is a product. It is sometimes convenient to identify a
given vector ft V with its set of values {f(i) lie A}. f(i) is called the ith coordinate
off, and we think off as the "A-tuple" (f(i))1€A. Addition and scalar multiplication
in V are given in terms of A-tuples as follows: (f(i))I€A + = (f(i) +
and = (xf(i))IGA. This particular viewpoint is especially fruitful when
A! = n <cc. In this case, we can assume A = {1, 2,. . . , n}. Each feY is then
identified with the n-tuple (f(1),..., f(n)). When Al = n, we shall use the
PRODUCTS AND DIRECT SUMS 31

notation V1 x x instead of V1. Thus, the examples given in 1.5,


3.21, and Exercise 1 of Section 3 are all special cases of finite products. Example
1.5 is a product in which every V1 is the same vector space V.
11 V = flieit V1, then for every pair of indices (p, q) e A x A, there exist linear
transformations irs, e Hom(V, Vp), and °q C Hom(Vq, V) defined as follows:
Definition 4.2:

(a) V -÷ is given by ir4f) = f(p) for all feY.


(b) 0q Vq V is given by

fo if i=q
if

In Definition 4.2(b), e Vq. OqOX) is that function in V whose only nonzero


value is x taken on at i = q. The fact that and °q are linear transformations is
obvious. Our next theorem lists some of the interesting properties these two sets
of maps have.
Theorem 4.3: Let V = V1. Then

(a) irs, = I;, the identity map on for all peA.


(b)
(c) If A is finite, >pcA OPJVP = the identity map on V.
(d) is surjective and is injective for all peA.
(e) Let W be a second vector space over F. A function T: W —+ V is a linear
transformation if and only if e Hom(W, for all peA.
(f) The vector space V together with the set {n1,Ip e A} of linear trans-
formations satisfies the following universal mapping property: Suppose
W is any vector space over F and e Hom(W, p e A} a set of linear
I

transformations. Then there exists a unique T e Hom(W, V) such that for


every p e A the following diagram is commutative:

4.4:

WT>V =

Proof: (a), (b), and (c) follow immediately from the definitions. is surjective
and is injective since = I;. Thus, (d) is clear. As for (e), we need only
argue that T is linear provided is linear for all peA. Let at, fleW and x, ye F.
32 LINEAR ALGEBRA

Then for every peA, we have + yfl)) = + yfl) =


+ = + yT(fl)). Now it is clear from our definitions that two
functions f, g cv are equal if and only if = for all peA. Consequently,
T is linear.
Finally, we come to the proof of (f). We shall have no use for this fact in this
text. We mention this result only because in general category theory products
are defined as the unique object satisfying the universal mapping property given
in (f). The map T: W —> V making 4.4 commute is given by = We
leave the details for an exercise at the end of this section. E

The map icr: V —> in Definition 4.2(a) is called the pth projection or pth
coordinate map of V. The map 0q Vq —÷ V is often called the qth injection of Vq
into V. These maps can be used to analyze linear transformations to and from
products. We begin first with the case where IAI < cc.

Theorem 4.5: Suppose V = V1 x x is a finite product of vector spaces,


and let W be another vector space. If T1 e Hom(W, V1) for i = 1,. . , n, then there
.

exists a unique T e Hom(W, V1 x x Vj such that ;T = T1 for all


i=1,...,n.
Proof Set T = 01T1 and apply Theorem 4.3. J
As an immediate corollary to Theorem 4.5, we get the following result:

Corollary 4.6: If Al = n c cc, then Hom(W, Hom(W, V1).

Proof: Define a map 'F: Hom(W, V1 x x —. Hom(W, V1) x


x Hom(W, by 'iF(T) = (ir1T,..., One easily checks that 'P is an
injective, linear transformation. Theorem 4.5 implies 'F is surjective. E

We have a similar result for products in the first slot of Hom.

Theorem 4.7: Suppose V = V1 x x is a finite product of vector spaces,


and let W be another vector space. If T1 e W) for i = 1,.. , n, then there
.

exists a unique T e Hom(V1 x x W) such that TO1 = T1 for all


i=l,...,n.
Proof: Set T = 1 and apply Theorem 4.3. J
Corollary 4.8: If lAl = n < cc, then V1, Hom(V1, W).

Proof Define a map '1': V1, W) -* Hom(V1, W) by 'F(T) =


(TO1,..., TON). Again the reader can easily check that 'F is an injective, linear
transformation. Theorem 4.7 implies that 'F is surjective. fl
PRODUCTS AND DIRECT SUMS 33

Suppose V = V1 x x is a finite product of vector spaces over F. Let B,


be a basis of V1, i = 1,..., n. We can think of the vectors in V as n-tuples
with oc1eV1. For any i and O,...,O). Thus,
91(x) is the n-tuple of V that is zero everywhere except for an in the ith slot.
Since °i is injective, 01: V1 01(V1). In particular, 01(B1) is a basis of the subspace
01(V1). Since 01(B1) n = (0), B = 91(B1) is a linearly independ-
ent set Clearly, V = >J' 101(V1). Consequently, B is a basis of V. We have now
proved the following theorem:

Theorem 4.9: Let V = V1 x x be a finite product of vector spaces. If B1 is


a basis of V1, i = 1,..., n, then B = 01(B1) is a basis of V. In particular, if
each V1 is finite dimensional, then so is V. In this case, we have
dimV=ThidimV1. u
At this point, let us say a few words about our last three theorems when
Al = cc. Corollary 4.6 is true for any indexing set A. The map 'P(T) = (ir1T)166 is
an injective, linear transformation as before. We cannot use Theorem 4.5 to
conclude 'I' is surjective, since 01T1 makes no sense when Al = cc. However,
we can argue directly that 'P is surjective. Let (T1)ICA e Hom(W, V1). Define
T e Hom(W, V1) by T(x) = Clearly 'I'(T) = (T1)IEA. Thus, we have
the following generalization of 4.6:

4.10: For any indexing set A, Hom(W, flEa HIGA Hom(W, V1).

In general, Corollary 48 is false when lAl = cc. For example, if W = F


and V1 = F for all i e A, then the reader can easily see that
lHomF([TIEA F, F)l > Fl when A is infinite. Since Homg(F,
F) cannot be isomorphic to Hom(F, F).
If V = H1thVI with lAl = cc and B1 is a basis of V1, then UIEA0I(Bl) is a
linearly independent subset of V. But in general, V 01(VJ. For a concrete
example, consider V = RN in Exercise 5 of Section 1. Thus, 01(B1) is not in
general a basis for V. In particular, Theorem 4.9 is false when IA! = cc.
Let us again suppose V = V1 with A an arbitrary set. There is an
important subspace of V that we wish to study.

Definition 4.11: Let leA V1 = {fe [lisa V1 I 1(i) = 0 except possibly for finitely
many ieA}.

Clearly El? leA V1 is a subspace of V under pointwise addition and scalar


multiplication. In terms of A-tuples, the vector f = lies in e leA V1 if and
only if there exists some finite subset A0 (possibly empty) of A such that = 0
forallieA — A0. IflAl c cc,then = = cc,then
usually a proper subspace of V. Consider the following example:
34 LINEAR ALGEBRA

Example 4.12: Let F = R, A = F%J, and V1 = for all ic/i Then the Iki-tuple
11R

(flieN, that is, the function f: R given by f(i) = 1 for all i e N, is a vector
in V = fl
vector space El? leA V1 is called the direct sum of the V1. It is also called the
subdirect product of the V1 and written V1. In this text, we shall consistently
use the notation to indicate the direct sum of the If Al = n < cc,
then we can assume A = {1, 2, . . . , n}. In this case we shall write V1 e e
or $IGAVI.Thus,Vl®"®Vfl, x x
V1, and V1 are all the same space when Al = n c cii
Since E9I€A V1 = our comments after 4.10 imply the following
theorem:

Theorem 4.13: Suppose V = V1 is the direct sum of vector spaces V1. Let B1
be a basis of V1. Then B = is a basis of V. E

The subspace $ V1 constructed in Definition 4.11 is sometimes called the


external direct sum of the V1 because the vector spaces {V11 i e A} a priori have no
relationship to each other. We finish this section with a construction that is often
called an internal direct sum.
Suppose V is a vector space over F. Let {V1 ji e A} be a collection of subspaces
of V. Here our indexing set A may be finite or infinite. We can construct the
(external) direct sum El? V1 of the as in Definition 4.11 and consider the
natural linear transformation 5: V1 —÷ V given by = Since
e only finitely many of the are nonzero. Therefore, L€A x1 is a
well defined finite sum in V. Thus, S is well defined and clearly linear.

Definition 4.14: Let {V1 lie A} be a collection of subspaces of V. We say these


subspaces are independent if the linear transformation 5: $ V1 —÷ V defined
above is injective.

Note Tm S = Thus, the subspaces V1, i eA, are indepedent if and only if
E9Ith V1 LEA V1 via S. A simple example of independent subspaces is provided
by Theorem 2.13(c).

Example 4.15: Let V be a vector space over F and W a subspace of V. Let W' be
any complement of W. Then W, W' are independent. The direct sum W e W' is
just the product W x W', and 5: W x W' —, W + W' is given by
S((oc, fl))=x+/3. If(x, fl)ekerS, then cz+fl=O. But WnW'=O. Therefore,
= —fJeWnW' implies = = 0. Thus, S is injective, and W,W' are
CX

independent. U

In our next theorem, we collect a few simple facts about independent


subspaces.
PRODUCTS AND DIRECT SUMS 35

Theorem 4.16: Let {V11i e A} be a collection of subspaces of V. Then the


following statements are equivalent:

(a) The i eA are independent!


(b) Every vector cx e V1 can be written uniquely in the form cx = cx1

with for all ieA.


(b') = 0 with cx1eV1, then = 0 for all ieA.
(c) For every j eA, n V1) = (0).

Proof In statements (b) and (b'), LEAmeans = 0 for all but possibly finitely
many i e A. It is obvious that (b) and (b') are equivalent. So, we argue
(a) (b') (c).
Suppose the V1 are independent. If LEA = 0 with e V1 for all i eA, then
S((cxJIEA) = = 0. Since S is injective, we conclude that = 0 for all i e A.
Thus, (a) implies (b'). Similarly, (b') implies (a).
Suppose we assume (b'). Fix j e A. Let cx e n V1). Then cx =
cx for some e V1. As usual, all the here are zero
except possibly for finitely many indices i j. Thus, 0 = + (— cx1

(b') then implies =; = 0 for all i e A — {j In particular, cx = 0, and (c) is


established.
Suppose we assume (c). Let LEA = 0 with e V1. If every = 0,
there is nothing to prove. Suppose some say cxi, is not zero. Then
= e n V1) implies n V1) 0. This is contrary to our
assumption. Thus, (c) implies (b'), and our proof is complete. S

If {V11i e A} is a collection of independent subspaces of V such that


LeA V1 = V, then we say V is the internal direct sum of the V1. In this case,
eIGAVI. If Al = nc cc,we
shall simply write V = V1 e
when V is an internal direct sum of
subspaces V1,...,V,,.
The reader will note that there is no difference in notation between an
external direct sum and an internal direct sum. This deliberate ambiguity will
cause no real confusion in the future.
Finally, suppose V = V1 eis an internal direct sum of independent
subspaces i,..., Then by Theorem 4.16(b), every vector cx e V can be
written uniquely in the form cx = cx1 + + cx,, with cx1 e V1. Thus, the map
P1: V —* V, which sends cx to cx1, is a well-defined function. Theorem 4.16(b)
implies that each P1 is a linear transformation such that Im P1 = V1. We give
formal names to these maps P1,..., P,,.

Definition 4.17: Let V = V1 ® V,, be the internal direct sum of independent


subspaces V1,..., V,,. For each i = 1,..., n, the linear transformation P1
defined above is called the ith projection map of V relative to the decomposition
36 LINEAR ALGEBRA

Our next theorem is an immediate consequence of Theorem 4.16(b).

Theorem 4.18: Let V =


ent subspaces V1,. ..,
V1 ee be an internal direct sum of the independ-
Suppose P1,..., P,, e Hom(V, V) are the associated
projection maps. Then
(a)

P1 = the identity map on V. El

Theorem 4.18 says that every internal direct sum decomposition


V = V1 e e V,, determines a set {P1,..., of pairwise orthogonal
[4.18(a)] idempotents [4.18(b)] whose sum is 'v [4.18(c)] in the algebra of
endomorphisms t(V) = HomF(V, V). Let us take this opportunity to define
some of the words in our last sentence.
Definition 4.19: By an associative algebra A over F, we shall mean a vector space
(A, /3) —* + /3, (x, —÷ xcx) over F together with a second function (x, /1) —÷ cxfl
from A x A to A satisfying the following axioms:
Al. cx(/ly) = for all x, /3, yeA.
A2.

A4. x(xfl) =(xcx)/3 = x(xfl) for all fleA, xc F.


A5. There exists an element I cA such that hx = = for all cxeA.

We have seen several examples of (associative) algebras in this book already.


Any field F is an associative alebra over F. and F[X] with the usual
multiplication of matrices or polynomials is an algebra over F. If V is any vector
space over F, then HomF(V, V) becomes an (associative) algebra over F when we
define the product of two linear transformations T1 and T2 to be their composite
T1T2. Clearly axioms A1—A5 are satisfied. Here 1 is the identity map from V to
V. Linear transformations from V to V are called endomorphisms of V. The
algebra t(V) = HomF(V, V) is called the algebra of endomorphisms of V.
Suppose A is any algebra over F. An element e A is idempotent if owe = x. In
F or F[X], for example, the only idempotents are 0 and 1. In
e11,..., es,, are all idempotents different from 0 or 1. Idempotents {x1,..., oç} in
an algebra A are said to be pairwise orthogonal if = 0 whenever i j. Thus,
{ ..., is a set of pairwise orthogonal idempotents in
Theorem 4.18 says that every internal direct sum decomposition
V = V1 ee determines a set of pairwise orthogonal idempotents whose
sum is 1 in t(V). Our last theorem of this section is the converse of this result.

Theorem 4.20: Let V be a vector space over F, and suppose {P1,..., is a set
of pairwise orthogonal idempotents in t(V) such that P1 + + P,, = 1. Let
ImP1. Then V= V1
EXERCISES FOR SECTION 4 37

Proof? We must show V=V1 + and Let cxeV


and set = Then = = (P1 + + PJtx) = P1(or) + + =
+ Since = V V1 + + Thus,
V=Vl+...+Vn.
Fix J, and suppose Then 3 = = for
some /3eV and fJ1eV Then
= 0. Thus, n V1) = (0), and the proof is complete.

EXERCISES FOR SECTION 4

(1) Let B = lie i4 be a basis of V. Show that V is the internal direct sum of
{F31IieA}.
(2) Show HomF(eIEA V1, W) HomF(VI, W).
(3) Give a careful proof of Theorem 4.3(f).
(4) LetV = V1 x x Vi,, and for each i = l,...,n, setT1 = Show that
{T1,. . ., is a set of pairwise orthogonal idempotents in t(V) whose
sum is 1.
(5) Let V = V1 x x Show that V has a collection of subspaces
such that V=
a combined version of Corollaries 4.6 and 4.8 by showing directly that
c&:HomF(Vl x x W1 x x
given by çb(T) = n,j1 m is an isomorphism.
(7) Suppose V = V1 eLet Te Hom(V, V) such that T(V1) V1 for
all i = 1,..., n. Find a basis of V such that

=
where M1 describes the action of T on V1
(8) IfX,Y,Zaresubspaces ofVsuch thatX$Y=X$Z=V,isY=Z?Is
Z?
(9) Find three subspaces V1, V V

V= that V such that


WcV2 and V=V1EJ3W.
(11) Let A be an algebra over F. A linear transformation TeHomF(A, A) is
called an algebra homomorphism if T(xfl) = TOx)T(fl) for all fleA. Ex-
hibit a nontrivial algebra homomorphism on the algebras F[X] and
38 LINEAR ALGEBRA

(12) Suppose V is a vector space over F. Let 5: V be an isomorphism of V.


Show that the map T - 'TS is an algebra homomorphism of t(V)
S
which is one to one and onto.
(13) Let F be a field. Show that the vector space V = F (over F) is not the direct
sum of any two proper subspaces.
(14) An algebra A over F is said to be commutative if cxfl = flo for all fleA.
Suppose V is a vector space over F such that dimF(V)> 1. Show that t(V)
is not commutative.
(15) Suppose V is a vector space over F. Let Te&f(V) be idempotent. Show
V = ker(T) ® Im(T).
(16) Let V be a vector space over F, and let T e t(V). If T3 = T, show that
V = V0 e V1 e V2 where the V1 are subspaces of V with the following
properties: = 0, = and =
In this exercise, assume 2 0 in F.
(17) Suppose V is a finite-dimensional vector space over F. If T e 1(V) is
nonzero, show there exists an S e 1(V) such that ST is a nonzero
idempotent of 1(V).
(18) Suppose T e t(V) is not zero and not an isomorphism of V. Prove there is
an Se4°(V) such that ST = 0, but TS ft
(19) Suppose V is a finite-dimensional vector space over F with subspaces
W1,..., Suppose V = W1 + + Wk, and dim(V) =
Show that V = Wk.

5. QUOTIENT SPACES AND THE ISOMORPHISM THEOREMS

In this section, we develop the notion of a quotient space of V. In order to do


that, we need to consider equivalence relations. Suppose A is a nonempty set
and R A x A is a relation on A. The reader will recall from Section 2 that we
used the notation x y to mean (x, y) e R. The relation is called an
equivalence relation if the following conditions are satisfied:
5.1: (a) x x for all xeA.
(b)
If all x, y, zeA.
A relation satisfying 5.1(a) is called reflexive. If 5.1(b) is satisfied, the relation is
said to be symmetric. A relation satisfying 5.1(c) is said to be transitive. Thus, an
equivalence relation is a reflexive, symmetric relation that is transitive.
Example 5.2: Let A = 7, and suppose p is a positive prime. Define a
relation (congruence mod p) on A by x y if and only if p x — y. The reader
can easily check that is an equivalence relation on /. fl
QUOTIENT SPACES AND THE ISOMORPHISM THEOREMS 39

The equivalence relation introduced in Example 5.2 Called a congruence,


and we shall borrow the symbol to indicate a general equivalence relation.
Thus, if R A x A is an equivalence relation on A and (x, y) ER, then we shall
write x y. We shall be careful in the rest of this text to use the symbol only
when dealing with an equivalence relation.
Now suppose is an equivalence relation on a set A. For each x e A, we set
= {y e A I y x}. is a subset of A containing x. is called the equivalence
class of x. The function from A to 9°(A) given by x -. satisfies the following
5c

properties:

5.3: (a) x et
(b)
(c) For any x, yeA, either 5c = y or = 4).
(d) A =
The proofs of the statements in 5.3 are all easy consequences of the
definitions. If we examine Example_5.2 again, we see 7L is the disjoint union of the
p equivalence classes U, I,.. ., p — 1. It follows from 5.3(c) and 5.3(d) that any
equivalence relation on a set A divides A into a disjoint union of equivalence
classes. The reader probably has noted that the equivalence classes {U,
T,..., p — 1
} of 1 inherit an addition and multiplication from 1 and form the
field discussed in Example 1.3. This is a common phenomenon in algebra. The
set of equivalence classes on a set A often inherits some algebraic operations
from A itself. This type of inheritance of algebraic structure is particularly
fruitful in the study of vector spaces.
Let V be a vector space over a field F, and suppose W is a subspace of V. The
subspace W determines an equivalence relation on V defined as follows:

5.4:

if cz—fleW

Let us check that the relation defined in 5.4 is reflexive, symmetric, and
transitive. Clearly, cx cx. fi, then cx — fleW. Since W is a subspace,
If cx
fi— cxeW. Therefore, ft cx. Suppose cx ft and ft y. Then cx — fi, ft — yeW.
Again, since W is a subspace, cx — y = (cx — fi) + (ft — y) e W, and, thus cx y. So,
indeed is an equivalence relation on V. The reader should realize that the
equivalence relation depends on the subspace W. We have deliberately
suppressed any reference to W in the symbol to simplify notation. This will
cause no confusion in the sequel.

Definition 5.5: Let W be a subspace of V, and let denote the equivalence


relation defined in 5.4. If cxc V, then the equivalence class of cx will be denoted by
The set of all equivalence classes cxc V} will be denoted by V/W.
40 LINEAR ALGEBRA

Thus, & = {fleVlfl z} and V/W = {&lcceV}. Note that the elements in
V/W are subsets of V. Hence V/W consists of a collection of elements from £3°(V).

Definition 5.6: If W is a subspace of V and cxcV, then the subset


+ W = {z + ylyeW} is called a coset of W.

Clearly, fi e + W if and only if — heW. Thus, the coset x + W is the same


set as the equivalence class & of under So, V/W is the set of all cosets of W.
In particular, the equivalence class & of has a nice geometric interpretation.
= cx + W is the translate of the subspace W through the vector
Let us pause for a second and discuss the other names that some of these
objects have. A coset + W is also called an affine subspace or flat of V. We shall
not use the word "flat" again in this text, but we want to introduce formally the
set of affine subspaces of V.

Definition 5.7: The set of all affine subspaces of V will be denoted d(V).

Thus, A e d(V) if and only if A = + W for some subspace W c V and some


cx eV. Note that an affine subspace A = cx + W is not a subspace of V unless
= 0. Thus, we must be careful to use the word "affine" when considering
elements in d(V). Since d(V) consists of all cosets of all subspaces of V,
V/W c d(V) and these inclusions are usually strict.
The set V/W is called the quotient of V by W and is read "V mod W". We shall
see shortly that V/W inherits a vector space structure from V. Before discussing
this point, we gather together some of the more useful properties of affine
subspaces in general.

Theorem 5.8: Let V be a vector space over F, and let d(V) denote the set of all
affine subspaces of V.

(a) If {A1 lie A} is an indexed collection of affine subspaces in d(V), then


either = or A1 e d(V).
(b) If A, Be d(V), then A + B e d(V).
(c) If Ac d(V) and xe F, then xA e d(V).
(d) If Ac d(V) and T e Hom(V, V'), then T(A) e d(V').
(e) If A' e d(V') and T e Hom(V, V'), then T - '(A') is either empty or an affine
subspace of V.

Proof The proofs of (b)—(e) are all straightforward. In (e), T '(A') =


{z e V I T(z) e A'}. We give a proof of (a) only. Suppose A1 = + W1
for each i e A. Here W1 is a subspace of V and a vector in V. Suppose
flI€AAI qS. Let fle Then for each ieA, fi = + with y1eW1. But
then J3 + W1 = + and = fllth(fl + W1).
We claim that + W1) = ,6 + (flI€AWI). Clearly, J3 + c
QUOTIENT SPACES AND THE ISOMORPHISM THEOREMS 41

+ W1), SO let ccefl1Jfl + W1). Then, for i cc = /3 + = /3 +


with 61eW1 and But then and Thus,
+ W1) /3 + Therefore, fll€A(fl + = /3 +
Since /3 + (flI€A W1) e d(V), the proof of (a) is complete. fl
We can generalize Theorem 5.8(d) one step further by introducing the
concept of an affine map between two vector spaces. If cc eV, then by translation
through cc, we shall mean the function Sa: V —÷ V given by Sjfl) = cc + /3. Any
coset cc + W is just SJW) for the translation Note that when cc 5& is not a
linear transformation.

Definition 5.9: Let V and V' be two vector spaces over a field F. A function
f: V -÷ V' is called an affine transformation if f = for some Te HomF(V, V')
and some cc cV'. The set of all affine transformations from V to V' will be
denoted AffF(V, V').

Clearly, Homf(V, V') V') Theorem 5.8(d) can be restated as


follows:

Theorem 5.10: If Ac d(V) and fe V'), then f(A) e d(V'). E


Let us now return to the special subset V/W of d(V). The cosets of W can be
given the structure of a vector space. We first define a binary operation -- on
V/W by the following formula:

&-i-/J=cc+fJ
In equation 5.11, cc and /3 are vectors in V and & and /3 are their corresponding
equivalence classes. & -I- if is defined to be the equivalence class that contains
cc + /3. We note that our definition of & -1- 5 depends only on the equivalence
classes & and /3 and not on the particular elements cc e & and /3 e /3 (used to form
the right-hand side of 5.11). To see this, suppose cc

and — flare in W. Therefore, (cc1 + — (cc + fl)eW and cc1 + = cc + ft


Thus, -1-: V/W x V/W —* V/W is a well-defined function. The reader can easily
check that (V/W, -I-) satisfies axioms V1—V4 of Definition 1.4. is the zero
element of V/W, and — cc is the inverse of & under -I-. The function -1- is called
addition on V/W,_and, henceforth, we shall simply write + for this operation.
Thus, & + /3 = cc + /3 defines the operation of vector addition on V/W.
We can define scalar multiplication on V/W by the following formula:

5.12:

= xcc
42 LINEAR ALGEBRA

In equation 5.12, xe F and & e V/W. Again we observe that if e &, then
xx1 = xz. Thus (X, &) —* xx is a well-defined function from F x V/W to V/W. The
reader can easily check that scalar multiplication satisfies axioms V5—V8 in
Definition 1.4. Thus, (V/W,(&, fi) -+ & + fi, (x, &) -÷ x&) is a vector space over F.
We shall refer to this vector space in the future as simply V/W.
Equations 5.11 and 5.12 imply that the natural map H: V V/W given by
H(z) = & is a linear transformation. Clearly, H is surjective and has kernel W.
Thus, if i: W —' V denotes the inclusion of W into V, then we have the following
short exact sequence:

II
O-*W____ ->V/W-*O

In particular, Theorem 3.33 implies the following theorem:

Theorem 5.14: Suppose V is a finite-dimensional vector space over F and W a


subspace of V. Then dim V = dim W + dim V/W. S

We shall finish this section on quotients with three theorems that are
collectively known as the isomorphism theorems. These theorems appear in
various forms all over mathematics and are very useful.

Theorem 5.15 (First Isomorphism Theorem): Let T e HomF(V, V'), and suppose
W is a subspace of V for which T(W) = 0. Let fl: V —* V/W be the natural map.
Then there exists a unique I e HomF(V/W, V') such that the following diagram
commutes:

5.16:

v/w
Proof? We define I by T(&) = T(cz). Again, we remaind the reader that & is a
subset of V containing cx. To ensure that our definition of I makes sense, we
must argue that T(cz1) = T(x) for any If e&, then; — oceW. Since T
is zero on W, we get T(x1) = Thus, our definition of I(&) depends only
on the coset & and not on any particular representative of a. Since
T(x& + yfl) = T(xx + yfl) = T(xcz + y/J) = xT(x) + yT(/3) = XT(ä) + yT(/3), we
QUOTIENT SPACES AND THE ISOMORPHISM THEOREMS 43

see I e Hom(V/W, V'). TH(cz) = 1(ä) = T(x) and so 5.16 commutes. Only the
uniqueness of I remains to be proved.
If T' e Hom(V/W, V') is another map for which T'H = T, then I = T' on Im
H. But H is surjective. Therefore, I = T'. fl
Corollary 5.17: Suppose T e V'). Then Tm T V/ker T.
Proof We can view T as a surjective, linear transformation from V to Tm T.
Applying Theorem 5.15, we get a unique linear transformation
I: V/ker T Tm T for which the following diagram is commutative:

T
V >ImT

V/ker T

In 5.18, H is the natural map from V to V/ker T. We claim I is an isomorphism.


Since IH = T and T: V —+ Tm T is surjective, I is surjective. Suppose & e ker I.
Then T(cz) = TH(cz) = 1(ä) = 0. Thus, e ker T. But, then fl(cz) = 0. Thus, & = 0,
and I is injective. LI
The second isomophism theorem deals with multiple quotients. Suppose W is
a subspace of V and consider the natural projection H: V V/W. If W' is a
subspace of V containing W, then H(W') is a subspace of V/W. Hence, we can
form the quotient space (V/W)/H(W'). By Corollary 5.17, H(W') is isomorphic to
W'/W. Thus we may rewrite (V/W)/H(W') as (V/W)/(W'/W).

Theorem 5.19 (Second Isomorphism Theorem): Suppose W c W' are sub-


spaces of V. Then V/W'.

Proof Let H: V -÷ V/W and H': V/W —* (V/W)/H(W') be the natural pro-
jections. Set T = H'H: V (V/W)/H(W'). Since H and H' are both surjective,
T is a surjective, linear transformation. Clearly, W' c ker T. Let e ker T.
Then 0 = H'H(x). Thus, & = H(cz)e H(W'). Let fleW' such that H(J3) = H(cz).
Then H(fl—x)=0. Thus, fl—rxekerH=WcW'. In particular, zeW'.
We have now proved that ker T = W'. Applying Corollary 5.17, we have
(V/W)/H(W') = Tm T V/ker T = V/W'. LI

The third isomorphism theorem deals with sums and quotients.

Theorem 5.20 (Third Isomorphism Theorem): Suppose W and W' are sub-
spaces of V. Then (W + W')/W W'/(W n W').
44 LINEAR ALGEBRA

Proof? Let H: W + W' —÷ (W + W')/W be the natural projection. The inclusion


map of W' into W + W' when composed with H gives us a linear transformation
T: W' —* (W + W')/W. Since the kernel of H is W, ker T = W n W'. We claim T
is surjective. To see this, consider a typical element y e(W + W')/W. y is a coset
of W of the form y = 6 + W with 68W + W'. Thus, 6 = + fi with ci eW and
fleW'. But ci + W = W. So, y = 6 + W = (fi + ci) + W = fl + W. In particular,
T(fl) = fl + W = y, and T is surjective. By - Corollary 5.17,
(W+W')/W=ImTh W'/kerT=W'/WnW'. E
We close this section with a typical application of the isomorphism theorems.
Suppose V is an internal direct sum of subspaces V1,..., Thus,
V= Since Theorem 5.20 implies
V/V1 =(V1 = = Vie...
$ $ Here the little hat (-) above V1 means is not present in this
sum.

EXERCISES FOR SECTION 5

(1) Suppose fe F). 1ff 5& 0, show V/ker F.

(2) Let T e Hom(V, V) and suppose T(tx) = cx for all ci 8W, a subspace of V.
(a) Show that T induces a map Sc Hom(V/W, V/W).
(b) If S is the identity map on V/W, show that R = T — 'v has the property
that R2 = 0.
(c) Conversely, suppose T = 1v + R with Re Hom(V, V) and R2 = 0.
Show that there exists a subspace W of V such that T is the identity on
W and the induced map S is the identity on V/W.
(3) A subspace W of V is said to have finite codimension n if dim V/W = n. If
W has finite codimension, we write codim W c cc. Show that if W1 and
W2 have finite codimension in V, then so does W1 n W2. Show
codim(W1 n W2) codim W1 + codim W2.
(4) In Exercise 3, suppose V is finite dimensional and codim W1 = codim W2.
Show that dim(W1/W1 n W2) = dim(W2/W1 n W2).
(5) Let T e Hom(V, V'), and suppose T is surjective. Set K = ker T. Show there
exists a one-to-one, inclusion-preserving correspondence between the
subspaces of V' and the subspaces of V containing K.
(6) Let Te Hom(V, V'), and let K = ker T. Show that all vectors of V that have
the same image under T belong to the same coset of V/K.
(7) Suppose W is a finite-dimensional subspace of V such that V/W is finite
dimensional. Show V must be finite dimensional.
EXERCISES FOR SECTION 5 45

(8) Let V be a finite-dimensional vector space. If W is a subspace with


dim W = dim V — 1, then the cosets of W are called hyperplanes in V.
Suppose S is an affine subspace of V and H = cc + W is a hyperplane. Show
that if H n S = 0, then S ft + W for some ftc V.
(9) If S = + We d(V), we define dim S = dim W. Suppose V is finite
cc

dimensional, H a hyperplane in V, and Se d(V). Show that S n H


0 dim(S n H) = dim S— 1. Assume S H.
(10) Let Sed(V) with dimS = m — 1. Show that S = {>Jt1 x1ccjlLr=1 = l}
for some choice of m vectors cc1,. .., C V.

(11) Suppose C = {(V1, d1) lie Z} is a chain complex. For each i €7, set
= ker d1/Im H1(C) is called the ith homology of C.
(a) Show that C is exact if and only if H1(C) = 0 for all i eZ.
(b) Let C={(V1,dJlieZ} and C'= be two chain
complexes. Show that any chain map T = C —+ C' induces a
linear transformation T1: H1(C) H1(C') such that 11(cc + Im

d
—>V1--÷0
is a finite chain complex. Show that 1)' dim H1(C) =
— i)i dim V1. Here each V1 is assumed finite dimensional.

(12) Suppose V is an n-dimensonal vector space over F, and W1,..., are


subspaces of codimension e1 = n — dim(WJ. Let = + W1 for cc1

= 1,..., k. If n... n = q5, show dim(W1 n ... n Wk) > n — e1.

(13) Use Exercise 12 to' prove the following assertion: Let = cc1 + W1 and
= cc2 + W2 be two cosets of dimension k [i.e., dim(W1) = k]. Show that
and are parallel (i.e., W1 = W2) if and only if and are contained
in a coset of dimension k + 1, and have empty intersection.
(14) In IV, show that the intersection of two nonparallel planes (i.e., cosets of
dimension 2) is a line (i.e., a coset of dimension 1). The same problem makes
sense in any three-dimensional vector space V.
(15) Let and S3 be planes in IV such that n n S3 = 0, but no two
are parallel. Show that the lines n n S3 and n S3 are parallel.
(16) LetW={pflpeR[X]}. Show
that W is a subspace of Show that dim(R[X]/W) = n.(Hint: Use the
division algorithm in
(17) In Theorem 5.15, if T is surjective and W = ker T, then T is an isomorph-
ism [prove!]. In particular, S = (T)-'is a well-defined map from V' to
V/W. Show that the process of indefinite integration is an example of such
a map S.
46 LINEAR ALGEBRA

6. DUALS AND ADJOINTS

Let V be a vector space over F.

Definition 6.1: V* = HomF(V, F) is called the dual of V.

If V is a finite-dimensional vector space over F, then it follows from Theorem


3.25 that V* is finite dimensional with dim V* = dim V. We record this fact with
a different proof in 6.2.

Theorem 6.2: Let V be finite dimensional. Then dim V* = dim V.

Proof Let = ocj be a basis of V. For each i = 1,..., n, define an


element xr e V* by czr = Here V —. P is the isomorphism determined
by the basis and P F is the natural projection onto the ith coordinate of
P. Thus, if x = + + then xr is given by

cxt(x1cz1 + ... + = x1

We claim that z* = . . , is a basis of V*. Suppose = 0. Let


j e {1,..., n}. Then equation 6.3 implies 0 =
y1 = = 0, and is linearly independent over F.
If T e V*, then T = This last equation follows immediately
from 6.3. Thus, = V*, and is a basis of V*. In particular,
fl
The basis z* = constructed in 6.3 is called the dual basis of
Thus, every basis of a finite-dimensional vector space V has a corresponding
dual basis of V*. Furthermore, V V* under the linear map T, which sends
every e to the corresponding ;!k e x*.
If V is not finite dimensional over F, then the situation is quite different.
Theorem 6.2 is false when dim V = cc. If dim V = cc, then dim V* > dim V.
Instead of proving that fact, we shall content ourselves with an example.

Example 6.4: Let V = eF, that is, V is the direct sum of the vector spaces
= Fli e N}. It follows from Exercise 2 of Section 4 that V* =
F, F) flit' HomF(F, F) F. From Theorem 4.13, we
know that dim V = NI. A simple counting exercise will convince the reader
that dim V* = dim(flft1 F) is strictly larger than IN. LI

Before stating our next result, we need the following definition:


DUALS AND ADJOINTS 47

Definition 6.5: Let V, V', and W be vector spaces over F, and let w: V x V' W
be a function. We call w a bilinear map if for all cV, w(; ) e HomF(V', W) and
for all ,6e V', co(, fJ)e HomF(V, W).

Thus, a function co: V x V' -÷ W is a bilinear map if and only if


w(xcz1 + Ycx2, J3) = xw(z1, /1) + yw(cx2, /3), and co(cz, + + y/32) = x@(cx,
yw(cx, for all; yeF, ; ocr, cx2eV and /3,
fl2eV'. If V is any vector space
over F, there is a natural bilinear map ax V x V" —* F given by

6.6:

w(x, T) = T(z)

In equation 6.6, e V and T e V*. The fact that w is a bilinear map is ob-
vious. w determines a natural, injective, linear transformation i/i: V —. V**
in the following way. If eV, set ç&(z) = w(cx, ). Thus, for any T e
= @(cz, T) = T(cz). If X, ye F, and T e V*, then
/3 e V
+ yfl)(T) = w(xcz + y/J, T) = xco(cz, T) + yw(fl, T) = (xI/ar(z) +
i/i(xx
Consequently, if, e HomF(V, V**). To see that ç& is injective, we need to
generalize equation 6.3. Suppose = lie A} is a basis of V (finite or infinite).
Then for every i e A, we can define a dual transformation czr e V* as follows: For
each nonzero vector x e V, there exists a unique finite subset

are all nonzero scalars in F. We then define cxt(z) = Xjk jf i =


k = 1,..., n. Ifi*A(cz), we set = 0.1hz = 0, we of course define z7(cx) = 0.
Clearly xr and

Ii if i=j
if

Now if e ker then T(x) = 0 for all T e W. In particular, = 0 for all i eA.
This clearly implies = 0, and, thus, is injective.
We note in passing that the set = lie A} c which we have just
constructed above, is clearly linearly independent over F. If dim V < cc, this is
just the dual basis of V" coming from If dim V = cc, then does not span
and, therefore, cannot be called a dual basis. At any rate, we have proved the
first part of the following theorem:

Theorem 6.7: Let V be a vector space over F and suppose co: V x F is the
bilinear map given in equation 6.6. Then the map i/i: V —+ given by
i/i(z) = co(z,') is an injective linear transformation. If dim V <cc, then ç& is a
natural isomorphism.
48 LINEAR ALGEBRA

Proof Only the last sentence in Theorem 6.7 remains to be proved. If


dimV c co, then Theorem 6.2 implies dimV = dim <cc. Since is
injective, our result follows from Theorem 3.33(b). fl
The word "natural" in Theorem 6.7 has a precise meaning in category theory,
but here we mean only that the isomorphism i/i: V is independent of any
choice of bases in V and The word "natural" when applied to an
isomorphism çfr: V —* also means certain diagrams must be commutative.
See Exercise 4 at the end of this section for more details. We had noted
previously that when dim V c oo, then V This type of isomorphism is not
natural, since it is constructed by first picking a basis cx = {cxi,. . . , cxj of V and
then mapping cx1 to czr in
V x —÷ F can also be used to set up certain
correspondences between ø°(V) and £0(v*).

Definition 6.8: If A is any subset of V, let A' = {fleV* w(x, /1) = 0 for all cxeA}.

Thus, A' is precisely the set of all vectors in that vanish on A. It is easy to
see that A' is in fact a subspace of V*. We have a similar definition for subsets of

Definition 6.9: If A is a subset of Let A' = {cz e V I co(x, /1) = 0 for all fle A}.

Thus, if A V*, then A' is the set of all vectors in V that are zero under the
maps in A. Clearly, A' is a subspace of V for any A

Theorem 6.11k Let A and B be subsets of V (or V*),

(a) A c B implies A' B'.


(b) L(A)' = A'.
(c) (AuB)'=A'nB'.
(d) A c A".
(e) If W is a subspace of a finite-dimensional vector space V, then
dimV = dimW + dimW1.

Proof (a)—(d) are straightforward, and we leave their proofs as exercises. We


prove (e). Let .. , be a basis of W. We extend this set to a basis
= .., of V. Thus, dimW = m and dimv = n. Let
= be a dual basis of We complete the proof of (e) by arguing
that is a basis of W'.
If m + 1 cj (n, then cxf(cz1) = 0 for i = 1,...,m. In particular,
Since c is linearly inde-
pendent over F. We must show L({c4+1,. .., = W'. Let fleW'. Then
J3 = Since . .,cxmeW, we have for any j = 1,...
DUALS AND ADJOINTS 49

0 = = (E?= i = = Thus, 11 = 1 e
S
If T e W), then T determines a linear transformation
T* e HomF(W*, Vt), which we call the adjoint of T.
Definition 6.11: Let T e W). Then T* e Homr(W*, V*) is the linear
transformation defined by T*(f) = IT for all fe
W the linear transformations f and T is
again a linear map from V to F, we see T*(f) e V*. If x, ye F and f1, f2 e then
T*(xf1 + yf2) = (xf1 + yf2)T = x(f1T) + y(f2T) = xT*(f1) + yT*(f2). Thus, T* is a
linear transformation from
W The T* from
T —.
—. Homr(W*,V*) is an injective transformation. If V and W are
finite dimensional, then this map is an isomorphism.

Proof Let x: Hom(V, W) —' Hom(W*, V*) be defined by x(T) = T*. Our
comments above imply x is a well-defined function. Suppose x,y e F,
T1, T2 e Hom(V, W), and fe Then x(xT1 + yT2)(f) = (xT1 + yT2)*(f) =
gxT1 + yT2) = xQT1) + yQT2) = xTt(f) + = (xTt + yTfl(f) =
(xx(T1) + Thus, x(xT1 + yT2) xx(T1) + and x is a linear
transformation.
Suppose T e ker x. Then for every fe W*, 0 = x(T)(f) = T*(f) = IT. Now if we
follow the same argument given in the proof of Theorem 6.7, we know that if fi is
a nonzero vector in W, then there exists an fe W* such that f(fl) 0. Thus,
11' = 0 for all fe W* implies Im T = (0). Therefore, T = 0, and x is injective.
Now suppose V and W are finite dimensional. Then Theorems 6.2 and 3.25
imply W)} = dim{HomF(W*, V*)}. Since x is injective, Theorem
3.33(b) implies x is an isomorphism. 5

We note in passing that forming the adjoint of a product is the product of


the adjoints in the opposite order. More specifically, suppose T e HomF(V, W)
and Sc Z). Then STe HomF(V, Z). If fe Z*, then (ST)*(f) =
f(ST) = (fS)T = T*(fS) = T*(S*(f)) = T*S*(f). Thus, we get equation 6.13:

6.13:

(ST)* = T*S*

The connection between adjoints and Theorem 6.10 is easily described.


Theorem 6.14: Let T e HomF(V, W). Then

(a) (Im T*)± = ker T.


(b) ker T* = (Im T)'.
50 LINEAR ALGEBRA

Proof (a) Let e(Im and suppose co: V x —' F is the bilinear map
defined in equation 6.6. Then o(rz, Im Ti = 0. Thus, for all fe
o = o(cz, T*(f)) = fT) = fT(cz) = gT(x)). But we have seen that
gT(tx)) = 0 for all feWt implies T(x) = 0. Thus, e ker T. Conversely,
if e ker T, then 0 = f(T(cz)) = w(x, Tt(f)) and e(Im T*)±.
(b) Suppose fe ker T*. Then 0 = T*(f) = fT. In particular, =
O for all e V. Therefore, 0 = f) and fe(Im T)'. Thus,
ker T* (Im T)-'-. The steps in this proof are easily reversed and so
fl
Theorem 6.14 has an interesting corollary. If T e HomF(V, W), let us define
the rank of T, rk{T}, to be dim(Im T). Thus, rk{T} = dim(Im T). Then we have
the following:

Corollary 6.15: Let V and W be finite-dimensional vector spaces over F, and let
TeHomF(V,W). Then rk{T} = rk{T*}.

Proof The following integers are all equal:

rk{T} = dim(Im T) = dim V — dim(ker T) [Theorem 3.33(c)]


= dim V— dim{(Im T*)±} [Theorem 6.14]
= dim V* — dim{(Im T*)±} [Theorem 6.2]
= dim(Im T*) [Theorem 6.10(e)]
= rk{T*} E

Corollary 6.15 has a familiar interpretation when we switch to matrices. If


is any basis of V and /3 any basis of W, then Theorem 3.25 implies
rk{T} = rk(f(oc, /3)(T)). Let A = JTfr, fl)(T). In Theorem 6.16 below, we shall show
that the matrix representation of T*: with respect to /3* and is given
by the transpose of A. Thus, f(/J*, cx*)(T*) = At. In particular, Corollary 6.15 is
the familiar statement that a matrix A and its transpose At have the same rank.
Theorem 6.16: Let V and W be finite-dimensional vector spaces over F. Suppose
x and are bases of V and W, respectively. Let ct* and 13* be the corresponding
dual bases in and W*. Then for all T e HomF(V, W), we have

6.17:

f(fJ*, = fl)(T))t

Proof Suppose = ., .and /3 = •, .


Set A= fl)(T). Then
A= e Mm x JF), and from 3.24, we have = for all
j=1,...,n.
EXERCISES FOR SECTION 6

12(13*, x*)(T*) is the n X m matrix that makes the following diagram


commute:
T*

r(p, a*XT*)
Mmxi(F) -

The transpose of A is the n X m matrix At = (bpq), where bpq = aqp for all
p = 1,. . , n, and q = 1,..., m. It follows from 3.24 that f(f3*,
. = At
provided that the following equation is true:

forall q=1,...,m
Fix q = 1,..., m. To show that T*(f3) and are the same vector in
it suffices to show that these two maps agree on the basis of V. For any
r = 1,..., n, (T*(f3))(;) = 13'(T@r)) = = >JtI a1fi($1) = aqr. On
the other hand, i = i = brq = aqr. Thus, equation
6.18 is established, and the proof of Theorem 6.16 is complete. E

EXERCISES FOR SECTION 6

(1) Prove (a)—(d) in Theorem 6.10.

(2) Let V and W be finite-dimensional vector spaces over F with bases and fi,
respectively. Suppose T HomF(V, W). Show that rk{T} = rk(F(x, f3)(T)).
(3) Let 0 fJeV and feV* — (0). Define T: V —* V by T(cz) = f(cz)fi A func-
tion defined in this way is called a dyad.
(a) Show T e Hom(V, V) such that dim(Im T) = 1.
(b) If 5€ Hom(V, V) such that dim(Im 5) = 1, show that S is a dyad.
(c) If T is a dyad on V, show that Tt is a dyad on V*.
(4) Let V and W be finite-dimensional vector spaces over F. Let V -÷ V**
and W —* W** be the isomorphisms given in Theorem 6.7. Show that
for every T e Hom(V, W) the following diagram is commutative:

T**
V** )
52 LINEAR ALGEBRA

(5)

Let A = {f1, . , . fj
c V* and suppose g e V* such that g vanishes on A'.
.

Show g e MA). [Hint: First assume dim(V) < co; then use Exercise 3 of
Section 5 for the general case.]
(7) Let V and W be finite-dimensional vector spaces over F, and let
ox V x W —* F be an arbitrary bilinear map. Let T: V —' W* and
5: W -÷ V* be defined from ca as follows: T(cx)(/1) = co(cx, /3) and
s(fl)(cx) = co(cx, /3). Show that S = T* if we identify W with W** via
(8) Show that WI.
(9) Let V be a finite-dimensional vector space over F. Let W = V ® V*. Show
that the map (cx,f1) —' (/3, cx) is an isomorphism between W and W*.
(10) If
5 T
O-.V —>W

is a short exact sequence of vector spaces over F, show that

T* 5*

is exact.
(11) Let {W1 lie Z} be a sequence of vector spaces over F. Suppose for each
i eZ, we have a linear transformation e1 e HomF(WI, Then
D = {(W1, e1) lie 7L} is called a cochain complex if e1 + = 0 for all i eZ. D
is said to be exact if Ime1 = for all ie7L.
(a) If C = {(C1, d3liel} is a chain complex, show that C* =
{(Cr, e1 = lie Z} is a cochain complex.
(b) If C is exact, show that C* is also exact.
(12) Prove
V be a finite-dimensional vector space over F with basis =
Define by Show that
T*(f)
= (f)? for all fe V*. Here you will need to identify with in a
natural way.
(14) Let {z1}t0 be a sequence of complex numbers. Define a map T: C[X] -. C
by =0 akX9 = >3 = 0 akzk. Show that T e(C[X])*. Show that every
Te(C[X])* is given by such a sequence.
(15) Let V = R[X]. Which of the following functions on V are elements in V*:
(a) T(p) = Sb p(X) dx.
(b) T(p) = Sb p(X)2 dx.
SYMMETRIC BILINEAR FORMS 53

(c) T(p) = Sb X2p(X) dx.


(d) T(p) = dp/dX.
(e) T(p) =
(16) Suppose F is a finite field (e.g., F1,). Let V be a vector space over F of
dimension n. For every m n, show the number of subspaces of V of
dimension m is precisely the same as the number of subspaces of V of
dimension n — m.
(17) An important linear functional on Mn x JF) is the trace map
Tr: Mn x JF) -÷ F defined by Tr(A) = a11 where A = Show that
TrQe(Mn xn(F))*.
(18) In Exercise 17, show Tr(AB) = Tr(BA) for all A, BE Mn x
(19) Let m,neNJ. Let Define
T
T e HomF(Fn, is given in this way for some f1
(20) Let V be a finite-dimensional vector space over C. Suppose cii,.. . , ci,, are
distinct, nonzero vectors in V. Show there exists a T e V* such that

7. SYMMETRIC BILINEAR FORMS

In this last section of Chapter I, we discuss symmetric bilinear forms on a vector


space V. Unlike the first six sections, the nature of the base field F is important
here. In our main theorems, we shall assume V is a finite-dimensional vector
space over the reals It
Let V be a vector space over an arbitrary field F.
Definition 7.1: By a bilinear form co on V, we shall mean any bilinear map
co: V x V —. F. We say co is symmetric if w(cz, /3) = co(/3, ci) for all ci, /3e V.

Example 7.2: The standard example to keep in mind here is the form
xj, (y1,..., yj) = x1y1. Clearly, o is a symmetric, bilinear form
onF". J
Suppose cv is a bilinear form on a finite-dimensional vector space V. Then for
every basis = {oc1,..., ci,,} of V, we can define an n x n matrix
M(w, ci)E whose (i,j)th entry is given by {M(co, = cii). In terms
of the usual coordinate map V —, M,, 1(F), cv is then given by the following
equation:

7.3:

co(/3, ö) =
54 LINEAR ALGEBRA

Clearly, w is symmetric if and only if M(w, is a symmetric matrix.

Definition 7.4: Suppose w is a bilinear form on V. The function q: V —' F defined


by = is called the quadratic form associated with w.

If V is finite dimensional with basis x = {x1,. .., then equation 7.3


implies = Here (x1, . . , xJ = and
= .

= M(co, x). Thus, is a quadratic homogeneous polynomial in the


coordinates x1,..., of That fact explains why q is called a quadratic form
on V. In Example 7.2, for instance, q((x1,..., xj) =
At this point, a natural question arises. Suppose cv is a symmetric, bilinear
form on a finite-dimensional vector space V. Can we choose a basis of V so
that the representation of cv in equation 7.3 is as simple as possible? What would
the corresponding quadratic form q look like in this representation? We shall
give answers to both of these questions when F = It For a more general
treatment, we refer the reader to [2].
For the rest of this section, we assume V is a finite-dimensional vector space
over R. Let cv be a symmetric, bilinear form on V.

Definition 7.5: A basis x = {x1,..., of V is said to be cv-orthonormal if

(a) co(x1, = 0 whenever i j, and


(b) ct*x1,x1)e{—1,0, 1}foralli= 1,...,n.
In Example 7.2, for instance, the canonical basis Ô = = (0,...,
1,..., = 1,..., n} is an cv-orthonormal basis of R". Our first
theorem in this section guarantees co-orthonormal bases exist.

Theorem 7.6: Let V be a finite-dimensional vector space over R and suppose cv


is a symmetric, bilinear form on V. Then V has an co-orthonormal basis.

Proof We proceed via induction on n = dim V. If V = (0), then the result is


trivial. So, suppose n = 1. Then any nonzero vector of V is a basis of V. If
w(x, = 0 for every e V, then any nonzero vector of V is an co-orthonormal
basis. Suppose there exists a fi e V such that co(fJ, # 0. Then c =
- 1/2 is a positive scalar in and {cfJ} is an cv-orthonormal basis of V.
Ico(fl, 11)1

Thus, we have established the result for all vector spaces of dimension I
over It
Suppose n> 1, and we have proved the theorem for any vector space over R
of dimension less than n. Since cv is symmetric, we have

7.7:

forall
SYMMETRIC BILINEAR FORMS 55

In equation 7.7, q is the quadratic form associated with w. Now if


w(x, x) = qQz) = 0 for all x e V, then 7.7 implies ca is identically zero. In this case,
any basis of V is an co-orthonormal basis. Thus, we can assume there exists a
nonzero vector JJeV such that co(/3, /3) 0. As in the case n = 1, we can then
adjust /3 by a scalar multiple if need be and find an ocr, 0 in V such that
1}.
Next define a linear transformation fe V* by
f is a nonzero map. Set N = ker f. Since f 0, and
P= f is surjective. Thus, Corollary 5.17 implies
1, P. In particular,
Theorem 5.14 implies dim N = dim V — 1. co when restricted to N is clearly a
symmetric bilinear form. Hence our induction hypothesis implies N has an cu-
orthonormal basis {oc1,. . . ,
= {x1,... ,
j.
is an co-orthonormal basis of V. Since
We claim
f(Q 0, oç, N. In particular, is linearly independent over P. Since
dimp(V) = n, is a basis of V. Conditions (a) and (b) of Definitions 7.5 are
satisfied for {x1,..., since this set is an co-orthonormal basis of N. Since
N = ker f, w(x1, xj = 0 for i = 1,..., n — 1. Thus, is an co-orthonormal basis
of V and the proof of Theorem 7.6 is complete. D

The existence of co-orthonormal bases of V answers our first question about


representing co. Suppose = . . . , ç} is an co-orthonormal basis of V. Then

the matrix M(co, is just an n x n diagonal matrix,


with q(x1)=w(x1,x1)e{—1,0, 1}. If iieV with xjt and
= yJ, then equation 7.3 implies 'i) = By
reordering the elements of if need be, we can assume

1 for i=1,...,p
q(ocj= —1 for i=p+1,...,p+m
0 for i=p+m+1,...,p+m+r
The vector space V then decomposes into the direct sum V = V1 ® V0 V1,
where and V1=
L({cz1,..., ;}).
Our quadratic form q is positive on V1 — (0), zero on V0, and negative on
V_1 — (0). For example, suppose fleV_1 — (0). Then /3 = +
for some xj,...,xmeF. Thus,
Since /3 0, some x1 is nonzero. Since = — 1 for all i = 1,. . . , m, we see
q(/3) <0.
The subspaces V1, V0, and are pairwise co-orthogonal in the sense that
co(V1, = 0 whenever i,j e { —1,0, 1} and i j. Thus, any co-orthonormal basis
of V decomposes V into a direct sum V = V_1 El? V0 ® V1 of pairwise co-
orthogonal subspaces The sign of the associated quadratic form q is constant
56 LINEAR ALGEBRA

on each — (0). An important fact here is that the dimensions of these three
subspaces, p, m, and r, depend only on cv and not on the particular cv-
orthonormal basis cx chosen.

Lemma 7.9: Suppose fi = { . , is a second cv-orthonormal basis of V,


.

and let V = W_1 ® W0 W1 be the corresponding decomposition of V. Then


= forj = —1,0,1.

Proof? W - is the subspace of V spanned by those for which q(fl1) = —1. Let
cxeW_1 n(V0 + V1). If cx 0, then <0 since cxeW_1. But cxeV0 + V1
implies q(cx) 0, which is impossible. Thus, ci = 0. So, W_1 n (V0 + V1) = (0).
By expanding the basis of W_1 if need be, we can then construct a subspace P of
V such that W_1 P, and P®(VO + V1) = V. Thus, from Theorem 4.9, we
have dim(W_ dim P = dim V — dim V0 — dim V1 = dim(V_ 1). Therefore,
dim(W_ C dim V_1. Reversing the roles of the W1 and V1 in this proof gives
dim(V_ dim(W_ Thus, dim(W_ = dim(V_ A similar proof shows
dim(W1) = dim(V1). Then dim(W0) = dim(V0) by Theorem 4.9. This completes
the proof of Lemma 7.9. D

Let us agree when discussing co-orthonormal bases of V always to order the


basis elements cc e according to equation 7.8. Then Lemma 7.9 implies that the
integers p, m, and r do not depend on but only on cv. In particular, the
following definition makes sense.

Definition 7.10: p — m is called the signature of q. p + m is called the rank of q.

We have now proved the following theorem:

Theorem 7.11: Let cv be a symmetric, bilinear form on a finite-dimensional


vector space V over R. Then there exists integers m and p such that if
= is any cv-orthonormal basis of V and = (x1,..., xjt, then
E -

A quadratic form q, associated with some symmetric bilinear form cv on V, is


said to be definite if = 0 implies = 0. For instance, in Example 7.2,
q((x1,..., xj) = is definite when F = R. If F = C, then q is not definite
since, for example, q((1, 0,..., 0)) = 0.
If q is a definite quadratic form on a finite-dimensional vector space V over R,
then Theorem 7.11 implies > 0 for all e V — (0) or <0 for all
e V — (0). In general, we say a quadratic form q is positive definite if > 0 for
all —(0). We say q is negative definite if <0 for all —(0).

Definition 7.12: Let V be a vector space over R. A symmetric, bilinear form cv on


V whose associated quadratic form is positive definite is called an inner product
onV.
EXERCISES FOR SECTION 7 57

Note in our definition that we do not require that V be finite dimensional. We


finish this section with a few examples of inner products.

Example 7.13: Let V = and define w as in Example 7.2. LI

Example 7.14: Let V = R, and define w by @((x1, x2,...),


(y1, y2,...)) = x1y1. Since both sequences {x1} and {y1} are eventually zero,
o is well defined and is clearly an inner product on V. El

Example 7.15: Let V = C([a, b]). Define o(f, g) = f(x)g(x) dx. Clearly, o is an
inner product on V. El

We shall come back to the study of inner products in Chapter V.

EXERCISES FOR SECTION 7

(1) In our proof of Lemma 7.9, we used the following fact: If W and W' are
subspaces of V such that W n W' = (0), then there exists a complement of
W' that contains W. Give a proof of this fact.
(2) I2tV = Mmxn(F),andletCeMmxm(F).DefinealllapCtXV x V—'Fbythe
formula o(A, B) = Tr(AtCB). Show that o is a bilinear form. Is o
symmetric?
(3) Let Define a map o:VxV-.F by o(A,B)=nTr(AB)
— Tr(A) Tr(B). Show that o is a bilinear form. Is o symmetric?
(4) Exhibit a bilinear form on R" that is not symmetric.
(5) Find a symmetric bilinear form on C" whose associated quadratic form is
positive definite.
(6) Describe explicitly all symmetric bilinear forms on BV.
(7) Describe explicitly all skew-symmetric bilinear froms on DV. A bilinear
form o is skew-symmetric if o(x, fi) = — o(fl, 4
(8) Let ox V x V —. F be a bilinear form on a finite dimensional vector space V.
Show that the following conditions are equivalent:
(a) {xeVlo(; fi) = 0 for all fleV} =(0).
(b) {xeVIo($,oc)=OforallfleV} =(0).
(c) M(o, isnonsingular for any basis of V.
We say o is nondegenerate if o satisfies the conditions listed above.
(9) Suppose o: V X V —. F is a nondegenerate, bilinear form on a finite-
dimensional vector space V. Let W be a subspace of V. Set
W'={czeVIo(oc,fl)=Oforall$eW}.ShowthatV=WEBW'.
58 LINEAR ALGEBRA

(10) With the same hypotheses as in Exercise 9, suppose fe V*. Prove that there
exists an e V such that f(fl) = @(x, /3) for all /3eV.
(11) Suppose cv: V x V -. F is a bilinear form on V. Let W1 and W2 be
subspaces of V. Show that (W1 + W2)' = Wt n W±, If cv is nondegen-
erate, prove that (W1 n W2)' = Wf +
(12) Let cv be a nondegenerate, bilinear form on a finite-dimensional vector
space V. Let cv' be any bilinear form on V. Show there exists a unique
Te V) such that co'(oc, /3) = cv(T(cz), /3) for all x, 13€ V. Show that cv'
is nondegenerate if and only if T is bijective.
(13) With the same hypotheses as in Exercise 12, show that for every
T e HomF(V, V) there exists a unique T' e V) such that
cv(T(à), /3) = cv(cz, T'(/J)) for all at, /3eV.
(14) Let Bil(V) denote the set of all bilinear forms on the vector space V. Define
addition in Bil(V) by (cv+ co%x, /3) = cv(at, /3) + cv'(at, /3), and scalar mult-
iplication by (xcv)(cz, /3) = xcv(at, /3). Prove that Bil(V) is a vector space over
F with these definitions. What is the dimension of Bil(V) when V is finite
dimensional?
(15) Find an co-orthonormal basis for R2 when cv is given by cv((x1, y1),
(x2, y2)) = x1y2 + x2y1.

(16) Argue that cv(f, g) = f(x)g(x)dx is an inner product on C([a, b]).


(17) Let Suppose cv:VxV-÷R is given by
cv(f, g) = f(x)g(x)dx. Find an cv-orthonormal basis of V.

(18) Let V be the subspace of C([—ir, it]) spanned by the functions 1, sin(x),
cos(x), sin(2x), cos(2x),..., sin(nx), cos(nx). Find an co-orthonormal basis of
V where cv is the inner product given in Exercise 16.
Chapter II

Multilinear Algebra

1. MULTILINEAR MAPS AND TENSOR PRODUCTS

In Chapter I, we dealt mainly with functions of one variable between vector


spaces. Those functions were linear in that variable and were called linear
transformations. In this chapter, we examine functions of several variables
between vector spaces. If such a function is linear in each of its variables, then
the function is called a multilinear mapping. Along with any theory of multilinear
maps comes a sequence of universal mapping problems whose solutions are the
fundamental ideas in multilinear algebra. In this and the next few sections, we
shall give a careful explanation of the principal constructions of the subject
matter. Applications of the ideas discussed here will abound throughout the rest
of the book.
Let us first give a careful definition of a multilinear mapping. As usual, F will
denote an arbitrary field. Suppose V1,..., V are vector spaces over F.
Let qS: V1 x x V be a function from the finite product V1 x x to
V. We had seen in Section 4 of Chapter I that a typical vector in V1 x x is

an n-tuple (x1,. .., aj with a1 e V1. Thus, we can think of 4) as a function of the n
variable vectors a1,..., oç1.

Definition 1.1: A function 0: V1 x x V is called a multilinear mapping


if for each 1,.. ., n, we have

(a) + xj = Ø(a1,..., aj + ., ..,


and
(b)

59
60 MULTILINEAR ALGEBRA

forj i,and xeF.Thus,afunction4:V1 x x -÷V


is a multilinear mapping if for all i e {1,..., n} and for all vectors a1 e V1,
a1_1eV1_1, we have 4(a1,...,a1_1,
aje HomF(Vl, V). Before proceeding further, let us give a few examples of
multilinear maps.

Example 1.2: If n = 1, then a function V1 —÷ V is a multilinear mapping if and


only if / is a linear transformation. Thus, linear transformations are just special
cases of multilinear maps. S

Example 1.3: If n = 2, then a multilinear map V1 x V2 —÷ V is what we called


a bilinear map in Chapter I. For a concrete example, we have w: V x —* F

given by w(a, T) = T(a) (equation 6.6 of Chapter I). S

Example 1.4: The determinant, det(A), of an n x n matrix A can be thought of


as a multilinear mapping 4): F x x F —÷ F in the following way: If
a1 = (a11,..., for i = 1,..., n, then set Ø(a1,..., aj = The fact
that 4) is multilinear is an easy computation, which we leave as an exercise at the
end of this section. 5

Example 1.5: Suppose A is an algebra over F with multiplication denoted by


afi for a, fleA. Let n i 2. We can then define a function 4u: A by
a multilinear mapping. 5

If 4): V1 x x V is a multilinear map and T is a linear transformation


from V to W, then clearly, T4): V1 x x —> W is again a multilinear map.

We can use this idea along with Example 1.5 above to give a few familiar
examples from analysis.

Example 1.6: Let I be an open interval in l1. Set = C"(I). Thus,


consists of those fe C(I) such that f is infinitely differentiable on I.
Clearly, is an algebra over when we define vector addition
[(f + gXx) = f(x) + g(x)], scalar multiplication [(yf)(x) = yf(x)], and algebra
multiplication [(fg)(x) = f(x)g(x)] in the usual ways. Let D: —÷ C91) be the

function that sends a given fe to its derivative f'. Thus, D(f) = f'. Clearly,

Let n e 1%J. Define a map 4): —÷ by Ø(f1, ..., = D(f1 .. . fj.


Our comments immediately proceeding this example imply that 4) is a
multilinear mapping. S

Example 1.7: Let [a, b] be a closed interval in k and consider C([a, b]). Clearly,
C([a, b]) is an k-algebra under the same pointwise operations given in Example
1.6. We can define a multilinear, real valued function c/i: C([a, —÷ R by
MULTILINEAR MAPS AND TENSOR PRODUCTS 61

Let us denote the collection of multilinear mappings from V1 x x


V by Mu1F(Vl x x V). If Z = V1 x x then clearly,
MulF(Vl x x V) is a subset of the vector space Vz. In particular, if
f, g e Mu1F(Vl x x Vi,, V) and x, ye F, then xf + yg is a vector in VZ. A simple
computation shows that xf + yg is in fact a multilinear mapping. This proves the
first assertion in the following theorem:

Theorem 1.8: Let V1,..., V be vector spaces over F. Set Z =

(a) Mu1F(VI x x V) is a subspace of


(b) If n 2, {Mu1F(Vl x x V,,, V)} n {HomF(Vl x x V,,, V)} = (0).

Proof We need prove only (b). Suppose 4): V1 x x V is a multi-


linear mapping that is also a linear transformation. Fix i = 1,. .., n.
Since 4) is multilinear, we have 4)(a1,..., + .., aj = 4*x1,..., a,J
-

+ qS(a1,..., aj. Since 4) is linear, we have 4*x1,...,a1 + at,..., =


at,..., as,) + 4)(O,..., at,..., 0). Comparing the two results
gives aj=440,..., a1,..., 0). Again since 4) is linear, we have
4(a1,..., as,..., 0). Thus, 44a1,..., a1_1, 0,
...,aj=O.
Now n 2, the are arbitrary, and so is the index i. Therefore, for any
(a1'..., ajeV1 we have aj=4)(O, a2,...,
+4)(a1,0,...,O)=0+O=0. ci

Theorem 1.8(b) says that in general (i.e., when n ) 2) a nonzero, multi-


linear mapping 4): V1 x x V is not a linear transformation from
V1 x x to V, and vice versa. We must always be careful not to confuse
these two concepts when dealing with functions from V1 x x to V.
We have seen in Examples 1.6 and 1.7 that one method for constructing
multilinear mappings on V1 x x is to choose a fixed multilinear map
from V1 x x V and then compose it with various linear trans-
formations from V to other vector spaces. A natural question arises here. Can we
construct (with possibly ajudicious choice of V) all possible multilinear maps on
V1 x x by this method? This question has an affirmative answer, which
leads us to the construction of the tensor product of V1,..., Before
proceeding further, let us give a precise statement of the problem we wish to
consider.

1.9: Let V1,. .., be vector spaces over F. Is there is a vector space V (over F)
and a multilinear mapping 4): V1 x x V such that if
ci,: V1 x x —> W is any multilinear mapping on V1 x x then there
exists a unique linear transformation T e HomF(V, W) with TØ =
62 MULTILIN EAR ALGEBRA

Question 1.9 is called the universal mapping problem for multilinear


mappings on V1 x x In terms of commutative diagrams, the universal
problem can be stated as follows: Can we construct a multilinear map
4): V1 x x V with the property that for any multilinear map
Vi: V1 x x —÷ W there exists a unique T e HOmF(V, W) such that the
following diagram is commutative:

1.10:

v1x...xVn

Notice that a solution to 1.9 consists of a vector space V and a multilinear map
4': V1 x x —> V. The pair(V, 4)) must satisfy the following property: If W is

any vector space over F and i/i: V1 x x —÷ W is any multilinear mapping,

then there must exist a unique, linear transformation T: V —÷ W such that 1.10
commutes.
Before constructing a pair (V, 4)) satisfying the properties in 1.9, let us make
the observation that any such pair is essentially unique up to isomorphism. To
be more precise, we have the following lemma:

Lemma 1.11: Suppose (V, 4') and (V', 4)') are two solutions to 1.9. Then there
exist two isomorphisms T1 e HomF(V, V') and T2 e HomF(V', V) such that

(a) T1T2 = = and


(b) the following diagram is commutative:

1.12:

V1x...xVn V

T1

Proof? Since (V, 4)) is a solution to 1.9 and 4": V1 x x —' V' is a multi-
linear map, there exists a unique T1 eHomF(V, V') such that T14' = 4".
Similarly, there exists a unique T2 e V) such that T24)' = 4. Putting
MULTILINEAR MAPS AND TENSOR PRODUCTS 63

the two obvious diagrams together, we get


1.13:

vIx...xvn

is commutative. Now in diagram 1.13, we can replace T2T1 with L, the identity
map on V. Clearly, the diagram stays commutative. Since (V, 0) satisfies 1.9,
there can be only one linear transformation from V to V making 1.13
commutative. We conclude that T2T1 = L,,. Similarly, T1T2 = and the proof
of the lemma is complete. U
Thus, if we find any solution to 1.9, then up to isomorphism we have found
them all. We now turn to the matter of constructing a solution. We need to recall
a few facts about direct sums.
Suppose A is a nonempty set. Then we can construct a vector space U over
F and a bijective map p: A —> U such that p(A) is a basis of U. To see this, set
U= F. Thus, U is the direct sum of Al copies of F. For each i eA, let be
the vector in U defined by = 0 ifj i, and ö1(i) = 1. It follows from Theorem
4.13 in Chapter I that B = {ö1li E A} is a basis of U. The map 'p: A —> B given by
qi(i) = is clearly bijective.
Now suppose A itself is a vector space over F. Then in U = F, we have
vectors of the form ö(j1+...+ij — — — and —
x e F. We shall employ these ideas in the construction of a solution to 1.9.
Let V1,..., Vn be vector spaces over F, and, for notational convenience, set
Z= V1 X x A typical element in the set Z is an n-tuple of the form
(ott,..., with eV1. Set U = ajeZ F. Thus, U is the direct sum of IZI
copies of F. As we observed above, U has a basis of the form
eQ V1 x x Vj. Let U0 be the subspace of U spanned
by all possible vectors of the following two types:

1.14:

a,,) — a,,..., a,) a',..., a,,)

and

xaj a,,) — Xt$(a1 a,,..., a,,)

In 1.14, i can be any index between 1 and n,


tXn) any elements of Z and x any scalar in F.
64 MULTILINEAR ALGEBRA

Set V = U/U0, the quotient space of U by U0. There is a natural map


x x given by = + U0. Thus,
czj is just the coset in U/U0 containing the vector We can
now prove the following lemma:
Lemma 1.15: (V, 0) is a solution to 1.9.

Proof Clearly, V is a vector space over F. We must first argue that 4) is a


multilinear mapping. This follows immediately from 1.14 and the definition of
U0. We have

= (ö(cx1 + + U0
= (ô(; + U0) + (kxi a) + U0)

Also,

= xa1 aj ÷ U0 = + Uo
= X(ö(a1 + U0)

Thus, 4) is a multilinear mapping.


Now suppose W is another vector space over F and ç&: V1 x x —> W
a multilinear mapping. We must construct a unique linear transforma-
tion T: V —÷ W such that T4 = çb. To do this, we recall that B =
{b(a1,..., ajI(iXi,. . , x x is a basis of U. It follows from 3.23 of
Chapter I that there exists a unique T0 e HomF(U, W) such that To(b(ai =
for all (x1,..., Since i/i is multilinear, T0 van-
ishes on the generators (1.14) of U0. Thus, U0 ker T0. It now follows
from the first isomorphism theorem (Theorem 5.15 of Chapter I) that T0 in-
duces a linear transformation T: U/U0 -÷ W such that T(ö(a1 + U0) =
To(ô(a1 a)) = ifr(a1,..., for all (;,..., cc1) e Z. Since 4*x1,..., aj =
+ U0, we have =
Finally, suppose T' e HomF(V, W) and T'4) = cli. We must argue T' = T. Since
T'q5 = Tq5, we see T = T' on Im 4$. From our definitions, L(Im 4$)= V. There-
fore, T = T', and the proof of Lemma 1.15 is complete. S

Definition 1.16: The vector space U/U0 is called the tensor product of V1,...,
(over F) and will henceforth be denoted V1 ®F

When the field F is clear from the context, we shall drop it from our notation
and simply write V1 ® ® for the tensor product of V1,. . , .
MULTILINEAR MAPS AND TENSOR PRODUCTS 65

Definition 1.17: A coset b(a1 + U0 in the tensor product U/U0 =


V1 ®® will henceforth be written 0 0 an.

With these changes in notation, our multilinear map 4': V1 x


x ®® is given by 4'(a1,...,aj=a1 ®® We
shall refer to 4' as the canonical map of V1 x x into V1 ® 0 Since
4' is multilinear, we have the following relations in V1 0 ®

and

We also know from our construction of U/U0 that V1 ® ® V,, is spanned by


the image of 4'. Thus, every vector in V1 ® ® V,, is a finite sum of the form
Here aijeVi,...,atheVn for all i = 1,...,r.

Finally, let us restate Lemma 1.15 using our new notation.

Theorem 1.19: Let V1,..., Vn and W be vector spaces over F. Suppose


i/i: V1 x x Vn -÷ W is a multilinear mapping. Then there exists a unique
linear transformation TEHomF(Vl ® ® V,,, W) such that
E
We shall discuss various functorial properties of V1 ® in Section
2. But at this point, having introduced a new vector space V1 ® 0 we
want to at least give a basis of this space.

Theorem 1.20: Let V1,..., Vn be vector spaces over F, and suppose B1,..., Bn
are bases of V1,.. ., respectively. Then B = ®... ® e B1} is a basis

Proof We prove this theorem by using Lemma 1.11. Consider the set
B1 X X Bn={(fli,...,/JJIPjEB1}. Let V'= $qi1 We
have seen from our previous discussion that V' is a vector space over F
with basis ,fJjeB1 x
.
x Bn}. We define a function
.

x x by Now B1 X X Bnc
\"j X X Vn and each is the linear span of It follows that there
exists a unique multilinear function 4": V1 x x Vn —÷ V' such that
ø'(fli,...,fln)4'o(Pi,...,Pn)forall(fli,...,fln)GB1 X X Bn.
66 MULTILINEAR ALGEBRA

We claim that (V', 4?) satisfies 1.9. To see this, let i/i: V1 x x —, W be an
arbitrary multilinear mapping. Since 1 . , fin) e B1 x x Bn} is a
basis of V', it follows from 3.23 of Chapter I that there exists a unique linear
. , fij
transformation T: V' -÷ W such that T(ô(p1 /1)) = for all
(fir,. x x Bn. Then Tq5' = 1/i and clearly T is the unique linear
transformation for which this happens.
We now have two pairs (V', 4?) and (V1 ® ... ® 4) satisfying 1.9. Hence,
Lemma 1.11 implies there exists an isomorphism SeHomF(V', %T1 ® ... ® Vj
such that

v1x...xVn

V1®...®Vn

is commutative. Now for all fijeB1 x x Bn, we have


flu ® ® fin = flJ = Sø'(Th,..., /1,,) = pa). Since S is an iso-
morphism, it maps any basis in V' to a basis in V1 ® We conclude that
LI

Corollary 1.22: Suppose V1,... , are finite-dimensional vector spaces


over F. Let m1 = dim V1. Then V1 0 ® is finite dimensional, and
LI

In the exercises at the end of this section, the definition of an algebra


homomorphism will be needed. The definition of an associative algebra A (with
identity) over the field F was given in Definition 4.19 of Chapter I. Suppose A1
and A2 are two algebras over F.

Definition 1.23: A function p: A1 —÷ A2 is called an algebra homomorphism if


A2) and pOxfi) = -p(a)qi(fi) for all at,

Example 1.24: Let V be any vector space over F, and consider the two algebras
A1 = F[X] and A2 = t(V). Then every T e t(V) determines an algebra homo-
morphism A1 —÷ A2 defined as follows:

(pT(ao + a1X + + anXn) = + a1T + .. + anTn


The fact that is an algebra homomorphism is easy. Note that
EXERCISES FOR SECTION 1 67

'PT(l)= T° Thus, PT sends the multiplicative identity 1 of F[X] to the


multiplicative identity L,, of 1(V). fl

If the reader prefers matrices to linear transformations, we can construct a


similar algebra homomorphism Qc: A1 = F[X] —+ A3 = JF) for any matrix
Ce A3. Set Qda0 + a1X + + = a01 + a1C + + We shall use
these two types of algebra homomorphisms extensively in Chapter III. Other
examples of algebra maps will be considered in the exercises at the end of this
section.

EXERCISES FOR SECTION 1

(1) Complete the details of Example 1.4, that is, argue cz11) = is
a multilinear mapping.
(2) In the proof of Theorem 1.20, we used the following fact: if B1 is a basis of
and B1 x - - x —÷ V' is a set map, then 4o has a unique extension to

a multilinear map 4": V1 x x —÷ V1. Give a proof of this fact.


(3) Suppose V1,..., V are finite-dimensional vector spaces
over F with dim V1 = m1 and dim V = p. Show that
dimF{MuIF(Vl x x V)} = pm1
(4) Let (p:Fm x be defined as follows: Ifo=(xi,...,xm)eFm
and fi = (y1,. .., yj e let qi(a, fJ) be the m x n matrix whose (i,j)th entry
is Show that is a multilinear (i.e., bilinear) mapping.
(5) Let V1 = Mpxq(F), V2 Mmxn(F) and V3 = Mpmxqn(F). Define a map
4?: V1 x V2 -÷ V3 as follows: If A = V1 and B = (brs) e let p(A, B)
be the pm x qn matrix defined by the following block decomposition:

jttiiBHIaiqB
\apiBL"IapqB

Show that p is a bilinear mapping from V1 x V2 to V3. p(A, B) is usually


written A ® B and is called the Kronecker product of A and B.
(6) Show that MulF(Vl x x HomF(Vl ® ® V). Does this
give a simple proof of Exercise 3?
(7) Suppose V1,..., are vector spaces over F and for each i = 1,..., n,
let f1eVr. Show that i/rV1 x x given by çlr(cz1,...,xj=

is a multilinear mapping.
(8) Give an example of a multilinear mapping p: V1 x x V such that
68 MULTILINEAR ALGEBRA

(9) A vector a e V1 ®" ® is said to be decomposable if elm 0. Here is


the canonical map from V1 x x to V1 ®' ® Are all vectors in
V1 ® ® decomposable? If not, construct an example.
(10) Show that = at, in V1 iszero ifand onlyifsomec4
is zero.
(11) Suppose V is a finite-dimensional vector space over F. Let = {a1,. ..,
be a basis of V. Show that the isomorphism 1(V) —÷ given
in equation 3.24 of Chapter I is an algebra homomorphism.
(12) Let V be a vector space over F. For each integer n 0, we define as
follows:

(F if n=O
if n=1
if

Set = Show that Y(V) is an associative algebra over F


when we define multiplication of vectors in 3(V) by (x1 ® ® xj
($1 ® ' ® Pm) = ®'
® atn ® Ph ® " 0 flnv flY) is called the tensor
algebra of V.
(13) Show that the tensor algebra of V constructed in Exercise 12 has the
following universal mapping property: If A is any (associative) algebra over
F and T e HomF(V, A), then there exists a unique algebra homomorphism
(p: Y(V) -÷ A such that p(a) = T(cz) for all xe V.

(14) Let F be a field. We regard F as an algebra over F. Show that any algebra
homomorphism p: F —> F is either zero or an isomorphism.
(15) Suppose Ae is nonsingular. Define a map on by the
equation (p(B) = A 'BA. Show that p is an algebra homomorphism from
to that is bijective.
(16) With the same notation as in Exercise 15, suppose the map
-÷ given by = AB is an algebra homomorphism.
What can you say about A in this case?

2. FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS

In this section, we present a series of theorems that tell us how to manipulate


tensor products and use them in various applications. Our first theorem
essentially says that forming tensor products is an associative operation.

Theorem 2.1: Let V1 ,..., and W1, ..., Wm be vector spaces over F. Then
there exists an isomorphism T: (V1 ® ® Vj ® (W1 ®... ® Wm) '
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 69

such that
= a1 ®Pm.Her = 1,...,nand
j=1,...,m.
The proofs of the theorems in this section can usually be done in two different
ways. We can appeal to Lemma 1.11 or use Theorem 1.20. We shall present a
mixture of both types of proof here. Since we are dealing with vector spaces, we
could prove every theorem in this section by using Theorem 1.20. The advantage
to proceeding via Lemma 1.11 (i.e., a basis-free proof) is that this type of proof is
valid in more general situations (e.g., modules over commutative rings).

Proof of2.1: Let i = 1,..., n, be a basis ofV1. = 1,..., m, be abasis


of Applying Theorem 1.20, we have the following facts:

(a) F1 = x x is a basis of
V1®...®Vn.
(b) F2 = {f11 ® .,fl,JeC1 X X Cm} is a basis of

OCR,

X xC1 X X Cm} is a basis

F2} is a basis of (V1® Vj ® (W1® . ..® Wj.

Using 3.23 of Chapter I, we can construct a linear transformation


that T((a1® ® xjØ(f11® ®PnJ) = ® for all
(a1 ® ... ® f31 ®... ® x F2. Clearly (d) and (c) imply T is
an isomorphism. The fact that T((cx1 ® ... ® ® (fly ® ® =
for any a1eV1 and is now a straightfor-
ward computation that we leave to the exercises. U

Let us say a word about the proof of Theorem 2.1 via Lemma 1.11. There
is a natural multilinear mapping Vi: V1 x x x W1 x x Wm
given by
® ® ®... ® We could then argue that the pair
((V1 ®... ® Vj ® (W1 ®... ® Wm), i/') satisfies the universal mapping
property given in 1.9. Lemma 1.11 would then imply
via a
linear transformation T for which T((a1 ®... ® ® (/3 ®... ® /3)) =
®®®®® We ask the reader to provide the details of this
proof in the exercises at the end of this section.
There is a special case of Theorem 2.1 that is worth noting explicitly.

Corollary 2.2: (V1 ® V2) ® V1 ®(V2 ® V3).


70 MULTILINEAR ALGEBRA

By Theorem 2.1, both of these vector spaces are isomorphic to


Proof?
VI®V2®V3. fl
The point of Theorem 2.1 and Corollary 2.2 is that we can drop all
parentheses when forming successive tensor products of vector spaces. Our next
theorem says that forming tensor products is essentially a commutative
operation as well.
Theorem 2.3: Let (i1,. .., ij be a permutation of (1,..., n). Then there
exists an isomorphism T: V1 ® ® y1 ® ... ® V1 such that

Proof? The map i/i1:V1 x x given by


= a11 ®® aj, is clearly multilinear. Hence using the universal
mapping property of (V1 ® ... 0 4)), we have a unique linear transformation

T: V1 ® -+ \11 ®... ® V1 such that TqS = '/'I. Thus, ® =


çU1(a1,...,aJ=T4)(a1,...,
Now let 4)': V11 x x V1 —÷ be the canonical multilinear
V1
map. The map i/i2: V11 x x V1 given by i/i2(a11,..., aJ=
® ® a, is clearly multilinear. Hence there exists a unique linear trans-
formation T': V11 ® V1 —÷ V1 ® ® such that T'Ø' = '/'2. Thus,
a1 ® ® = i/i2(a1,..., aj,) = T'Ø'(a11,. . aj) = T'(a11 ®... ® ai). Clearly,
.
,

T and T' are inverses. J


We may view F itself as a vector space over F. We can then consider the
tensor product V F. Here V is an arbitrary vector space over F. Since { 1} is a
basis of F as a vector space over itself, Theorem 1.20 implies V ®F F V under
the map sending a ® x to ax. Similar remarks can be made for F ®F V. Thus, we
have the following theorem:

Theorem 2.4: F®FV. U


We next turn our attention to the relations between tensor products
and homomorphisms. Suppose that for each i = 1,..., n, we have a linear
transformation T1 e V1 V
and V a linear transformation. We then have a natural
multilinear mapping V1 x x -+ V'1 ® ®
given by 9(a1,
= T1(a1) ® ... ® Since each T1 is linear, p is clearly mul-
tilinear. It follows from Theorem 1.19 that there exists a unique linear trans-
formation S:V1® such that S4)=9. Here
4): V1 x x —÷ V1 ® is the canonical multilinear mapping given by
4'(a1,..., = a1 0® The map S is called the tensor product of the T1
Since S4) = p, we have
and is usually written S = T1 ®
for all (a1,..., aje
V1 x x Let us summarize this discussion with the following definition:
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 71

Definition 23: Suppose T1 e HomF(VI, for i = 1,..., n. T1 ® ® is the


linear transformation from V1 ® ® to ® ® defined by the
following equation:

(T1 ® ® ® 0 txj = T1(tx1)® 0 Tjtx,,)

In our notation, we have deliberately suppressed the field F. When dealing


with more than one field, we shall write T1 ®F
®F instead of the simpler notation used in 2.5.
In our next theorem, we gather together some of the more obvious facts
about tensor products of linear transformations.
Theorem 2.6: Let V1, and i = 1,..., n, be vector spaces over F. Suppose
T1, e HomF(VI, V) and e Vi). Then the following assertions are
true:

(a) If each T1 is surjective, so is T1 ® 0 Ti,.


(b) If each is injective, so is T1 ® ®
(c) If each T1 is an isomorphism, so is T1 ® ®
(d)
(e) If each T1 is an isomorphism, (T1 ®... 0 = TI1 0

(h)

Proof? (c)—(h) are all straightforward and are left to the reader. We prove (a) and

(a) It follows from our construction of the tensor product that ® ®


is spanned as a vector space over F by all vectors of the form tx'1 ® ® tx's
with (oc'1,...,txjeV'1 x x V's. Let x x Since
each T1 is surjective, there exists an e V1 such that T1(cx1) = tx. Thus,

Thus, 0 = L(lm(T1 ®... ® T1fl= Im(T1 ®... 0 T,,). Hence,


T1 ® 0 is surjective.
(b) Suppose T1 is injective for each i = 1,..., n. Let B1 = {cxjk k e A1} be a
basis of V1. Since T1 is injective, T1(B1) = {Tl(txlk) I k e A1} is a linearly
independent subset of V. In particular, T1(B1) is part of a basis of
(Theorem 2.6, Chapter I). It now follows from Theorem 1.20 that the set

is a linearly independent subset of ®0


72 MULTILINEAR ALGEBRA

Now let a e ker(T1 ® Tj. Again using Theorem 1.20, a can be


written uniquely in the following form:

2.7:

(k1 kjeA1 x x

In equation 2.7, every ck1 e F and all but possibly finitely many of
these scalars are zero. If we now apply T1 ® ® to a and use the fact
the B is linearly independent over F, we see ck1 = 0 for every
(k1,...,kjeA1 x x fl

Recall that a complex of vector spaces (over F),

S T
V ->V

is said to be exact if T is surjective, and im S = ker T. Suppose for each


= 1,..., n, we have an exact complex of the following form:

T,

Theorem 2.6(a) implies T1 ® V1 ® ® -± ® is a sur-


jective linear transformation. We want to identify the kernel of T1 ® ®
For each i = 1,..., n, we can consider the linear transformation
Let W1 = ® ® ® ® Ivj. Then is the subspace of V1 ®
® spanned by all vectors of the form fr1 ® S1(txfl ® ...
(a1,..., txç',.. ., ocjeV1 x x V7 x x We can then form the subspace
W = W1 + + V1 ® ® We can now prove the following lemma:
Lemma 2.9: W = ker(T1 ® ® Tj.
Proof Fix i = 1,.. . , n, and consider a typical generator fi = a1 ®
of W1. (T1®"®Tj(fJ)=T1(a1)Ø"®T1S1(oc')®
® Tjaj. Since 2.8 is exact, = 0. Thus, (T1 ® ® Tj(fl) = 0.
Since ker(T1 ® ® Tj is a subspace of V1 ® ® we conclude that W =
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 73

The opposite inclusion, ker(T1 ® ® Tj c W, is a bit more difficult to


establish. We begin by defining a multilinear mapping i/i: V'1 x

x x there exists a vector (cx1,...,


then
x x such that for all i = 1,... ,n. This follows
= cx

from the fact that each T1 is surjective. We then define l/i(c4,. .., t4) to be the
following coset:

2.10:

Now it is not obvious that i/i is well defined. We must check that if(f31,..., fJJ is
a second vector in V1 x xwith the property that T1(fi1) = for all
= 1,...,n, then ($1 +W + W.
Since T1(fi1) = = for i = 1,..., n and each sequence in 2.8 is exact,
there exists a e such that S1QiJ = — fit. In particular, we have the
following relations in V1 ® 0

Adding the relations in 2.11 gives 0" ® a,, — $1 ® fi,,e


Thus,
a well-defined function Vi: x x
(V1 ® ® VJ/W. The fact that is multilinear is obvious.
Let cb: V'1 x x V',, -÷
® be the canonical multilinear map.
V'1
Using the universal mapping property of ® V',,, 4i), we con-
clude that there exists a unique linear transformation T: ® —,
(V1 ® ® VJ/W such that Tq5 = i/i. Thus, for all (a'1,. . ., t4) e
x x V',,, and any (a1,..., ajeV1 x x V,, such that =
i=1,...,n, we have (a1®
T(a'1 ® ... ®
Now consider the composite linear transformation given by

T
74 MULTILINEAR ALGEBRA

Set S =T(T1 Then for all (a1,...,cxjeV1 x x we have


Thus, S is
nothing but the natural map from V1 ® ... ® to (V1 ® ® VJ/W. In
particular, it follows from 5.13 of Chapter 1 that ker S = W. Since
ker(T1 ® ® Tj c ker S, we conclude that ker(T1 ® Tj c W. This
completes the proof of the lemma. El

We have now proved the following theorem:


Theorem 2.12: Let

T,
\T"

be an exact complex of vector spaces over F for each i = 1,.. ., n. Let


= ® ® S1 ® and set W = W1 + + Then

T1 ® is surjective and
El

There is a special case of Theorem 2.12 that we present as a separate theorem.

Theorem 2.13: Suppose

S I

is a short exact sequence of vector spaces over F. Then for any vector space W,

2.14:

O—V"®W T®Iw>vf®Wo

is a short exact sequence.

Proof T ® Lw is surjective by Theorem 2.6(a). S ® Lw is injective by Theorem


2.6(b). If we apply Theorem 2.12 to the two exact complexes:

S T
V"

o >0
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 75

we see that ker(T ® = Im(S ® u,). Thus 2.14 is exact and the proof of the
theorem is complete. Q
The next natural question to ask about tensor products is how they behave
with respect to direct sums. We answer this question in our next theorem, but
leave most of the technical details as exercises at the end of this section.
Theorem 2.15: Suppose {V1 lie A} is a collection of vector spaces over F. Then
for any vector space V we have

%{V1®V}
leA leA

Proof: Let —' elSA V1 and e leA v1 -+ be the canonical injections


and surjections introduced in Definition 4.2 of Chapter 1. Then we have the
following facts:

(a) = 'v1 for all jeA.


(b)
(c) For any e %ieA V1, = 0 except possibly for finitely many j eA.
(d) 01ir1 = I, the identity map on EEL V1.

Perhaps we should make a few comments about (d). If e EL? leA V1, then is a A-
tuple with at most finitely many nonzero components. Thus, XIeA is a
finite sum whose value is clearly This is what the statement 01Th1 = I
means in (d).
Now for each j e A, we can consider the linear transformation
® V -+ { V1} ® V. An easy computation shows that
{ e leA V1} ® V is the internal direct sum of the subspaces {Im(01 ® lie A}.
Thus, { VJ ® V = Im(01 ® Iv). Since each 01 is injective, Theorem 2.6
implies °i ® is injective. Hence V1 ® V Im(01 ® L4. It now follows that
EL?leAlm(Ol®Iv) = ®V. S
We next study a construction using tensor products that is very useful in
linear algebra. Suppose V is a vector space over F, and let K be a second field
containing F. For example, F = R and K = C. We have seen in Chapter 1 that K
is a vector space (even an algebra) over F. Thus, we can form the tensor product
V ®F K of the vector spaces V and K over F.
V ®F K is a vector space over F. We want to point out that there is a natural
K-vector space structure on V ®F K as well. Vector addition in V ®F K as a K-
vector space is the same as before. Namely, if = (oct x1), and
1

'1 = ®FYj) are two vectors in V ®FK (thus a1, and x1,
then
76 MULTILINEAR ALGEBRA

We need to define scalar multiplication of vectors in V ®F K with scalars in K.


Let x e K, and consider the linear map p,,e Homf(K, K) defined by ,u,jy) = xy.
Clearly, is an F-linear transformation on K. In particular, is a well-
defined F-linear transformation on V ®F K. Now if = (cx1 ®F x1) is a

typical vector in V ®F K, we define scalar multiplication by the following


formula:
2.16: =

®F xxi). Our previous discussion in this section implies


that equation 2.16 gives us a well-defined function from K x
(V ®F K) —÷ V ®F K. The fact that this scalar multiplication satisfies
axioms V5—V8 in Definition 1.4 of Chapter 1 is straightforward. Thus, via the
operations defined above, the F-vector space V ®F K becomes a vector space
over K.
Throughout the rest of this book, whenever we view V ®F K as a vector
space over K, then addition and scalar multiplication will be as defined above.
The process whereby we pass from a vector space V over F to the vector space
V ®F K over K is called extending the scalars to K.
Since F K, Theorem 2.6 implies that the natural map
V ®F F V ®F K is injective. Here i: F -÷ K is the inclusion map. Now
V V ®F F by Theorem 2.4. Putting these two maps together gives us a
natural, injective maple HomF(V, V ®F K) given = ®F 1. By 2.16, imi
generates V ®F K as a K-vector space. We shall often identify V with
Imi = V ®F 1 in V ®FK. We note that V ®F1 is an F-subspace of V ®FK.
This follows immediately from 2.16. For if xe F, then
x(a ®F 1) = x = xa ®F I eV ®F 1. Thus, when we extend the scalars from
F to K, we produce a K-vector space, V ®F K, which contains V, that is, Im i, as
an F-subspace, and such that V ®F K is the K-linear span of V. We can now
construct a K-basis of V ®F K.
Theorem 2.17: Let V be a vector space over F, and suppose K is a field
containing F. if B is a basis of V, then {rx ®F1 a e B} is a basis of the K-vector
space V ®FI(.
Proof Let F = {a ØF 1 I e B}. Since {1} is subset of K that is linearly
independent over F, Theorem 1.20 implies the vectors in IT are linearly
independent over F. in particular, fl = IBI, and no element of IT is zero. We
must argue IT is linearly independent over K, and LK(f) = V ®F K. Here LK(F')
is all K-linear combinations of the vectors in IT.
Let us first argue that F is linearly independent over K. Let a1,..., e B,
k1,..., and suppose k1(; ®F1)= 0. Let C = be a basis
of K over F. Then each k1 can be written uniquely in the following form:

JEtS
i=1,...,n
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 77

In Equation 2.18, the are scalars in F and each sum on the right-hand side is
finite. Thus, for each i = 1,. .., n, = 0 except possibly for finitely many j eA.
We now have

{txi®F(ExIizJ)}= i=ljeA
1=1 1=1 11 jeA

Since the vectors ®F e B, e C} are linearly independent over F by


Theorem 1.20, we conclude that = 0 for all i and j. In particular,
k1 = = = 0, and F is linearly independent over K.
To complete the proof, we must show LK(F) = V ®F K. Since V ®F K is
spanned as a vector space over F by vectors of the form a ®F k (a e V, k e K) and
F c K, it suffices to show that k e LK(fl. This last inclusion is easy. Write
= with

a = çt xiai) ®Fk = L(x1a1 ®Fk)


= @1 ®FxIk)

= E x1k(oc1 ®F l)ELK(F) E

There are two important corollaries to Theorem 2.17 that are worth noting

Corollary 2.19: Suppose V is a finite-dimensional vector space over F and K is a


field containing F. Then dimF(V) = dimK(V ®F K).

Proof
basis over K by Theorem 2.17 fl
Corollary 2.20: Suppose V is a finite-dimensional vector space over F and K is a
field containing F. Then HomF(V, V) ®F K HomK(V ®F K, V ®F K) as
vector spaces over K.

Proof? If dimFV = n, then dimF(HomF(V, V)) = n2 by Theorem 3.25 of Chapter


I. Thus dimK(HomF(V, V) ®F K) = n2 by Corollary 2.19. On the other hand, the
same corollary implies dimK(V ®F K) = n. Consequently, dimK(HomK(V ®F K,
V ®F K)) = n2 by Theorem 3.25 again. Since the K-vector spaces
HomF(V, V) ®F K and HomK(V ®F K, V ®F K) have the same dimension, they
are isomorphic by Theorem 3.15 of Chapter I. U

A word about Corollary 2.20 is in order here. We proved this result by


counting dimensions. This type of argument gives us a quick proof of the
corollary but tends to obscure the nature of the isornorphism between the two
vector spaces. It is worthwhile to construct an explicit K-linear isomorphism
i/i: HomF(V, V) ®F K —÷ HomK(V ®F K, V ®F K). We proceed as follows:
Consider the map x: HomF(V, V) x K —÷ HomK(V ®F K, V ®F K) defined by
78 MULTILIN EAR ALGEBRA

x(T, k) = k(T ®F 'K) Here T e HomF(V, V), and k e K. From the discussion
preceding Definition 2.5, we know that T ®F IKE HomF(V ®F K, V ®F K). We
claim T ®F 'K is in fact a K-linear map on V ®F K. To see this, we use equation
2.16. We have (T ®F 1K)(k(a ®F k')) = (T ®F ®F kk') = T(oc) ®F kk' =
k[T(cz) ®F k'] = k[(T ®F 'Id@ ®F k')]. Thus, T ®F IKE HomK(V ®F K,
V ®F K). Again by 2.16, k(T ØF 'K) is the K-linear transformation on V ®F K
given by [k(T ®F &)](cx ®F k') = k(T(tx) ®F k') = T(tx) ®F kk'. In particular,
Imx c HomK(V K, V K). x is clearly an F-bilinear mapping, and, thus,
factors through the tensor product HomF(V, V) ®F K. So, we have the following
commutative diagram:

2.21:

V ®FK)

In 2.21, Ø(T, k) = T ®Fk, and cli is the unique, F-linear transformation making
2.21 commute. Thus, i/i(T ØF k) = ViqS(T, k) = x(T, k) = k(T ®F 1,3. Using
equation 2.16, we can verify that is in fact a K-linear transformation. We have
i/4k2(T ®Fkl)) = t/i(T ®Fk11(l) = k2k1(T ®F&) = k2çl4T Thus, i/i is
K-linear.

Finally, we must argue that i/i is an isomorphism. We do this by repeated


applications of Theorem 2.17. Let = be basis of V. Define
e HomF(V, V) by = x1 if p = j and zero otherwise. It follows from
Theorem 3.25 of Chapter I that i,j = 1,. . , n} is a basis of
I
. V).
Theorem 2.17 then implies ®F 1 I i,j = 1,. . , n} is a K-basis of
.

HomF(V, V) K. On the other hand, {; Ii = 1,..., n} is a basis of


V®FK. Thus, = 1,...,n} (where ifp=j and
zero otherwise) is a K-basis of HomK(V K, V ®F K). Now one easily checks
that 1) = Thus, i/i is an isomorphism of K-vector spaces.
Let us rephrase some of our last remarks in terms of matrices. Suppose V is a
finite-dimensional vector space over F. Let T e HomF(V, V). Let a = {oc1,.. . ,
be a basis of V over F. Set A = ['(cx, cx)(T). Thus, A is the matrix representation of
T relative to Now suppose we extend scalars to a field K F by passing to
V Theorem 2.17 implies a® I = {cx1 ®F1,...,cxfl ®F1} is a K-basis of
V K. The F-linear map T: V -+ V has a natural extension
/4T ®F 1) = T ®F 'K to a K-linear map on V ®F K. Here i/i is the isomorphism
in diagram 2.21. We have seen that V is imbedded in the extension V ®F K as
the subspace V ®F 1. If we identify V with V ®F 1, then T ®FIK restricted to V
is just T. Thus, we may think of T 'K as an extension of T. Clearly, ® 1,
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 79

a0 1)(T ®F = A. Thus, the matrix representation of the extension of T


relative to the extended basis is the same as the matrix representation of T on V.
One of the most important examples of extending scalars is the com-
plexification of a real vector space. We finish this section with a brief discussion
of that notion.

Definition 2.22: Let V be a vector space over The tensor product V ®F C is


called the complexification of V.

We shall shorten our notation here and let 1C denote the complexification of
V. Thus, P = e V, e C}. Our previous discussion implies
that yC is a vector space over C with scalar multiplication given by
z'(oc z) z'z. IfB is an p-basis ofV, then B® 1 = {cx 1IxeB} is
a basis of yC over C.
There is an important map on Vc that comes from complex
conjugation on C. Recall that if z = x + iy (x, ye R, i = is a complex
number, then 2 = x — iy is called the conjugate of z. Clearly the map a: C —÷ C
given by a(z) = 2 is an R-linear transformation. Thus, CE HomR(P, Vc).
Recall that a is given by the following equation:

2.23:

®R ®R Zk)) (ttk Zk)


= kVl

Since a is an R-isomorphism of C, Theorem 2.6 implies a is an R-


yC•
isomorphism of yC• Note that a is not a C-linear transformation of

Definition 2.24: Let V be a vector space over FR, and let T e V). The
extension T will be called the complexification of T and written Tc.
Thus, TC is the C-linear transformation on yC given by

2.25:

Tc (S ®R Zk))
=

C -÷ Homc(VC, is yC)
®R
the C-linear isomorphism given in 2.21. If c co, and is a basis of V,
then f(cx, cx)(T) = JT(cx ® 1, ® 1)(Tc). Thus, the matrix representation of the
complexification of T is the same as that of T (provided we make these
statements relative to and 0 1).
80 MULTILIN EAR ALGEBRA

It is often important to decide when an Se Homc(VC, yC) is the com-


plexification of some T e

V be a finite-dimensional vector space over and let


Sc yC) Then S = TC for some T e V) if and only if the
following equation is satisfied:

2.27:

S a C-linear transformation on yC, then clearly S is an R-linear


transformation on Iv a is also an R-linear transformation on yC, Thus,
the statement in equation 2.27 is that these two endomorphisms commute as
maps in yC)
Let us first suppose that S is the complexification of some T e V).
Thus, S = T If zjeV, then

[S(Iv a)] (S Zk)) =s( @k ®R 2k))


=

On the other hand,

[(Iv ®R a)S] Ct zJ) = (Is, ®R a) (5 (T(cxk) Zk))

= ktl (T(cxk)
Thus, S satisfies equation 2.27.
Conversely, suppose S c Homc(VC, yC) and satisfies equation 2.27. The
discussion after Corollary 2.20 implies that S = wi), where
e HomR(V, V) and cC. To be more precise, S = wi), where
Vi is the isomorphism in 2.21, F = and K = C. We shall suppress i/i here and
write S = i wi). Thus, S is given by the following equation:

2.28:

S(>
k=1
ock®RZk)
j1
>
ki
>

Let a zeVC. Then

[S(Iv a)](cx z) = S(a


=
EXERCISES FOR SECTION 2 81

On the other hand,

[(Iv ®R c)S](cx z) = (Iv ®R ®R wiz) =

Since S satisfies 2.27, we have ØR — = 0. In particular,


all cteV.
Now suppose the real and imaginary parts of are and respectively.
Thus, and = + Then — = and
= 0 implies E?=l(TJ(cx)®R = 0 for all cxeV.
Since {cx 1 e V} spans
I
as a vector space over C, we can now conclude
that 0 on
yC But then
1

S=>(TJ®RwJ)=L (TJ®RxJ)+>(TJ®RiyJ)

J=1
= (z
j=1
= (x
j=1

Thus, S is the complexification of 1 e HomR(V, V) and the proof of


Theorem 2.26 is complete. J

We shall have more to say about the complexification of a real operator T in


Chapter III.

EXERCISES FOR SECTION 2

(1) Complete the details of the proof of Theorem 2.1 by showing


for
any and

(2) Give a basis free proof of Theorem 2.1 by showing that the pair
((V1 0 ® Vj ® ®... ® WJ, &) satisfies 1.9. Recall çfr: V1 x
x V,, x W1 x x

÷d1+1
(3) Generalize Theorem 2.13 as follows: Suppose —* V1 +1 V1 ÷
d1
v1_1 is an exact chain complex of vector spaces over F.
If V is any vector space (over F), show that C ®F
is an exact chain
complex.
(4) Show by example that if 0 —÷ V' y1 V'1 -÷0 and 0 -÷
V \T2 3T2 —, 0 are two short exact sequences of vector spaces,
82 MULTILINEAR ALGEBRA

then

51®S2V®V

is not necessarily exact.


(5) Complete the details of the proof of Theorem 2.15. Namely, show
{ $ V is the internal direct sum of the subspaces
{Im(01 0 ieA}.
(6) Generalize Corollary 2.20 as follows: Suppose V and W are finite-
dimensional vector spaces over F. Let K be a field containing F. Show that
HomF(V, W) ®F K HomK(V ®F K, W ®F K) as K-vector spaces.
(7) Is Corollary 2.20 true for infinite-dimensional vector spaces V? If so, give a
proof. If not, give an example.
(8) Verify axioms V5—V8 from Definition 1.4 of Chapter I for the scalar
multiplication being defined in equation 2.16.
(9) Show that as
K-vector spaces under the map that sends (a1 ®F k1)
(afl®Fkj—÷(al ®FaJ®F(klk2kJ.
(10) Show that cu: HomF(Vl ®F V2, V3) -÷ HomF(Vl, HomF(V2, V3)) is an iso-
morphism. Here Vi is defined by [çb(f)(a1)](a2) = f(a1 0 a2).
(11) Show that HomF(Vl, V2) ®F V3 —÷ HomF(Vl, V2 0 V3) is an isomorph-
ism. Here is defined by q(f 0 a3)(a1) = f(a1) 0 a3. We assume
dim V3 c co.
(12) Suppose V and W are vector spaces over F and 0 = 0 in
V ®F W e V, W). Show that there exist finite-dimensional subspaces
V1 V and W1 c W such that
(a) and
(Ii)

(13) Show that V 0 W = 0 if and only if V or W is zero.

Let us return to problems about the Kronecker product A 0 B of two


matrices (see Exercise 5 of Section 1).

(14) Suppose V and W are finite-dimensional vector spaces over a field F. Let
T E HomF(V, V) and SE HomF(W, W). Suppose A and B are matrix repre-
sentations of T and 5, respectively. Show that A 0 B is a matrix
representation of T on V ®FW.
(15) If A e and BE Mm x m(11, show that rk(A ® B) = rk(A)rk(B).
ALTERNATING MAPS AND EXTERIOR POWERS 83

(16) In Exercise 15, show that det(A ® B) =


(17) Let V = F[X]. Show that V ®F V F[X, Y] under the map that sends
f(X) 0 g(X) to f(X)g(Y).
(18) Let D: F[X] —÷ F[X] be the formal derivative. Thus,
D a linear transformation on F[X] such that
D(VJ c for all n e EN. Here Vn is the vector space defined in Exercise I
(Section 2 of Chapter I).
(19) Interpret the map D 0 D on F[X] 0 F[X] using the isomorphism given in
Exercise 17. Restrict D 0 D to Vn 0 Vm and compute a Kronecker
product that represents D 0 D.
(20) Generalize Exercise 17 to F[X1,..., Xn].

3. ALTERNATING MAPS AND EXTERIOR POWERS

In this section, we study a special class of multilinear maps that are called
alternating. Before we can present the main definitions, we need to discuss
permutations. Suppose A = {1,..., n}. A permutation of A is a bijective map of
A onto itself. Suppose a is a permutation of A. If a(1) = j1, a(2) = and
a(n) =j, then A = {j1,.. ,j,j. We can represent the action of a on A by the
following 2 x n array:

[1 2 n
1 J2 in

Example 3.2: Let n = 5, and

[1 2 3 4 5
3 4 1 5

Then a is the bijection of A = {1, 2, 3,4, 5} given by a(1) = 2, a(2) = 3, a(3) = 4,


a(4)=1,anda(5)=5. J
Clearly, the number of distinct permutations of A is n!. We shall let 5,, denote
the set of all permutations on A = {1, ..., n}. Thus, IS,,I = n!.
Since the elements of 5,, are functions on A, we can compose any two elements
a, t 5,,, getting a third permutation at of A. Thus, we have a function
x + 5,, given by (a, t) —* at. The action of at on A is computed from a and
t by using equation 3.1 in the obvious way.
84 MULTILINEAR ALGEBRA

Example 3.3: Let n = 5. Suppose a, teS5 are given by

[1 2 3 4 51 [1 2 3 4 5
a=[2 t=[5
3 4 1
5j 4 2 3 1

Then

[1 2 3 4 51 [1 2 3 4 5
1 3 4 2j'
ta=[4 2 3 5 1

Note that at ta. LI

The map (a, t) —+ at on satisfies the following properties:

3.4: (a) a(ty) = (at)y for all a, t, yeS,,.


(b) There exists an element 1 such that al = 1 a = a for all a 5,,.
(c) For every aeS,,, there exists a t€Sn such that at = ta = 1.

In (b) of 3.4, 1 is just the identity map on A. Any set S together with a binary
operation (a, t) —÷ at from S x S to S that satisfies the three conditions in 3.4 is
called a group. For this reason, the set is often called the permutation group on
n letters. With this notation, some of our previous theorems can be worded
more succinctly. For example, Theorem 2.3 becomes: For all a e Sn'

In this section, we shall need the definition of the sign, sgn(a), of a


permutation a Sn. We first define cycles and transpositions. A permutation
a e 5,, is called a cycle (or more accurately an r-cycle) if a permutes a sequence of
elements i,, r > 1, cyclically in the sense that a(i1) = i2,
a(i2) = i3,..., a(ir_i) = a(ir) = i1, and a(j) =j for alIjeA — {i1,..., ir}.

Example 3.5: If n = 5, then

[1 2 3 4 5
3 4 1 2

is 5-cycle.

[1 2 3 4 5
a2=[2 3 1 4 5

is a 3-cycle.

[1 2 3 4 5
a=[2 3 1 5 4
ALTERNATING MAPS AND EXTERIOR POWERS 85

is not a cycle. However, a is the product of two cycles:

[1 2 51[1 2 3
3 1 4 sj[i 2 3 5 4j U
When dealing with an r-cycle, a, which permutes i1,..., i, and leaves fixed all
other elements of A, we can shorten our representation of a and write
a= ir). Thus, in Example 3.5, a1 = (1, 5, 2, 3, 4), a2 = (1, 2, 3), and
a = (1,2, 3)(4, 5).
We say two cycles (of Sn) are disjoint if they have no common symbol in their
representations. Thus, in Example 3.5, a1 and a2 are not disjoint, but (1,2, 3) and
(4, 5) are disjoint. It is convenient to extend the definition of cycles to the case
r = 1. We adopt the convention that for any i cA, the 1-cycle (i) is the identity
map. Then it should be clear that any ae is a product of disjoint cycles.
Example 3.6: Let n = 9 and

[1 2 3 4 5 6 7 8 9
3 4 1 6 5 8 9 7

Then a = (1, 2, 3, 4)(5, 6)(7, 8, 9). U


Any 2-cycle (a, b)e is called a transposition. The reader can easily
check that any cycle ir) is a product of transpositions, namely,
(i1,. . , ir) = (i1, ir)(ii,
. . (ii, i3)(i1, i2). The factorization of a given cycle as a
.

product of transpositions is not unique. Consider the following example:

Example 3.7: Let n = 4. Then (1, 2, 4, 3) = (1, 3)(1, 4)(1, 2). Also
(1,2,4,3) = (4,3,1,2) = (4,2)(4, 1)(4,3). U

Since every permutation is a product of disjoint cycles and every cycle is a


product of transpositions, we get every permutation is a product of trans-
positions. We know from Example 3.7 that such a factorization is not unique,
but we do have the following fact:

Lemma 3.8: Let a e a can be written as a product of an even number of


transpositions, then any factorization of a into a product of transpositions must
contain an even number of terms. Similarly, if a can be written as a product of an
odd number of transpositions, then any factorization of a into a product of
transpositions must contain an odd number of terms.

Proof? Let X1,. . , Xn denote indeterminates over the field


. and consider the
polynomial P(X1,.. . , Xn) = — Xi). Here the product is taken over all i
and j such that 1 i cj n. If a then we define a new polynomial a(P) by
a(P) = P(XC(l),.. ., = fla<J(XC(I) — XC(J)). Clearly, a(P) = ± P. A single
86 MULTILIN EAR ALGEBRA

transposition applied to P changes the sign of P. Thus, a(P) = P if and only if a


is a product of an even number of transpositions. a(P) = — P if and only if a is a
product of an odd number of transpositions. The proof of the lemma is now
clear. El

Definition 3.9: A permutation a e S,, is even if a can be written as a product of an


even number of transpositions. If a is even, we define the sign of a, sgn(a), by
sgn(a) = 1. The permutation a is odd if a can be written as a product of an odd
number of transpositions. In this case, we set sgn(a) = — 1.

Clearly, a product of two even permutations is again even. A product of two


odd permutations is also even. The product of an even and odd permutation is
odd. Note that our definition implies that 1, the identity map on A, is an even
permutation (a product of zero transpositions).

Example 3.10: If n = 4, then a = (1, 2, 4, 3) = (1, 3)(1, 4)(1, 2) is odd.


= (1, 2)(3, 4) is even. Thus, sgn(a) = — 1 and sgn(r) = 1. El

We can now return to our study of multilinear mappings and introduce the
concept of an alternating map. Suppose V and W are vector spaces over a field
F. Recall that = {(cxi,..., cxJ cx1 e V}. We shall keep n fixed throughout this
discussion.

Definition 3.11: A multilinear mapping —* W is called alternating if


cm) = 0 whenever some cz1 = for i j.

Thus, a multilinear mapping ij from to W is alternating vanishes on all


n-tuples (ocr,. . . , cx,,) that contain a repetition. We shall clarify the situation when
n = 1 by adopting the convention that all linear transformations from V to W
are alternating.

Example us return to the example in 1.4. The map


3.12: Let
4': F given by 4'(cx1, . . . , cm,,) = where cm, = (a,1,. . . , a,,,) is
multilinear. If any two rows of A e Mnx are equal, then det A = 0. Thus, 4' is
an alternating multilinear map. El

Example 3.13: Suppose 4': —* W is an arbitrary multilinear mapping. We can

construct an alternating map Alt(Ø) from 4' with the following definition:

Alt(Ø)(cm1,..., cx,,) = . .,
tTGS,,

A simple counting exercise shows Alt(4') is alternating. El

We shall denote the set of all alternating multilinear mappings from to W


ALTERNATING MAPS AND EXTERIOR POWERS 87

by W). Clearly, W) is a subspace of MuIF(V x x V, W). For


if W), and x e F, then + and are clearly alternating.
Suppose a = (i, j) is a transposition in with i cj. Let q W), and
ci = (ci1,..., xj€ Then
(i) (j)
(cic(1),.,cic(n))=(cil,.,cij,.,cii,,cin)
Thus (cxC(l),..., cia(n)) is just the vector ci with its ith and jth components
(i)
interchanged. Suppose we consider the n-tuple (ci1,..., + cii,.
U)
+ cxi,..., with ; + in both the ith and jth positions. Since is
(i) U)
alternating, we have 0= + cxi,..., + cii,..., cx,,) =
(i) U) (i) U)
cii,..., ai,..., cii,..., ai,..., ci,,). Thus,
(i) U) (i) U)
Since sgn(a)= —1,
we can rewrite this last equation as follows:

3.14:

..., = cx,,)

Thus, when we interchange two terms in the sequence cx1,..., cx,, the sign of
cx,,) changes. Since every permutation is a product of transpositions,
equation 3.14 immediately implies the following theorem:

Theorem 3.15: Let V" -÷ W be an alternating multilinear mapping. Let a€ 5,,.


Then for all (cx1,.. ., cx,,) e V", n(cxC(l),. . ., cx,7(fl)) = cx,,). El

Another useful observation concerning alternating maps is given in the


following theorem:

Theorem 3.16: Let -÷ W be a multilinear mapping. Then ,i is alternating if


and only if . , ci,,) = 0 whenever = for some i.

Proof This is a straightforward computation which we leave to the


exercises. El

If V" —÷ W is an alternating multilinear map and T e HomF(W, W'), then


clearly Ti7 e W'). This suggests the following analog of the universal
mapping problem posed in 1.9.

3.17: Let V be a vector space over F, and fix n N. Is there a vector space Z over
F and an alternating multilinear map V" —÷ Z such that the pair (Z, has the
88 MULTILIN EAR ALGEBRA

following property? If W is any vector space over F, and cli e W). then
there exists a unique T e HomF(Z, W) such that =

The question posed in 3.17 is called the universal mapping problem for
alternating multilinear maps. This problem has an obvious solution, which we
shall construct shortly. First, let us point out that any solution to 3.17 is
essentially unique.

Lemma 3.18: Suppose (Z, and (Z', ii') are two solutions to 3.17. Then there
exist isomorphisms T1: Z such that

(a) T1T2 = = and


(b) the following diagram is commutative:

Pr

Proof? This proof is identical to that of Lemma 1.11. El

The next order of business is to construct a solution to 3.17. This is easy using
what we already know about tensor products. Consider V ®F V.
Let U be the subspace of generated by all vectors of the form ® ®
where the n-tuple (cx1,..., cxc) contains a repetition. Set Z = Let
cc) = cx1
Then define V" -÷ Z by
00 be the canonical multilinear map from to
® ... 0 cxj + U. Thus, is just the
. , .=
composite of 0 with the natural map of onto its quotient space The
definition of U immediately implies that 'i is an alternating multilinear map.

Lemma 3.19: The pair (Z, constructed above is a solution to 3.17.

Proof Suppose cli: —÷ W is any alternating multilinear map. Then because i/i

is multilinear, there exists a unique linear transformation T0: —> W such that

3.20:
ALTERNATING MAPS AND EXTERIOR POWERS 89

is commutative. If (;,..
, e V't contains a repetition, then
. =0
since alternating. Thus, 3.20 implies T0(oc1 ® ... ®
i/i is = 0. Since U is
generated by all vectors ; 0 with (oc1,.. ., cxj containing a repetition,
we conclude that T0(U) = 0. It now follows from the first isomorphism theorem
(Theorem 5.15 of Chapter I) that T0 induces a linear transformation
given by If
then oc,,)= T((oc1 + U) = T0(oc1
0 = l/4X1,..., Thus,

3.21:

iscommutative.
Finally, we must argue that T is unique. Suppose T' e W) makes 3.21
commute. Then T = T' on Im But Z = L(Im 'i). Thus, T = T', and the proof of
Lemma 3.19 is complete. El

Definition 3.22: The vector space is called the nth exterior power of V
(over F) and will henceforth be denoted by

When the base field F is clear from the context, we shall simplify our notation
and write Note that A'(V) = V. We define A°(V) = F.

Definition 3.23: The coset (;® 0 + U in A't(V) will henceforth be


denoted by; A A oc,, and called a wedge product.

Thus, the alternating multilinear map 11: V't -÷ A11(V) is given by


=; A We have already noted that Im 'i spans A"(V).
A oc,,.
Thus, every vector in A't(V) is a finite linear combination of wedge products
;A A We also have the following relations when dealing with wedge
products:

3.24: (a) at1 A A at,1 = A A at5('t) for all


(b) ; A A (x;) A A = x(oc1 A A for all xeF.

Let us restate Lemma 3.19 using our new notation.


90 MULTILINEAR ALGEBRA

Theorem 3.25: Let V and W be vector spaces over F and suppose i/i: —p W is
an alternating multilinear map. Then there exists a unique linear transformation
T e Homf(A"(V), W) such that for all (x1,..., ocj e Vi', ifr(x1,..., =
fl
Having constructed the nth exterior power of V, the next order of
business is to find a basis for this space. Suppose B is a basis of V. As usual, set
= {(x1,..., xjl e B}. Let B(n) denote the subset of W consisting of those n-
tuples which have distinct entries. Thus, B(n) = {(x1,..., eWI when-
ever i j }. It is possible that B(n) = In this case, n > IBI. But then every
wedge product A A in is zero. Thus, = 0, and the empty set
is a basis of So, we can assume with no loss of generality that n IBI.
We define an equivalence relation on the set B(n) by the following formula:

3.26: (at1,..., if and only if there exists a ce such that


= =

Thus, two n-tuples in B(n) are equivalent if some permutation of the entries in
the first n-tuple gives the second n-tuple. The fact that is indeed an
equivalence relation, that is, that satisfies the axioms in 5.1 of Chapter I is
obvious. We shall letB(n) denote the set of equivalence classes of B(n). Recall
that the elements of B(n) are subsets of B(n). B(n) is the disjoint union of the
distinct elements of B(n). If we shall let (act,..., at,) denote
the equivalence class in B(n) which contains (at1,..., Thus,
B(n)
Now for each element x e B(n), we can choose an n-tuple (at1'..., ocj e B(n)
such that oç) = x. For a given x, there may be many such n-tuples, but
they are all representatives of the same equivalence class x. For each x e B(n),
pick a representative of x, say in B(n). Thus, is an n-tuple (act,..., B(n)
such that oct> = x. We have now defined a set mapping (x -÷ of B(n)
to B(n). There are of course many choices for such a map. Choose any such map
and set C(n) = {2,, x eB(n)}. Then C(n) is a collection of n-tuples in B(n), one n-
I

tuple for every equivalence class xc B(n). We can now state the following
theorem:

Theorem 3.27: Let V be a vector space over F. Suppose B is a basis of V.


Construct the set C(n) as above. Then A = A I
(oc1,..., cxj e C(n)} is
a basis of In particular, = IB(n)

Proof The set C(n) consists of one representative in B(n) for each equivalence
class x eB(n). Thus, IC(n)I = IB(n)I. In particular, if A is a basis of then
= Al = IC(n)I = IB(n)I. So, we need only argue A is a basis of
The proof we give here is analogous to that of Theorem 1.20.
ALTERNATING MAPS AND EXTERIOR POWERS 91

Let ST = 2jcC(n) F. As we have seen in Section 1, 9 is a vector space


over F with basis I (;,..., cm) e C(n)}. Here is the function from
C(n) into F given by .., fij) = 1 if (fir,. . , fij = (cxi,. . . , and
.

zero otherwise. We shall show that ST


We define a map po: W —÷ ST as follows: If (461,. . . , 1? contains a
repetition, set p0(fi1,.. . , fij = 0. Suppose (fi1,.. . , fij e B(n). Then
<fii,...,fin> xeB(n). If = ,ocjeC(n), then (cm1,.. .,mn) =
Thus, So, there exists a unique
such that .. , = (xi,.. . , ocj. In this case, define
. . , 46,3 = sgn(cr)ö(21 We can extend the map p0 in the usual way (see
Exercise 2 at the end of Section 1 in this chapter) to a multilinear map p: V" —* ST.
An easy computation using the definition of 4u0 shows p is alternating.
We now claim the pair (9, ji) satisfies the universal mapping property in 3.17.
To see this, suppose W is a vector space over F, and çfr: V" —* W an alternating
multilinear map. Using 3.23 of Chapter I, we can define a linear transforma-
tion T: ST -÷ W by = xe,) for all (cm1,..., x,je C(n).
If (flu, . ,
.
46,,).
e B" and contains a repetition, then Tp(fi1,.. . , fi,,) = T(0) = 0.
Since is alternating, t/4fi1,.. ., fi,,) = 0. Suppose
171, fin) e B(n).
Then (fia(1)'...'fiff(n)) =(cmi,...,ocn)eC(n) for some aeSn. Thus,
Tp(fi1,. . , fi,,) = T(sgn(cr)4,
. = n)) = sgn(a)çl4x1, ., x,3. On
. .

the other hand, since çfr is alternating, Theorem 3.15 implies


. , 46,,) =
. = sgn(c)çl4oc1,..., cx,,). Thus, Tp and i//
are two alternating multilinear maps on V" that agree on B". It follows that
T4u = VI.
Finally, we must argue T is unique. If T': ST -÷ W is a second linear
transformation such that T'p = çfr, then T'(b(,,, = T'p(cm1,..., cm,,) =
cxn) = T14cm1,..., = T(c5(,,, ,,j) for all (ocr,..., cm,,) e C(n). Thus, T
and T' agree on a basis of ST and, consequently, must be equal.
We can now apply Lemma 3.18 to the pairs and (9, ji). In
particular, there exists an isomorphism 5: ST such that Sji = If
(cm1,...,cmjeC(n),
S
then ;
is an isomorphism taking the basis {ô(,,j n7)I@Zi,.
e C(n)} in ST to the set A in A is a basis of El

Corollary 3.28: Suppose V is a finite-dimensional vector space over F with


dimF(V) = N. Then =

Before giving a proof of 3.28, let us discuss its meaning. If 0 zC n rC N, then


is the binomial coefficient N!/n!(N — n)!. If n > N, then = 0.

Proof of Corollary 3.28: If n = 0, the result is trivial. Suppose 1 n rC N. Let


B = {cx1,..., be a basis of V. Theorem 3.27 implies that
A= A A 11 N} is a basis of The cardinality
92 MULTILINEAR ALGEBRA

ofA is clearly the number of ways of picking an n-element subset from the set
{1,2 N}. Therefore, 41 =
Suppose n > N. Then we had noted previously that B(n) = 4), and = 0.
Thus, = = 0. fl
There is another corollary that can be derived from Theorem 3.27. Suppose
V is an n-dimensional vector space over F. Let = be a basis
of V. Then we have a nonzero, alternating map 4): —÷ F given by
= = (a11,..., a1j. Our next corollary says that
where (CXJa
4) is essentially the only alternating map from

V F. Then
F)) = 1.
Proof The map 4) constructed above is alternating. Hence 4)5 F).
Suppose F). If 'i denotes the canonical map given in Lemma
3.19,then there exists a unique linear transformation Te F) such
that Tq = Similarly, there exists a T1 e such that T1q = 4).
Now Corollary 3.28 implies dimf(MV) = 1. Consequently,
F)} = 1. Since 4) 0, we conclude T1 $ 0. In particular,
{T1} is a basis for F). Therefore T = xT1 for some xc F. We then
have i/i = Tii = xT1ii = xØ. Thus, {qS} is a basis of AItF(V", F) and the proof of
Corollary 3.29 is complete. El

At this point, we could begin to discuss the functorial properties of exterior


powers. Almost all the results in Section 2 have analogs in our present situation.
Since this is not a text in multilinear algebra per se, we shall leave most of these
types of results to the exercises at the end of this section. The reader who wishes
to read further in this subject matter should consult [5] or [4].
We shall finish this section with a description of the induced map on exterior
powers derived from a given Te W). Suppose V and W are vector
spaces over F and let T be a linear transformation from V to W. Let
—+ be the natural alternating map given by =
A A The linear transformation T induces an alternating, multi-
linear mapping 1//T: V" -÷ given by = T(oc1) A A T(ocj.
The fact that 1//T is indeed alternating is clear. It now follows from Theorem 3.25
that there exists a unique linear transformation Sc such
that Sq =

Definition 330: The unique linear transformation S for which Sn = 'PT will
henceforth be denoted M(T).

Thus, A"(T)e
A A A A T(czj.
Clearly, M(T1T2)= M(T1)M(T2) for T2 eHomf(V, W) and T1 eHomF(W, Z).
We also have the important analogs of Theorem 2.6.
EXERCISES FOR SECTION 3 93

Theorem 3.31: Let V and W be vector spaces over F, and let T e W).
Then the following assertions are true:
(a) If T is injective, so is
(b) If T is surjective, so is A"(T).
(c) If T is an isomorphism, so is M(T).
Proof Consider bases of V and W and apply Theorem 3.27. S
A"(T) is usually called the nth exterior power of T.

EXERCISES FOR SECTION 3

(1) Show that every permutation o-e Sn is a product of disjoint cycles.

(2) Elaborate on the details of Example 3.13. Specifically, show Alt(S):


Mulf(V", W) —+ W) is a well-defined linear transformation.
(3) Prove Theorem 3.16.
(4) The exterior algebra A(V) of V is defined to be the following direct sum:
A(V) = $ :t. 0M(V).
(a) Show that A(V) is an algebra over F when we define the product of two
elements cc = A A and fi = A A fJmeAm(V) by
cc/I = cc1A ccp A fJ1A Pm.
(b) Show that A(V) is an anticommutative algebra. This means for all
cc e A"(V) and /ie Am(V), cc/I = (—
(c) Show that there exists an injective linear transformation T: V —÷ A(V)
such that (T(cc))2 = 0 for all cc e V.
(5) Show that the exterior algebra A(V) of V has the following universal
mapping property: Given any algebra A over F and a linear trans-
formation T e A) such that (T(cc))2 = 0 for all cc e V, then there
exists a unique F-algebra homomorphism p: A(V) -÷ A such that
ço(cc) = T(cc) for all cc E V.

(6) If dimF(V) n, what is dimF(A(V))?


(7) Suppose V is a finite-dimensional vector space over F. Show that
Is this true if V is infinite dimensional?
(8) Give an example of a short exact sequence 0 —÷ V —+ W -÷ Z —+ 0 of vector
spaces over F such that the corresponding complex
0 —* MV -÷ MW —. MZ —.0 is not exact.

(9) Suppose V is a vector space over F, and let K be a field containing F. Show
®F K ®F K) as K-vector spaces.
94 MULTILINEAR ALGEBRA

(10) Let V and W be vector spaces over F. Show that


e W)

4. SYMMETRIC MAPS AND SYMMETRIC POWERS

In this last section on multilinear algebra, we study another special class of


multilinear maps. Let V and W be vector spaces over F and let n e rkl.

Definition 4.1: A multilinear mapping 0: —, W is said to be symmetric if


OCn) = rxC(fl)) for all and all

We shall denote the set of all symmetric multilinear mappings from to W


by W). Clearly, W) is a subspace of W). Note that
W) W) whenever F IF2. Let us consider some examples:

Example 4.2: If n = 2, then W) is just the set of all symmetric bilinear


maps from V x V to W. In particular, any inner product is a symmetric
multilinear map. fl

Example 4.3: If 0: —÷ W is any multilinear map, we can construct a


symmetric map S(q5) e Symf(V", W) from 4 with the following definition:

.., E
= cTE5 .., S

We note in passing that a multilinear map 4 e W) is in fact


symmetric if ocr,) remains unaltered whenever two adjacent terms
in oct) are interchanged. If 4' e W), and T e HomF{W, Z), then
clearly, T4i e Z). As with alternating maps, this suggest the following
universal mapping problem for symmetric multilinear maps:

4.4: Let V be a vector space over F and n e Ri. Is there a vector space Z over F
and a symmetric multilinear map 4': V" -÷ Z such that the pair (Z, 4') has the
following property: If W is any vector space over F, and V, e W), then
there exists a unique T e W) such that T4' = i/i?

As with alternating maps, it is an easy matter to argue that any solution to 4.4
is essentially unique.

Lemma 4.5: Suppose (Z, 4') and (Z', 4") are two solutions to 4.4. Then there exist
isomorphisms T1: Z such that
SYMMETRIC MAPS AND SYMMETRIC POWERS 95

(a) T1T2 = and T2T1 = and


(b) the following diagram is Commutative:

Proof This proof is identical to that of Lemma 1.11. E


As the reader can see, much of what we do here is completely analogous to
the case of alternating maps. For this reason, our treatment of the results for
symmetric maps will be much abbreviated.
We now construct a solution to 4.4. Set Z = V®f/N where N is the subspace
of generated by all vectors of the form ®® — 0
Let 4:V"-÷Z be given by The
definition of N implies 0 is a symmetric multilinear map.
Lemma 4.6: (Z, 0) solves 4.4.
Proof Suppose i/i e W). By Theorem 1.19, there exists a
T0 e W) such that T0(cz1 ® ... 0 ocj = t/4oc1,..., cz,j. Since i/i is
symmetric, T0 vanishes on N. Consequently, T0 induces a linear transformation
T:Z—÷W given by T((cz1 + N) = T0(cz1 Clearly Tq5 = 1//.
The fact that T is unique is the same argument as for multilinear or alternating
maps. S
Definition 4.7: The vector space is called the nth symmetric power of V
and will henceforth be denoted by The cosets 0 oct, + N will
henceforth be written ...

With this notation, we have the following theorem:

Theorem 4.8: Let V be a vector space over F, and suppose i/i e W).
Then there exists a unique linear transformation T e W) such that
ct*x1,..., = for all .., 5
From our construction of we see that the set
{[;]... I ..., e spans as a vector space over F. The
following relations are all obvious.

4.9: (a)
(b) ... = [a.]
(c) = for all creSs.
96 MULTILINEAR ALGEBRA

To construct a basis of we proceed in the same spirit as Theorem 3.27.


Let B be any basis of V. Define an equivalence relation on by
(x1,..., oc,,) . ., if and only if (xC(l),..., = for some
a e S,,. From each equivalence class of pick one representative and call this
set of representatives B*. An argument completely analogous to that of Theorem
3.27 gives us [ocj e B*} is a basis of
Theorem 4.10: Suppose B is a basis of V. Form B* as described above. Then
A fl
Now suppose dimF(V) = N < co. Let B = {cz1,..., XN} be a basis of V. The
number of elements in B* is the number of distinct monomials of degree n in the
symbols [ocr],..., [OCN]. Thus, IB*I is equal to the number of distinct products of
the form ... kNPN with e1 + + eN = n. An easy counting argument
gives us precisely (N-i +n) of these monomials. Thus, we have proven the
following corollary:
Corollary 4.11: If dimf(V) N, then = (N1 +11) fl
Finally, a linear transformation T: V —÷ W induces a map on symmetric
powers as follows: Let 4.: -÷ be the natural symmetric map given by
Ø(oc1,..., = [cz1] [oct]. T induces a symmetric multilinear mapping 'PT
from V" to given by = [T(oc1)] ... [T(àn)]. By Theorem 4.8,
there exists a unique linear transformation e such
that = Thus, for all (a,..., czj e V", =
[T(oc1)] . [T(OCj].
If T1 e HomF(V, W) and T2 e HomF(W, Z), then clearly, =
We also have the analog of Theorem 3.31.
Theorem 4.12: Let V and W be vector spaces over F and suppose
TEH0mF(V, W).

(a) If T is injective, then is injective.


(b) If T is surjective, then is surjective.
(c) If T is an isomorphism, then so is LI

The linear transformation is called the nth symmetric power of T.

EXERCISES FOR SECTION 4

(1) Complete the details of Example 4.3, that is, argue S(q5) is indeed a
symmetric multilinear mapping.
(2) Suppose V is a vector space over F. Show that the following complex is a
short exact sequence:

T T
0 >0.
EXERCISES FOR SECTION 4 97

Here T is given by and T' is given by


T'(fl 0 ö) = [fl][c5].
(3) Is Exercise 2 still true if 2 is replaced by n?
(4) Let A = F[X1,..., denote the set of all polynomials in the variables
X1,..., with coefficients in the field F. Thus, F[X1,..., is the set
consisting of all finite sums of the following form •

with c(11 and (i1,..., ijeQkJ u {O})".


(a) Show that A is an infinite-dimensional vector space over F with basis
A= . X'; (i1,..., I
u when we define addition and
scalar multiplication as follows:
. + d(11...I )X[' .

= E (c(11 .
.
+ d(11 . .

and
— V
(1F"1n) 1 n)L.., 1
. . .
n

(b) Suppose we define the product of two monomials and


in A by the formula
Show that we can extend this definition of multiplication in a natural
way to a product on A such that A becomes a commutative algebra
over F, that is, fg = gf for all f, g e A.
(c) Let = e1 + + = p}). Show that
I
=
(Ii — 1 + P)

(d) Show that A is a graded F-algebra, that is, A = and


ApAq Ap+q for all p,q ) 0.
(5) Let V be a vector space over F. The symmetric algebra 5(V) is defined to be
the following direct sum: 5(V) = Here as usual,
S(V) F when we define
products by the formula = ••. [;]
(b) Show that there exists a natural, injective linear transformation
TEH0mF(V,S(V)) such that T(cz)T(fl) = T(fl)T(oc) for all x,fieV.
(6) Show that the pair (5(V), T) constructed in Exercise 5 has the following
universal mapping property: If A is any F-algebra, and i/i e HomF(V, A)
such that ç(4oc)çfr(f3) = çl4fJ)çfr(cz) for all /3eV, then there exists a unique
algebra homomorphism 5(V) —+ A such that pT = k.
(7) If dimF(V) = n, show F[X1,..., as F-algebras.
(8) Let V and W be vector spaces over F. Show that ® W)
{Sk(V)
(9) If V is a vector space over F and K a field containing F, show
®F K ®F K) as K-vector spaces.
Chapter III

Canonical Forms of Matrices

1. PRELIMINARIES ON FIELDS

In this chapter, we return to the fundamental problem posed in Section 3 of


Chapter I. Suppose V is a finite-dimensional vector space over a field F, and let
T e 1(V) = Homf(V, V). What is the simplest matrix representation of T? If
= {;,.. ., xj is any basis of V and A = then we are asking, What is
the simplest matrix Be that is similar to A? Of course, that problem is a
bit ambiguous since no attempt has been made to define what the word
"simplest" means here. Intuitively, one feels that a given matrix representation A
of T is simple if A contains a large number of zeros as entries. Most of the
canonical form theorems that appear in this chapter present various matrix
representations of T that contain large numbers of zeros strategically placed in
the matrix. As one might expect, we get different canonical forms depending on
what we are willing to assume about T.
Let us first set up the notation that we shall use for the rest of this chapter. V
will denote a finite-dimensional vector space of dimension n over a field F. 1(V)
will denote the set of endomorphisms of V. Thus, 1(V) = V). We have
noted in previous chapters that 1(V) is an algebra over F. If is any basis of V,
then F(oc, 1(V) —, is an F-algebra isomorphism. Let T e 1(V). Our
goal in this chapter is to study various matrix representations of T.
Recall from Chapter II that T determines an algebra homomorphism
F[X] -÷ 1(V) given by q4f(X)) = f(T). Here if f(X) = + + c1X
+ c0eF[X], then f(T) = + + c1T + colv. In particular, p(f(X) +
g(X)) = f(T) + g(T), p(cf(X)) = cf(T), and p(f(X)g(X)) = f(T)g(T). Note also
that p(1) = ly, that is, p takes the multiplicative identity element 1 in the
98
PRELIMINARIES ON FIELDS 99

algebra F[X] to the multiplicative identity 'v in t(V). Another important point
to note here is that any identity f(X) = g(X) in F[X] is mapped by p into the
corresponding identity f(T) = g(T) in 1(V).
We shall constantly use this map p to study the behavior of T on V. In order
to facilitate such a study, we need to know some basic algebraic facts about the
polynomial algebra F[X]. We present these facts in the rest of this section.
The algebra F[X] is often called the ring of polynomials in the indeterminate
X over F. We have seen that F[X] is an infinite-dimensional vector space over F
with basis the monomials { 1 = X°, X, X2,... }. In particular, two polynom-
ials f(X) = + + a1X + a0 and g(X) = bmXm + + + b0 in F[X]
with 0 5e bm are equal if and only if n = m and =
a0 = b0. If f(X) is a nonzero polynomial in F[X], then f(X) can
be written uniquely in the following form: f(X) = + + a1X + a0 with
n 0, a0 e F, and 0. The integer n here is called the degree off
We shall use the notation 5(f) to indicate the degree of f. Thus, 5(S) is a function
from F[X] — {0} to N u {0}. Notice that we do not give a degree to the zero
polynomial 0. The degree function 5() has all the same familiar properties that
the reader is acquainted with from studying polynomials with coefficients in R.
Thus, we have the following facts:

1.1: (a) 5(f) 0 for all feF[XJ — {0}.


(b) 5(f)= 0 if and only if f= a0eF — {0}.
(c) 5(fg) = 5(f) + 5(g) for nonzero f, g e F[X].
(d) 5(f + g) max{5(f), 5(g)} for nonzero f, g and f + g.

We also have the division algorithm:

1.2: Let f(X), g(X) e F[X] with g 5é 0. Then there exist unique polynomials h(X)
and r(X) in F[X] such that f(X) = h(X)g(X) + r(X), where r(X) = 0 or 5(r) c 5(g).

The proof of 1.2 is nothing more than the long division process you learned in
grade school. We leave it as an exercise at the end of this section.
Let f, g e F[X]. We say f divides g if there exists an he F[X] such that lb = g.
1ff divides g, we shall write fi g. We say f and g are associates if fi g and g It
follows easily from 1.1(c) that f and g are associates if and only if f = cg for some
nonzero constant c e F. For example, 2X + 2 and X + 1 are associates in Q[X],
whereas X + I and X are not associates.
The notion of a greatest common divisor of a set of polynomials f1,. . , is .

the same as in ordinary arithmetic. We say d(X) e F[X] is a greatest common


divisor of f1,.. ., if d satisfies the following two properties:

1.3: (a) dlf1fori=1,...,n.


(b)
100 CANONICAL FORMS OF MATRICES

If d is a greatest common divisor of f1,.. . then clearly cd is also a greatest


common divisor of f1,. .., for any nonzero constant c e F. On the other hand,
if e is a second greatest common divisor off1,. . , then 1.3(b) implies d e and
.

e d. Hence, e = cd for some c e F — {O}. Thus, a greatest common divisor of


f1,..., f,, is unique up to associates in F[X]. One of the most important
properties of the algebra F[X] is the fact that any finite set of polynomials has a
greatest common divisor.

Lemma 14: Let f1,..., e F[X]. Then f1,..., have a greatest common
divisor d. Furthermore, d = a1f1 + ... + for some a1,...,

a proof of 1.4 and leave the details to the reader. We first


note that a greatest common divisor of f1,..., f,, is nothing but a greatest
common divisor of and 1 where 1 is a greatest common divisor of f1,.. .
Hence by induction, it suffices to prove the lemma for two polynomials f and g.
We can also assume g 0. We now apply the division algorithm over and over
until the remainder becomes zero. More specifically, we have

f=a1g+f2 with ê(f2)<ô(g)


g= a2f2 + f3 with ê(f3) <ô(f2)

= ar..ifr_i + with ô(fr) < t3ffr—i)

and

= arf.

An easy argument shows f and g. Back


substituting in 1.5 gives = Af + Bg for some A, Be F[X]. El

fj
We shall let g.c.d.(f1,. .., denote a greatest common divisor of f1,. . . , ç.
Although a g.c.d.(f1,. . ,. fj
is unique only up to some nonzero constant in F, this
will cause no confusion in the sequel. Note that g.c.d.(f1,..., fj = 1 whenever
some f1 is a nonzero constant in F.
In the sequel, we shall also need the dual concept of a least common multiple
l.c.m.(f1,..., of f1,.. ., We shall discuss this notion in the exercises at the
end of this section.
A polynomial f(X) e F[X] is said to be constant if for 5(f) is zero. Thus, f is a
constant if and only if fe F. We say a polynomial f(X) in F[X] is irreducible (over
F) if f is not constant, and whenever f = gh with g, he F[X], then one of g or h is
a constant. This notion of irreducibility definitely depends on the field F.
PRELIMINARIES ON FIELDS 101

Example 1.6: f(X) = X2 + I is irreducible over R, but factors nontrivially,


f(X) = (X — i)(X + i), over C. fl
Note that linear polynomials, that is, aX + b with a 0, are always
irreducible in F[XIJ.
Now suppose f(X) is not constant. 1ff is not irreducible, then 5(f) > 0 and
f = gh with 0 c 5(g), 5(h) and 5(g), 5(h) c 5(f). If g and h are not irreducible, they
in turn factor. Since the degree of the resulting factors keeps dropping, we
eventually arrive at a factorization of f in the form f = g1,..., with each g1
irreducible. Thus, we have proved the first part of the following theorem:

Theorem 1.7: Every nonconstant polynomial f(X) e F[X] can be factored in an


essentially unique way as a product of irreducible polynomials.

A factorization f = g1 with each g1 irreducible is said to be essentially


unique if given any other factorization f = with each g irreducible, then
n = m, and there exists a permutation c e such that g1 and are associates
for all i = 1,..., n. Clearly, a given factorization can be only essentially unique.
For example, (X + I)(X + 2) = (2X + 2)(IX + 1) are two factorizations of
X2 + 3X + 2 in Q[X].

Proof of 1.7: We have already shown that f = g1 for some irreducible


polynomials g1,. .., e F[X]. We must argue that this factorization is es-
sentially unique. We give a brief sketch of how to do this. The essential point
here is to make the following observation about irreducible polynomials:

1.8: If g(X) is irreducible, and then or

1.8 follows easily from Lemma 1.4 and the observation that if g is irreducible,
then g.c.d.(g, p) = 1 or g for any p e F[X]. Once we have 1.8, the essential
uniqueness of f = g1 is clear. LI

Theorem 1.7 implies that up to associates any nonconstant polynomial f(X)


can be written (essentially) uniquely in the following way:

1.9: f(X) = . ..

In Equation 1.9, g1,..., are irreducible, e1,. .., em are positive integers, and
g1 and are not associates whenever i j. Sometimes it is convenient to allow
an exponent e1 in 1.9 to be zero. For example, if f1,. . . , f,, are nonconstant
polynomials, then there exists a set of irreducible polynomials {g1,. . , and
nonnegative integers such that g1 and are not associates for i 5e j and

1.10:
102 CANONICAL FORMS OF MATRICES

1ff1,..., are factored as in equation 1.10, then a g.c.d. (f1,..., fj =


where = for all j 1,..., m. We say that a set of poly-
nomials f1,. . ., are relatively prime if a g.c.d. (f1 ,.. .,fj
= 1. Clearly, if some
is a nonzero constant, then f1,..., are relatively prime. If each is a
nonconstant polynomial, then 1.10 implies f1,.. . , are relatively prime if and
only if the have no common irreducible factor. Note that Lemma 1.4 implies
the following result:

1.11: f1,. .., are relatively prime if and only if a1f1 + + = 1 for some
a1eF[X].

We shall use this remark frequently throughout the rest of this chapter.
Before closing this section, we need to say a few words about algebraically
closed fields.

Definition 1.12: A field F is algebraically closed if the only irreducible poly-


nomials in F[X] are linear polynomials.

Example 1.13: Since X2 + 1 is irreducible in Q[X] and R[X], we see neither €1


nor R is algebraically closed. On the other hand, the field of complex numbers C
is algebraically closed. This fact is often called the fundamental theorem of
algebra (first proven by Gauss). It is not trivial. fl

If F is algebraically closed and f(X) is a nonconstant polynomial in F[XIJ,


then Theorem 1.7 implies f(X) can be written uniquely in the following form:

f(X) = c i:-i (X —

In 1.14, ce F — {0}, c1,. . ., are distinct constants in F, and


n1 + + nr = 3(f). The constants c1, . , Cr are called the roots of f and will
. .

henceforth be denoted R(f).


We shall need the following theorem from abstract algebra:

Theorem 1.15: Let F be any field. Then there exists a field F containing F with
the following properties:

(a) F is algebraically closed.


(b) If K is any algebraically closed field containing F, then there exists an F-
algebra homomorphism i/i: F -÷ K such that k(1) = 1.

An easy exercise shows t/i must be injective in (b). It follows that any two fields
containing F and satisfying (a) and (b) are isomorphic as F-algebras. For this
reason, a field F satisfying (a) and (b) is called the algebraic closure of F (any
EXERCISES FOR SECTION 1 103

other algebraic closure of F being isomorphic to F). The proof of Theorem 1.15
is beyond the level of this book. The interested reader can consult [6; Thm. 32, p.
106].
If F is any field, we shall let F denote the algebraic closure of F. Our interest
in F comes from equation 1.14. If f(X) is a nonconstant polynomial in F[X], then
f may not factor into linear polynomials. Since F F, F[X] F[X]. Since
F is algebraically closed, f has a unique factorization as in 1.14. If
f(X) = c fl[. 1(X — cjn e F[X], we shall again call R(f) = {c1,.. . , cr} the roots
of f. Thus, the roots of a polynomial f(X) in F[X] are those elements c e F, the
algebraic closure of F, such that f(c) = 0.
Example 1.16: Suppose f(X) = (X2 + 1)(X2 + 2)e R[X]. Since X2 + 1 is irre-
ducible in R[X], f has no factorization as in 1.14 over R. It is not hard
to see that C is the algebraic closure of R. In C[X], we have f(X) =
(X + i)(X — i)(X — + f are then given by R(f) =
{i, El

The fact that any field F has an algebraic closure F is often used when
studying linear transformations.
Suppose T e HomF(V, V). Then we have the F-algebra homomorphism
(p: F[XIJ -.t(V) given by 4(f) = f(T). Suppose we wish to study the action of f(T)
on Y for some interesting f(X) e F[X]. One way to do this is to extend the scalars
to F and study the natural extension of f(T) to Often information obtained
about f(T) on gives us useful information about f(T) on V. Recall that
extending scalars to F means we form the F-vector space = V ®F F. By the
natural extension of T to VF, we mean the linear map T ®F IF Homp(VF, yF)
The vector space V is imbedded in yF as the F-subspace V ®F 1. If we identify a
vector aeV with its image a ®F 1 in V ®F 1, then we have
(T ®F 1rXa) = (T ®r Ir)(a ®F 1) = T(a) ®F I = T(a). Thus, T ®F IF is just T
on V. In the sequel, we shall identify a ®F 1 with a and set T ®F Ir TF. When
we do this, a typical vector in is a finite sum of the form with
Zn eF, and a1,.. ., €V. The action of TF on such a vector is given by
Z1a1) = In particular, the reader can easily check that
(f(T)) = f(TF) for any f(X) F[X]. Now since F is algebraically closed, f can be
factored in F[X] as in equation 1.14. Thus, f(Tf has a particularly simple form:
f(T)F = c (TF — cJ". We shall see how these ideas can be usefully em-
ployed in Section 5 when discussing the real Jordan canonical form of T.

EXERCISES FOR SECTION 1

(1) Prove (c) and (d) in equation 1.1.


(2) Give a proof of 1.2. [Hint: Proceed by induction on
(3) Let f1, ..., e F[X]. Show that g.c.d. (f1,. . ., fn) = g.c.d.(g.c.d.(f1,.. 1)'
fj. We used this idea in Lemma 1.4.
104 CANONICAL FORMS OF MATRICES

(4) Complete the proof of Lemma 1.4 by showing that = g.c.d.(f, g).
(5) Give a proof of 1.8 and then complete the details in Theorem 1.7.
(6) Determine what all irreducible polynomials of degree less than or equal to
three look like in F2[X].
(7) Show that f(X)g(X) = 0 in F[X] if and only if f = 0 or g = 0.

(8) Let K and F be fields and suppose K F. Show that any F-algebra
homomorphism i/i: F -÷ K with &(l) = 1 must be injective. If i/i(l) 5e 1 is this
statement true?
(9) Find d = g.c.d.(X3 — 6X2 + X + 4, X5 — 6X + 1) in Q[X]. Exhibit
two polynomials A and B such that d = A(X3 — 6X2 + X + 4)
+ B(X5 — 6X + 1).

(10) Suppose F c K are fields. Let g1,..., F[X]. Show that if g1,..., are
relatively prime in F[X], then g1,.. ., are relatively prime in K[X]. Is
the converse true here?
(11) In Section 2, we shall need a least common multiple, l.c.m.(f1,..., f,j, of a
set of polynomials f1, . . , f,, e F[X]. We define a l.c.m. (f1,. . ,
. to be a
. fj
polynomial (unique up to associates) e(X) such that (a) f1 e for all
= 1,...,n, and(b)iffjgfori = 1,..., n, then Prove that any set of
polynomials f1,..., e F[X] has a least common multiple.
(12) In Exercise 11, suppose each f1 is factored as in equation 1.10. For each

common multiple of f1,...,


(13) Prove that the product of a least common multiple and a greatest common
divisor of f and g is the product fg.
(14) If f(X) e F[X] has degree n, show IR(f)I n.

(15) Prove that every polynomial f(X) c R[X] factors into linear and quadratic
polynomials.
(16) Let f(X)c R[X], and let f'(X) denote the derivative of f. Show that f and f'
are relatively prime in R[X] if and only if f has no multiple roots.
(17) Prove that f(X) = 1 + X + X3 + X4 is not irreducible over any field.
(18) Show that p(X) = X4 + 2X + 2 e Q[X] is irreducible.
(19) Let f(X) be a nonconstant polynomial in F[X]. Set (f) = g(X)e F[X]}.
Show that the vector space F[X]/(f) is a finite-dimensional algebra over F
when multiplication is defined as follows: (h + (f))(g + (f)) = hg + (f).
(20) Suppose in Exercise 19 that f(X) is irreducible. Prove that F[X]/(f) is a field.
MINIMAL AND CHARACTERISTIC POLYNOMIALS 105

2. MINIMAL AND CHARACTERISTIC POLYNOMIALS

Let us start this section with a very general result.

Theorem 2.1: Let A be an algebra over F, and assume dimF(A) = m <cc. Then
for every x eA, there exists an f(X)e F[X] such that 1 C 5(f) C m and f(x) = 0.

Proof Consider the set A = {1, x,..., xm} A. Since dimFA = m, A is linearly
dependent over F. Hence, there exist constants c0,..., Cm E F, not all zero, with
c0(1) + c1x + + cmxtm = 0. Set f(X) = c0 + c1X + + Clearly,
1 C5(f)Cm,andf(x)=0. El

Although the proof of Theorem 2.1 is trivial, the theorem has many
interesting ramifications for a T e t(V). Since dimF(V) = n < cc, dimFt(V) = n2.
Thus, we can apply Theorem 2.1 to the algebra 1(V). We conclude that there
exists a nonconstant polynomial f(X) e F[X] such that 5(f) C and f(T) = 0.
Another way to say this is that ker p = {f(X)e F[X] I f(T) = 0}, the kernel of
qx F[X] —p 1(V), is not zero. Among all such nonconstant ft ker p, we can select
a polynomial g(X) of smallest degree. Then 1 (5(g) C n2, g(T) = 0, and
5(g) C 5(f) for any nonconstant polynomial ft ker (p. Suppose 5(g) = m. Then
+c0 with Then
+ (cm -
1
+ ... + (co/cm) is a monic polynomial of lowest degree in
ker (p. A polynomial arXr + + a0 e F[X] is said to be monic if r ? 1 and
ar = 1.
Now we claim that c 1g(X) is unique. To see this, suppose f(X) is a second,
nonconstant, monic polynomial in ker p of smallest possible degree. Set
c 1g(X) = h(X). Applying the division algorithm 1.2, we have
f(X) = A(X)h(X) + r(X) with r = 0 or 5(r) c 5(h). If we apply (p to this equation,
we get 0 = f(T) = A(T)h(T) + r(T) = r(T). If r 0, then r(X) is a nonconstant
polynomial in ker (p of degree smaller than 5(g). This is impossible. We conclude
that r = 0 and f = Ah. Now 5(f) = 5(h) by definition. Thus, 1.1(c) implies
5(A) = 0. So, A is a nonzero constant. Since f and h are both monic, we conclude
that A = 1 and f = h. We have now shown that there is a unique, monic
polynomial of smallest positive degree in ker (p. This polynomial gets a special
name, which we introduce below:

Definition 2.2: The unique, monic polynomial f(X)e F[X] of smallest positive
degree such that f(T) = 0 is called the minimal polynomial of T We shall
henceforth denote the minimal polynomial of T by mT(X) or just mT.

Our discussion before Definition 2.2 implies that mT(X) exists and indeed is
unique. We also know the following facts about mT(X):

23: (a) 1 C S(mT) C n2.


(b) If fe F[X] and f(T) = 0, then mT If.
106 CANONICAL FORMS OF MATRICES

(a) follows from Theorem 2.1. (b) is the same argument that was used above
for the uniqueness of mT.
We can define the minimal polynomial of an n x n matrix in a similar fashion
to 2.2.

Definition 24: Let A e The minimal polynomial mA(X) of A is the


monic polynomial ft F[X] of smallest positive degree such that f(A) = 0.

Suppose is a basis of V, and JT(cz, a)(T) = A. Since JTfr, 1(V) —, JF)


is an isomorphism of F-algebras, clearly mT(X) = mA(X). Thus, we can switch
back and forth freely between V [T or mT(X)] and [A or mA(X)]. The
minimal polynomial is the same. Note that one consequence of the equality
mA(X) = mT(X) is that similar matrices have the same minimal polynomial. For
if A and B are similar, then Theorem 3.28 of Chapter I implies that A and B are
matrix representations of the same linear transformation T. Thus,
mA(X) = mAX) = mB(X). Let us examine a few examples.

Example 2.5: If T = 0, clearly mT(X) = X. If T = then mT(X) = X — 1. El

Example 2.6: Let V = R2, and let Ô = = (1,0), 62 = (0, 1)} be the canonical
basis of R2. Let T be defined by T(61) = 62 and T(62) = —.6k. Thus, T is the
familiar rotation of P2 through a 900 angle:

F(&oXT)=(? 1)=A
Clearly, A2 + I = 0. Thus, X2 + 1 e ker p. Since T takes no nonzero a
into a multiple of itself, no linear polynomial lies in ker p. Thus,
mT(X)=mA(X)=X+L fl
Suppose we consider the same example over C instead of P.

Example 2.7: Let V = C2, 3 = = (1,0), 62 = (0, 1)}, and suppose T(61) = 62,
= Then again we have

and hence X2 + 1 e ker p. But now T((1, —i)) = i(1, —i). So, the same reasoning
used in Example 2.6 no longer applies. If U(mT) = 1, then T = zlv for some
zeC — {0}. But then 62 = T(61) = z61 which is impossible. Hence
mT(X) = X2 + 1 as before. El

Note in Example 2.6 that the minimal polynomial mT(X) = X2 + 1 is


irredicuble in R[X]. When we extend the scalars in this example to C, we get
MINIMAL AND CHARACTERISTIC POLYNOMIALS 107

Example 2.7. The minimal polynomial stays the same but is no longer
irreducible. X2 + 1 = (X — i)(X + i) e C[X]. These examples suggest the follow-
ing question: How does the minimal polynomial change under an extension of
the scalars? The answer is that the minimal polynomial stays the same under all
extensions of the scalars. However, as the field F gets larger, we may be able to
factor mAX) more easily, as the two examples above show.
In order to argue mT(X) remains invariant under extensions of the base field,
we need to examine the kernel of p more closely. From 2.3(b) we know that any
polynomial in ker (p is a multiple of mT(X).

Lemma 2.8: Suppose g(X) is a polynomial of positive degree m in F[X]. Let


(g) = {A(X)g(X)IAe F[X]}. Then (g) is a subspace of F[X], and
dimF{F[X]/(g)} = m.
Proof? The fact that (g) is a subspace of F[X] is obvious. We want to show
dimF{F[X]/(g)} = 3(g). Let x1 = X' + (g), i = 0, 1,..., m — 1. We shall show
thatA = {1, x,..., xm_l} isabasis of F[X]/(g). Then dimF{F[X]/(g)} Al = m
and our proof will be complete.
A typical vector in F[X]/(g) is a coset of the form f(X) + (g) with f(X) e F[X].
Using the division algorithm 1.2, we can write f(X) = h(X)g(X) + r(X) with r = 0
or 3(r) < m. Then f(X) + (g) = (hg + r) + (g) = r + (g). Suppose r(X) =
c,Xs+...+co with scm. Then
c,x5 + + Thus, A spans F[X]/(g) as a vector space over F.
Suppose c1x' = 0 in F[X]/(g). Then back in F[X], we have
= A(X)g(X) for some polynomial A. Since 3(g) = m> 3 c1X'), it
follows from 1.1(c) that A = 0. Thus c0 = = cm_i = 0. This proves the set A is
linearly independent over F. Thus, A is a basis of F[X]/(g). U

Corollary 2.9: 0 -÷ (mT) c4 F[XJ —+ Im (p -÷ 0 is an exact sequence of vector


spaces over F. In particular, (p} = 3(m,j.

Proof The curved arrow between (mT) and F[X] indicates that the map
between (mT) and F[X] is the inclusion map. We have seen that ker (p = (mT) by
2.3(b). The result now follows from Lemma 2.8 and Corollary 5.17 of Chapter
LU
The image of is just the linear span of the set {T1 Ii = 0, 1,... }. Corollary
2.9 says this is a subspace of of dimension 3(mT).
Now suppose K is any field containing F. We can extend our scalars to K by
forming V" = V ®F K Let D' = T ®F II. Then we have two minimal poly-
nomials mT e F[X] and mTK e K[X]. The statement that the minimal polynomial
of T remains invariant under an extension of the base field means more precisely
that mT(X) = mTK(X) in K[X].

Theorem 2.10: Let K be a field containing F, and let T e HomF(V, V). Then
mT(X) = mT'4X) in K[X].
108 CANONICAL FORMS OF MATRICES

Proof: Since F K, F[X] K[X]. In particular, mT(X) e K[X]. Suppose


mT(X) = xr + ar_iXr_l + + a0 with a1eF. Then using Theorem 2.6 of
Chapter II, we have

mT(T)=mT(T®FIK)=(T®FIK)+ar_l(T®FIK) +
= (V ®FIK) + ®F&) + a0

= mAT) ®F 'K
=0
Thus, 2.3(b) implies mTK(X) I mT(X) in K[X]. Since both of these polynomials are
monic, our theorem will follow if we can show that ê(mT) = ô(mTK).
We know from Corollary 2.9 that

0 —. (mT) -, F[XJ -. Im (p -.0


is an exact sequence of vector spaces over F. Consequently, Theorem 2.13 of
Chapter II implies

2.12:

isan exact sequence of vector spaces over K. The reader can easily check that
F[X] ®F K K[X] (as K-algebras) under the map sending ®F to
> Under this isomorphism (mT) ®F K is sent to all multiples of mT in
K[X]. Let us call this image (mT)K[X]. Then the short exact sequence in 2.12
becomes

2.13: 0-.(mT)K[X]-.K[X]--.Im (P®FKO


In particular, Lemma 2.8 implies that dimK(Im ®F K) = a(mT). On the other
hand, using theorems from Chapter II, we have Im ®F K 1(V) ®F K
HomK(VK, yK) Under this isomorphism, Im ®F K is clearly just the
K-linear span of ®F 'K Ii = 0,... }. Thus, Corollary 2.9 implies
dimK(Im ®F K) = dimK{L(T' ®F 'KI' = 0, 1,. . . )} = ô(mTK). Therefore,
ê(mT) = ê(mTK) and our proof is complete. El

We think of V' as the natural extension of T to Theorem 2.10 says that


the minimal polynomial of T remains the same under all extensions of the
scalars. Switching to matrices, we have the following matrix version of Theorem
2.10:

Corollary 2.14: Let Ae and let K be any field containing F. Then the
minimal polynomial, mAW4X) e F[X], of A viewed as an n x n matrix with
coefficients in F is the same as the minimal polynomial, mAK4X) e K[X], of A
viewed as an n x n matrix with coefficients in K.
MINIMAL AND CHARACTERISTIC POLYNOMIALS 109

. , bj be the canonical basis of P. Then


Proof Suppose A = Let ö = .

['(b, b)(T) = A where T is the endomorphism of F' given by =


We have seen in Chapter II that is a K-basis of P ®F and
f(5, b)(TK) = A. Thus, from Theorem 2.10, we have mAF(X) = mT(X) =
mTK(x) = mACK4X). fl

Let us summarize what we have proved about the minimal polynomial of T


so far.

2.15: (a) 1 C ö(mT) n2.


(b) ker = (mT).
(c) The minimal polynomial remains the same under all extensions of the
base field.
(d) ö(mT) = dimF(F[T]).

In (d), F[T] denotes the subspace of S(V) spanned by all powers of T.

There is one final property that mT(X) possesses that we shall discuss at this
point.

Lemma 2.16: T is invertible if and only if mT(X) has a nonzero constant term.

Proof Let mT(X) = XT + ar_jXr_l + ... + a0eF[X]. Suppose a0 = 0. Then


mT(X)= Xg(X), where g(X)= + + + a1. Set S = g(T). Then
Set(V), and 0 = mT(T) = TS. Since r = a(mT)> £3(g) = r — 1, we see S a
Hence, there exists a vector e V such that S(oc) 0. Then T(S(tx)) = 0 implies T
is not invertible. In particular, if T is invertible, then mT(X) must have a nonzero
constant term.
Conversely, suppose a0 $ 0. Then in 8(V), we have

[ \a0J \ a0 J \aojj

g(X) =
\aoj
XE1 -\ J
xr-2 - ... -
Then Tg(T) = g(T)T and T is invertible. U

Note that the proof of Lemma 2.16 implies that if T is invertible, then
T in of course, have a
similar statement about matrices.
110 CANONICAL FORMS OF MATRICES

Corollary 2.17: If T e 1(V) is invertible, then T' = f(T) for some f(X) e F[X].
Similarly, if Ac is invertible, then A-' = g(A) for some polynomial
g(X)eF[X]. C
We now turn our attention to the second polynomial of this section. We need
to consider matrices with polynomial entries.

Definition 2.18: Let denote the set of all n x n matrices A


with entries e F[XIJ.

Thus, an element A e is a rectangular array of polynomials from


F[X], that is,

f11(X),...,
A=

for some choice of e F[X]. Clearly, c We can extend


the algebra operations from to in the obvious way.

2.19: (a) + = + for


(b) = for ceF[X].
(c) = where = =

Thus, JF[X]) is an algebra over F containing as a subalgebra.


We can extend the definition of the determinant to the algebra in
the obvious way.

2.20: =

Clearly, det: —÷ F[X]. Many theorems concerning the behavior of


the determinant on pass over to with no change in proof.
For example, det(AB) = det(A)det(B) for all A, Be JF[X]). We also have
the following important result:

2.21: adj(A)A = A adj(A) = for

Recall that adj(A) is the adjoint of A. It is defined as follows: If

A = e let MIJ(A) be the (n — 1) x (n — 1) matrix formed


from A by deleting row i and column j from A. Thus,
MIJ(A)EM(fl_j)X(fl_l)(F[X]). The adjoint of A is the n x n matrix whose i,jth
entry is given by = (— The proof of 2.21 is the same as
for fields. We also have the Laplace expansion for the determinant.

2.22: det(A)
=
MINIMAL AND CHARACTERISTIC POLYNOMIALS 111

or

det(A)
=
The proof of 2.22 is the same as in the field case.
We can now introduce the characteristic polynomial of an n x n matrix with
coefficients in F.

Definition 2.23: Let A e Then cA(X) = det(XI — A) is called the charac-


teristic polynomial of A.

In 2.23, XI means Thus, XI — and det(XI — A)eF[X].


Expanding det(XI — A), we see that cA(X) = + c,,_ 1X"' + + with
= — and c0 = (— 1)" det(A). In particular, cA(X) is a monic poly-
nomial of degree n with coefficients in F.
Note that any matrix similar to A has the same characteristic polynomial.
For suppose B = in Then cB(X) = det(XI — B) =
det (XI—PAP')=det (P)det (XI—A)det
det(XI — A) = cA(X). This remark allows us to extend the definition of the
characteristic polynomial to any T e
Definition 2.24: Let T e 8(V). Then CT(X) = det(XI — A), where A = ['(at, ac)(T)
and is any basis of V.

We had seen in Theorem 3.28 of Chapter I that any two matrix represen-
tations of T are similar. Hence, the definition of cT(X) does not depend on the
basis We shall call cT(X) the characteristic polynomial of T.

Example 2.25: Let T be the linear transformation given in Example 2.6. Then

cT(x)=cA(x)=det(1 fl

Note that the characteristic polynomial CT(X) is always a monic polynomial


of degree n = dim V. One of the most famous theorems in linear algebra is the
following result, first formulated by Cayley:
Theorem 2.26 (Cayley—Hamilton): Let A e Then cA(A) = 0.

Proof: Suppose c4X) = X" + + + c0. Set B = XI,, — A. If we


eliminate a row and column from B and then take the determinant, we get a
polynomial in X of degree at most n —1. In particular, the entries in adj(B) are
all polynomials of degree at most n — 1. It follows that there exist unique
matrices B0,..., such that

2.27:
112 CANONICAL FORMS OF MATRICES

In equation 2.27,we should really write Bj(X'In) instead of B1X1, but the meaning
of the symbol is Clear.
Now from equation 2.21, we have

2.28: adj(B)B = CA(X)ln = CoIn + cilnX + + cn_iInXT1 + InX"

On the other hand, from equation 2.27 we have

2.29: adj(B)B = (B0 +. + — A)

= (—B0A) + (B0 — +
+ (Bn2 — + Bn..iXn

We now compare the results in 2.28 and 2.29. We have two polynomials in X
with coefficients in Mn n(F) that are equal. An easy argument shows that the
matrices corresponding to the same powers of X in both equations must be
equal. Thus, we get the following equations:

2.30:
B0 — B1A = ciln

Bn_2 — Bn.iA = Cn.iln


Bni
If we now multiply each equation in 2.30 by successively higher powers of A, we
get
2.31: —B0A = coin
B0A — = c1A

u n—2 n—i_u n—i n_


—cn_i n—i

Bn_iA" =
Adding the vertical columns in 2.31 gives us 0 = cA(A). fl
Corollary 232: Let I e 8(V). Then cT(T) = 0. In particular, U(mT) U(c,j =
dim V.

Proof Let be a basis of V. Set A = Then cT(X) = cA(X). By the


Cayley—Hamilton theorem, 0 = cA(A) = cx)(cT(T)). Hence CT(T) = 0. Thus,
m1(X)1c1(X) by 2.15(b). Since n, U(cT(X)) = n. Hence
C n. fl
MINIMAL AND CHARACTERISTIC POLYNOMIALS 113

We have noted in the proof of Corollary 2.32 that mT I CT. In general, these
two polynomials are not equal, as the following trivial example shows:

Example 2.33: Let A= e Clearly, CA(X) = (X — and


mA(X)=X—a. fl
A less trivial example is as follows:

Example 234: Let

[—1 7 01
A1 0 2 OleM3x3(Q)
[o 3 —1]

One can easily check that cA(X) = (X + 1)2(X — 2) and


mA(X)=(X+1)(X—2). fl

The examples above suggest that even when mT CT, they always have the
same irreducible factors in F[X]. This is indeed the case.

Theorem 2.35: Let Ae JF). Then cA(X)I (mA(X))".

Proof Suppose mA(X) = xt + + + Note for this proof, we have


changed our customary indexing on the coefficients of the minimal polynomial.
Let us now form the following r matrices in

2.36: B0

andfori=1,...,r—1

Then clearly, B1 — AB1_1 = a11 for all i = 1,..., r — 1. We also have


+atlA=mA(A)—arl=—arL
Now set Then
and we have

237: +B1Xr_2 +Br_i)


= + (B1 — +
— ABE2)X — ABr1
= + + + +
= mA(X)Ifl
114 CANONICAL FORMS OF MATRICES

If we now take the determinant of both sides of equation 2.37, we get cA(X)
det C = (mA(X))". Consequently, CA(X) I (mA(X))". El

Corollary 2.38: Let T e 8(V). Then c1(X) and mT(X) have the same set of
irreducible factors in F[X].

Proof Theorem 2.26 implies mT(X) I cT(X). Theorem 2.35 implies that
cT(X) I The result follows from 1.7. fl
Let us rephrase Corollary 2.38 in terms of the language used in Theorem 1.7.
Suppose mT(X) = fV . . is the (essentially) unique factorization of the minimal
.

polynomial of T in F[XIJ. Thus, each f1 is irreducible, and are not associates


for i # j, and each d1 > 0. Then corollary 2.38 implies that the unique
factorization of cT(X) (in F[XJ) is given by cT(X) = with d1 C e1 for all
= 1,..., r. We must also have = n.
Now suppose P is the algebraic closure of F. Consider the extension
T T to Vr. We have seen in Theorem 2.10, that mT(X) = mTr(X) in
P[x]. It is clear from the definition that c1(X) = If we apply Corollary
2.38 to TF e 8(V"), we conclude that cT(X) and mT(X) have the same irreducible
factors in P[X]. Since P is algebraically closed, the only irreducible polynomials
in P[X] are all linear. Hence c1(X) and m1(X) can be written in P[X] as follows:

2.39: mT(X) (X — c1)m and cT(X) = (X —


=

In equation 2.39, we must have 0 < m1 C n1 for all i = 1,..., r. Also,


n1 + + n = dimF(V). Recall that the roots R(f) of a nonconstant
polynomial f(X) e F[X] are those elements a e P such that f(a) = 0. Equation
2.39 implies R(mT) = {c1,..., cr} = R(cT). Thus, we have proved the following
corollary:

Corollary 2.40: Let T e 8(V). Then the minimal polynomial and characteristic
polynomial of T have the same roots in P.

We shall finish this section with a brief look at invariant subspaces.

Definition 241: Let T e 8(V), and let W be a subspace of V. We say W is


invariant under T or T-invariant if T(cz) e W for all e W.

Thus, W is T-invariant if T(W) W. Clearly, (0), V, ker T, and Im T are all T-


invariant subspaces of V. In the next few sections, we shall mainly encounter T-
invariant subspaces in direct sum decompositions of V. Suppose
V = V1 ® $ Vr is an (internal) direct sum of subspaces V1, . . ., Vr. Let us
further suppose each V1 is T-invariant. We shall denote the restriction of T to V1
by T1. Since V1 is T-invariant, T1 e 8(V1) for all i = 1, ..., r. Suppose
= + + with Then T(oc) =
EXERCISES FOR SECTION 2 115

Let a basis of V1, i = 1,. . . , r. It follows from Theorem 4.16(b) of


Chapter I that x = a basis of V. Since each V1 is T-invariant, ['(cx,
has the following form:
A1 0
2.42: ['(at, at)(T) = with A1 =
0 Ar

Equation 2.42 gives us an immediate proof of the first half of the following
theorem:
Theorem 2.43: Let T e t(V) and suppose V = V1 ® is an internal direct
sum of 1-invariant subspaces V1,..., Vr. Let denote the restriction of T to V1.
Then

(a) T1e1(VJ,i= 1,...,r.


(b) cT(X) = 1cT(X).
(c) mT(X))= l.c.m.(mTl(X),..., mT(X)).
Proof Here c.r(X) is the characteristic polynomial of TI on V1. Similarly mTIX)
is the minimal polynomial of TI on V1. (a) is clear. As for (b), we have
cT(X) = cA(X), where A = ['(at, at)(T). Thus, from 2.42, we have cT(X) =
det(XI — A) = det(X11 — A1) cr(X). Here II of course denotes the
identity matrix of size the same as A1.
For (c), let us shorten notation here and write m1 for m-r(X). Recall from
Exercise 11 in Section 1 that a l.c.m.(m1,. . , mj is a polynomial e e F[X] with
.

the following two properties.

2.44: (i) m1lefori=1,...,r.


(ii)

We shall argue that mT(X) satisfies (i) and (ii) in 2.44.


Since mT(T) = 0 on V, clearly mT(T1) = 0 on V1. Thus, m1 mT by 2.15(b). We
have now established (i).
Suppose g(X) e F[X] such that for all i = 1,. . . , r. Then m1(X)a1(X) =
g(X) for some a1. In particular, if at1 e V1, then g(TJ(at1) = m1(T1)a1(T1)(at1) = 0.
Now let at e V and write at = + + ; with at1e V1. Since each V1 is invariant
under T, V1 is invariant under g(T). Clearly, the restriction of g(T) to V1 is
nothing but g(TJ. Thus, g(T)(at) = g(T)(at1) = g(TJ(at1) = 0. Therefore,
1

g(T) = 0, and by 2.15(b). This proves (ii) and completes the proof of the
theorem. C

EXERCISES FOR SECTION 2

(1) Let F K be fields. Show that the map i/i: F[X] K -± K[X] given by
i/i(f(X) ® k) = kf(X) is an isomorphism of K-algebras.
116 CANONICAL FORMS OF MATRICES

(2) Suppose >J'L 0A1X' = B1X' in with A1, B1 e for


alli=O,...,m. Show A1= B1foralli.
(3) Let A e If det(A) # 0, does it follow that A is invertible?
(4) IfA = and CA(X) = + + + a0, show

(i) = = —Tr(A).
(ii) a0 = (— det(A).

(5) Give an example of a vector space V and Te 8(V) such that V = V1 ® V2


with V1 a T-invariant subspace and mT(X) $ mT,(X)mT2(X). Here T1
denotes the restriction of T to V1.
(6) Suppose T e 8(V) is nilpotent, that is, Tm = 0 for some m 1. Let
f(X) = a1Xr + + a0 be any polynomial in F[X] with a0 $ 0. Show that
f(T) is an invertible linear transformation on V.
(7) Find the characteristic and minimal polynomials of T: -÷ given by
= 663, T(62) = — 1 163, and 1(53) = + 663. Here ö =
63} is the canonical basis of R3.
(8) Find the characteristic and minimal polynomials of the subdiagonal matrix
given by

000 00
100 00
010 00
666 ió
(9) Let 1: W be given by T(x1, x2, x3, x4) = (x1 — x4, x1,
— 2x2 — x3 — 4x4, 4x2 + x3).
(a) Compute cT(X).
(b) Compute m14X).
(c) Show that R4 is an internal direct sum of two proper 1-invariant
subspaces.
(10) Find the minimal polynomial of

—1 0 0 0 0
1 —1 0 0 0
0 0 —1 0 0
0 0 0 1 0
0 0 0 1 1

(11) Suppose W is a 1-invariant subspace of V. Show that I induces a linear


EIGENVALUES AND EIGENVECTORS 117

transformation Ic t(V/W) given by I(cz + W) = T(cx) + W. What is the


relationship between the minimal polynomials of I and I?
(12) Let A e When computing the characteristic and minimal poly-
$j. matrix
nomials of A, is it permissible to first row reduce A to some simpler
and then make the desired computations? Explain.
(13) A matrix D = diagonal if = 0 whenever i If D is a
diagonal matrix, then we shall write D = diag(a1,..., aj, where a1 = d11
for all i = 1,..., n. Compute cD(X) and mD(X) for any diagonal matrix D.
(14) Suppose A e JF) is a nonzero, nilpotent matrix. Thus, Ak = 0 for some
k 2. Compute mA(X) and cA(X). Show that A cannot be similar to any
diagonal matrix.
(15) Let Ae if the degree of mA(X) is n, does it follow that A is similar
to a diagonal matrix?
(16) Let Ae Show that A is singular if and only if zero is a root of
cA(X).

(17) Let Ac and suppose F is an algebraic closure of F. Suppose


cA(X) = flit.1 (X — c1)in F[x]. Here c1,..., (eF) are not necessarily
distinct. Write the coefficients of cA(X) as symmetric functions of c1,. ..,
The coefficients of cA(X) are functions of c1,..., which lie in F.
(18) Use your answer from Exercise 17 to find a matrix AcM2 such that
cA(X) = X2 + 2X +5.
(19) Suppose A e is a triangular matrix. This means = 0 whenever
i > j (upper triangular) or = 0 whenever i <j (lower triangular). Show
that if A = is triangular, then c4X) = (X — a11).

(20) Let Ac JF) and B c Mm x Consider the Kronecker product A ® B


of A and B. Prove that if xc R(cA(X)) and ye then xy is a root of
cA®B(X).

(21) Prove Corollary 2.14 directly without using tensor products. [Hint: Use a
basis of K over F and write the coefficients of mT4X) in terms of this
basis.]

3. EUGENVALUES AND EIGENVECTORS

As usual, let V be a finite-dimensional vector space over F of dimension n. Let


T c 1(V).

Definition 3.1: An element cc F is called an eigenvalue of T if ker(T — c) $ 0.

Thus, c e F is an eigenvalue of T if there exists a nonzero vector c V such


that T(cz) = ci. Eigenvalues are also called characteristic values, but in this text,
118 CANONICAL FORMS OF MATRICES

we shall use the term "eigenvalue" exclusively. The complete set of eigenvalues
for T in F will be denoted £PF(T) and called the spectrum of T (in F). Thus,
CE YF(T) if and only if there exists a vector cx E V such that cx # 0 and T(cx) = ccx.
The spectrum of T depends on the field F.

Example 3.2: Let us return to Examples 2.6 and 2.7 of the Section 2. Since
T: R2 represents a rotation through 90°, no nonzero vector is taken by T
into a multiple of itself. Consequently, = 4). If we extend T to
Tc: C2 —' C2, then both i and — i are eigenvalues of TC. We have
Tc((1, —i)) = i(1, —i) and Tc((_1, —i)) = —i(—1, —i). We shall soon see that
Tc can have at most two distinct eigenvalues. Therefore, = {i, — i}.

Let us gather together some of the more obvious facts about eigenvalues.

Theorem 3.3: Let T e 8(V), and suppose K is a field containing F. Then

(a) YF(T) b°K(T).


(b) YF(T) = R(cT) n F, that is, the eigenvalues of T in F are precisely the
roots of the characteristic polynomial of T that lie in F.
(c) b°F(T) = R(m1) n F.
(d) WF(T)I

Proof (a) Suppose c e 92F(T). Then there exists a nonzero vector cx e V such that
(T—c)(cx)=0. cx®rl
0 ®F 1 = 0. Thus, CE
(b) Recall that R(cT) is the set of roots (in F) of the characteristic
polynomial c1(X) of T. Let be a basis of V, and set ['(cx, cx)(T) = A.
Now CE if and only if ker(T — c) 0. From Theorem 3.33(b)
of Chapter I, we know ker(T — c) # 0 if and only if T — c is not an
isomorphism on V. Since 8(V) -÷ is an isomorph-
ism of F-algebras, T — c is not an isomorphism if and only if
A — c = A — cIi, is not invertible in This last statement is
in turn equivalent to det(A — cIa) = 0. Now cT(C) = cA(c) =
— A) = (— det(A — cIj. Thus, CE if and only if c is a
root of cT(X) in F. Hence, = R(cT) n F.
(c) We have seen in Corollary 2.40 that R(m1) = R(cT). Hence, (c) follows
from (b).
(d) From (c), we know R(m1). We have seen in Exercise 14 of
Section 1 that IR(mT)I 3(m1). The result now follows from Corol-
lary 2.32. fl

Let us make a few comments about Theorem 3.3. The includion in (a) could
very well be strict, as Example 3.2 shows. As we extend scalars, the extended
linear transformation TK may pick up more eigenvalues because the character-
EIGENVALUES AND EIGENVECTORS 119

istic polynomial cT(X) may have more linear factors in K[X] than it had in F[X].
This is precisely what is happening in Example 3.2. Over R, cT(X) = X2 + 1.
Since X2 + 1 is irreducible in R[X], R(cT) n R = 4). Thus = 4). Over C,
cTc = (X2 + 1) = (X + i)(X — i). Hence, R(cic) n C = {i, —i}. Therefore,
= {±i}.
Theorem 3.3(b) tells us exactly how to compute the eigenvalues of T in F.
Choose any matrix representation A of T, and compute the characteristic
polynomial cA(X) = det(XI — A). Then find the roots of cA(X) that lie in F.
These roots are precisely the eigenvalues of T lying in F. Of course, finding the
roots of cA(X) that lie in F may be very difficult. If F = for instance, we can
use well-known techniques from numerical analysis to at least approximate the
real roots of cA(X).
Let us consider one more example before continuing.
Example 3.4: Let T: It' -÷ It' be given by T(31) = — T((52) = 2(52,
T(b3) = and T(b4) = (54 — (53. Here 3 =
(54, (53, 34} is the canonical basis
of It'. The matrix representation of T is given by

Thus, It' = L({(51, $ L({33, b4}) is a direct sum decomposition of It'


in terms of T-invariant subspaces. Theorem 2.43 implies cT(X) =
(X — l)(X — 2)(X2 — X + 1). Thus, R(cT(X)) = {l, 2, (1 ± c C. If we
now apply Theorem 3.3, we have = {l, 2} and 97c(Tc) =
{l,2,(l±q5i)/2}. U
Definition 3.5: Let A e JF). An element cc F is called an eigenvalue of A if
det(A — cIa) = 0.

Clearly, c is an eigenvalue of A if and only if there exists a nonzero column


vector Ye 1(F) such that AY = cY. The set of eigenvalues of A (in F) will be
denoted by If T e a basis of V, and A = JT(tx, ci)(T), then the proof
of 3.3(b) implies that b°F(T) = £PF(A). In particular, Theorem 3.3 remains valid
with T replaced by A. We can therefore switch back and forth between T and its
matrix representation A when computing eigenvalues. One corollary of this
interplay is that similar matrices have the same set of eigenvalues. That is
because similar matrices represent the same linear transformation on V.

Definition 3.6: A nonzero vector ci eV is called an eigenvector of T belonging to


the eigenvalue c if T(ci) = cci.

Thus, if c e 9'F(T), then any nonzero vector in ker(T — c) is an eigenvector


120 CANONICAL FORMS OF MATRICES

belonging to C. We emphasize that eigenvectors are always nonzero. Let us look


at some examples.

Example 3.7: Suppose T = 0. Then every nonzero vector in V is an eigenvector


for T belonging to 0. If T = 1v' then every nonzero vector in V is an eigenvector
for T belonging to 1. fl
Example 3.8: In Example 2.6, 9'R(T) = 4. Hence, T has no eigenvectors. In
Example 2.7, Tc has (1, —i) belonging to i and (—1, —i) belonging to —i. Thus,
as the reader should expect, when we enlarge the base field F by extending
scalars, a given T may acquire new eigenvectors not present over F. U
Example 3.9: Let T: R4 -÷ R4 be the transformation in Example 3.4. (1, 1, 0,0) is
an eigenvector belonging to 1, and (0, 1, 0, 0) is an eigenvector belonging to
2.fl
Eigenvectors are also called characteristic vectors. In this text, we shall not
use this name. The relation between eigenvectors belonging to different
eigenvalues is one of linearly independence.

Theorem 3.10: Suppose c1,. . . ,; are distinct eigenvalues in £'F(T). Let be an


eigenvector of T belonging to c1 for each i = 1,..., r. Then A = {x1,..., ;} is
linearly independent over F.

Proof? Suppose A is linearly dependent over F. Then among all nontrivial


relations of the form + + a relation in which the
fewest number of x1 occur. After relabeling the x1 if need be, we can assume our
relation of minimal length is of the form

a1; + + a,;=O
In equation 3.11, s r, and each a1 is a nonzero scalar in F. Multiplying 3.11 by
c1 gives

3.12: c1a1; + + c1a5; = 0

Applying T to 3.11 gives

3.13: c1a1; + + c,a5oc, = 0

If we now subtract 3.12 from 3.13, we produce a nontrivial relation among the x1
that has fewer terms in it than 3.11. This is a contradiction. Thus, A is linearly
independent over F. fl

There are several interesting corollaries to Theorem 3.10. In the first place,
19'.(T)l dim V since eigenvectors belonging to distinct eigenvalues must be
EIGEN VALUES AND EIGENVECTORS 121

linearly independent. We do not list this fact as a corollary since Theorem 3.3(d)
is an even sharper result. Theorem 3.10 gives us sufficient conditions for
representing T as a diagonal matrix.

Corollary 3.14: Suppose Te such that 9'F(T)I = dim V. Then there exists a
basis of V such that is a diagonal matrix.

Thus, if dim V = n, and T has n distinct eigenvalues in F, then T can be


represented by a diagonal matrix.

Proof of 3.14: Let = {c1 . . . , cj, where n = dim V. For each i = 1,.. . , n,
let be an eigenvector belonging to c1. Then = .. . , is a basis of V by
Theorem 3.10. Clearly, JT(2, x)(T) = diag(c1,.. . , cj. El

Here and throughout the rest of the text, we shall let diag(c1,. .. , cj denote
the n x n diagonal matrix
[ci 0
3.15:
[0
We note in passing that the converse of Corollary 3.14 is false. Namely, if
some matrix representation of T is diagonal, we cannot conclude that T has n
distinct eigenvalues in F. For example, T = is represented by the matrix
relative to any basis in V, but T has only one eigenvalue 1.
A slightly different version of Corollary 3.14 is worth recording here.
Corollary 3.16: Let F be an algebraically closed field and let Ae JF).
Suppose cA(X) has no repeated roots. Then A is similar to a diagonal matrix.
Proof Since F is algebraically closed, cA(X) = (X — c1r in F[X]. Here
c1,. ..,cr
are the roots of cA(X), and we must haven1 + + = n. Now the
statement that c4X) has no repeated roots means each = 1. In particular,
r = n, and R(cA) = {c1,..., Thus, Theorem 3.3(b)implies =

Suppose A = Let 5 = . , be the canonical basis of and


define a linear transformation T: -÷ by T(t51) = = Co11(A)'. Then
f(Ô, b)(T) = A.
We have noted that = 92F(A) = {c1,..., Consequently, Corollary
3.14 implies there exists a basis of F" such that tx)(T) = diag(c1,. . ., cj.
Since ['(3, 3)(T) is similar to f(x, x)(T) (Theorem 3.28 of Chapter I), we conclude
A is similar to diag(c1, . . , ca). El
.

Clearly, a given T e t(V) can be represented by a diagonal matrix if and only


if V has a basis consisting of eigenvectors of T. Corollaries 3.14 and 3.16 give us
sufficient conditions for such a basis of eigenvectors to exists. If T has enough
eigenvalues (i.e., dim V), then T can be represented as a diagonal matrix.
122 CANONICAL FORMS OF MATRICES

Example 3.17: Consider the linear transformation Tc: C4 —> C4 derived from T
in example 3.4. Since £ec(TC) = {1, 2, (1 ± we conclude from Corollary
3.14 that there exists a basis of C4 such that

10 0 0

3.18: = : : 0

0 0 0
2

We can also conclude from Corollary 3.16 that

1 0 0 0
—1 2 0 0
A—
— 0 0 0 —1
0 0 1 1

is similar in M4 4(C) to the diagonal matrix given in 3.18.


Over R, T has no diagonal representation. To see this, suppose there
exists a basis of W such that JT(cx, oe)(T) = diag(a1, a2, a3, a4). Then
cT(X) = ftt1(X — aJell[X]. Since cT(X) = (X — 1)(X — 2)(X2 — X + 1), we
can conclude X2 — X + 1 factors in R[X]. This is impossible. Thus, A is not
similar in M4 4(R) to any diagonal matrix. fl
In Example 3.17, we gave an example of a linear transformatio-n T that
cannot be diagonalized, that is, there exists no basis of R4 such that JT(x, tx)(T) is
diagonal. The corresponding matrix statement is that A is not similar to any
diagonal matrix in M4 4(l1). The example works because the roots of the
characteristic polynomial of T (or A) do not all lie in the base field It
Suppose T e t(V) is an arbitrary linear transformation. If there exists a
basis of V such that oe)(T) = diag (a1,. ., aj, then clearly R(cT(X))
.

{a1,..., aj F. Similarly, if Ae is similar to a diagonal matrix, then


R(c4X)) c F. We can ask about the converse of these statements. If R(cT) F, is
T diagonalizable? If R(cA) c F, is A similar to a diagonal matrix in
The answer to both of these questions is easily seen to be no. The simplest place
to look for examples is the collection of nilpotent linear transformations.

Definition 3.19: A linear transformation T e 1(V) is said to be nilpotent of index


k if Tk = 0 and Tk 1
0. Similarly, a matrix A e is nilpotent of index k
ifAk=Oand Ak_i
Suppose is a basis of V, and A = F(x, oe)(T). Since JT(cz, 1(V) -÷ JF)
is an isomorphism of F-algebras, clearly T is nilpotent of index k if and only if A
is nilpotent of index k.
EIGENVALUES AND EIGENVECTORS 123

Example 3.20: Let

000 00
100 00
Nk= 9 ! 9 9 9

ooo
An easy computation shows Nk is nilpotent of index k. fl
Suppose T e 1(V) is nilpotent of index k. It follows readily from Exercise 6 of
Section 2 that the minimal polynomial mT(X) must be a power of X alone. Since
T is nilpotent of index k, mAX) = X". In particular, Theorem 3.3(c) implies
= R(mT) = {0}. Thus, a nilpotent transformation or matrix has only one
eigenvalue 0. Note that mT(X) = implies k n. The maximum index of
nilpotency of any nilpotent transformation T cannot exceed n = dim (V). If
T is not zero, then T cannot be diagonalized. For suppose =
diag(a1,. aj for some basis of V. Then 0 = 1(2, 2)(T9 = [1(2, cc)(T)]k
. . ,

= [diag(a1,.. ,aj]" =
. But then a1 = = = 0 and T = 0,
which is impossible. Similar reasoning shows that a nonzero nilpotent matrix
cannot be similar to a diagonal matrix.
Nilpotent linear transformations are the fundamental ingredients in the
Jordan canonical form. We shall finish this section with a representation
theorem for nilpotent transformations.

Theorem 3.21: Suppose T e 1(V) is a nilpotent linear transformation of index of


nilpotency k ? 2. Then there exists a basis of V such that

[Nk,
1(2, cz)(T) =
[0 Nk

where

Let us say a few words about Theorem 3.21 before proceeding with its proof.
The supposition that k 2 is solely to avoid the trivial case T = 0. If T is
nilpotent and not zero, then clearly the index of nilpotency of T is some positive
integer between 2 and n. The notation for the Nk is given in Example 3.20. Thus,
each Nk1 is a x matrix having ones on its subdiagonal and zeros everywhere
else. Note then that JT(tx, txXT) is the next best thing to being diagonal. 12(2, cx)(T) is
a subdiagonal matrix with only zeros and ones appearing on its subdiagonal.
124 CANONICAL FORMS OF MATRICES

Proof of 3.21: Set k1 = k. Since T is nilpotent of index k1, we know Tk1 = 0, and
T1" -1 0. Hence, there exists a vector cx e V such that T1" - 1(tx) 0. We first
claim that A = fr, T(x),..., Tkt - is linearly independent over F. Suppose
these vectors are linearly dependent over F. Then we have

3.22: c1x + + + = 0.

In this equation, c1,.. ., e F and are not all zero. Suppose is the first
nonzero scalar in 3.22. Since Tk1 - 1(cx) 0, s < k1. We can then rewrite 3.22 as

3.23: [(c, +

c5 0 and T is nilpotent, ; + ;÷ 1T + + CklTk1 -s is invertible (see


Exercise 6 of Section 2). Thus, Equation 3.23 implies T5 '(x) = 0. But then
= 0, which is a contradiction.
Now let Z1 = L(fr, T(x),. . ., - Since A is linearly independent over
F, dimF(Zl) = k1. Also note that Z1 is a T-invariant subspace of V. If we let T1
denote the restriction of T to Z1, then JT(A, A)(T1) = Nk1.
Suppose we can find a T-invariant complement W of Z1. So, V = ® W,
and T(W) W. Let t denote the restriction of T to W. t is clearly nilpotent
since T is. The index of nilpotency k2 oft is less than or equal to k1 the index of
nilpotency of T on V. Since dimF(W) <dimF(V), we can assume (via induction
on the dimension of V) that there exists a basis A of W such that

[Nk, 0
3.24: ['(A, AXt) =
[0
with k2 ? ? and k2 + + = dimW. It then follows from 2.42 that
= A u A is the required basis for V. Hence the proof of Theorem 3.21 will be
complete when we argue Z1 has a T-invariant complement.
Before constructing a complement of Z1, we need the following technical
result:

3.25: If /3eZ1 is such that = 0 where 0 < s k1, then /3 = T5(fl0) for
some /30eZ1.

The proof of 3.25 is as follows: Since /3eZ1, /3 c1cx + c2T(ct) +


+ Ck1T (2). Thus, 0 = Tk1_5(/3) = + + C5T"'(2). Since A is
linearly independent over F, we conclude that c1 = = 0. Therefore,
/3 = ... + CkIT (2)— + + Ck1T(2)). Since
+ + CklTk1_5_i(2)EZ1, the proof of 3.25 is complete.
We now claim there exists a T-invariant subspace W of V such that
Z1 W = V. To construct W, consider the following set Y = {W' V I W' is a
EIGENVALUES AND EIGENVECTORS 125

subspace of V, T(W') W' and Z1 n W' = (O)}. Since the subspace (0) eY,
4', We can partially order the subspaces in 5 by inclusion If is a
totally ordered subset of 5, then clearly has an upper bound W in Y.
Namely, W = Uw€,0W'. Hence, (5 c) is an inductive set. We can apply
Zorn's lemma (Z2 of Chapter 1) and choose a maximal subspace W in Y. W is
clearly T-invariant, and W n Z1 = (0). We must argue Z1 + W = V.
We shall suppose Z1 + W # V and derive a contradiction. If Z1 + W # V,
then there exists a /3EV — (Z1 + W). Since T'" = 0, there exists an integer u such
that 0< u (k1, T'1(fl)eZ1 + Wand + W for any i < u. Let us write
T11(fl)=y+cS with yeZ1 and öeW. Then
U(y) _u(5) These two vectors are in
+ Z1 and W, respectively,
since Z1 and W are T-invariant. Since Z1 n W = (0), we conclude that
Tk1 U(y) = 0 and _u(o) = 0. We can now apply 3.25 to the vector y. We
get y = T"(yo) for some y0eZ1. Therefore, T"(/3) = y + 3 = Tt1(70) + cS.
Set = — Then Tt1(/31) T"($) — = 5eW. Since W is T-invariant,
we can now conclude that Tm(flj)EW for any On the other
hand, if i < u, then = — + W. [y0eZ1 implies
If — T1(y0)eZ1 + W, then + W. Since i <u this is
impossible.] We have now proved the following relations concerning

3.26: Tm(fl1)eW for all


T'(fl1)*Z1 + W for all i <u
Now set W1=W+L({fl1,T(131),...,Tu_l(131)}). Since u>O,
T°(/31) W by 3.26. Thus, W1 properly contains W. Also it is clear using
3.26 again that W1 is T-invariant. Since W is a maximal element of W1 * g,r.
Therefore, W1 n Z1 (0). Let + c1fl1 + c2T(fl1) + + cuTu_l($i) be a
nonzero vector in W1 n Z1. Here fl0eW, and c1,. ., Some constant c1
must be nonzero here. Otherwise, we would have 0 # e W n Z1 = (0).
Suppose is the first nonzero constant among the c1. Thus, 1 s u, and

We can rewrite this last inclusion relation as follows:

3.27: 0 + + + + cuTu_s)(Ts_l(fli))EZ1

Since c, 0, and T is nilpotent, c, + + + is invertible. It also


follows from Exercise 6 of Section 2 that the inverse R of this map is a
polynomial in T. In particular, W and Z1 are invariant under R. If we apply R to
Equation 3.27, we get 0 R($0) + T5 - e Since e W, R(fl0) e W. Thus,
+ Z1. But s — I <u, and we have a contradiction with 3.26. We
conclude that Z1 + W = V, and the proof of Theorem 3.21 is now
complete. El

The integers k1 ? appearing in Theorem 3.21 are called the


invariants of T. We shall soon see that they are unique. The subspace Z1
126 CANONICAL FORMS OF MATRICES

appearing in the proof of Theorem 3.21 15 called a T-cyclic subspace of V. Let us


formally define T-cyclic subspaces in our present Context.

Definition 3.28: Let T e t(V) be nilpotent. A T-invariant subspaCe Z of V is


Called a T-cyclic subspace if Z has a basis of the form A = fr, T(x),..., -
for some 26Z and some k ? 1.

Lemma 3.29: Let T e t(V) be nilpotent. Suppose Z is a T-cyclic subspace of V


with basis A = fr, T(x),..., Then

(a) Tk(Z) = 0.
(b) If T1 denotes the restriction of T to Z, then JT(A, A)(T1) = Nk.
(C)

Proof? (a) It is clearly enough to show Tk(tx) = 0. By definition, Z is T-invari-


ant. Therefore T%x) = + c2T(2) + ... + ckT"1(tx) for some
c1,..., ckeF. In particular, (c1 + c2T + + ckTk_l — Tk)(x) = 0.
If c1 #0, then c1 + c2T + + — Tk is invertible since T is
nilpotent. But then = 0. This is impossible since A is a basis of Z.
Hence c1 = 0. We now have (c2 + ... + ckT"2 — = 0.
The same argument shows that if c2 # 0, then T(2) = 0. Again this is
impossible since A is a basis of Z. Continuing the argument, we
conclude c1 = = Ck = 0. Hence, = 0.
(b) This assertion is obvious from (a).
(c) If i = 0 or k, then statement (c) is obvious. Suppose 1 i k — 1.
Since A = fr, T(2),..., '(x)} is a basis of Z, T1(Z) is spanned by
T'(x), T1 But by (a) only the first k — i of these
vectors are nonzero. Thus T'(Z) = L({T1(cx),.. ., Tk- '(x)}). Since
T" 1(x)} A, these vectors are linearly independent over
F. Hence, . ., T"
— '(cx)} is a basis of This proves (c). EJ

We can rephrase Theorem 3.21 in terms of T-cyclic subspaces as follows:

Theorem 3.30: Suppose T e is a nilpotent linear transformation of index of


nilpotency k ? 2. Then there exist integers k = k1 and T-cyclic
subspaces Z1,..., Z,, of dimensions k1,. .., respectively, such that

Proof? Construct Z1 and W as in the proof of 3.21. Then Z1 is a T-cyclic


subspace of V of dimension k = k1. Let T1 be the restriction of T to (the T-
invariant) subspace W. Since Tkl = 0, = 0. Thus, if k2 is the index of
nilpotency of T1 on W, then k2 k1. Proceeding as in the proof of 3.21 with T1,
we can construct a T1-cyclic subspace Z2 of W such that dim Z2 = k2. Clearly,
Z2 is a T-cyclic subspace of V of dimension k2. Again by 3.21, Z2 has a T-
EIGENVALUES AND EIGENVECTORS 127

invariant complement W' in W. Restrict T to W' and start again. In a finite


number of steps, we construct T-cyclic subspaces Z1,..., such that
J
Now let us turn our attention to the uniqueness of the integers k1 ?
appearing in Theorem 3.30.

Theorem 3.31: Let T e t(V) be nilpotent of index k ? 2. Suppose


V = Z1 $ with each Z1 a T-cyclic subspace of dimension k1, and
k = k1 ? k2 ? ? Suppose V = U1 $ with each a T-cyclic
subspace of dimension and 11 ? ? 4. Then p = s and k1 = li,..., =
Proof? Suppose the assertion in the theorem is false. Then there is a first integer
i ? I where # 4. We can assume with no loss of generality that 4 <k1. Then
k1 ?k2 ? 12
apply T' to V Z1 ® $
?"?
11,k1 = = 4_i, and <k1.Ifwe
we get Th(V) = T'(Z1) $... ® T'(ZJ $
$ From Lemma 3.29(c), we have dim T"(Z1) = k1 — li,.
dim = k1 — 4. In particular,

3.32: dim T"(V)? (k1 — 4) + + — 4)

On the other hand, V = U1 $® ® and Lemma 3.29(c)


implies = 0 whenever j ? i. Thus, T'IV) = T"(U1) $... $ T"(U1_ In
particular,

3.33: dim T"(V) = — + + — 4) = — + + —

Comparing the inequalities in 3.32 and 3.33 gives us k1 — 0. Thus, C


which is impossible. C
We have now shown that the invariants k1 ? ? of a nilpotent linear
transformation T are unique. We can extend the definition of invariants to
nilpotent matrices in the obvious way. Suppose Ae is nilpotent of index
k. Then A defines a linear transformation T: P —' P, where =
= 1,..., n. Here = {b1,. . . , is as usual the canonical basis of P. Clearly
JT(b, b)(T) = A. Since JT(Ô, t(P) -÷ is an algebra isomorphism, T is
nilpotent of index k. We define the invariants of A to be those of T. Thus, if
k1 ? ? are the invariants of A, then by definition k1 ? ? are the
invariants of T, where A = f(t5, 3)(T). Since JT(cS, 3)(-): t(P) —* is an
isomorphism, this definition makes perfectly good sense. Note that if A has
invariants k1 ? ? then Theorem 3.21 implies A is similar to
[Nkl
[0
128 CANONICAL FORMS OF MATRICES

We can now state the following Corollary to Theorem 3.31:

Corollary 3.34: Two nonzero nilpotent matrices A, BE are similar if


and only if they have the same invariants.
Proof Let k1 ) and be the invariants of A and B respectively.
Then our comments above imply that A is similar to

[Nk1 0

[0 Nk

and B is similar to

[N11 0

[o
similarity is clearly an equivalence relation on JF). If A and B have
1 e same invariants, then we have

[Nk1 0 1 [N,1 0

[0 Nj [0 1=1
N,

T uis, A and B are similar. Conversely, suppose A and B are similar. Then

[Nk1 01 [N,, 0
and
[0
I

Nj I

[0
1

N,

are similar. These two matrices then describe the same linear transformation
T: P -÷ P relative to two different bases, say x and at' of P (see Theorem 3.28, of
Chapter 1). If

[Nk, 0
12(2, ae)(T) =
[0
I

then P = Z1 ® Z,, with a T-cyclic subspace of P of dimension k,.


Similarly,

[N,, 0
12(2', ac')(T) =
[0 N,
EIGENVALUES AND EJGEN VECTORS 129

implies F = U1 e with a T-cyclic subspace of F of dimension


Theorem 3.31 now implies p = s and k1 = li,. .., = li,. Thus, the invariants
of A and B are the same. E

There is another application of Theorem 3.30 that is worth mentioning here.


Suppose A is a nonzero nilpotent matrix in Then there exists an
invertible matrix P such that
[Nk, 0
3.35: PAP' =
[0
I

In 3.35, k1 are the invariants of A. It is often important to compute P.


We can find an invertible matrix P satisfying 3.35 by paying careful attention to
the proof of Theorem 3.30. Define T: F —÷ F by T(81) = Co11(A)t as usual.
Construct the T-cyclic decomposition of F, that is, F = Z1 given in
Theorem 3.30. If Z1 = L({cx1, =
Tkp— then = {cx1, T(cx1),. . . , T'" '(cx1),. . , cxv,
— . . . is
a basis of V for which

[Nk 0
=
[0
I

Nk

We now have the following equation (Theorem 3.28 of Chapter 1):

3.36: M(c5, ö)(T)M(c5, cx) 1


= JT(cx, cx)(T)

Recall that M(c5, cx) is the change of basis matrix given by M(c5, =
I
Hence, if P = M(Ô, then 3.36 becomes 3.35. Since
we know M(ó, = M(cx, ö) is the easier matrix to compute. Thus, to
compute P, construct and invert M(cx, 5).

Example 3.37: Suppose

[0 0 01
A11 0 01eM3x3(Q).
[1 1 oJ
Then
[0 0 ol
A2=I0 0
[i 0

and A3 = 0. Thus, A is nilpotent of index 3. The matrix A defines a linear


transformation T: Q3 -. given by T(61) = + 53, T(52) = and T(53) = 0.
130 CANONICAL FORMS OF MATRICES

In particular, T2(61) = 53. Thus, Z1 in Theorem 3.30 is given by Z1 =


L({61, T(61) = + T2(c51) = 63}). k1 = 3, and

[0 0 01
0 01
[0 1 oj

[1001 [1 0 01
1 01 and P=M(i5,oc)=J0 1 01
[0 1 ij [0 —1 ij

The reader can easily check that PAP' = Nk1. E


Finally, let us say a few words about how many similarity classes of nilpotent
matrices there are in > By Corollary 3.34, the number of similarity
classes of nilpotent n x n matrices is the number of partitions of n,
n = k1 + + with k1 k,. Let us denote this number by 2P(n). The
function G°(n) has been studied intensely by combinatorialists. The value of Y(n)
forn= 1,P1(2)=2,PJ'(3)=3,cTA(4)=5,Y(5)=7,
and Y(6) = 11.

EXERCISES FOR SECTION 3

(1) Find all the eigenvectors of the map Tc given in Example 3.4.
(2) Suppose A is a lower triangular matrix of the form

a11 0 0
a21 a22
A=

Compute the characteristic polynomial of A and S°F(A).


(3) Let T e t(V) and g(X)e F[X]. If c is an eigenvalue of T in F show that g(c) is
an eigenvalue of g(T).
(4) Let

[18 —9 —61
A117
[25 —12 —9j

Show that A is nilpotent. Find P such that PAP' has the subdiagonal
form given in Theorem 3.21.
EXERCISES FOR SECTION 3 131

(5) Let Ac >JF) such that R(cA(X)) F. Show that A is similar to a lower
triangular matrix.
(6) Let T e 1(V). Suppose R(cT) = {O}. Show that T must be nilpotent.
(7) Find up to similarity all possible nilpotent matrices in M6
(8) Find all eigenvalues and eigenvectors for
[2 1

1 2j

(9) Let T e 1(V), and suppose cT(X) = (X — c1ñ in F[X]. Show that V
has a basis of eigenvectors of T if and only if dim(ker(T — c1)) n1 for all
i=1,...,r.
(10) Let Ac JF) be a diagonal matrix. Suppose cA(X) = [fl. 1(X — in
F[X]. Set Show that dim(W)=

(11) Let T be a projection of V onto a subspace W. Compute the spectrum


of T. Do the same if T is an involution (T2 = Iv).
(12) Compute the spectrum of D: V,, —. V,, (notation as in Exercise 18, Section 2
of Chapter II).
(13) Let Ac JF) and Be Mm x Show that £'°F(A ® B) = {xylx c b°F(A)
and yc6°F(B)}.
(14) Let A, Be > Show that if AB = BA, then A and B have a common
eigenvector. Do A and B have a common eigenvalue?
(15) Suppose A, B c JC) and at least one of them is nonsingular. If AB is
similar to a diagonal matrix, prove that BA is also similar to a diagonal
matrix.
(16) Given an example that shows that if both matrices in Exercise 15 are
singular, then BA need not be similar to a diagonal matrix.
(17) Use Corollary 3.34 to construct two matrices A, B c M4 4(R) such that
cA(X) = cB(X), and mA(X) = m5(X), but A and B are not similar.
(18) If A and D are square matrices with A nonsingular, show that

= (det A)(det(D — CA- 'B))

(19) Let 5, T c 1(V). Show that 5°F(ST) = Do TS and ST have the same
eigenvectors? (Note: Exercise 18 gives an easy proof of this problem if S and
T are represented by symmetric matrices.)
132 CANONICAL FORMS OF MATRICES

4. THE JORDAN CANONICAL FORM

Inthis section, we shall use Theorem 3.21 to present a canonical form for those
T e 1(V) = HomF(V, V) that have the property that the roots of cT(X) all lie in F.
If F, then we know from Section 1 that c1 factors in F[X] as follows:

CT(X) = [1 (X —

In equation 4.1, {c1,..., cr} = 9'F(T) and n = When we use


notation as in 4.1, it is always understood that c1,..., are distinct elements in
F. We begin with the following decomposition theorem:

Theorem 4.2: Let T 61(V) and suppose R(cT(X)) F. Factor cT(X) as in


equation 4.1. Set V1 = ker(T — for i = 1,..., r. Then

(a) Each V1 is a nonzero, T-invariant subspace of V.


(b)VV1$"$Vr
(c) dim i = 1,..., r.
(d) If T1 denotes the restriction of T to V1, then there exists a basis of V1
such that

f(cz1, czj(T1) = + M1, i = 1,..., r

In equation 4.3, M1 is a nilpotent matrix with index of nilpotency at most n1.

Proof? (a) S°F(T) = {c1,. .., cj by Theorem 3.3. Thus, for each c1 there exists a
nonzero vector such that = In particular,
e ker(T — = This shows V1 (0). Since T commutes with
V1.
any polynomial in T, T(T — = (T — c1)"T. In particular, if fI e
then (T — cJ"T(fJ) = T(T — = 0. Hence T(/3) e V1. This proves
(a).
(b) For each i= 1,...,r, set Then
h1(X),.. ., have no common factor in F[X]. In particular,
g.c.d.(h1 . , hr) = 1. It now follows from 1.1 1 that there exist
. .

polynomials a1(X),..., ar(X)eF[X] such that a1h1 + + = 1.


Now consider the algebra homomorphism F[X] —÷ 1(V) given
by p(X) = T. Applying p to a1h1 = gives us
1

a1(T)h1(T) + ... + ar(T)hr(T) = Also, h1(XXX — cjt" = cT(X).


Therefore, (T — = 0. In particular, Im(h1(T)) for every
= 1,. . . , r. Since each V1 is T-invariant, we have Im(a1(T)h1(T))

for each i=1,...,r. Now let czeV. Then


(a1(T)h1(T) + . . + aAT)hAT))(cx) = a1(T)h1(T)(cx) + . . .+
ar(T)hr(T)(c4eVi + + V1. Thus, V = V1 + + Vr.
To finish the proof of (b), we must show V1 n = (0). Fix
THE JORDAN CANONICAL FORM 133

= 1,. . . , r, and note that g.c.d.(h1(X), (X — = 1. Again by 1.11,


there exist polynomials p1(X) and q1(X) in F[X] such that
p1h1 + If then since
cxeV1. h1(x) = 0 since Thus, = Iv(cx) = [p1(T)h1(T)
+ q1(T)(T — = p1(T)h1(T)(x) + q1(T)(T — = 0 + 0 = 0.
This completes the proof of (b).
(c) Fix i = 1,..., r, and let T1 denote the restriction of T to V1. Then
V1 = ker(T — implies (T1 — = 0. In particular, the minimal
polynomial, mT.(X), of T1 on V1 must divide (X — Thus,
mT.(X) = (X — c1)k for some k1 n1. By Corollary 2.38,
cT(X) = (X — for some ? k1. = dim V1, and by Theorem 2.43,
we have

4.4: 11 (X — c1)fl = cT(X) = fi cT(X) = fi (X — c1)'

Theorem 1.7 now implies = n1 for all i = 1,..., r.


(d) On each V1, = cjlv. + (T1 — cjIv.). We have seen in the proof of (c)
that each T1 — c1Iv. is nilpotent on V1 of index k1 n1 = dim V1. By
Theorem 3.21, there exists a basis ; of V1 such that

0 1
4.5: — cjlv.) = I
M1.
[ 0 Nk111 j
In equation 4.5, k1 = k11 ? ? are the (unique) invariants of
T1 — c1Iv.. We now have ['(cx1, = JT(cx1, + (T1 — c1Iv))
=c1JT(cz1, + F(x1, — = + M1. This gives us
equation 4.3 and completes the proof of Theorem 4.2. E

We have already proved our next theorem, but let us introduce a definition
first.

Definition 4.6: Any matrix of the form

clk+Nk=[
1 :1
is called a Jordan block of size k belonging to c. A square matrix J is Called a
Jordan matrix if J has the form J = diag(J1,. ., Jj, where J1,..., are Jordan
.

blocks of various sizes.


134 CANONICAL FORMS OF MATRICES

The Computations after equation 4.5 show that there is a basis of V1 such that
f&1,

1
4.7: f@x1, = I I
a J1
[ 0 B(k1p(j))j

In equation 4.7, k11 are the invariants of T1 — on V1. Each


is a Jordan block of size belonging to c1. Thus, = clIk.. + Nk.. for
j = 1,...,p(i).
If we now set = then is a basis of V. Since V = V1
equation 2.42 implies

01
4.8: F(occz)(T)=I
[0 Jj
The representation of T given in equation 4.8 is called a Jordan canonical form
of T. We shall see shortly that J is unique up to a permutation of its blocks
We have now proved the following theorem:

Theorem 4.9: Let T e 1(V), and suppose the roots of the characteristic poly-
nomial of T all lie in F. Write cT(X) as in equation 4.1. Then there exists a basis
of V such that
01
['(cz, cz)(T) = I I

[0 Jj
where each
[B(k11) 0
Ji = I

[0
For each i = 1,..., r, the integers k11 are the invariants of the
nilpotent transformation T — c1 on V1 = ker(T — B(k11),..., are
Jordan blocks of sizes k11,..., respectively belonging to c1. U

We can restate Theorem 4.9 in terms of matrices as follows.


Corollary 4.10: Let Ae and suppose cA(X) = (X — with
C1,. .
. , e F. Then A is similar to a Jordan matrix J of the following form:

01
J=I I

[0 Jj
THE JORDAN CANONICAL FORM 135

[B(k11) 0
Ji =
[ 0

For each i = 1,...,r, k11 and = Cilk.. + Nk... El

The constants ?' that appear in Corollary 4.10 are computed from
the natural linear transformation T: F" —+ F" associated with A. Namely, if 5 is
the canonical basis of F", define T by = Then f(c5, Ô)(T) = A, and
are the invariants of T — c1 on ker(T —

Example 4.11: Let

[19 —6 —91
A117
[25 —9 —11j

A simple computation shows cA(X) = X3 — 4X2 + 5X —2 = (X — 1)2(X — 2).


Thus, R(cA) = {1, 2} 0.
Let T: lV lV be given by = Then F(ö, S)(T) = A and
cT(X) = (X — 1)2(X — 2). Theorem 4.2 implies

4.12: W' = ker(T — 1)2 $ ker(T — 21)

An easy computation shows

[18 —6 —91
4.13: (a) F(&o)(T—1)=A—I=117 —5
—9 —12j

[—3 3 01
4 01
[—3 3 oj
[17 —6 —91
(c)JT(c5,b)(T—21)=A—21=j17—6 —91
[25 —9 —13j

Now 4.13(b) implies that V1 = ker(T — J)2 is a cyclic (T — 1)-subspace with basis
= = 1, 1), = (3, 3, 4)}. If T1 denotes the restriction of T to V1, then

g1XT1) = - I) +
= (? )+ 0=( 0
136 CANONICAL FORMS OF MATRICES

Equation 4.13(c) implies V2 = ker(T — 21) is spanned by the single vector


= (3, 4, 3). Thus, = cx3} is a basis of V and equation 2.42 implies

1 00
4.14: 11:0 =J
0 02
J is clearly a Jordan canonical form of T (or A). If we set

[1 3 31
M(ocö)=J1 3
4

[1 0 01 [19 —6 —91
4.15: 1 01=M(ô,cx)117 —4
—9 —lij

where

[1 3 31
M(b,cx)'=M(cc,ö)=Il 3
[1 4

3
3 41 =1—i 0 iI=P
[i 4 [—i 1 oj

Thus, A is similar to J via P. U

Let us note that if F is algebraically closed, for example, F = C, then the


hypotheses of Theorem 4.9 and Corollary 4.10 are always satisfied. Thus, any
n x n matrix A is similar to a Jordan matrix of the form given in Corollary 4.10.
If F is not algebraically close, for example, F = R, then no such representation
may exist.

Example 4.16: Let

A=(?
THE JORDAN CANONICAL FORM 137

Then cA(X) = X2 + 1, and consequently, R(cA(X)) = { ± i} Since n = 2,


there are only two types of Jordan matrices irj M2

(a1 O\
and
(a 0
I
\O a2,,' \1 a

If A is similar to either one of these forms, then X2 + 1 = cA(X) =


(X — a1)(X — a2) or (X — a)2. This is impossible since a, a1, a2 e It Thus, A is not
similar to any Jordan form over It fl
Let us take a closer look at the Jordan canonical form of T. Suppose T e 1(V),
and R(cT(X)) c F. Write cT(X) as in equation 4.1. Then Theorem 4.2 implies
V = V, with V1 = ker(T — c1)", and dim V1 = n1. If denotes the
restriction of T to the T-invariant subspace V1, then T1 — c1 is nilpotent of index
k1 C n1. If we let k1 = k11 denote the unique invariants of T1 — c1 on
V1, then there exists a basis of V1 such that
[B(k11) 0
4.17: f(x1, x1)(T1) = J1
=
[0
In equation 4.17, each = ClIk.. + Nk.. for j = 1,..., p(i). Finally, if
= then equation 2.42 implies

01
4.18: cx)(T) = J = I I

[0 Jj
The first thing to note here is that k11 ? ? are the unique invariants
of the nilpotent transformation T1 — c1 on Since dim V1 = n1, Theorem 3.21
implies k11 + + = n1. But, k11 + + is the size of the square ma-
trix J1. Thus, J1 is an n1 x n1 matrix having the eigenvalue c1 running down its
diagonal. Now, n1 is the multiplicity of the root c1 in cT(X). Thus, an eigenvalue c1
of T appears as many times on the principal diagonal of J as its multiplicity in
the characteristic polynomial of T. These remarks together with Theorem 3.31
readily imply that the Jordan canonical form J of T is unique up to a
permutation of its blocks J1,. The corresponding matrix statement is the
following theorem whose proof we leave as an exercise at the end of this section:

Theorem 4.19: Let A, B e be matrices whose characteristic polynomials


have all of their roots in F. Then A and B are similar if and only if they have the
same Jordan canonical form J = diag{J1,... , Jr} (except possibly for a per-
mutation of the blocks J1,. . . , Jj. El

Next note that k11 is the index of nilpotency ofT1 — c1 on V1. Therefore, the
minimal polynomial ofT1 on V1 is given by mT.(X) = (X — cjkui. Theorem 2.43(c)
138 CANONICAL FORMS OF MATRICES

now implies the minimal polynomial of T is given by the following equation:

4.20: mT(X) = fl (X —

We can now prove the following interesting result:

Theorem 4.21: Let T e t(V), and assume CT(X) = 1(X — C1)" eF[X]. Then T
can be represented by a diagonal matrix if and only if mT(X) = 1(X — c1),
that is, every eigenvalue of T has multiplicity one in the minimal polynomial of
T.

Proof? We have seen that the Jordan canonical form of T given in equation 4.18
is unique up to a permutation of the blocks J1,..., If T is represented by a
diagonal matrix diag(b1,..., bj = B, then B is a Jordan canonical form of T.
Hence, B is J up to some permutation of the blocks In particular,
every block J1 is diagonal. Thus, k11 = 1 for every i = 1,..., r. Then equation
4.20 implies mT(X) = (X — c1).
Conversely, if mT(X) = 1(X — c1), then equation 4.20 implies k11 = 1. But
Therefore, 1 = k11 = = for all i = 1,..., r, and J is
diagonal. fl
Let us rephrase Theorem 4.21 slightly.

Corollary 4.22: Let T e V). T has a diagonal matrix representation


over F if and only if the minimal polynomial of T is a product of distinct linear
factors in F[X].
Proof Suppose mT(X) = 1(X — c1) e F[X], where c1,.. ., are distinct. We
had seen in Corollary 2.38 that cT(X) and mT(X) have the same set of irreducible
factors in F[X]. Thus, cT(X) = — c1)" with n1 + + n = dim V.
Hence, the hypotheses of Theorem 4.21 are satisfied, and we conclude that T can
be represented by a diagonal matrix relative to some basis of V.
Conversely, suppose ['(cx, oØ(T) = diag(a1,. .., aj for some basis of V. Then
cT(X) = — a1). Thus, again, the hypotheses of Theorem 4.21 are satisfied.
We conclude mT(X) is a product of distinct linear factors in F[X]. El
We shall finish this section with a second version of Theorem 4.2, which will
be convenient in later sections.
Theorem 4.23: Let T e 1(V), and suppose R(cT(X)) c F. Factor cT(X) as in
equation 4.1. Then there exist linear transformations P1,. P, and N1,..., Nr
in 1(V) such that

(a) P1, . are pairwise orthogonal idempotents whose sum is


(b) P1 and N1 are polynomials in T for all i = 1, ..., r.
THE JORDAN CANONICAL FORM 139

(c)ImP1=V1=ker(T_c1)ffori=1,...,r.
(d) T = + NJ.
(e)
ifi=j
' '
LO ifi$j
(f) N1 is nilpotent of index at most n1 for each i = 1,..., r.

Proof? For each i = 1,..., r, set V1 ker(T — and h1(X) —


Then the following facts were proven in Theorem 4.2.

4.24:
(ii) a1(X)h1(X) = 1 for some a1,..., are F[X]
(iii) Im(h1(T)) c V1 for each i = 1,..., r

Set P1 = a1(T)h1(T) for i = 1,..., r. Then each P1 is a polynomial in T, and


4.24(u) implies P1 + + = Thus, V ImP1 + + ImPr. Since V1 is
T-invariant, and Im h1(T) c V1, we see Im P1 V1. In particular,
Im P1 + + Im is a direct sum. Thus, V = Im P1 Im Equation
4.24(i) now implies Im P1 = V1 for i = 1,..., r.
Since each V1 is T-invariant and each is a polynomial in T, we
have V1 n In particular, = 0 whenever i j. It is now
easy to see that each is idempotent. Let eV. Write = +
+ with = Im P1. Then = 0 whenever i $j. Therefore,
+ + +Pr(oc)P1(cti+ +;) +...
+ P1(oc1 + + = P1(cX1) + + We conclude that = P1(cz1)
for all i = 1,.. ., r. Hence, P?(cx) = P1(P1(cz)) = P1(P1(oc1 + + x,j) =
P1(P1(oc1)) = P1(x1) = P1(cz). Thus, each is idempotent. We have now established
(a) and (c) and the first part of (b).
Set N1 = (T — cJP1, i = 1,..., r. Since each P1 is a polynomial in T so is each
N1. Polynomials in T commute. So, = = (T — = 0 if i $ j
and N1 if i = j. We have now established (b) and (e). Notice
that N1(V) = (T — c1)P1(V) = P1(T — c1)(V) c Im P1 = V1. Also,
(T — cJP1(Im = 0 whenever i j.
Since

Z (c1P1 + N1) = (c1P1 + TP1 — c1P1) = TP1 = T(ZP1) = TIv T,

(d) is clear. It remains to prove (f).


Fix i = 1,..., r, and let e V. Then N1(x) = (T — cJP1(oc). But
P1(cz) e V1 = ker(T — c1)". Since P1 and T — c1 commute, we have
= [(T — = (T — = 0. Therefore, N1 is nilpotent of
index at most n1. 9
140 CANONICAL FORMS OF MATRiCES

EXERCISES FOR SECTION 4

(1) Prove Theorem 4.19. [Hint: Show that ker(A — ker(B — for
every eigenvalue ci.]
(2) Show that the Jordan canonical form for a transformation T (if it exists) is
unique up to a permutation of its blocks J1,. ,
(3) Find all Jordan forms for
(a) All 8 x 8 matrices having x2(x — as minimal polynomial.
(b) All 6 x 6 matrices having (x + 2)4(x — 5)2 as characteristic
polynomial.
(4) Find the Jordan canonical form J of

[30 —141
A=1 1

[—1 —3 1

and find the matrix P such that J over It


(5) Find the Jordan canonical form J of

0—1 00
0 0 0
A—

—1
1

1 201
3 —1 0

over C. Find P such that PAP1 = J.


(6) Let

2 1 —2 0
A= 00 10
0 0 0
10 00
1

Find P such that PAP in Jordan form.


(7) Give an example of two matrices A, Be JF) such that cA(X) = cB(X)
and mA(X) = mB(X), but A and B have different Jordan canonical forms.
(8) Let Nk be the nilpotent matrix given in Example 3.20. Show that Nk is
similar to its transpose
(9) A slightly different version of Exercise 9 of Section 3 is the following: Let
T e 1(V), and suppose R(cT(X)) c F. Show that T can be represented by a
diagonal matrix if and only if whenever (T — c)%x) = 0, then (T — c)(oc) = 0.
THE REAL JORDAN CANONICAL FORM 141

(10) Suppose A, B eM3 3(F) are nilpotent. If = mB(X), then prove A is


similar to B. Compare this result with Exercise 17 of Section 3.
(11) Use Exercise 10 to prove the following: Let A, Be JF). Suppose
cA(X) = cB(X) = — and m4(X) = m8(X). If n1 3 for all
i= 1,...,r,thenAissimilartoB.

(12) Find the Jordan canonical form of the following matrix:

0001
001
1

—3 2 1 1

3 —6 1 4

(13) Let A, Be JF). Suppose

(A 0
kS%OA

is similar to

(B 0
k\0 B

in Show that A is similar to B.


(14) What is the Jordan form of the linear map D: V3 -÷ V3 (notation as in
Exercise 18, Section 2 of Chapter II].
(15) Let A, Be Suppose AB = BA. Show there exists a P such that
PAP' and PBP' are both in Jordan canonical form.
(16) Let V be a finite-dimensional vector space over C. Classify all T e 1(V) such
that V has only finitely many T-invariant subspaces.

5. THE REAL JORDAN CANONICAL FORM

In this section and the next, we take up the question of what canonical form for
T e HOmF(V, V) is available when the characteristic polynomial of T does not
have all of its roots in F. When F = R, we are able to construct a form
surprisingly close to the Jordan canonical form of Section 4.
For the time being, let us assume F is an arbitrary field. Let V be a finite-
dimensional vector space of dimension n over F. If T e HomF(V, V), then we
have seen that the characteristic polynomial cT(X) is a monic polynomial of
degree n in F[X]. Using Theorem 1.7, we know cT(X) has an essentially unique
142 CANONICAL FORMS OF MATRICES

factorization of the following form:

CT(X)

In equation 5.1, q1(X),. .., are moniC, irreducible polynomials in F[X].


When i $ j, and are not associates. Each n1 ? 1, and
n1E3(q1) + + =
= E3(cT(X)) n.
From Corollary 2.38, we know the minimal polynomial mT(X) can be
factored in F[X] in the following way:

5.2: mT(X) = q1(X)m'

In equation 5.2, 1 m1 n1 for every i = 1,..., r.


We shall need the following generalization of Theorem 4.2.
Theorem 5.3 (Primary Decomposition Theorem): Let T et(V), and write the
characteristic and minimal polynomials of T as in equations 5.1 and 5.2. For
each i = 1,..., r, set V1 = Then
(a) Each V1 is a nonzero, T-invariant subspace of V.
(b) V V1 ED ED Yr
(c) V1 = ker(q1(T)m'), i = 1,..., r.
(d) dimF(VI) = n1a(q1), i = 1,..., r.

Proof (a) T commutes with and, thus, V1 is T-invariant. For


each i 1,. . , r, set h1(X) =
. Since h1(X) is missing
the factor q1(X), mT(X) f h1(X). In particular, h1(T) 0. Hence
there exists a vector e V such that h1(T)(cx) 0. But
= = cT(T)(cz) = 0. Thus, h1(T)(lx) is a
nonzero vector in V1. We have now proven (a) as well as the fact that
Im[h1(T)] for all i = 1,..., r.
(b) The polynomials h1(X),.. ., hr(X) are clearly relatively prime in
F[X]. It follows from 1.11 that there exist polynomials
a1(X),..., such that a1h1 + + = 1. In particular,
we have for any vector czeV, = a1(T)h1(T)(cz) +
+ ar(T)hr(T)(cz) e V1 + +
Suppose eV1 n Vi). and h1(X) are obviously rela-
tively prime. Hence, + Bh1 1 for some A, Be F[X]. Then

= = + B(T)h1(T)(cx) = 0 + 0 = 0. This proves


(b).
(c) Since m1 n1, ker(q1(T)m) ker(q1(T)f) = V1. Now the same argu-
ments used in (a) and (b) when applied to q1(X)m',...,
show that each subspace ker(q1(T)m) is nonzero, and
V = ker(q1(T)m') ED ED Comparing dimensions gives
us ker(q1(T)m') = for all i = 1,..., r.
THE REAL JORDAN CANONICAL FORM 143

(d) Let T1 denote the restriction of T to Then = 0. In


particular, the minimal polynomial mT. of T1 must divide and hence
be a power of q1(X). Corollary 2.38 then implies cT(X) for
some 1. Since dim V1 = we have dim V1 = p15(qJ.
On the other hand, Theorem 2.43(b) implies

q1(X)" = cT(X) = fi = q1(X)P

LI

In this section, we shall use Theorem 5.3 in the following form:

Corollary 5.4: Let T e t(V), and suppose c is a root of cT(X) in F. Write


cT(X) = (X — c)mq(X)in F[X] with X — c and q(X) relatively prime. Then
V1 = ker(T — c)m is a nonzero, T-invariant subspace of dimension m.
Furthermore, V = V1 $ W for some T-invariant subspace W of V.

Proof The complete factorization of cT(X) given in 5.1 has (X — cr as one of its
terms q1(X)'1. We may assume q1(X)" = (X — c)m. The result now follows from
Theorem 5.3 with W = ker(q2(T)fl2) $ $ J
We can now take up the question of what canonical form is available for T
when R(cT(X)) F. The first thing one might try is to pass to the algebraic
closure F of F. Thus, consider the extended map on = ®F F. As we
have seen in Section 2, the characteristic polynomial cT(X) is the same as cTr(X).
Since F is algebraically closed, c,JX) decomposes into linear factors in P[X] and
Theorem 4.2 implies T1' has a Jordan canonical form J e Of course, the
entries in J may not all lie in F, but we can hope that if the relationship between
F and F is special enough, we may be able to use J to produce some reasonable
canonical form for T in This is precisely what happens when F Elk.
Then F = C. By using complex conjugation a e HomR(C, C) (See Section 2,
Chapter II), we can convert a Jordan canonical form for Tc to a reasonable form
for T itself over lit
Let us set up the notation we shall use for the rest of this section. V will denote
a vector space over Elk of dimension n. Vc = V ®RC is the complexification of V.
We shall identify a vector cV with its image 1 in yC Then,
yC e V}. Note then that any vector can be written
= i zkaklzk
uniquely in the form = p + iA, where 4u, 2eV and i = We had also seen
in Chapter II that a extends to an R-isomorphism c: Vc. Recall that
the value of ®R a on a typical vector = e Vc is given by
®R Zkcxk) = We shall shorten our notation here and
write ®R a)(fJ) = for any PEVC. Thus, if fi = Zk2k, then
P = Equivalently, if fi p + ii with p,AeV, then 7? = p— iL It is
144 CANONICAL FORMS OF MATRICES

important to keep in mind here that fi —÷ if is an R-isomorphism of yC, but not a


C-isomorphism because it is not a C-linear map.
Now suppose TEH0mR(V, V). As usual, we set Tc = T ®RIc. Thus
TC the C-linear transformation on Vc given by
is zkcxk) =
or equivalently + iA) = TQz) + iT(A). We had observed in
Theorem 2.26 of Chapter II that TC commutes with conjugation fi —* I? on
More generally, we have the following fact:

5.5: Let f(X) = e C[X]. Then for every f(TC)(oc) =

In 5.5, 1 = 2kXk is the conjugate of f. To prove 5.5, it clearly suffices to


assume is a vector of the form cx = fi ®R z for some lie V, z e C. We then have

f(TC)(oc) = f(Tc)(fl z) Tk(fl) ® zkz} {Tk(fJ) ®R ZkZ}


=
= 1(TC)(fl ®R 2) =

One interesting application of equation 5.5 is the following lemma.

Lemma 5.6: If z e then 2 est)c(Tc).

Proof If z is an eigenvalue of TC, then TC(cx) = zcx for some nonzero eigenvector
& is also nonzero and equation 5.5 implies TC(&) 2&. Thus, if cx is an
eigenvector of TC with corresponding eigenvalue z, then & is an eigenvector of TC
with corresponding eigenvalue 2. El

Now since C is algebraically closed, the characteristic polynomial of Tc


(which is the same as the characteristic polynomial of T) factors into linear
factors. Lemma 5.6 implies that if z is a root of cT(X), then 2 is also a root of cT(X).
Thus, the spectrum of TC can be written as follows:

StOc(TC) = z1, 2k,..., z1, 2j

In equation 5.7, c1 . . . , denote the (distinct) real roots of cT(X), and z1,
2k,. .. , denote the (distinct) complex roots which are not real. We have
r 0, t 0, and r + 2t n = 3(cT).
If = 0, then cT(X) has only real roots. In this case, T has a Jordan canonical
t

representation J e JR) and there is nothing left to say. Hence, throughout


the rest of this discussion, we shall assume t 1. Equation 5.7 implies the
characteristic polynomial of Tc factors in C[X] as follows:

(X — —
cT(X) [1 (X —
=
Since the coefficients of cT(X)( = c-rc(X)) are all real, conjugating c1.(X) merely
THE REAL JORDAN CANONICAL FORM 145

interchanges z, and 2, in equation 5.8. In particular, Pi = q, for all 1 = 1,..., t.

Thus, the multiplicity of z, and 2, in cT(X) is the same. We can now rewrite
equation 5.8 as follows:

5.9:
cT(X) = fi (X — fi (X — —

Since U(cT) = dimRV = n, we must have + = n.


Let us fix 1 = 1,..., t and consider the two T'-invariant subspaces
ker(TC — zj" and ker(TC —in 21)Pi
These two subspaces have dimension p,
over C by Theorem 4.2. Furthermore, conjugation c —> is an R-isomorphism
from ker(TC — —* ker(TC — 2,)Pl. This leads to an important observation.

Lemma 5.10: The nilpotent transformation TC — z, on ker(TC — has the


same invariants as TC — 2, on ker(TC — 2jP1.

Proof If W is a C-subspace of VC, let us denote the conjugate of W by W. Thus,


W = ne W}. The proof of 5.10 consists of the following observations, which
are all easy to prove. If W is a C-subspace of ker(TC — then W is
a C-subspace of ker(TC — 2,)Ps. If W is W. If so is
$ W, = ker(TC — then WI W, = ker(TC — 21)Pz. If W is a
(TC — z,)-cyclic subspace of ker(TC — z,)" with basis {n, (Tc — z,)(oc),.
(TC — z,)Ir 10x)}, then W is a (TC — 2,)-cyclic subspace of ker(TC — 2,)Pi with
basis {&, (TC — 2j(ä),..., (Te — 2jk 1(x)}. The proof of the lemma now follows
from Theorem 3.21 and the definition of invariants. fl

If we now combine Theorem 4.9 with Lemma 5.10, we get the following
important corollary:

Corollary 5.11: Let fi, {fJ,,..., fl,,,,} be a basis of ker(TC — z,)" such that if TF
denotes the restriction of Tc to ker(TC — then

[B(k,1) 0 1
r(fl,,fl,)(TF)=I
L 0 B(k(q(l))j

Here k,1 and ;1k,. + for all j 1,..., q(l). Then


5, = { /?,1,..., is a basis of ker(TC — 2,)Pz. Furthermore, if TF denotes the
restriction of TC to ker(TC — 2,)Th, then

[B(k,1) 0 1

L0 B(klq(1))j

Here B(k,J)=2,Ik,. for allj = 1,..., q(l). D


146 CANONICAL FORMS OF MATRICES

Thus, the Complex blocks in the Jordan canonical form of TC occur in


Conjugate pairs. Let us now introduce the following subspaces of Vc:

= kerfP — j = 1,...
Ur+i = ker(TC — z,)" $ ker(TC — 1= 1,..., t

From Theorem 4.2, we know each j = 1,.. ., r, is a subspace of


yC of dimension over C. Each Ur+i is a subspace of yC of
dimension 2p, over C, and \TC = U1 $ $ Ur
Ur+i $® We claim
that each of the two types of subspaces in 5.12 have bases contained in
V(=V®R 1).
For j 1,..., r, is a real root of It follows from Corollary 5.4 that
= ker(T — is a T-invariant subspace of V of dimension over It
Clearly, ®R C = and, thus, any R-basis of is a C-basis of By
Theorem 3.21, there exists an l1-basis of such that

0 1
= I=
[0
As usual, in equation 5.13, denotes the restriction of T to
are the invariants of — on and B(yjq) = CJIy•q + Nyjq for all
q = 1, . . , s(j). By Theorem 2.17 of Chapter 2,
. is a C4asis of The
representation of Tf (the restriction of TC to with respect to is identical to
5.13.

be a basis of ker(TC — such that

[B(k,1) 0
5.14: fl,)(TF) = I

[0 B(klq(1))

In equation 5.14, TF denotes the restriction of TC to ker(TC —


) are the invariants of TF — z1 on ker(TC — The rest of the
notation is the same as in Corollary 5.11.
It now follows from Corollary 5.11 that /1, u 5, is a basis of Write each
= + = A11,...,/-11p1'
Then A, c V. Clearly A, spans the C-vector space Ur+i. Since dimc(Ur+,) =
we conclude A, is a basis for Ur+j over C.
For each l=1,...,t, let Then Ur+i. In
particular, = 2p,. Since yC = U1 $ Ur+t, A = U
A,} is a C-basis of It now follows that A is a basis of V over R, and, in
particular, V = V1 EEL" Vr Vr+1 e $ Vr+t.
We have already noted that V1,..., Vr are T-invariant subspaces of R with
THE REAL JORDAN CANONICAL FORM 147

the restriction of T to being represented by equation 5.13 relative to


Equation 5.14 readily implies each is a T-invariant subspace. To see this,
consider the first block B(k11) in f(fl1, fl,XTF). Let us simplify notation here and
write k11 = k. Then B(k) corresponds to the subspace of
ker(TC — spanned by the first k vectors fl,1,..., of fi,. We then have the
following equations:

TQz,i)+iT(1,i)=TC(fl,i)=z,fl,i = (a, + ib,)Qz,1 + + (P12 + iA,2)


T(p,2) + iT(A,2) = TC(fi,2) = z1fl,2 + $13 = (a1 + ib1)( P12 + iA,2) + + iA,3)

T(p,k_I) + iT(A,k_l) = TC(fl,k_l) = Z1fl,k_1 + filk

= (a1 + ibjQz,k_l + iAIk_l) + (P1k + '21k)

and

T(,uIk) + iT(A,k) = TC(fllk) = Z,fl,k = (a, + ib,)( f41k +

if we now equate the real and imaginary parts in equation 5.15, we get the
following equations;

5.16: T(1u11) = a1p,1 — b,211 + P12


T(111) = b,p,1 + a,1,1 + 2,2

= a,p12 — b,1,2 +
T(2,2) = b1p12 + a,2,2 + 213

T(p,k) = alp,k — b,A,k

= b,p,k + a,A,k

Thus, the subspace spanned by the first 2k vectors Ru' 211 . . 'mi' of A, form a
T-invariant subspace of +,. The representation of T on this subspace is given
by the 2k x 2k matrix

D
12 D 0
'2
0
12 D
148 CANONICAL FORMS OF MATRICES

where

a1 b1

\—b, a,

In equation 5.17, there are k = k,1 2 x 2 matrices D running down the


diagonal and the 2 x 2 identity matrix 12 on the subdiagonal. In the case that
= 1, then 5.17 simplifies to just D.
Clearly, each block in equation 5.14 gives us the corresponding
x matrix as in 5.17. [Each diagonal element z1 in is replaced by D
and every 1 on the subdiagonal of (if any) is replaced with 12.] We have
now proved that VH, is T-invariant and if T, denotes the restriction of T to
then

[H(k,1) 0
A,)(t,) =
[ 0 H(klq(l))

In 5.18, is the x matrix constructed from as in equation


5.17.
We can now put all of this material together. We have proved the following
theorem.

Theorem 5.19: Let T e HomR(V, V) and suppose

cT(X) = [I (X — fi (X — — C[X]

Here c1, ..., c1 are the distinct real roots of cT and z1, 2k,. z1, are the
nonreal roots. Let TC denote the complexification of T. Then

(a) TC has a Jordan canonical form of the following type:

[JI I

Jr+ i

0
sr-ft
THE REAL JORDAN CANONICAL FORM 149

Forj=1,..,r,

0
Ji =
[0
I

B(yJS(J))

Here Yii + + Yjs(j) = and B(Yjm) = +


form=1,...,s(j).Forl=1,...,t,

[B(k,1) 0
=
[0
Here k,1 ? kjq(j), k11 + + klq(:) = Pi' and B(kjm) = ZIIkim + Nkzm
form=1,...,q(l).Forl=1,..,t,
0
=
L 0 B(klq(j))

Here B(kzm) = ZlIkI + Nk for m = 1,..., q(l).


Foreachl = 1,..., tandm = 1,..., X 2kim
matrix given in equation 5.17. Thus, H(kjm) is formed from B(kjm) by
replacing each z, by

( a1 b,

a1

where z1 = a1 + ib, and each 1 by 12.


(b) There exists a basis A of V such that

J1
0
5.20: F(A, A)(T)
= 0 K,

K,

Forj = 1,..., r, is the same as in (a). For 1 = 1,..., t,

[i-i(k11) 0 1
K,=I
[0 H(klq(I))j
150 CANONICAL FORMS OF MATR1CES

The matrix representation given in equation 5.20 is Called a real Jordan


canonical form of T. Our dissussion before Theorem 5.19 tells us how to
construct a basis A of V that gives a real Jordan canonical form. Namely, find a
real basis (satisfying 5.13) of each ker(TC — and add to these vectors the real
and imaginary parts of a basis (satisfying 5.14) of each ker(TC —

Example 5.21: Let T: R2 be given by T(ö1) = ö2 and T(62) = —ö1. Then

1(6, b)(T)
= —

and cT(X) = X2 + 1 = (X — i)(X + i). Thus, in the notation of Theorem 5.19,


r = 0, t = 1, z1 = i, = —i. We know from Example 3.2 that Tc is represented
by the diagonal matrix

1(fl, fl)(TC) =
(i 0)

where = {fl1 = (1, —i), = = (l, i)}. In the notation of 5.19,

(i 0
k%0 —i

is the Jordan canonical form of Tc with J1 = (i) and Ji = (— i). The real and
imaginary parts of i are 0 and 1. Therefore,

D=(?
= + i21 for p1 = (1, 0), and = (0, —1). Thus, A = A1} is basis for
R2 and JT(A, A)(T) = D is a real Jordan form of T. El

If A e then, as usual, we associate with A a linear transformation


—* W' given by = = 1,..., n. Then 1(6, Ô)(T) = A. By a real
Jordan canonical form of A, we shall mean a real Jordan canonical form of T.
Thus, if we fix an ordering of the eigenvalues of A as in equation 5.7, then A is
similar to a real Jordan canonical matrix of the form given in equation 5.20.

Example 5.22: Let

[14 —3 —91
A=I 15 —3
[13 —2 —9j
THE REALJORDAN CANONICAL FORM 151

A simple calculation shows CA(X) = X3 — 2X2 + X — 2 = (X — 2)(X2 + 1).


Thus, cA(X) = (X — 2)(X — i)(X + i) e C[X]. Corollary 4.22 then implies

[2 0

0 —ij

is the Jordan canonical form of A in M3 3(C). Then the computations in


Example 5.21 imply

0 01
1°: 0 ii
[o:—i oj

is a real Jordan canonical form of A. U

We note in passing that the analog of Theorem 4.19 is true for real Jordan
canonical forms. Let A, BE Then A and B are similar if and only if A
and B have the same real Jordan canonical form, that is, there is an ordering of
the eigenvalues of A (and B) such that the resulting real Jordan canonical forms
are the same. We leave this remark as an exercise at the end of this section. Since
a real Jordan canonical form of T (or A) is unique up to similarity, authors often
refer to "the" real Jordan canonical form of T (or A).
In the remainder of this section, we discuss one of the more important
applications of the real Jordan canonical form, that is, solving systems of linear
differential equations. Suppose I is some open interval containing 0 in It Let
x1(t),..., C'(I). We are interesting in solving a system of linear differential
equations of the following type:
dx1
5.23:

= + +
—a-

In equation 5.23, e for i,j = 1,..., n. We are interested in finding a solution


to 5.23 subject to some initial condition x1(0) = c1,. .. , xjO) =
Let us introduce the obvious vector notation here. Set A = e
x = (x1 . ,xjt, and C = (c1, ..., cJ. Then 5.23 can be rewritten as follows:
.
.

5.24: x' = Ax with x(0) = C

Here x' of course means the n x 1 matrix where x denotes the


derivative of x1(t).
152 CANONICAL FORMS OF MATRICES

Now the solution procedure in 5.24 is to replace A with the simplest matrix
similar to A that we can find. Suppose J = PAP' for some invertible matrix
Pe Set y = Px. Then y' = Px', and y(O) = PC. Also,
Jy = PAP '(Px) = PAx = Px' = y'. Thus, to find a solution to 5.24 we need
only solve the following equation:

y'=Jy with y(O)=PC

If y is a solution to 5.25, then x = P 'y is a solution to 5.24. For, x' =


P1y'=P'Jy=P'(PAP1)Px=AL Also, x(O)=P'y(O)= P'PC=
If we let J be the real Jordan canonical form of A, then the equations we get in
5.25 are easy to solve. In the first place, we have seen in Theorem 5.19 that J is a
series of diagonal blocks.

[B, 0
5.26: J= I

[0 BK

Each block in 5.26 has one of two possible forms:

[D 0
0
5.27: . or
[-0 12 D
0 1 c

In equation 5.27, D is a 2 x 2 matrix of the form

(ab
a

The notation is meant to include the two trivial cases = (c) or = D.


Suppose the size of each is x Then clearly, y' = Jy decomposes
into K sets of equations = j = 1,. , K. Here y1 = (y1, . ,
. . . .

Y2 = (ye, + a,..., ÷J, etc. Thus, to find a solution to equation 5.25, it suffices
to know how to solve equations of the following two types:

c 0 0 x1
I

5.28: = with i=1,...,n


x, Ô
THE REAL JORDAN CANONICAL FORM 153

and
x'1 D 0 0 x1
'2 0
5.29: . =0 ' withx1(0)=c1, i=1,...,2n,
6 12 D

and D=(
Before proceeding with a solution to these two types of equations, we need to
recall a few facts about exponentials of matrices. If A e Mn n(R), then is the
n x n matrix defined by the following eqpation:

5.30: eA
= k=O

The reader can easily argue that the partial sums = = of the series
in 5.30 converge to a well-defined n x n matrix we denote by eA. We shall need
the following facts:

Theorem 531: (a) If Q = PAP1, then =


(b) If AB = BA, then eA+S =
(c)
(d) If

A=( a b
a

then

a( cosb sinb
eA e\\.b cosb
(e) If c e 92R(A), then ec e
(f) d(et"9/dt = AetA.

Proof? All six of these assertions are easy computations, which we leave to the
exercises. El

Theorem 5.3 1(f) provides us with a unique solution to equation 5.24, namely
x= For, x' = d(etAC)/dt = AetAC = Ax, and x(0) = = C. The fact
that etAC is the only solution to 5.24 is a simple computation. We want to see
what form this solution takes in our two special cases 5.28 and 5.29.
154 CANONICAL FORMS OF MATRICES

Let us consider equation 5.28 first. Set

0 0 0
I
and N=
1 c 0 1 0

So, B, Ne Mn We had seen in Section 3 that N is nilpotent of index n.


Since commutes with N, Theorem 5.3 1(b) implies = = Thus,
the solution to 5.28 is x = etC = ectetNC. Using the definition of we haye

0T
1 0 0 o
00
00
(tN)k
5.32: etN
= k! = r2 10
(n—2)! (n—3)! (n — 4)!
tn—1 tn—2
t 1
(n—I)! (n—2)! (n — 3)!

If we now substitute 5.32 into x = edtetNC, we get


j—i tk
5.33: for j=1,...,n
Now let us consider equation 5.29. If the matrix in equation 5.29 is just D,
then Theorem 5.3 1(d) implies the solution is

= = cos bt sin bt"jci


5.34: x eta ( cosbt)\c2
So, we can assume n> 1 and proceed with the general case.
Set p = a — bi. Forj = l,...,n, let = + and = +
Then equation 5.29 becomes the following system:

5.25:
J2

0
00'
0

00
0 z1 z'1

with for i=l,...,n


0 1PZn in

The equations in 5.35 are solved in the same manner as equation 5.28. Thus,
tk
5.36: (fl— p1 V
j—k' 3 — i,..., n
— L. ,
k=O IL
THE REAL JORDAN CANONICAL FORM 155

Now recall e1tt = eat _bti = eat(cos bt — i sin bt). Substituting this expression into
equation 5.36 and letting = C2j -' + gives us our final solution:

5.37:

j—i &
= eat [c2(J_k)_1 cos bt + c2(J_k) sin bt]

j—t tk
= eat [C2(J_k) cos bt — c2(J_k)_I sin bt], j = 1,..., n
k=O

Note in either case 5.28 or 5.29 the solution is linear combinations


(coefficients in R[t]) of exponentials, sines, and cosines. Thus, we have the
following theorem:

Theorem 5.38: Let Ae 11(R), and let x(t) be a solution to the differential
equation Ax = x'. Then each coordinate x of
of the form tkeat cos bt, tleat sin bt, where a + bi runs through the
eigenvalues of A. El

Example 5.39: Consider the following system of differential equations:

5.40:

= 2x1 + x2 + x3
= x2 + x3
= —x2 + x3
= x1 + x2 + 2x4 with x(O) = (c1; c2, c3, c4)t

The matrix of this system is

2 1 1 0
0
A—°
_0 —1
1 1

1 0
1 1 0 2

The first order of business is to find the real Jordan canonical form of A and the
matrix P for which PAP' = J.
The characteristic polynomial of A is given by

5.41: cA(X) = (X — 2)2(X2 — 2X + 2) = (X — 2)2(X — z1)(X — U


156 CANONICAL FORMS OF MATRICES

In equation 5.41, z1 = 1 + i and = i — i. We conclude that

2000
1200
5.42
0 0 0
0 0 0

is the Jordan canonical form of A in M4 4(C) and

2 0 0 0
2 0 0
543
0
0 1

is the real Jordan canonical form of A.

5.44:

0—2 0 0 1—i 1 1 0
(A—2) 2 0 0—2 0
and A—z1= 0 —i 1 0
= 0 2 0 0 0 —1 —i 0
0 0 2 0 1 1 01—i
Equation 5.44 readily implies {(l, 0, 0, Ø)t, (0, 0, 0, 1)t} is a basis of ker(A — 2)2,
and {(1, i, — 1, —i)} is a basis of ker(A — z1)) Since (1, i, — 1, —i) =
(1, 0, — 1, 0) + i(0, 1, 0, — 1), we conclude that

1 0 1 0
A=
0 1 0 —1

is a basis for M4 4(R) giving the real canonical form J of A. It now follows from
equation 3.36 that

1 0 1 0 1 0 1 0

5.45:P= 0 1 0 1
and P
-' 0 0 0 1

0 0 —1 0 = 0 0 —1 0
0 1 0 0 0 1 0—1
Our solutions to equations 5.28 and 5.29 imply that the system Jy = y' with
EXERCISES FOR SECTION 5 157

y(O) = PC has solutions given by

5.46: y1 = e2t(Ci + c3)

y2 =e2t(t(ci + c3) + (c2 + c4))


y3 = et[c2 sin t — c3 cos t]
= &[c3 sin t + C2 C05 t]

Thus, x(t) = P 'y is given by

x1(t) = e2t(ci + c3) + et[c2 sin t — c3 cos t]


x2(t) = et[c3 sin t + c2 cos t]
x3(t) = —et[c2 sin t — c3 cos t]
x4(t) = e2t[t(c1 + C3) + (c2 + c4)] — et(c3 sin t + c2 cos t] El

EXERCISES FOR SECTION 5

(1) Find an invertible matrix P such that

[2 0 01
PAr'=lO 0 ii
[o —1 oj

for the matrix A given in Example 5.22.


(2) Give a detailed proof of the four assertions in Lemma 5.10.
(3) Let A, Be Show that A and B are similar if and only if there is an
ordering of the eigenvalues of A and B so that the resulting real Jordan
canonical forms of A and B are the same.
(4) Find the real Jordan canonical form J of

0 0 0 —8
1 0 0 16
A—
— 0 1 0 —14
0 0 1 6

Also compute a matrix P such that = J.


158 CANONICAL FORMS OF MATRICES

(5) Find the real Jordan canonical form J of

1 —2 —1 1

2 —3 0
A—
— 0 3 2 0
—1 —1 2 1

and compute P such that = J.


(6) Let Set Si = Ak/k!. Show
is a Cauchy sequence in
and hence converges to some n x n matrix.
(7) Prove Theorem 5.31.
(8) Show that etAC is the unique solution to equation 5.24.
(9) Solve the following system of differential equations:

= 14x1 — 3x2 — 9x3

= 15x1 — 3x2 — lOx3


= — 2x2 — 9x3

(10) Solve the following system of differential equations:

= —8x4

= x1 + 16x4

= x2 — 14x4

= x3 + 6x4

(11) Solve the following system of differential equations:


= x1 — 2x2 — x3 + x4

= 2x2 — 3x3

x'3 = 3x2 + 2x3


= —x1 — x2 + 2x3 + x4
(12) Solve the following system of differential equations:

= 3x2 — 2x3
= x1 — 2x2 + 2x4
= 2x1
= x1 — 4x2 + x3 + 2x4
THE RATIONAL CANONICAL FORM 159

(13) Let A e >


Prove that det(eA) =
(14) Let T: lV —. lV be a linear transformation represented by the matrix

76 —3 —2
4 —1 —2
10 —5 —3

Compute cT(X), mT(X) and the subspaces in Theorem 5.3.


(15) Given the setting in Theorem 5.3, suppose W is a T-invariant subspace of
V. Prove that W =

6. THE RATIONAL CANONICAL FORM

Inthis section, we continue our theme from the last section. T E HomF(V, V), and
we want to discuss what canonical form may be available for T. We have seen in
Section 5 that if F = R, then the Jordan canonical form for Tc can be used to
construct a real Jordan canonical form for T. For a general field F and its
algebraic closure F, no such special relations as those used in Section 5 exist.
Thus, the Jordan canonical form of in does not give us any
particular form for T in
In this section, F is an arbitrary field, and we stay in JF). We work with
the minimal polynomial of T and construct a canonical form for T based on mT.
We shall assume as always that V is a finite-dimensional vector space over F
with dimFV = n. Let T e V). We shall factor the minimal polynomial
mT(X) of T as in equation 5.2. Then the primary decomposition theorem implies

Each V1 = ker(q1(T)m') is a nonzero T-invariant subspace of V. If T1 denotes


the restriction of T to V1, then Theorem 2.43 implies mT.(X) = q1(X)m for
i=1,...,r.
Thus, in constructing a canonical form for T, it suffices to consider T1 on V1.
Hence, we can assume the minimal polynomial of T is just a power of a single
irreducible polynomial q(X) e F[X]. We shall need the following definition:

Definition 6.2: A T-invariant subspace Z of V is said to be T-cyclic if there exists


a vector xeZ such that Z =

We have seen T-cyclic subspaces before in the context of nilpotent trans-


formations. If T is nilpotent and Z is a T-cyclic subspace of V in the sense of
Definition 3.28, then clearly Z is T-cyclic. In our present context, we do not
assume T is nilpotent.
160 CANONICAL FORMS OF MATRICES

Lemma 63: Let Z be a T-cyclic subspace of V. Suppose dim Z = m > 0. Then

(a) Z has a basis of the form A = {oc, . ., Tm for some 0 in Z.


(b) If t denotes the restriction of T to Z, then

000 0 c0
100 0 c1
1 0 0 c2
64: r(A,A)(t)=
o o a a
o o 0 1 cm_i

where Tm(x) = c0cx + c1T(cx) + + cm_iTm_i@).


(c) The minimal polynomial of t on Z is given by

mt(X)=Xm_cm_iXm_i — —Co

Proof (a) Since Z is cyclic, there exists a nonzero cx E Z such that


Z= fe F[X]}. In particular, Z = L({cx, T(cx), .. }). If
= 0, then clearly T'(cx) = 0 for all 1 k, and Z =
L({cx,. . . , Tk - 1(cx)}).
Since dim Z = m, we conclude that none of
the first m vectors cx, T(cx),..., 10x) is zero. We claim that
A = {cx, T(cx),..., i(cx)} is linearly independent over F. Suppose
not. Then there is a linear dependence relation among the vectors of
A of the following type:

63: c0cx + + ... + ckT%xJ = 0

In equation 6.5, 1 C k C m — 1 and Ck 0. Dividing by Ck, we can


rewrite equation 6.5 as follows:

6.6: T%x) = b0oc + b1T(cx) + ... +

Thus, T%x)eL({oc, (Toe),..., But then T""(cx) =


T(T%x)) = b0T(oe) + + + bk_,T"(oc)EL({oc, T(cx),...,
Tk - i(cx)}). Continuing with this argument, we get T'(oc) e
T(cx),...,Tk_i(cx)}) for every Therefore, ZcL({cx,
T(oc),. .., Tk - '(oe)}). This is impossible since k <m = dim Z. Thus, A
is linearly independent over F and the proof of (a) is complete.
(b) Tm(cx)e L(A) implies Tm(cx) = c0cx + c1T(oe) + + Cm - ,Ttm '(cx). The
constants c0 . . . are unique, and equation 6.4 is now obvious.
, Cm_i
(c) Let g = the minimal polynomial oft on Z. For any fe F[X],
f(T)(Z) = 0 if and only if f(T)(cx) = 0. Thus, g is the moniC polynomial
of smallest positive degree for which g(T)(oc) = 0. Since A is a basis of
THE RATIONAL CANONICAL FORM 161

Z, 5(g) m. But (Tm — Cm_iTtm 1


—. — c0)(oc) = 0 from(b). There-
fore, g is monic, we con-

Note that Lemma 6.3 implies the dimension of a T-cyclic subspace Z of V is


exactly the degree of the minimal polynomial of T restricted to Z. We shall use
this fact many times in what follows. We need to give a formal name to the
matrix appearing in equation 6.4.

Definition 6.7: Let g(X) =


Xm — - — — c0 be a monic polynomial of
degree m in F[X]. The m x m matrix

000 0 c0
100 0 c1
010 9

o Ô 6 6
0 0 0 1 cm_i

is called the companion matrix of g(X). We shall henceforth denote the


companion matrix of g by C(g(X)).

Thus, the matrix of 1' appearing in equation 6.4 is just the companion matrix
of the minimal polynomial of t. We can restate Lemma 6.3 as follows:

Corollary 6.8: Let Z be a T-cyclic subspace of V, and suppose dim Z = m > 0.


Then Z has a basis A = {x, T(oc),.. ., Tm such that f(A, A)(t) = C(mt).
S
In our next theorem, we shall argue that each V1 = ker(q1(T)tm) in equation 6.1
is a direct sum of T-cyclic subspaces. If V1 = Z11
position, then Corollary 6.8 implies each
$has$a basis
is such a decom-
such that
= C(mTJ. Here T to Now = 0.
Hence mT.. I q1(X)m. Since q1(X) is irreducible, mT..(X) = q1(X)ei for some m1.
Thus, there is a matrix representation of T on V1 consisting of blocks of
companion matrices of various powers of q1(X). The main theorem we need to
prove is the following:

Theorem 6.9: Let T E HomF(V, V), and suppose mT(X) = q(X)e, where q is a
(monic) irreducible polynomial over F. Then V = Z1 ® e where each Z1
is a T-cyclic subspace of V.

Proof We proceed by induction on n = dimF(V). If n = 1, then V = for


some 0. T(oc) = c e F, and, thus, V itself is T-cyclic. We therefore
may assume dimV = n> 1.
162 CANONICAL FORMS OF MATRICES

Since mT(X) = q(X)C, qfl)e1 0. Hence, there exists a nonzero vector E V


such that q(T)C - '@1) # 0. Let Z1 = L({x1, . }). Thus,
Z1 is the T-
cyclic subspace of V generated by Let d = ä(q), and let T1 denote the
restriction of T to Z1. Our previous discussion shows rn-f1 = q(X)' for some 1 e
But q(T)C - 1(a) 0. Therefore, mT1(X) = q(X)e.
Lemma 6.3 implies dimF(Zl) = U(mT1) = de. If de = n, then Z1 = V and our
proof is complete.
Let us assume de < n. Since Z1 is T-invariant, T induces a linear trans-
formation 1: VIZ1 -+ V/Z1 given by

1(/1 + Z1) = T(fl) + Z1

The fact that 1 is a well-defined linear transformation is exercise 11 in Section 2.


If f(X) e F[X], then clearly we have

f(1)(fl + Z1) = f(T)(fl) + Zi

We get two important facts from equation 6.11. First, mt(X)I mT(X). Second, if
W is a T-cyclic subspace of V generated by a vector and q(X)' is the minimal
polynomial of T on W, then q(X)' must be a multiple of the minimal polynomial
of I on the 1-cyclic subspace of V/Z1 generated by fi + Z1.
Since m1{X) I mT(X), mi<X) = q(X)e1 with e1 C e. Also, dim Z1 1 implies
dim{V/Z1} <n. Hence, we may apply our induction hypothesis to 1 on V/Z1
and conclude that V/Z1 = 22 ®... Each is a 1-cyclic subspace of V/Z1.
The minimal polynomial of I on has the form and we may assume

We shall complete the proof of the theorem by constructing T-cyclic


subspaces Z2,. . in V such that

for i = 2,..., p.
6.12: (a) Each Z1 is isomorphic to
(b) The minimal polynomial of T on Z1 is the same as the minimal
polynomial q(X)e of I on

Each subspace of VIZ1 has the form = + Z1 Here W1 is


some subspace of V containing Z1. Let us suppose is generated as a 1-cyclic
subspace of VIZ1 by the coset + for i = 2,..., p. Fix i = 2,...,p. Then
Lemma 6.3 implies = L({; + Z1, TfrJ + Z1,.. ., + Z1}). Since
+ Z1) = 0, e Z1. Since Z1 is cyclic, we have = f(T)(oc1)
for some feF[X].
Now we claim there exists a vector /13 e Z1 such that q(T)%x3 + $3 = 0. To see
this, first note 0 = q(T)c(oc1) = [q(T)c — = q(T)C — We have
seen from the first part of this proof that the minimal polynomial of T on Z1 is
q(X)e. Thus, 0 = implies q(X)e I q(X)e_ef(X). Hence, there
THE RATIONAL CANONICAL FORM 163

exists a polynomial h(X)E F[X] such that q(X)eh(X) = q(X)c - cif(x). Clearly,
f(X) = h(X)q(X)ei. Set e Z1. Then q(T)e(; + = q(T)ei(x1) +
= — h(T)(oc1)
q(T)e(/11) = f(T)(cz1) — q(T)eh(T)(x1) = f(T)(a1)
— f(T)oc1 = 0.
Now let Z1 be the T-cyclic subspace of V generated by + Since
+ + Z1 = + Z1 generates the 1-cyclic subspace Z3 of V/Z1, and the
minimal polynomial oft on is q(X)e, our remarks after equation 6.11 imply
the minimal polynomial of T on is a multiple of q(X)e. But q(T)%x3 + = 0.
Thus, q(X)e is the minimal polynomial of T on Z1. It now follows from Lemma
6.3 that Z1 and Z3 have the same dimension e1d. Since the natural map
y —+ y+ Z1 induces a surjective map from Z1 to Z1, we conclude from Theorem
3.33 of Chapter I that the natural map y —' y + Z1 is an isomorphism of Z1 onto
We have now proven (a) and (b) of 6.12.
Let us denote the natural map from V to V/Z1 by ic. Thus, m(y) = y + Z1.
We have seen in the previous paragraph that ic restricted to each Z1, i = 2,
., p, is an isomorphism of onto Z1. This fact easily implies
Z1 + + = Z1 E13" $ To see this, we need only show that if
Yi + + = 0 with then Yi = = = 0 (Theorem 4.16 of Chapter
I). If then in

V= Z1 + + Z,. Let yeV. Then ir(y)eV/Z1 = Z2 + + Z,.


Hence there exist vectors 2,..., p, such that ic(y) = ir(y2) + +
e Z3, i = ...

This last equation implies y — (Y2 + + = kent. Thus, yeZ1 +


+ This completes the proof of (c) in 6.12.
Of course, the completion of the proof of 6.12(c) also completes the proof of
the theorem since each Z1 is T-cyclic. El

We should point out here that Theorem 6.9 gives us a different proof of
Theorem 3.21. If T is nilpotent, then mT(X) = Xk. The companion matrix of Xk is
just the matrix Nk defined in 3.20.
We can now prove our main result in this section.

Theorem 6.13: Let T E t(V) have minimal polynomial given by


mT(X) = q1(X)m' . . as in equation 5.2. Then V is a finite direct sum of T-
cyclic subspaces, V = Z11 $® $ Zn $ The sub-
spaces satisfy the following properties:

(a) The minimum polynomial of T restricted to is q1(X)eui, where

(b) = 5(q1).
(c) There exists a basis A= then

[R1 0
[(A, A)(T) = I

[0 Rn
164 CANONICAL FORMS OF MATRICES

where

[C(qpi) 0
R1 =
[0
foralli=1,...,r.
Proof We have virtually proved everything here already. The primary decom-
position theorem implies V = V1 $$ with = Theorem
2.43 implies the minimal polynomial of T restricted to V1 is given by q1(X)m.
Hence by Theorem 6.9, each V1 = Z11 $ where is a T-cyclic
subspace of V. We had seen in the proof of 6.9 that the can be chosen such
that the restriction of T to Z11 has minimal polynomial q1(X)m', and the
restriction of T to the remaining has minimal polynomial q1(X)eui where
m1 = e1, e12 ;3 Thus, we have established (a). (b) follows from
Lemma 6.3. Lemma 6.3 also implies there exists a basis of such that
= Here denotes the restriction of T to (c) now
follows from equation 2.42. E
The matrix constructed in 6.13(c) is called a rational canonical form of T. As
we shall soon see, it is unique up to a permutation of the R1. Let us consider
some examples.

Example 6.14: Suppose T: €12 Q2 is given by T(ö1) = and T(b2) =


Then

F(Ô, ÔXT)
= C 1)
cT(X) = X2 + 1, which is irreducible in C[X]. Therefore, mT(X) = X2 + 1. The
rational canonical form of T is just the companion matrix

c(x2+1)=(1

As usual, if A e then the rational canonical form of A is the rational


canonical form of T: —p where ['(6, ÔXT) = A.

Example 6.15: Let

[—1 7 01
A1 0 2 OIeM3x3(Q)
[o 3 —ij
THE RATIONAL CANONICAL FORM 165

We had seen in Example 2.34 that CA(X) = (X + 1)2(X — 2) and m4X) =


(X + 1)(X — 2). The Companion matrix of any linear polynomial X — a is the
1 x 1 matrix (a). We conclude that the rational canonical form of A must be the
diagonal matrix D = diag(— 1, —1, 2). fl

Example 6.16: Let

A=[1
-i -!
A simple calculation shows cA(X) = (X2 + 2)(X2 + 1). Thus, the rational canon-
ical form R of A is given by

Let us turn our attention to the uniqueness of the polynomials


which appear in Theorem 6.13.

Theorem 6.17: Let T e 1(V) have minimal polynomial given by


mT(X) = q1(X)m1 . . asin equation 5.2.
Let V = Z11 ®$ be the T-cyclic decomposition given in Theorem
6.13. Suppose we have a second T-cyclic decomposition
V = W11 $ ® W1U(l) $ Wri $ where the minimal poly-
nomial of T restricted to each is given by and ? Then
u(i) = p(i) for every i = 1,..., r, and = for every j = 1,..., p(i).

Before proving Theorem 6.17, we note that Theorem 2.43 implies that any
decomposition of V = U1 e ® UN into T-cyclic subspaces such that the
restriction of T to U1 has minimal polynomial a multiple of some must involve
all factors q1,..., of mT. Thus, for every i = 1,..., r there must exist some
j = 1,..., N such that the minimal polynomial of T on is a power of q1.
Hence, Theorem 6.17 is the natural sort of uniqueness statement one would
expect.

Proof of 6.17: If V = W11 $ Wru(r), then Theorem 2.43 implies that


mT(X) = . .
It follows that f11 = m1,. . ,
. = mr. Also, for each
= 1,..., r, W11 $ ® V3 = ker q3(T)m1. The primary decomposition
theorem now implies W31 e $ = V1 = Z11 $ ® Zjp(j) for all
= 1,.. . , r. Thus, without loss of generality, we may assume r = 1, and argue
166 CANONICAL FORMS OF MATRICES

u(1)= p(l), and = for all j = 1,..., p(l). We simplify notation by


dropping the 1 and write V = $ with e1 ? ? and

Suppose the integers {e1,. . . , ej and {f1,...,


are not the same. Then there
is a first integer m, where em From our comments above, we know e1 = f1.
Therefore, 1 <m C min{u, p}. We can suppose em > with no loss in gener-
ality. Then e1 = f1,. . ,
.
= and em> "mS mT(X) = Since ? fi
for i nj, we know = 0 whenever i m. Thus, we have

6.18: = q(Tf"(W1) $... e


Now each W3 is a T-cyclic subspace of dimension A simple computation
shows dim(q(T)fm(WJ) = U(q)(f1 — for i = 1,..., m — 1. Thus, equation 6.18
implies

6.19: dim{q(T)fm(V)}
= E —

On the other hand, q(T)fm(V) ®T=1 q(T)fm(ZJ, and dim{q(T)fm(ZJ} =


— for i C m. Therefore, dim{q(T)"(V)} ? >Jti U(q)(e3 — fm). If we
now substitute this inequality into equation 6.19 and use the fact that e1 =
f1,. = cmi' we get em C This is impossible and the proof of
Theorem 6.17 is complete. J

The polynomials qi(X)et ,


. . . . . . , . . .
,
are called
the elementary divisors of T. Since each is the minimal polynomial of T on
Lemmas 2.8 and 6.3 imply Thus, we have the following
corollary to Theorem 6.13.

Corollary 6.20: Let TE have elementary divisors q?",...,


E F[X]. Then

p(i)
® ®
i=1 j=1

Note that Corollary 6.20 implies 1


= cT(X). This is why the
q1i are called elementary divisors. The matrix version of Theorem 6.17 is easily
stated. We leave the proof as an exercise at the end of this section.

Theorem 6.21: Two matrices A, BE are similar if and only if A and B


have the same elementary divisors. fl
We shall finish this section with another similarity result. Before stating it, we
need the following definition:
THE RATIONAL CANONICAL FORM 167

6.22: Let A e
Definition For each i = 1,..., n, let g1 be a greatest
common divisor of all i-rowed minors of — A. The polynomials g1,
.., are called the invariant factors of — A (or the invariant
factors of A).

We note that the usual theory of determinants implies g1 in F[X]. Thus,


each invariant factor is a well-defined polynomial in F[X]. The invariant factors
of A are important because they determine the similarity class of A.

Theorem 6.23: Let A, BE Then A and B are similar if and only if the
invariant factors. of XI — A and XI — B are the same.

Proof? The statement that the invariant factors are the same is of course up to
units, i.e. nonzero constants in F. If A and B are similar, then so are XI — A and
XI — B. It then easily follows that the invariant factors of XI — A and XI — B
are the same. Let us suppose A and B have the same set of invariant factors.
In we can perform the same elementary row (and column)
operations the reader is familiar with in We can interchange two rows
(or columns) of a matrix C e We can multiply one row (or column)
of C by a polynomial h(X) and add the result to another row (or column). Both
of these operations are performed by multiplying C on the left (or right) by
suitable invertible matrices in We can also multiply a row (or
column) of C by a nonzero constant from F.
Now by applying suitable row and column operations to XI — A, it is not
difficult to see that XI — A is equivalent to a diagonal matrix of the form
- g1). Hence, there exist invertible matrices
R, SE such that R(XI — A)S = .., We ask the
reader to provide a proof of this fact in the exercises at the end of this section.
Being equivalent is clearly an equivalence relation on Thus, if
XI — A and XI — B have the same invariant factors, then XI — A and XI — B
are equivalent. Hence, there exist invertible matrices P, Q e such
that P(XI — A)Q = XI — B.
We claim there exist invertible matrices p, q E such that
p(XI — A)q = XI — B. To see this, we first note that P and Q can be viewed as
polynomials in X with coefficients in the algebra We can then divide
XI — B into P and Q (in a process entirely analogous to 1.2) and write

qe Then a tedious computation shows XI— B = P(XI — A)Q =


(XI—B)R(XI—B)+p(XI—A)q. Here R=p1P 1 +Q 'q1 —p1(XI—A)q1.
If we carefully analyze the powers of X in this relation, we see R = 0. Therefore,

p(XI — A)q = XI — B. Again comparing powers of X, we see pq = I.


So, if XI — A and XI — B have the same invariant factors, then
p(XI — A)q = XI — B for invertible matrices p, q E JF). But we have seen in
the previous paragraph that q = Again comparing powers of X, we see
= B. Thus, A and B are similar. fl
168 CANONICAL FORMS OF MATRICES

We can now state the following interesting corollary to Theorem 6.23.

Corollary 6.24: Let F K be fields, and suppose A, BE JF). Then A and B


are similar in if and only if A and B are similar in

Proof Thus, if A and B are similar in they are


similar in
Suppose A and B are similar in By Theorem 6.23, XI — A and
XI — B have the same invariant factors in K[X]. But the invariant factors of
XI — A and XI — B when computed in F[X] are the same as those in K[X]. (See
Exercise 20 at the end of this section.) Hence, XI — A and XI — B have the same
invariant factors in F[X]. In particular, A and B are similar in fl
A couple of comments are in order here. We have shown in Corollary 6.24
that if A, B and B
a P1 E such that P1API' = B. It
is not in general true that P1 = P.
We have now discussed two different sets of polynomials in F[X] which can
be associated with a matrix A e JF). The first set is the set of elementary
divisors ., of the matrix A. These polynomials obviously depend
on the field F. The second set is the set of invariant factors {g1,. . ., '} of A.
This set depends only on A and not on the particular F for which A e
Both sets determine the similarity class of A. There are formulas connecting the
elementary divisors with the invariant factors of A since XI — A is similar to
XI — R. Here R is a rational canonical form of A. We invite the reader to
determine the relationships between these two sets.

EXERCISES FOR SECTION 6

(1) Find an invertible matrix P EM3 3(0) such that PAP' is the rational
canonical form of A when A is the matrix given in Example 6.15.
(2) Find a P EM4 4(0) such that PAP' = R for the matrix A in Example
6.16.
(3) In the proof of Theorem 6.17, we claimed that dim{q(T)fm(WJ} =
— for i m. Give a proof of this statement.
(4) Prove Theorem 6.21.
(5) Find the rational canonical form R of
EXERCISES FOR SECTION 6 169

Also, find P such that PAP -' = R.


(6) Suppose Ae M6 6(Q) with minimal polynomial mA(X) = (X — 2)
x (X2 + 1)2. Find all possible rational canonical forms and elementary
divisors for A.
(7) Show that A is similar to At for any Ae
(8) Find the rational canonical form of

(sin U —cos-O
sinO

(9) Show that the rational canonical form of a diagonal matrix is itself.
(10) A matrix Ae is said to be indecomposable if A is not similar to any
matrix of the form diag(A1, A2) for some smaller square matrices A1 and
A2. A is nonderogatory if m4X) = cA(X). Show that A is indecomposable if
and only if A is nonderogatory and mA(X) = q(X)c with q(X) irreducible in
F[X].
(11) Let T e t(V), and let T* e t(V*) denote the dual of T. Show that
cT(X) = cr(X) and mT(X) = mT4X).
(12) Let T e 1(V), and suppose dim(V) = 2. Show that V is T-cyclic or T =
for some xeF.
(13) Let A be the companion matrix of g(X) = Xm — cm_ 1Xtm_1 — — C0.
Show directly that cA(X) = g(X).
(14) Let p1(X),..., pjX) be a set of monic, primary polynomials, all of degree at
least one in F[X]. Show there exists a matrix A whose nontrivial
elementary divisors are p p is a
power of an irreducible polynomial].
(15) Let
[1 3 31
A=I 3 1
31
[—3 —3

Find a P such that P 'AP is in rational canonical form.


(16) Let Ae JC). Suppose 9'c(A) ftt Prove that A is similar to a matrix
in
(17) Suppose A e 110R). If A2 = — I, show n = 2k for some integer k. Prove
that A is similar over R to

'k
170 CANONICAL FORMS OF MATRICES

(18) Let T e t(V). Show that V is T-cyclic if and only if every Se t(V) that
Commutes with T is a polynomial in T.
(19) An endomorphism T e 1(V) is said to be semisimple if every T-invariant
subspace of V has a T-invariant complement. If mT(X) is irreducible in
F[X], prove that T is semisimple.
(20) In the proof of Corollary 6.24, we use the fact that if g = g.c.d.(f1,. .., in
F[X], then g = g.c.d.(f1, ..., in K[X]. Give a proof of this fact.
(21) In theorem 6.23, argue that XI — A is equivalent to . . ,gj.
Chapter IV

Normed Linear Vector Spaces

1. BASiC DEFINITIONS AND EXAMPLES

In this chapter and the next, we shall take a brief look at some of the more
important functions which are often present on a vector space V. A great deal of
what we have to say in the present chapter can be done over any suitably
ordered field F. However, we shall simplify our discussion arid assume
throughout that F = R, the field of real numbers. Hence, V will denote a real
vector space in this chapter. We do not assume V is finite dimensional over
Let us begin with the definition of a norm on V.
Definition 1.1: A norm on V is a function f: V —÷ 01 such that
(a) f(cx)>OiftxeV—{O}.
(b) f(xtx) = IxIf@) for all txeV and xell.
(c) f(cx + fi) f(cx) + f(fJ) for all cx, /1eV.

In Definition 1.1, the notation lxi means the absolute value of the real number
x. In previous portions of this text, we have used the notation iAi to denote the
cardinality of the set A. The use of the symbol will always be clear from the
context and will cause no confusion in the sequel. Note that 1.1(b) implies
f(O) = 0.
The most familiar example of a norm on a real vector space V is V = R, and
f(cx) = cxi. Weshall give more interesting examples in a moment. 1ff is a norm on
V, then we shall adopt the standard notation of the subject matter and write
f(tx) = icxli for all cx eV. Thus, the symbol 1111 indicates a real valued function on
V that satisfies (a), (b), and (c) of 1.1.
171
172 NORMED LINEAR VECTOR SPACES

Definition 1.2: A normed linear vector space is a real vector space V together
with some fixed norm III: V —. It
Obviously a given vector space V can be viewed in different ways as a normed
linear vector space by specifying different norms on V. Thus, a normed linear
vector space is actually an ordered pair (V, 1) consisting of a real vector space 1

V and a real valued function liii: V —÷ 11 satisfying the axioms of 1.1. When it is
not important to specify the exact nature of we shall drop this part of the
notation and simply refer to V itself as a normed linear space. Let us consider
some nontrivial examples.
Example 1.3: Let n be a positive integer, and set V = R". V has at least three
important norms whose definitions are given by the following equations:

1.4: (a) II(x1,. . ., Ixd


=
1/2
(b) x?)
i= 1

(c) Ij(x1,..., = = 1,..., n}


The fact that liii satisfy the axioms in definition 1.1 is a
straightforward exercise. [To argue II satisfies 1.1(c), we need to use Schwarz's 1

inequality in Chapter V]. The norm liii given in 1.4(b) is just the usual Euclidean
norm on I?. The norm 1 given in 1.4(c) is called the unVorm norm on I". It is
1

the easiest of the three norms to use when making computations. Note that
when n = 1, all three of these norms reduce to the absolute value on R. E
Example 13: Let V = C([a, b]) denote the set of continuous, real valued
functions on a closed interval [a, bJ. V also has at least three important norms
that are used in analysis:
1.6: (a) 11f11 =
(b) If II =
(c) = b]}

Once again the reader can check that 1.6 defines norms on C([a, b]). The norm
is usually called the uniform norm on C([a, b]). fl
Example 1.7: Let V = It A typical vector in V is an infinite sequence
1

= (x1, x2,. . .) having at most a finite number of nonzero components Thus,


the norms II and IIgiven in equation 1.4 can be extended to norms on
II

V in the obvious way:

1.8: (a) x2,..j111


(b) x2, . . . )1I = 1
x?)1/2
(c) II(x1, x2, . . . = sup{Ix1IIi = 1, 2,.. . }
BASIC DEFINITIONS AND EXAMPLES 173

Since any two vectors in V sit in for some n sufficiently large, it is clear that
the equations in 1.8 define norms on V. C
Example 1.9: Let V = {(x1, x2,. . . x? < oo}. We had seen in
Exercise 6, Section 1 of Chapter I that V is a subspace of We can define a
norm on V as follows:
1/2
X2,.. . )II
=

V with the norm defined in equation 1.10 is a well-known Hilbert space, usually
denoted z2. fl

We can construct other normed linear vector spaces by considering injective


maps into a given normed space (V, liii). If W is a subspace of V, then clearly
restricted to W is a norm on W. More generally, if T: X —÷ V is an injective linear
transformation, then iiaii' = 11T(a)ii defines a norm on X.
Now suppose (V, ii ii) is an arbitrary normed linear vector space. We can use
the norm on V to measure the distance d(a, /3) between two vectors a and /3 in
V. Let us introduce the following topological notions:
Definition 1.11: Let A be a subset of V, and let a and /3 be vectors in V. Then

(a) d(a,fl)= ha—/hi.


(b) Br(a) = {vI ha — vii <r}.
(c) We say A is bounded if A Br(0) for some positive number r.
(d) The distance d(a, A) between a and the set A is the number
d(a, A) = inf{iia — vii yeA}.

(e) a is said to be an interior point of A if there exists an r > 0 such that


Br(tX) A.
(f) The set of interior points of A will be denoted A°.
(g) A is said to be open if A° = A.
(h) A is said to be closed if the complement AC of A (i.e., AC = V — A) is open.

All these definitions depend on the particular norm being used in V. The
function d: V x V —÷ l1 defined in 1.11(a) is called the distance function (relative
to the norm iii). The reader can readily verify that d satisfies the following
properties:

(b) d(a, /3) = d(/3, a) for all a, /3EV.

(c) d(a, v) d(a, /3) + d(/3, v) for all a, /3, yeV.

Any set V together with a function d: V x V -.


R that satisfies the conditions
in 1.12 is called a metric space. Thus, any normed linear vector space is a metric
space with distance function given by the norm.
____________

174 NORMED LINEAR VECTOR SPACES

The set Br(cc) introduced in 1.11(b) is called the ball of radius r around cc. Its
exact shape, of course, depends on the particular norm being used.

Example 1.13: Let V = and consider the three norms given in equation 1.4.
If cc = 0 and r = 1, then the ball B1(O) is the following set:

1)
(-1, 1):::; 1)

(—1 0) 0)

..:.zfz.
(—1, —1) —1)

norm 1k norm liii norm 111100 El

The reader can easily check that Br(cc) is an open set in V. In fact, it is clear
from the definitions that a set A in V is open if and only if for every cc e A there
exists an r > 0 such that Br(tX) A. The collection of subsets 0/1 = {A A is open
in V} forms a topology on V. This means that q5 and V e 0/1, and finite
intersections and arbitrary unions of sets from 0/j are again sets in 0/1.
We can also introduce the familiar concepts of limits, continuity, and so on by
using the distance function d. In what follows, it is assumed that V and W are
normed linear vector spaces. We shall use the same symbol II to denote the 1

norm in both spaces V and W. It will always be clear from the context when the
symbol liii is being used to represent the norm on V and when it is being used to
represent the norm on W.

Definition 1.14: Let V and W be normed linear vector spaces and suppose A is a
subset of V. Let f: A —÷ W be a function and suppose cc e V.

(a) = fi if for every r > 0 there exists an s > 0 such that for every
<r.
(b) If cc e A, and = f(cc), then we say f is continuous at cc.
(c) f is continuous on A if f is continuous at every point of A.
(d) f is Lipschitz on A if there exists a positive constant c such that
— f(n)1I — for all

We shall mainly be considering functions whose domain is all of V. If


W is such a function, then 1.14(a) and 1.14(c) can be rewritten in terms of
f: V —'
open sets as follows:

1.15: (a) f(5) = fi if for every open set U in W containing fi, there exists an
open set U' In V containing cc such that f(U') c U.
BASIC DEFINITIONS AND EXAMPLES 175

(b) f is continuous on V(or just continuous) if for every open set U in W,


f'(U) is an open set in V.

Definitions 1.14(a)—1.14(c) describe ideas that are familiar from the calculus.
The notion of a function being Lipschitz is more peculiar to analysis coming
from normed linear spaces. Note that any f: A -÷ W that is Lipschitz on A is
certainly continuous on A. Our most important example of a Lipschitz function
is the norm itself.

Lemma 1.16: Let (V, III) be any normed linear vector space. Then the norm Ill
is a Lipschitz function from (V, liii) to (R, I). I

Proof If cx, fJ€V, then Therefore,


hail — II fill ha — fill. the roles of
Reversing cx and fi gives us
hifill — hail 11/3—all = ha—fill. Thus, hall — 11/3111 la—fill. This is pre-
cisely the statement that 1111 is Lipschitz on V. El

Since Lipschitz functions are continuous, we conclude from Lemma 1.16 that
the norm II his a continuous, real valued function on V. We shall use this fact
often in the sequel.
Now suppose T: V —÷ W is a linear transformation. It follows easily from the
definition that T is Lipschitz on V if and only if there exists a c > 0 such that
hi dl for all e V. Linear transformations that are Lipschitz on V are
1

very important in analysis and algebra alike. These types of transformations are
called bounded linear operators. Let us formally introduce the notation we shall
use for the set of bounded linear operators.
Definition 1.17: A bounded linear operator T: V -÷ W is a linear transformation
that is Lipschitz on V. The set of all bounded linear operators from V to W will
be denoted êB(V, W).
Thus, T e GB(V, W) if and only if T e W), and there exists a positive
constant c such that II for all e V. Let us consider a few examples
before continuing.
Example 1.18: Let V = I? with n 2, and consider the norm Ill1 given in
Example 1.3. Let and denote the canonical injections and projections of

Let l1 have the standard norm Ix II = lxi. Then e Homk(R, V). Since
= hh(O,...,x,...,0)ll1 = lxi = hlxlh,eachO1isaboundedlinearoperator.
1161(x)111
Therefore, e lfl.
Each projection ; is a linear transformation from I?' to It If (x1, . . , xj e .

then = 1x11 = ll(x1,. .., Thus, i;

The reader can easily verify that and are also bounded linear operators
with respect to the Euclidean norm 1111 and the uniform norm 1111 on
176 NORMED LINEAR VECTOR SPACES

Example 1.19: Let V = C([a, bJ) with norm Fl given in equation 1.6. For
every ft V, set T(f) = f(t) dt. Then T e HomR(V, Ilk). As usual, let the norm on R
be the absolute value I. Set m = 1Ff II = sup{If(t)I t e [a, bJ}. Then IT(f)I =
I

C C m(b — a) = (b — Thus, b]), Ilk). El

Let us give an example of a linear transformation that is not a bounded linear


operator.

Example 1.20: Let V = C([0, 1]) = W. Then T = the identity map on V, is a

as in 1.6). Then we claim T: (C([0, 1]), 11111) —* (C([0, 111), liii j


linear transformation from V to W. Norm V with 11111 and W with II IL (notation
is not a bounded
linear operator. To see this, suppose T were bounded. Then there would exist a
c > 0 such that JIT(f)jI C clif 1k for all ft C([0, 1J). 1ff = V', then this inequality
would become 1 c/(n + 1). That relation is clearly impossible for all n. Thus, T
is not bounded. El

Example 1.20 shows that W) is in general a proper subset of


W). Another example, a bit more algebraic, is the following:

Example 1.21: Set V = W = Ilk, and again let T be the identity map from
V to W. Norm V with liii and W with liii! (notation as in 1.8). If T were a
bounded linear operator, then there would exist a C > 0 such that Ihx C
for all txeV. For = (1,..., 1, 0. . .), this inequality implies n C c. This is clearly
impossible for all n. El

The reader will note that the last two examples were both infinite dimen-
sional. If both V and W are finite dimensional over Ilk, then every linear
transformation from V to W is a bounded linear operator. Thus,
W) = Hom(V, W) whenever dim(V), dim(W) c cc. This fact is not obvious.
We shall prove it in Section 3 of this chapter.
Now suppose V and W are arbitrary normed linear vector spaces, and let
T e Hom(V, W). The relation between being bounded and being continuous is
easily stated.

Theorem 1.22: Let V and W be normed linear vector spaces and suppose
T E Hom(V, W). Then the following are equivalent:

(a) T is continuous at one point of V.


(b) T is continuous on all of V.
(c) T is a bounded linear operator.

Proof Lipschitz maps are continuous. Thus, the implications (c) (b) (a) are
all obvious. Hence, it suffices to show (a) (c). Suppose T is continuous at cc e V.
Then there exists an s>0 such that — ccli — T(cc)lI < 1.
BASIC DEFINITIONS AND EXAMPLES 177

Now any vector /1eV can be written in the form /3 = — a (set = /1 + a).
Thus, II /311 <s IIT(P)ll <1. If /3 is any nonzero vector in V, then II 0, and
llsIl/211 fill II = s/2 c s. Therefore, IIT(sfi/211 P11)11 c 1. This last inequality is
equivalent to IIT(fi)ll <(2/s)ll fill. We conclude that T is bounded. El

Let T e W). Then there exists a positive constant c such that


IIT(a)ll ( cilall for all a e V. Such a number c is called a bound for the linear
operator T. In Example 1.19, for instance, we showed that b — a was a bound for
) dt.
If c is a bound for T, and if such that II'5l1 = 1, then c. In
particular, the set of bounds for T is a subset of R that is bounded below by the
nonnegative number We conclude that the number inf{c c is a bound
for T} exists. This number is commonly called the norm of T. We shall
henceforth denote it 11Th. Thus, we have the following definition:

Definition 1.23: Let Te W). Then lIT II = inf{clc is a bound for T}.

There are a couple of alternative definitions for the norm of T that are worth
recording here.

Lemma 1.24: Let T e W). Then

(a) = sup{IIT(a)Il/hIalI JaeV — {0}}.


11Th

(b) IlTil = sup{IIT(a)Il IlalI = 1}. El

The proofs of (a) and (b) in 1.24 are straightforward. We leave them for
exercises at the end of this section. Let us consider an example.

Example 1.25: We return to Example 1.19. We have already noted that b — a is


a bound for T = )dt. Thus, IIT1I C b — a. Consider the constant function
leC([a,b]). and T(1)=fl'ldt=b—a. Thus, IIT(1)lI =Ib—aI=
b — a. In particular, Lemma 1.24(b) implies IITII b — a. Therefore,
IITII=b—a. El

Now suppose S and T are bounded linear operators from V to W. If c1 and


c2 are bounds for S and T, respectively, then for all x, y e R, we have
Il(xS + yT)(a)Il = hIxS(a) + yT(a)hI IxI IIS(a)II + I)'I IIT(a)Il +
Thus, xS + yT is a bounded linear operator with bound c1lxl + c2IyI. In
particular, GJ(V, W) is a subspace of Hom(V, W). We can use the definition of
IIT1I given in 1.23 to put a norm on êB(V, W).

Theorem 1.26: a?8(V, W) is a normed linear vector space if lIT II is defined as in


1.23.
178 NORMED LINEAR VECTOR SPACES

Proof We have already noted that W) is a vector space under the usual
operations xS + yT in Hom(V, W). It remains to verify that axioms (a), (b), and
(c) of Definition 1.1 are satisfied.

(a) Let W). Lemma 1.24(b) implies 11Th 0. Suppose 11Th = 0. Then
1.24(a) implies 1IT(a)ll = 0 for all a e V. But then T(a) =0 for all a e V and,
consequently, T = 0.
(b) Let T e GJ(V, W) and x e R. Using 1.24(b), we have

lixTil = sup{hhxT(a)Ii hail = 1} I


= sup{lxl IJT(a)ll hail = 1}
= Ixlsup{I1T(a)il Jail = 1} = lxi 11Th.

(c) Let 5, T E £?8(V, W). Again using 1.24(b), we have

hIS + Til = sup{hi(S + T)(a)il hail = 1}


= sup{llS(a) + T(a)li hail = 1}
sup{flS(a)fl + IIT(a)ll hail = 1}
sup{ilS(a)li I hail = 1} + sup{iIT(a)IJ hail = 1}
= lISlI + 11Th S
The norm introduced in Definition 1.23 is usually referred to as the uniform
norm on W). Note that if T e W) has uniform norm liT then
IIT(a) II c liT la for all a e V. We conclude this section with another useful fact
Ii ii

about the uniform norm.

Theorem 1.27: If T e W), and 5€ Z), then STE Z). Furthermore,


115Th 11511 11Th.

Proof For any aeV, we have iiST(aHI = ilS(T(a))il 1511 IIT(a)ii


iiSli 11Th hail. 5

EXERCISES FOR SECTION 1

(1) Show that equations 1.4(a) and 1.4(c) and 1.6(a) and 1.6(c) define norms on
R" and C([a, b]), respectively.
(2) Let W denote a real vector space. A function f: W -÷ R is called a seminorm
on W if f satisfies the following properties:
(a) f(a)?OforallaeW.
(b) f(xa) = lxlf(a) for all xeR and aeW.
(c) all a,fleW.
EXERCISES FOR SECTION 1 179

Show that if f is a seminorm on W and T e Hom(V, W), then fT is a


seminorm on V. Use this fact to construct a seminorm on W that is not a
norm.
In the rest of these exercises, V and W will denote normed linear vector
spaces.

(3) Lipschitz functions on V need not be linear transformations. Give at least


two examples.
(4) Show that BrOX) is an open set in V.
(5) Show that = {AIA is open in V} is closed under arbitrary unions and
finite intersections. Give an example in that shows that arbitrary
intersections of open sets need not be open.
(6) Give an example of a set A in V such that A is neither open nor closed.
(7) Let A V. A vector cx is said to be in the closure of A if Br(cz) n A 4) for all
r > 0. Let A = {x e V I is in the closure of A}. A is called the closure of the
set A.
(a) Show that A is the smallest closed set in V containing A.
(b) Give an example where A A.
(c) Show that e A if and only if d(cx, A) = 0.
(8) The boundary A° of a set A in V is the difference between the closure of A
and its interior. Thus, Aa = A — A°.
(a) Show czeAä if and only if for all r>O, and
Br(tX) n AC
(b) Compute the boundary of B1(O) in the examples in 1.13.
(9) Show that Br(cX) + = Br+sOX + /1).

(10) Give a detailed proof of the assertion that 1.15(a) and 1.15(b) are equivalent
to 1.14(a) and 1.14(c), respectively.
(11) Show that and it1 are bounded linear operators when is replaced by
either III or III (notation as in 1.4) in
(12) Let Show for all
e V, and consider the map W) -÷ W given by EéT) =
Show that is a bounded linear operator.
(14) Let e V. Show that there exists an fe R) such that II = 1 and
=
(15) Use Exercise 14 to show that = in Exercise 13.
(16) If define IAII 1,...,n}. Show
that this equation defines a norm on for which PABI! DAD IBIP
for all matrices A and B.
180 NORM ED LINEAR VECTOR SPACES

(17) Consider (W, UI! J. Let T e If A = then show


= hAil. Here hAil is the norm given in Exercise 16.
(18) Prove Lemma 1.24. (You will need Exercise 12).

(19) Prove that every ball Br(cx) in V is a convex set. (A set S in V is convex if /3,
+ (1 — x)c5eS for all xe[0, 1].)
(20) Consider (W, and let cx = (x1,..., xj e Define a map T: -÷
by T(y1,..., = Show that l1), and compute liii.
(21) Use Exercise 9 to show that the sum of any two bounded sets is bounded.
(22) Formulate the appropriate notion of a function f: V -+ W being Lipschitz at
a point cx e V. Consider the function f: R —* R given by f(x) = xft'2. Show
that f is Lipschitz at 1, but is not Lipschitz at 0.

2. PRODUCT NORMS AND EQUIVALENCE

Two normed linear vector spaces V and W are said to be norm isomorphic if
there exists an isomorphism T: V W such that T and T1 are bounded linear
operators, that is, T e V). For example, we have seen in
Exercise 17 of Section 1 that l?1) and are norm isomorphic when
W is given the uniform norm If two normed linear vector spaces are norm
hi

isomorphic, they are for all practical purposes identical. Thus, in the sequel, we
shall identify spaces that are norm isomorphic whenever it is convenient to do
so.
For two different norms on the same space V, we have the following
important definition:

Definition 2.1: Two norms ii ii and on the same real vector space V are said
II

to be equivalent if there exist positive constants a and b such that ihcxhh ahlcxhh',
and jlcxhh' c blhcxhh for all cxeV.

Thus, two norms and Ifi' on V are equivalent if the identity map is a norm
isomorphism from (V, liii) to (V, h'). We have already seen in Example 1.20 or
1.21 that two norms need not be equivalent. We should also point out the trivial
case of R itself. Definition 1.1(b) implies that any norm on R is equivalent to the
absolute value I I. We shall prove in Section 3 that if dim(V) c ccc, then any two
norms on V are equivalent.
The equivalence of norms is an important idea in analysis and topology alike.
If two norms are equivalent on V, then they generate the same topology. By this
we mean the open sets relative to the two norms are the same collection of sets.
More precisely, we have the following theorem:
PRODUCT NORMS AND EQUIVALENCE 181

Theorem 2.2: Let II and be equivalent norms on V. A subset A of V is open


with respect to lii if and only if A is open with respect to
Proof Suppose A is open with respect to the norm 1. Then for every
1

there exists an rOx) > 0 such that Br(u)(X) = —I


<r(tx)} A. Clearly,
A= Br(a)@). Thus, to argue that A is an open set with respect to j', it
suffices to show that any ball = 115
I
— xl <r} is an open set in (V, III').
Let us suppose for every xeV. Let Since Br(X) is an
open subset of(V, 1) (see exercise 4 of Section 1], there exists an s > 0 such that
Br@). Set D = — PIV < s/b}. D is just an open ball in the
norm around If then — — cb(s/b) = s. Hence,
D Br(tX). We have now shown that Br(x) is an open set in (V,
If we reverse the roles of and j', we get every ball in the II II'-norm is an
1

open set in (V, Ifl). This completes the proof of the theorem. fl
Theorem 2.2 implies that all topological notions remain the same when
switching between equivalent norms. This being the case, one should try to
choose an equivalent norm in which the given problem becomes easier
computationally. For example, in II? the Euclidean norm is equivalent to the
uniform norm (see Exercise 1 at the end of this section). It often happens in
specific problems that is easier to handle than the Euclidean norm when
doing arithmetic. Thus we often switch from to Ill when dealing with
problems in W1. Since these norms are equivalent, there is no loss in generality
from a topological point of view in making this switch.
An immediate corollary to Theorem 2.2 is the following remark, whose proof
we leave to the reader.
Corollary 2.3: Let V and W be normed linear vector spaces. Then the set
iM(V, W) in HomR(V, W) remains the same if either the norm in V or W is
replaced by an equivalent norm. Changing norms in V and W to equivalent
norms results in equivalent uniform norms on iJ(V, W). El

Now suppose we consider a product V x W of two normed linear vector


spaces (V, fi and (W, 1
We want to discuss what norms are available on
V x W. One norm which readily comes to mind is the so called sum norm
j(x, ThUS = + II The reader can easily verify that is a norm on
1

V x W. For example, the norm ii given in equation 1.4(a) is clearly the sum
norm on
The sum norm on V x W has the property that the canonical injections
and projections it1 associated with V x W are all bounded linear operators.
Thus,
2.4: V V

x
182 NORMED LINEAR VECTOR SPACES

is a diagram of bounded linear operators when V x W is given the sum norm


To see this, we have 17r2(x,fl)112 = fill2 C UxVi + = Thus,
and similarly it1, is bounded. We also have = lI@' = kil1
+ 110112 = Thus, 01, and similarly 02, is bounded. Note that lisa bound for
all four maps in 2.4.
The norms on V x W which have some intrinsic relation with the given
norms on V and W turn out to be the most useful norms to study on products.
This prompts the following definition.

Definition 2.5: Let V and W be normed linear vector spaces. A norm on the
product V x W is called a product norm if the canonical maps in Diagram 2.4
are all bounded linear operators.

Thus, a norm II 1 on V x W is a product norm if there exist four positive


constants a, b, c, and d such that the following inequalities are satisfied:

2.6: (1)
(2)
forallczeV fleW
(3) P)li
(4) 1n2(x, /3)112 dIl(x,

The sum norm JJ)ll, = lIcxlj1 + 11111k is a typical example of a product


norm on V x W. In fact, up to equivalence, the sum norm is the only product
norm on V x W.

Theorem 2.7: Let (V, II 1) and (W, 1112) be normed linear vector spaces. Then
any product norm on V x W is equivalent to the sum norm
lKcx, = lalli +
Proof? Let II be any product norm on V x W. Since
1
and it1 are bounded,
there exists constants a, b, c, and d satisfying the inequalities in 2.6. Let
e = max{a, b}. Then fl)li = lKcx, 0) + (0, + 11(0, lOll =
+ 1102(13)11 + bO /3l12 c eflhx01 + = ell(x, Thus, Il@'

c + d. Then 1 #02 = ThILi + Th02


cllx, + =
We conclude that and are equivalent. D

We can generalize Theorem 2.7 to n factors in the obvious way. We state the
result and leave the proof for the reader.

Theorem 2.8: Let (V1, i = 1,..., n, be a finite number of normed linear


vector spaces. Then any product norm on V1 x x is equivalent to the sum
PRODUCT NORMS AND EQUIVALENCE 183

norm IL given in (a) below. The formulas in (b) and (c) also define product
norms on V1 x x

(a) II(ai,..., = IIadh.


(b) . . , = IIadI?)112.
(c) =

In particular, these three norms are all equivalent. fl


In Theorem 2.8, is the sum norm on V1 x x is called the
Euclidean norm and II II the uniform norm.
The definitions in this section allow us to make some comments about
addition and scalar multiplication in a normed linear vector space (V, II). We
can think of vector addition in V as a linear transformation T: V x V —÷ V given
by T(a, /3) = a + /3. We claim T is a continuous function with respect to any
product norm on V x V. By Theorem 2.7, it suffices to show T is a bounded
linear operator with respect to the sum norm on V x V. Since
IIT(a,18)jI = Ihx+JJII hail + Il/3M = we see T is bounded by 1. We
have now proved the first part of the following lemma:

Lemma 2.9: Let V be a normed linear vector space. The operation of addition is
a bounded linear operator from V x V to V. The operation of scalar multiplica-
tion is a continuous function from l1 x V to V. El

The map f: R x V —÷ V given by f(x, a) = xa is not a linear transformation.


However, it is easy to show that f is a continuous map on R x V with respect to
any product norm. We leave this point as an exercise at the end of this section.
We have discussed what sorts of norms are available on a product
V1 x x of a finite number of normed linear vector spaces (V1, M
= 1,..., n. If V = V1 x x is given any product norm, then the pro-
V1 are all bounded linear operators. If we identify V1 with 01(V1)
jections it1: V —*
(these spaces are norm isomorphic), then V is an internal direct sum of the V1,
and the projections of V onto V1 are all bounded. In the remainder of this
section, we shall extend these remarks to internal direct sums in general.
Suppose (V, II is a normed linear vector space, and let V1,..., be
subspaces of V. We assume that V = ED Then every vector a e V can
be written uniquely in the form a = a1 + + where a1eV1 for each
= 1,..., n. Recall that the map sending a to isa1 a well-defined linear
transformation from V to V1. We call this map the ith projection and denote it
by P1. Thus, P1e HomR(V, V1).
Now we can restrict the norm II to each subspace V1. Then each V1 is a
normed linear vector space in its own right, and we can consider the sum norm
on the product V1 x x This leads to the following definition:
184 NORMED LINEAR VECTOR SPACES

Definition 2.10: Let (V, be a normed linear vector space with subspaces
V1,..., Vi,. We say that V is a norm direct sum of the V1 if

norm isomorphism.

In this definition, each V1 is normed with restricted to V1. The norm on


V1 x x is the sum norm Since V is the direct
. ., =
sum of the V1, the map S in (b) is an isomorphism. Thus, V = V1 ee
norm direct sum if and only if S and S are bounded linear operators. A typical
is a

example to keep in mind is an external direct product V1 x x V (with


sum norm) of normed linear vector spaces (V1, and subspaces 01(V1).
Suppose V = V1 ® is any (internal) direct sum. Then
= IIx1 + ... + = Thus, the map
S given in definition 2.10(b) is always a bounded linear operator. In particular, V
is a norm direct sum of the V1 if S1 is a bounded linear operator. This last
remark can be said in a slightly different way.

Theorem 2.11: Suppose V is a normed linear vector space and an internal direct
sum V = V1 e of subspaces V1 ,.. .,
V onto V1. Then V is a norm direct sum of the V1 if and only if each P1 is a
bounded linear operator.

Proof From our discussion above, we know V is a norm direct sum of the V1 if
and only if 5 is a bounded linear operator. 51 is bounded if there exists a
c > 0 such that cIlxl! for all xeV. This last inequality means

Now suppose V is the norm direct sum of the V1. Let oe V and write
with for j=l,...,n. For any fixed i, we have
1P1(x)jl = jx1fl = Thus, P1 is a bounded linear
operator.
Conversely, suppose each P1 is bounded. Then there exists a k1 > 0 such
that 11P1@)tl for all xeV. Let c=Er1k1. Then
>J1_i = 11P1@)ll cljxjl. Thus, 51 is bounded. Consequently, V is a
norm direct sum of the V1. U

EXERCiSES FOR SECTION 2

(1) Show that the norms fly, and II


given in 1.4 are all equivalent (and
hence product norms) on l1".
(2) Is Exercise 1 true for V = 1

(3) Give a detailed proof of Corollary 2.3.


EXERCISES FOR SECTION 2 185

(4) Let V and W denote normed linear vector spaces, and consider the
projection it1: V X W -÷ V. Show that relative to any product norm on
V X W, it1 is an open map [i.e., U open in V x W n1(U) open in V].
(5) Prove Theorem 2.8.
(6) Let V be a normed linear vector space and define f: R x V V by
f(x, tx) = Show that f is continuous relative to any product norm on

(7) Let V1, i = 2, 3, be normed linear vector spaces. Show that the map
1,
V2) x V3) -÷ eJ(V1, V3) given by X(S, T) = TS is a con-
tinuous map. Here the norm on is the uniform norm given
in equation 1.23, and the norm on the product is any product norm.
(8) Give an example of a normed linear space V and two subspaces V1 and V2
of V such that V = V1 $ V2, but V is not the norm direct sum of V1 and
V2.

(9) Suppose V = V1 $ V2 is a norm direct sum. Show that each V1 is a closed


subset of V.
(10) Let f: V -÷ be a seminorm on V (Exercise 2 of Section 1). Prove the
following assertions:
(a)
(b) f is constant on each coset + N of N.
(c) f induces a norm on the quotient space V/N [given by
+ = f(x)] such that the following diagram is commutative:

Here it is the natural map ir(x) = x + N.


(11) Show that lV') and are norm isomorphic.
(12) Let V and W be normed linear vector spaces, and suppose w: V x W —+ is
a bounded bilinear map. This means there exists a positive constant b such
that Iw(oc /3)1 bllxll II fill for all cxeV, and fleW. Let T:V -÷ W* be defined
by T(cz)(fi) = w(oc, /3). Show that T is a bounded linear map from V to
l1) and compute
(13) Let N be a closed subspace of a normed linear vector space (V, II 1). Define
a real valued function by
II

a normed linear vector space. Prove that the


natural map cc —, cc + N is a bounded linear map from V to V/N.
186 NORMED LINEAR VECTOR SPACES

(14) Suppose V and W are normed linear vector spaces, and T e W). Let N
be a closed subspace of V contained in ker(T). Let T be the induced map on
V/N given by T(cz + N) = T(x). If we norm V/N as in Exercise 13, show I is
a bounded linear map from V/N to W such that 1111 =
(15) Let V and W be normed linear vector spaces, and let T e W). T is
called an isometry if = for all e V. Examine the bounded maps
in this section and decide which are isometries.

3.SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF


NORMS

In this section, we shall prove that any two norms on a finite-dimensional vector
space V are equivalent. In order to do this, we need to develop a certain number
of topological facts that are true in any metric space. However, we shall state all
the results we need in the language of normed linear vector spaces. Throughout
this section, V will denote a normed linear vector space with some fixed norm

We begin with the notion of a sequence of vectors in V.

Definition 3.1: (a) A sequence {ocj in V is a function f: N —* V such that f(n) =


for all ne N.
(b) A sequence {cj is said to have a limit /1eV if for every r > 0
there exists an meN such that k — PD <r.
If a sequence has a limit /3 e V, then /1 is clearly unique. Thus, we can refer
to /1 as the limit of the sequence In this case, we shall say {oçj converges to /1
and write {ocj —÷ /1. We say a sequence is convergent if the sequence has a limit in
V. Note that 3.1(b) can be rewritten in the following form:

3.2: —÷ /3 if and only if for every r> 0 there exists an me N such that
k

Having introduced the notion of sequences, we now explore the relationships


between these functions and some of the other ideas we have been discussing.
Our first lemma says that addition and scalar multiplication preserves limits.

Lemma 3.3: Suppose {cç} -÷ oc, and {


—* /3. Then for any x, y e R,
+ y/1.

Proof We first show that + flj —* oc + /3. Let r > 0. Then there exists
natural numbers m1 and m2 such that k m1 lock — cxli <r/2, and
k ) m2 /1k — /311 <r/2. Let m = max{m1, m2}. If k m, then l(ock + /3k)
+ <r/2+r/2=r. Thus,
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 187

To prove the lemma, it now suffices to show that —p This is


Clear if x = 0. Thus, we assume x 0. Let r > 0. Since —* there exists an
meN such that <r/ixi. Then for we have
iixcxk — xxjj = lxi ikk — xii <r. Thus, —÷ and the proof of the lemma is
complete. El

Our next lemma gives us a sequential characterization of the closure of a set


in V.

Lemma 3.4: Let A be a subset of V, and denote the closure of A by A. Then e


if and only if there exists a sequence A such that {oçj —+

Proof Recall from Exercise 7 of Section 1 that eA if and only if fl A 0


for all r > 0. So, let us first suppose x e A. Then for every n e N, there exists an
e n A. Clearly, —* x.

Conversely, suppose a sequence in A converges to Let r > 0. Then


there exists a natural number m such that k m In particular,
Br(cX)flA 5e 0. Thus, El

Lemma 3.4 says that the smallest closed set in V containing A is precisely the
set of all limits from sequences in A. For instance, the closure ofB1(0) in the three
norms given in Example 1.13 are the following sets:

11111 liii

The boundary, (B1(0))3 = B1(0) — B1(0), of B1(O) in the three norms are the
three curves pictured below:

3.6:

ii

We can characterize continuous functions on normed linear vector spaces


using sequences.
188 NORMED LINEAR VECTOR SPACES

Lemma 3.7: Let V and W be normed linear vector spaces, and let A be a subset
of V. Suppose f: A -÷ W is a function. Let cx eA. Then f is continuous at cc if and
only if for every sequence in A converging to cx, —'

is continuous at X, and let {cj be a sequence in A that


f
0. Since f is continuous at
converges to cc. Let r > there exists an s > 0 such
that 5eA and — xU <s — f@)Il <r. Since —÷; there exists an
meN such that <5. In particular, <r.
Thus, -÷
For the converse, suppose f fails to be continuous at cx. Then there exists an
r >0 such that for every s >0 there is a fleA such that VP — <s, but
lIf(fl) — f@)li r. In particular, for every n e r%I, there exists an cc1, eA such that
— cxli < 1/n, but — f(x)lI ) r. We now have a contradiction. {øç} —p in
A, but does not converge to f(cx). We conclude that f must be continuous
atix. fl
The reader will recall that two norms ill and ill' are equivalent on V if the
identity map (V, 1) -÷ (V, 1') is a norm isomorphism, that is, continuous in
Ii

both directions. Using the sequential characterization of continuity given in


Lemma 3.7, we get the following corollary:

Corollary 3.8: Two norms on V are equivalent if and only if the set of
convergent sequences in V relative to one of the norms is precisely the same as
the set of convergent sequences relative to the other norm. 0

We also get a sequential characterization of product norms using Lemma 3.7.

Corollary 3.9: Let V and W be normed linear vector spaces. A norm on V x W


is a product norm if and only if the following property (G°) is satisfied:

03°): flj} -÷Ox, /1) in V x W

and and
a norm on V x W. We first note that the sum norm
= clearly satisfies property (c??'). If ill is a product norm on
+ II

V x W, then Ill is equivalent to by Theorem 2.7. Hence, there exist


constants a, b > 0 such that n)lI all
tj)eV x W. We then have the following inequalities: — cxlii —
+ Similarly,
— — (;fi)ll C — = —

1 + II These inequalities readily imply that


— 13112). satisfies (E3°).
Conversely, suppose III satisfies (G°). Then fij} —÷ (cc, 13) relative to III in
V x W if and only if {txn} —÷cc relative to in V, and —÷13 relative to 11112
in W. The same statement is true for the sum norm IL. We thus conclude that
the sets of convergent sequences relative to and IL in V x W are precisely
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 189

the same. Hence, fill is equivalent to II II


is a
product norm on V x W.

We can now introduce the central ideas of this section. In order to discuss
sequential compactness, we need the notion of a subsequence of }.

Definition 3.10: A sequence { /J,j is a subsequence of {aj if there exists a strictly


increasing function f: NJ -+ NJ such that = for all n e NJ.

If we set f(k) = nk in Definition 3.10, then n1 <n2 <n3 <...,and 13k = aflk
for all k e NJ. For this reason, we often use the notation {aflk} to indicate a
subsequence of We shall also use the notation { /J,j c÷ to indicate that
{ is a subsequence of
There are a few elementary remarks concerning subsequences that we shall
use implicitly throughout the rest of this section. We gather these remarks
together in the following lemma:

Lemma 3.11: (a) If -÷ a, and c÷ then {j3j —' a.


(b) If a sequence does not converge to some vector a, then
there exists a subsequence { c÷ such that no sub-
sequence of converges to a.
(c) Suppose is a sequence in V, and a e V. If every sub-
sequence of has a subsequence that converges to a, then
{ -+ a.

Proof (a) This proof is trivial. We leave it to the reader.


(b) Suppose does not converge to a. Then there exists an r > 0 such
that for any me NJ there exists an n ? m with — all ? r. By letting
m get large, we can construct a strictly increasing sequence
n1 <n2 that laflk — all r. Clearly, the subsequence {aflk}
has no subsequence converging to a since every term of {aflk} is a
distance at least r from a.
(c) This statement follows immediately from (b). If a sequence does
not converge to a, then by (b), has a subsequence { such that
no subsequence of { converges to a. But, this is contrary to our
assumptions in (c). Hence, —* a. fl
Definition 3.12: Let (V, III) be a normed linear vector space. Suppose A is a
subset of V. We say A is sequentially compact if every sequence in A has a
subsequence that converges to a vector in A.

Thus, if A is sequentially compact, and is a sequence in A, then there


exists an a e A and a subsequence {aflk} of a. The notion of
such that {aflk}
sequential compactness of course depends on the specific norm in V. However,
190 NORMED LINEAR VECTOR SPACES

Corollary 3.8 implies that if two norms are equivalent, then a set A is
sequentially compact with respect to one norm if and only if A is sequentially
compact with respect to the other norm. Let us consider some simple examples
in R before continuing.

Example 3.13: Let V = with I. Any finite set A in R must be


I

sequentially compact since any sequence in A necessarily contains a constant


subsequence.
An infinite set in R need not be sequentially compact. If A = /, for example,
then the sequence {n} contains no convergent subsequence. A more interesting
example is the set C = { 1/n n e Now any sequence in C has a convergent
subsequence. The sequence { 1/n} is contained in C and has the property that
any subsequence of {1/n} converges to 0. Since 0 is not in C, we conclude that C
is not sequentially compact. U

Our first property of a sequentially compact subset of V is perhaps the most


important one of all.

Theorem 3.14: Let A be a subset of a normed linear vector space (V, 1111). If A is
sequentially compact, then A is closed and bounded.

Proof We have seen in Exercise 7 of Section 1, that the closure A of A is the


smallest closed set in V containing A. Thus, to show that A is closed, it suffices to
show that A A. Let a e A. Then Lemma 3.4 implies that there exists a sequence
{ in A such that {aj a. Since A is sequentially compact, there exists a
subsequence {aflk} of and an element /3 e A such that {aflk} —+ /3. But Lemma
3.11(a) implies {aflk} a. Thus, a = /3. In particular, a e A. Hence, A A, and A
is closed.
Recall that A is bounded if A Br(0) for some r> 0. Suppose A is not
bounded. Then for every n e A is not contained in Bj0). Thus, for every n e NJ,
there exists an eA such that n. Since A is sequentially compact, there
exists a subsequence of say {aflk}, and a vector $ e A such that {aflk} —' /3.
Now we have noted in Lemma 1.16 that the norm is a continuous function from
V to It Lemma 3.7 then implies {IIaflklI} II fill in R. But from our construction
of the sequence {IIaflklI} —' +oo. Thus, 11/311 = +cc which of course is
impossible. We conclude that A is a bounded subset of V. fl

At this point, it may be a good idea to say a few words about a possible
converse of Theorem 3.14. We shall show later on in this section that if
dimR(V) < oo, then the converse of 3.14 is true. For the time being, we merely
note that neither hypothesis, closed nor bounded, alone implies sequential
compactness. 1 is a closed subset (but not bounded) of R, and / is not
sequentially compact. C = { 1/n n e Ni} is a bounded subset (but not closed) of
and C is not sequentially compact.
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS

Another important property that a sequentially compact subset has is that a


continuous, real valued function on such a set obtains both a maximum and
minimum value somewhere on the set. This remark easily follows from our next
theorem.

Theorem 3.15: Let V and W be normed linear vector spaces, and let A be a
subset of V. Suppose f: A W is a continuous function. If A is a sequentially
compact subset of V, then f(A) is a sequentially compact subset of W.

Proof? Let be a sequence in f(A). Then for every n, there exists an oc,, e A with
= Since A is sequentially compact, has a convergent subsequence
—÷ fleA. Since f is continuous, -. f(fl) by Lemma 3.7. Since fleA,
f(fl) e f(A). Thus, the sequence has a convergent subsequence {f(/Jj} — f(fl)
in f(A). We conclude that f(A) is sequentially compact. El

Corollary 3.16: Let A be a sequentially compact subset of V and suppose


f: A —. f is bounded on A. Furthermore, f
assumes both a maximum and minimum value on A.

Proof? Before proving the corollary, let us discuss its meaning. We say f is
bounded on A if there exists a positive number b such that If@)I <b for all a eA.
To say that f assumes a maximum value on A means there exists an e A such
that f(c5) f(a) for all eA. Similarly, f assumes a minimum value on A if there
exists a fleA such that f(5) f(fl) for all c5eA.
Now by Theorem 3.15, f(A) is a sequentially compact subset of Ut Thus, by
Theorem 3.14, is a bounded subset of R. This of course means f is bounded
on A.
Let x = sup f(A) = sup{f(c5) eA}. Let y = inff(A). Since f(A) is a bounded
subset of R, both x and y exist. Theorem 3.14 also implies that f(A) is a closed
subset of R. The reader can easily check that any closed (and bounded) subset of
R contains both its infimum and supremum. In particular, x, y e f(A). If eA
such that f(ct) = x, then clearly f assumes a maximum value x on A at Similarly,
if fleA such that Q/J) = y, then f assumes a minimum value y on A at /3. J
We now turn our attention to R. Since any norm on is equivalent to the
absolute value we can, with no loss in generality, state all our results relative
to We first remind the reader of some familiar definitions from the calculus.

Definition 3.17: Let be a sequence in a

(a) We say is increasing if for all n.


(b) is decreasing if ? for all n.
(c) is monotone if is either increasing or decreasing.
(d) is bounded if there exists a b> 0 such that b for all n.
192 NORMED LINEAR VECTOR SPACES

The important facts about monotone sequences are contained in the


following two lemmas:

Lemma 3.18: Any bounded, monotone sequence in l1 Converges.

Proof Let be an increasing, bounded sequence. Since is bounded,


x= ne exists. We claim —+ x. To see this, let r > 0. Then x — r is

not an upper bound of the set In e Hence, there exists an m such that
xm > x — r. Since is increasing, x — r < xm C x for all n ? m. In
particular, n ? m — <r. Thus,
xj x.
If is decreasing and y = J
ne then a similar proof shows
El

Lemma 3.19: Every sequence in l1 has a monotone subsequence.

Proof Let be an arbitrary sequence in R. Let us call a term xk of a peak


term if xk ? xk+J for allj 1. Suppose has infinitely many peak terms, say
Here we label those terms with n1 <n2 Clearly, {xflk} is a
decreasing subsequence of
Suppose has only finitely many peak terms (possibly none). Then there is
a last peak term, say Every term xk of after is not a peak term and,
so, is strictly less than some xk+J. In particular, we can find a strictly increasing
sequence n0 < n1 <n2 such that Thus, has a strictly
increasing subsequence. [1

We can combine Lemmas 3.18 and 3.19 into the following important
theorem:

Theorem 3.20: Every bounded sequence in R has a convergent subsequence.

Proof Let be a bounded sequence in R. By Lemma 3.19, has a


monotone subsequence {xflk}. Clearly, {xflk} is bounded since is. Hence,
Lemma 3.18 implies {xflj converges. El

We can generalize Theorem 3.20 to I?' and any product norm as follows:

Theorem 3.21: Let II be any product norm on


1
Then any bounded sequence
in (Re, liii) has a convergent subsequence.

Proof From Theorem 2.8 and Corollary 3.8, we may assume that is the sum
norm II(x1,..., = dx1I. We proceed by induction on n, the case n = 1
having been proved in 3.20.
Suppose is a bounded sequence in or. Then there exists a constant c > 0
such that Ixk C c for all k. Let us write ak = (xlk,.. ., xflk) for all ke NI, and set
= x
13k
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 193

Then ak =03k, xflk). Since liakill = II13k1I1 + IXnkI ? IIfJkIIl, IXnkI, both {/3k} and
{ are bounded sequences in R'1 -' and R, respectively. By our induc-
tion hypothesis, {/3k} contains a convergent subsequence {fJk}. Suppose
{/Jk}
Now consider the corresponding subsequence {xflk.} of {xflk}. Since {xflk} is
bounded, {xflk} is bounded. Thus, Theorem 3.20 implies {xflk} has a convergent
subsequence { }. Recall that this means there exists a strictly increasing
function f: RJ —* {k1, k2, k3, . . } such that
. = for all j e Suppose
—+ y. For eachj e rSJ, set = Then is a subsequence of {/3k.}. At this
point, a diagram of the sequences we have constructed may be helpful.

3.22: c÷ {xflk.} c÷ {xflk}


c÷ {IJkJ {/k}
Since {/Jk.} -÷ /3, —' /3. We now claim converges to (/3, y). We first
observe that y1) — (/3, y)111 — filL + — Since —*13, and
—* y, this last equation implies
—' (/3, y). The sequence is
clearly a subsequence of {ak }, and, consequently, {ctk} contains a convergent
subsequence. This completes the proof of 3.21. U

One important corollary to Theorem 3.21 is the converse of Theorem 3.14 for
product norms on

Theorem 3.23: Let Ill be any product norm on l?1. A set A in (lr, II II) is
sequentially compact if and only if A is closed and bounded.

Proof If A is sequentially compact, then we have seen in Theorem 3.14 that A is


closed and bounded.
Suppose, conversely, that A is closed and bounded. Let be a sequence in
A. Since A is bounded, clearly frk} is a bounded sequence in Hence,
Theorem 3.21 implies {ock} has a convergent subsequence { Suppose
{
—* /3. Since each vector /3k lies in A, and { —' /3, we conclude
for any r>0. Thus, fleA. But, A=A since A is closed.
Therefore, /3 e A. We have now shown that every sequence in A has a
subsequence that converges to a vector in A. Thus, A is sequentially
compact. El

At this point, we have developed enough material to be able to state and


prove the principal result of this section. Namely, that any two norms on R? are
equivalent. In proving this assertion, it clearly suffices to show that an arbitrary
norm is equivalent to the sum norm 11111.

Theorem 3.24: Let fl be an arbitrary norm on Then ill is equivalent to the


sum norm Il'.
194 NORMED LINEAR VECTOR SPACES

Proof Let Ô = be the canonical basis of Set a=


Then a>0, and for every we
have hail = CYX=ix1lilbdl
tablished one of the two inequalities we need in order to show fl is equivalent
to 11111.

For any; fielr, we have — 11/311 C ha — /311 C alla — P1k. If we think


of Ilil as a real valued function from I
then this last inequality
implies is a continuous function on the normed linear vector space (Re, II II

Set S = {ae if?' = 1}. The reader can easily check that S is a closed and
bounded subset of (I?', 1111 J. In particular, Theorem 3.23 implies that S is a
sequentially compact subset of (R", 11111).
Since II II is a continuous function on I?', certainly II II is a continuous function
on S. We can now apply Corollary 3.16 to the continuous map II ll:S —* l1. We
conclude that II assumes a minimum value m on S. Thus, there exists a y e S
such that [I'll = m, and lall m for all aeS. Note that m >0. For if m C 0,
then liv II C 0. Since liii is a norm on R", we would then conclude that y = 0. This
is impossible since 0 is not in S.
We have now constructed a positive constant m, such that hal ? m for all
aeS. We can rewrite this last inequality as hail mhIaII1 for all aeS. Let
fieif?' — {0}. Then Consequently, Thus,
II fill mhl This last inequality also holds when /3 = 0. Thus, setting
b = 1/m, we have shown hail1 C bilall for all ac R". Since we had previously
argued that hail C aIlaIl1, we conclude that 1111 is equivalent to El

In the rest of this section, we shall develop the important corollaries that
come from Theorem 3.24. We have already mentioned our first corollary.

Corollary 3.25: Any two norms on I?' are equivalent. El

Notice then that any norm on if?' being equivalent to 11111 is automatically a
product norm. Hence, we can drop the adjective "product" when dealing with
norms on Any norm on I?' is a product norm.

Corollary 3.26: Let V be a finite-dimensional vector space over R. Then any two
norms on V are equivalent.

Proof Suppose dim(V) = n. Then any coordinate map gives us an isomorphism


T: if?' V. Suppose liii and III!' are two norms on V. Then f(a) = IIT(a)hl and
g(a) = IIT(a)hI' define two norms on if?'. By Corollary 3.25, f and g are equivalent.
Since T is surjective, the equivalence off and g immediately implies 1111 and Ill'
are equivalent on V. El

There is an important application of Corollary 3.26, which we list as another


corollary.
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 195

Corollary 3.27: Let (V, III) be a finite-dimensional, normed linear vector space.
If = n, then (V, III) is norm isomorphic to (IR", 11111).
Proof Suppose = {cz1,.. ., is a basis of V over R. Then we have an
isomorphism 5: V R" given by S((5) = (x1,..., where i = 5.We can
define a new norm fl' on V by the equation 11(511' = II S((5) fly. It is a simple matter
to check that S is now a norm isomorphism between (V, III') and (P.", 1111). By
Corollary 3.26, liii and fl' are equivalent. This means that the identity map
from (V, liii) to (V, III') is a norm isomorphism. Composing these two norm
isomorphisms, we get (V, iii) and (P.", 11111) are norm isomorphic. U
Thus, for finite-dimensional, normed linear vector spaces, the theory is
particularly easy. We can always assume that our space is (P.", II II
up to norm
isomorphism.
Returning to a remark made earlier in this section, we can now prove the
following generalization of Theorem 3.23:
Corollary 3.28: Let (V, III) be a finite-dimensional, normed linear vector space.
Then a subset A of V is sequentially compact if and only if A is closed and
bounded.
Proof (V, liii) is norm isomorphic to (P.", liii 1) for some n. So the result follows
from Theorem 3.23. U
We should point out here that Corollary 3.28 is really a theorem about P.". It
is not true in general. If(V, is an infinite-dimensional, normed linear vector
space, and we set B = {czeVl C 1}, then B is closed and bounded in V.
However, B is never sequentially compact. We ask the reader to provide a proof
of this assertion in Exercise 10 at the end of this section.
We can also use Corollary 3.26 to show that all linear transformations on
finite-dimensional spaces are bounded.
Corollary 3.29: Let V and W denote finite-dimensional, normed linear vector
spaces. Then eJ(V, W) = HomR(V, W).

Proof Let T e HomR(V, W). Since any two norms on V (as well as W) are
equivalent, it suffices to argue that T is bounded with respect to a specific choice
of norms on V and W.
Suppose dim(V) = n and dim(W) = m. Let = .
., z,,} be a basis of V
and /3 = { 13m} a basis of W. Then we have the following commutative
diagram:
T
3.30:

>prn
196 NORMED LINEAR VECTOR SPACES

The vertical maps in 3.30 are the usual coordinate isomorphisms:


(x1,..., xj, where
f(oc) = x1a1 = a, and g(/3) = (yi'.. Ym)' where
= /3. If r'(a, fl)(T) = then S in 3.30 is the linear transformation
given by

S((x1,..., xj) = (E

Now let us norm W with the usual sum norm 11111. Then Dali = ilf(a)111 defines
a norm on V. Let us norm with the uniform norm (notation as in
Example 1.3). Then 11/311' = lg(fJ)li is a norm on W. We now have the following
commutative diagram of normed linear vector spaces:
T
3.31: (V, liii) >('W, liii')

p
S
Ii 1k) (Rm, II lI)
It suffices to argue that T is bounded with respect to the norms 1111 and 1111'.
Set For any vector y=(x1,...,
e we have the following inequalities:

3.32:
= jti

Thus, the map S is a bounded linear operator. Since Diagram 3.31 is


commutative, we have

3.33: IIT(a)1I' = = IlSf(a)1L0 = blialI

This is precisely the statement that T is a bounded linear operator from


(V, liii) to (W, liii'). LI

The conclusion we can draw from Corollaries 3.27 and 3.29 is that when
dealing with finite-dimensional, normed linear vector spaces V and W (and the
linear transformations between them), we can make the following assumptions
up to equivalence:
(a) V = and W = Rm.
(b) The norms on V and W can be any we choose for computational
convenience.
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 197

(c) The linear operator between V and W is given by multiplication on the


right by an n x m matrix.

Our last topic in this section is another corollary that has important
applications in least-squares problems.

Corollary 3.34: Let (V, II) be a normed linear vector space, and suppose W is a
finite-dimensional subspace of V. Then for any vector fi e V, there exists an a e W
such that d(fi, W) = Va — fill.

Proof Recall that d(fi, W) = inf{ — fill I


e W} is called the distance (in the
1111-norm) between /3 and W. Since d = d(fi, W) is the infimum of the set
{IR — fill eeW}, there exists a sequence in W such that — fill} — d.
We restrict the norm to W and claim that is a bounded sequence in W. To
see this, suppose {cç} is not bounded in W. Then there exists a subsequence {aflk}
of such that {llaflkll} — +co. Now — fihl} is a subsequence of
— fihl}. Therefore, — fihl} —' d. On the other hand, I

II fill I
last two facts imply that llaflkll cannot be approaching
llaflk — fill. These
+cc. This is a contradiction. Thus, we conclude that is a bounded sequence
in W.
Now W is finite dimensional. Hence, it follows from Corollary 3.27 that the
normed linear space (W, II II) is norm isomorphic to (Re, for some n. In
II II

particular, Theorem 3.21 implies has a subsequence {aflk} that converges to


some vector a e W. Clearly, {aflk — fi} —' a — /3 in V. Since the norm is a
continuous function, we have d = lim{llaflk — fill} = Ila — fill. This completes
the proofof 3.34. fl

A few comments about Corollary 3.34 are in order here. Suppose W is an


arbitrary subspace of some normed linear vector space (V, 1111). Let fi e V. One of
the central problems of linear algebra is how to find a vector in W (if such a
vector exists) that is closest to fi. Thus, we seek a vector a e W such that
la — fill = d(fi, W). In the case where the norm is induced from an inner product
on V (see Section 1 of Chapter V), the search for a usually amounts to
minimizing a sum of squares. Hence, these types of problems are called least-
squares problems.
Corollary 3.34 guarantees that the least-squares problem is always solvable if
W is finite dimensional. In this case, we can always find an a e W such that
Ila — fill = d(fi, W). In particular, if V itself is finite dimensional, then a vector in
W closest to fi always exists.
If W is not finite dimensional, then a vector a e W such that
la — fill = d(fi, W) may not exist. We complete this section with an example
that illustrates this point.

Example 3.35: Let V = {fe C([O, 1]) f(O) = O}. Clearly, V is a subspace of
C([O, 1]). We norm V with the uniform norm 1111 given in equation 1.6(c). Let
198 NORMED LINEAR VECTOR SPACES

W= {fe V dt = O}. Since the integral is a bounded linear operator on V,


W is a closed subspace of V. W is not finite dimensional over R.
Let /3 be any function in V such that /3(t)dt = 1. If e W, then
— /31L0 — — /lj = fl/3 = 1. Thus, d(/3, W) ? 1.
We next claim that we can find a vector yeW such that 11/3—vlL is as close to
1 as we please. To see this, let h be any function in V such that fl h(t) dt 0. Set
c= Then /3 — ch = y is a vector in W. — vIL0 = IIchIL, =
e [0, 1]}/Ifl h(t) dtl. We can certainly choose h such that this last
quotient is as close to 1 as we want. For example,

3.36:

We conclude that d(/3, W) = 1. To complete the example, we now argue that


ha — 1 for any aeW. Consequently, d(f3, W) Va — filL0 for any a in W.
Suppose there exists an a W such that ha — fill = 1. Set h(t) = /3 — a. Then
h(t) is a continuous function on [0, 1], h(0) = 0, h(t) dt = 1, and
sup{lh(t)i t [0, 1]} = 1. If you try to draw the graph of h, you will immediately
see that no such function can exist. Hence, there is no a eW such that
lIa—flIL0=1. J

EXERCISES FOR SECTION 3

(1) If a sequence in V converges to two vectors /3 and /3', show that /3 = /3'.
(2) Show that the sum norm II(a, /3)11, = haIl + 11/3112 satisfies property in
Corollary 3.9.
(3) Let A be a closed and bounded subset of It Show that inf(A) and sup(A) are
elements in A.
(4) Let V and W be normed linear vector spaces, and suppose f: A —p W is a
continuous function on a sequentially compact subset A of V. 1ff is a
bijection from A to f(A), show f': f(A) —÷ A is continuous.
(5) Construct a sequence in [0, 1] such that every ye [0, 1] is the limit of
some subsequence of

(6) If A and B are sequentially compact subsets of a normed linear vector space
V, show A + B is also sequentially compact.

(7) Unlike sequential compactness, the sum of two closed sets in V need not be
closed. Exhibit an example of this fact.
EXERCISES FOR SECTION 3 199

(8) Let V be a normed linear vector space. If W is a subspace of V, show that


the closure W of W is also a subspace of V.
(9) Modify the argument in Corollary 3.34 to show that any finite-dimensional
subspace of V is a closed set in V.
(10) Suppose V is an infinite-dimensional, normed linear vector space. Show
that the subset B = {oceV Dcxli 1} is closed and bounded, but not
sequentially compact. (Hint: Show that B cannot be covered by a finite
number of open balls of radius
(11) Suppose W is a proper closed subspace of a normed linear vector space V.
Let 0 c r < 1. Show there exists a vector /1eV such that 1110 = 1, and d(/1,
W)> 1 — r.

(12) In Exercise 11, suppose we assume dim(V) < CX). Show there exists a vector
/1eV such that 111111 = d(/1, W) = 1.
(13) Suppose V and W are normed linear vector spaces, and let T e W).
Show that ker(T) is a closed subspace of V. Is Im(T) a closed subspace of
W?
(14) Consider the function f: R2 —, R defined as follows: f(x, y) = xy/(x2 + y2) if
(x, y) (0, 0), and f(0, 0) = 0. Use Lemma 3.7 to prove the following
assertions:
(a) For all xc R, f(x, ): R R is continuous at 0.
(b) For all ye R, f( , y): R -. R is continuous at 0.
(c) f is not continuous at (0, 0).

(15) Suppose V and W are normed linear vector spaces, and let A and B be
sequentially compact subsets of V and W, respectively. Prove that A X B is
a sequentially compact subset of V X W relative to any product norm on
VXW.
(16) Let V be a normed linear vector space. A subset A of V is said to be dense in
V if A = V. Give an example of a proper subset of I?' that is dense in
Suppose W is a second normed linear vector space and f and g two
continuous functions from V to W. Suppose f = g on some dense subset of
V. Prove f = g.
(17) Let V and W be normed linear vector spaces. Let f: A —÷ W be a function
from a subset A of V to W. We say f is unjformly continuous on A if
for every r > 0, there exists an s > 0 such that for all cx, /1 e A,
iicx—/1ii <r.
(a) If f is uniformly continuous on A, prove that f is continuous on A.
Show that the converse is false.
(b) If A is sequentially compact, and f continuous on A, prove that f is
uniformly continuous.
200 NORMED LINEAR VECTOR SPACES

4. BANACH SPACES

In this section, we take a brief look at Banach spaces. Our goal is to introduce
the terminology most frequently used in the literature and prove that any
normed linear vector space is contained in some Banach space. As usual, (V, liii)
will denote some normed linear vector space over R. We first remind the read
about a definition familiar from the calculus.
Definition 4.1: A sequence {ocj in V is said to be Cauchy (or is called a Cauchy
sequence) if for every r> 0, there exists an m e 1%J such that
<r.
There are several easy facts about Cauchy sequences that we shall need in the
sequel. We gather these facts together in our first lemma. We leave the proof for
the exercises at the end of this section.
Lemma 4.2: (a) If — then is Cauchy.
(b) Any Cauchy sequence is bounded.
(c) If is Cauchy and contains a subsequence converging to say
then —÷

(d) A sequence is Cauchy if and only if


— 2m11 = 0.
(e) Let V and W be normed linear vector spaces, and suppose A is
a subset of V. Let f: A -÷ W be a function. 1ff is Lipschitz on A,
and is a Cauchy sequence in A, then {fOxj} is a Cauchy
sequence in W.
(f) With the same notation as in (e), suppose f is uniformly
continuous on A. If is a Cauchy sequence in A, then {f(cxj}
is a Cauchy sequence in W. fl

We note that 4.2(f) is not true in general for continuous functions. See
Exercise 2 at the end of this section. We can now introduce the central definition
of this section.

Definition 4.3: A normed linear vector space V is said to be complete if every


Cauchy sequence in V converges.

Again we remind the reader that this definition depends on the particular
norm on V. It is more precise to say (V, liii) is complete. If liii and 1' are two
equivalent norms on V, then clearly a sequence {oçj is Cauchy in the Il-norm if
and only if is Cauchy in the II I'-norm. In particular, (V, Ill) is complete if
and only if (V, liii') is complete.
A complete, normed linear vector space is called a Banach space in honor of
the great analyst Stefan Banach. One of the most important examples of a
Banach space is R itself.
BANACH SPACES 201

Theorem 4.4: R is a Banach space.

Proof Let be a Cauchy sequence in R. Then is bounded by Lemma


4.2(b). Then Theorem 3.20 implies contains a convergent subsequence {xflk}.
But then 4.2(c) implies converges. S
We can follow the same sort of argument given in Theorem 3.21, and get the
following corollary to Theorem 4.4:

Corollary 4.5: is a Banach space.

Proof Any norm on is equivalent to the sum norm 1 Hence, it suffices to


show (Rn, complete. If {ak = (xlk,..., xflk)} is a Cauchy sequence in
is
then {/3k = (xlk,.. .,xfl_lk)} is a Cauchy sequence in and {xflk} is a
Cauchy sequence in R. Induction then implies that {/lk} —* fi for some /1e
and {xflk} —* x for some x e R. Then the reader can easily check that
frk}—*(i3,x). E

Corollary 4.6: If V is any finite-dimensional, normed linear vector space, then V


is a Banach space.

Proof We have seen in Corollary 3.27 that V is norm isomorphic to (Br, II

for some integer n. The result now follows from 4.5. U

If V is not finite dimensional, then V need not be complete. Consider the


following example:

Example 4.7: Let V = R, and consider the norm HI given in equation


1.8(c). We claim that V is not complete in the fl To see this, let
oç, = (1, 1/2, 1/3,..., 1/n, 0, 0, 0,...). If n > m, then — = 1/(m + 1).
Thus, Lemma 4.2(d) implies is a Cauchy sequence in V. Clearly, {oçj has no
limit in V. Hence, V is not complete. El

We do not intend to embark on a full-scale study of Banach spaces. We shall


present only one last theorem, which says that every normed linear vector space
can be imbedded in a Banach space. In order to prove this assertion, we need
one preliminary result.

Lemma 4.8: Let (V, 1111) be a normed linear vector space, and suppose A is a
subset of V such that A = V. If every Cauchy sequence in A has a limit in V, then
V is complete.

Proof Let {oçj be a Cauchy sequence in V. We must argue {oçj has a limit in V.
Since A = V, each In particular, for each n e FkJ, there exists a such
that <1/n. Then C +
202 NORMED LINEAR VECTOR SPACES

c 1/n + 1/rn + — 2m11. Since {cxn} is Cauchy, we conclude frorn these in-
equalities that { is Cauchy. Our hypotheses now imply that there exists a
fJeV such that But then C +
II — fill implies {ocj —÷ fi.
Thus, V is complete. fl
We can now state our main result.

Theorem 4.9: Let (V, II II) be a norrned linear vector space. Then there exists a
Banach space (V', 1111') and a rnonornorphism 0: V —* V' such that the following
properties are satisfied:

(a) 110(2)11' = 11211 for all ace V.


(b) The irnage of 0 is dense in V', i.e. 0(V) = V'.

Furthermore, if (V", is another Banach space admitting a rnonornorphism


V —* V" satisfying (a) and (b), then there exists a norrn isornorphism x: V' V'1
i/i:
such that the following diagram is comrnutative:

Let us say a few words concerning Theorem 4.9 before giving its proof. A
Banach space (V', liii') satisfying the conditions (a) and (b) in 4.9 is called a
cornpletion of (V, II II). The theorem guarantees that every normed linear vector
space has a completion. The second half of the theorem says that any
completion of V is unique up to norm isomorphisrn. Hence, we may refer to the
completion of (V, III).
A linear transformation between norrned linear vector spaces which preserves
distances is called an isornetry. Thus, 4.9(a) says V is isometrically imbedded in
its completion. 4.9(b) says via 0, V sits in its completion as a dense subset.

Proof of 4.9: Consider the vector space given in Example 1.6 of Chapter 1.
VN is nothing but the set of all sequences in V with addition and scalar
multiplication defined pointwise. Let S = EVN is Cauchy}. If
and { are Cauchy sequences in V, and x, y e then clearly + is a
Cauchy sequence in V. Thus, S is a subspace of VN.
Let eS. The inequality IIocJI — INxmII C — implies that is
a Cauchy sequence in R. Since R is complete, the sequence has a limit in
R. In particular, it makes sense to talk about the lirnit, of the sequence
for any vector eS. We can then define a function p: 5 —÷ R by
p({oçj) = The reader can easily verify that the function p satisfies the
BANACH SPACES 203

following properties:

4.11: (1) p({oç,}) 0


(2) =
(3) p({cx1j + c +

These inequalities hold for all { eS, and all x e R. In the language of
Exercise 2 of Section 1, p is a seminorm on S. Note that p({oçj) = 0 does not
mean = 0. So, p is not a norm on S.
We now follow the ideas laid out in Exercise 10 of Section 2. Set
N= €5 = 0}. N is precisely those Cauchy sequences in V whose
norms "cxJ have limit zero. By Exercise 10, N is a subspace of 5,
and S/N is a normed linear vector space with norm given by
+ Nil' = p({cxj) = Set V' = S/N. We claim (V', III') is a
Banach space satisfying (a) and (b).
We first define a map 0: V —* V' by the equation 0(cx) = + N, where {cx,j
is the constant sequence cxi, = cx for all n e 1%J. We shall need some special
notation for constant sequences in this proof. We shall let {cx} denote the
constant sequence in V every term of which is cx. Clearly, any constant sequence
{cx} is Cauchy, and, thus, {cx}eS. The map 0 is now given by 0(2) = {cx} + N.
Clearly, 0 is a linear transformation from V to V'.
Suppose 0(2) = 0 for some cx V. Then {cx} + N = N in V'. Thus, {cx} eN. But
then iicxfl = lim flcxfl = 0. Therefore, cx = 0 since is a norm on V. We conclude
1

that 0 is a monomorphism from V to V'. Since ilO(cx)li' = li{cx} + Nil' =


lim licxD = ilcxV, we see 0 is an isometry. Thus, we have established (a).
In order to establish (b), let us first observe that we have the following
commutative diagram of seminormed vector spaces and linear maps:

4.12: (V, iii) > (5

(V', lii')

In this diagram, is the constant sequence map given by {cx}. it is the


natural projection given by + N. Note that these maps are
"isometries" of the seminorm spaces in question. We have p(00(cx)) = Dcxii for all
cxeV. Also, = for any
To argue 4.9(b), we must show that every ball Br({cxn} + N) around a point
+ N in V' has a nontrivial intersection with Im(0). We claim that it is
enough to prove this assertion on the 00-level of diagram 4.12. Hence, we claim
that 4.9(b) follows from our next assertion.

4.13: For every €5, and for every r > 0, there exists a vector e V such that
P(0o@7) — c r.
204 NORMED LINEAR VECTOR SPACES

To prove 4.13, fix r > 0 and eS. Since is a Cauchy sequence in V,


there exists an me Ri such that n ) m — <r. Set ti = Then
P(0o&i) — = — = lim — <r. (The limit here is
as n —÷ cc.) This proves 4.13 and completes the proof of 4.9(b).
It now remains to show that (V', is complete. For this, it suffices by
Lemma 4.8 to show that every Cauchy sequence in 0(V) has a limit in V'. Let
= + N} be a Cauchy sequence in 0(V). Thus, are
vectors in V. Now — = 110(& — = 117t0o@n —
— Thus, is a Cauchy sequence in the seminorm space S.
[For the seminorm space (5, p), a sequence in S is said to be Cauchy if for
every r > 0, there exists an m such that, s, t ) m — < r.] Each is
the constant sequence in V. Thus,
1
is a Cauchy sequence of
constant sequences in S. We need the following fact:

4.14: Every Cauchy sequence in 00(V) has a limit in 5, that is, there
exists a vector {13k} eS such that — = 0.

To prove 4.14, we first note that being Cauchy in S means


0= — = — In particular, is a
Cauchy sequence in V. Thus, = ye S. We claim the sequence
converges to y. For any fixed m e 1%J, we have p(OoOxm) — y) = 1

= — oc.JI. Since is a Cauchy sequence in V, this last


limit can be made as small as we please by choosing m large. Therefore,
—' {cxj. This completes the proof of 4.14.
If we now apply 4.14 to the sequence we see —÷ in S.
Using diagram 4.12, it easily follows that -÷ + N in V'. Thus, the
hypotheses of Lemma 4.8 are satisfied. We conclude that (V', 1') is complete.
The fact that a completion of V is unique up to a norm isomorphism x
satisfying 4.10 is straightforward. We leave the details to the exercises at the end
of this section. fl

EXERCISES FOR SECTION 4

(1) Prove the six assertions in Lemma 4.2.

(2) Give an example in R showing 4.2(f) is not true in general for continuous
functions.
(3) Fill out the details in the proof of Corollary 4.5.
(4) Show that C([0, 1]) is a Banach space with respect to the uniform norm
= 1]}.

(5) Suppose (V1, 1)' ...' (Va, are a finite number of Banach spaces.
Show that the product V1 x x is a Banach space relative to any
product norm.
EXERCISES FOR SECTION 4 205

(6) Suppose V is a normed linear vector space and W is a Banach space. Show
that W) is a Banach space with respect to the uniform norm (see
Definition 1.23).
(7) In the proof of Theorem 4.9, show that assertion 4.13 indeed implies the
closure of 0(V) in V' is all of V'.
(8) Complete the proof of Theorem by showing that the completion
(V', liii')V is unique up to a norm isomorphism satisfying 4.10.
of

(9) Suppose (V, Ill) is a Banach space, and V itself is also an algebra over R.
We say (V, liii) is a Banach algebra if the following two properties are
satisfied:
(i) lcxfihl C 1kV II /311 for all cx, /3eV.
(ii) 11111 = 1.
If V is a Banach space, show that V) is a Banach algebra with
respect to the uniform norm.
(10) Suppose (V, 1111) is a Banach algebra.
(a) IfoceV has < 1, then show 1 — is invertible in V. More precisely,
cx

show (1 — = f. (Recall that an elementin an algebra V is


invertible if there exists an element /3 e V such that = /32 = 1.)
(b) Let U = {cx e V is invertible}. Show that U is a nonempty, open
subset of V. Is U a subspace of V?
(c) With U as above, show the map —÷ a continuous map on U.
(d) Deduce the following theorem: if V is a Banach space, then the
invertible transformations in V) form a nonempty, open subset.
The map T —÷ T'
is continuous on this open set.
(11) Suppose (V, Ill) is a Banach algebra. Show that the multiplication map
/3) -÷ocjJ is a bounded bilinear map from V x V — V.

(12) Show that (C([O, 1]), liii) is not a Banach space. as in 1.6(b)).

(13) Let (V, 1111) be a Banach space. Suppose W is a closed subspace of V. Prove
that (W, Ill) is a Banach space.
(14) Suppose (V, liii) is a normed linear vector space. If V is sequentially
compact, prove that (V, 1) is a Banach space. Is the converse true?
(15) Let (V, II 1) be a normed linear vector space. Let {cxn} be a sequence in V.
We say the infinite series converges (to say /3 e V) if —, /1. Here

{
is the usual sequence of partial sums given by = We say
is absolutely convergent if converges. If (V, III) is a
Banach space, prove that every absolutely convergent series converges.
(16) Prove the converse of Exercise 15: If every absolutely convergent series in V
is convergent, then V is a Banach space.
(17) Use Exercise 16 to show that if N is a closed subspace of a Banach space V,
then V/N is a Banach space. The norm on V/N is given in Exercise 13 of
Section 2.
Chapter V

Inner Product Spaces

1. REAL INNER PRODUCT SPACES

In this chapter, we return to the material of Section 7 of chapter I. We want to


study inner products for both real and complex vector spaces. In this section and
the next, we shall concentrate solely on real inner product spaces. Later, we shall
modify our definitions and results for complex spaces. Throughout this section
then, V will denote a vector space over the real numbers R. We do not assume V
is finite dimensional over R.
The reader will recall that an inner product on V is a bilinear form
ox V x V —> R, which is symmetric and whose associated quadratic form is
positive definite. If w is an inner product on V, then we shall shorten our
notation and write w(cz, /3) = <cx, Thus, we can rewrite Definition 7.12 of
Chapter I using this new notation as follows:

Definition 1.1: Let V be a real vector space. An inner product on V is a function


>: V x V -÷ R satisfying the following conditions:

(a) <xoc + yfJ, y) = x<cx, y> + y(/3,


(b) <y, xcx + yf3> = x<y, cx> + y<y, /3>.
(c) (cx, /3> = (/3, cx>.
(d) (cx, cx> is a positive real number for any cx 0.

Conditions (a)—(d) in 1.1 are to hold for all vectors cx, /3, y in V and for all x,
ye R. Note that (a) and (d) imply that (cx, cx> 0 with equality if and only if
cx =0.
206
REAL INNER PRODUCT SPACES 201

A vector space V together with some inner product , ) on V will be called


an inner product space. Of course, the same space V may be regarded as an inner
product space in many different ways. More precisely then, an inner product
space is an ordered pair (V, < , )) consisting of a real vector space V and a real
valued function'( , >: V x V —' R satisfying the conditions in 1.1. Let us review
some of the examples from Chapter I.

Example 1.2: Let V = R'1, and set <cx, /3> = where cx = (x1,...,
and /3 = (y1,..., yj. It is a simple matter to check that < , > satisfies
conditions (a)—(d) in 1.1. We shall refer to this particular inner product as the
standard inner product on fl
Example 1.3: Let V = e r=1 R. Define an inner product on V by setting
<cx, /3> = Here cx = (x1, x2,...) and /3 = (y1, y2,...). Since any vector
in V has only finitely many nonzero components, '( , ) is clearly an inner
product on V. Thus, V is an example of an infinite-dimensional, inner product
space. El

Example 1.4: Let V = C([a, b]). Set <f, g) = f(x)g(x)dx. An easy computation
shows < , > is an inner product on V. J

A less familiar example is the normed linear vector space mentioned in


Example 1.9 of Chapter IV.

Example 1.5: Let V = < cc}. We can define an inner


product on V by setting = We ask the reader to verify
that with this definition of < , >, (V, < , >) is an inner product space. This
space is usually denoted by 12 in the literature. fl

Let (V, < , >) be an inner product space. If T: W —> V is an injective linear
transformation, then we can define an inner product on W by setting
<cx, /3>' = T(/3)) for all cx, /3 e W. In this way, we can produce many new
examples from the examples we already have. A special case of this procedure is
that in which W is a subspace of V. If we restrict < , > to W x W, then
(W, < , >) becomes an inner product space in its own right. For instance, is a
subspace of V in Example 1.3. When we restrict < , ) to x we get the
standard inner product on
Our first general fact about an inner product space (V, <')) is that V has a
natural, normed linear vector space structure that is intimately related to the
inner product < , ). To see this, we need an inequality, known as Schwarz's
inequality.

Lemma 1.6: Let V be an inner product space. Then <cx, /3)1 /3>112
<cx, cx)112</3,
for all cx, /3eV.
208 INNER PRODUCT SPACES

Proof Fix and /3 in V. For each real number t, let p(t) = — t/3, — t/1).
Then 1.1 implies that p(t) is a quadratic function on such that p(t) 0 for all
t e R. It follows that the discriminant, 4(x, /3)2 — /3, /3), of p(t) must be
negative or zero. Thus, (cx, /3)2 C x)<fl, /3). Taking square roots gives us the
desired conclusion. fl

We take this opportunity to point out that although Schwarz's inequality is


easy to prove, its conclusions in specific examples are not at all obvious. In
Example 1.2, for instance, the inequality becomes
2

(E xiyi) C v?).

In 1.7, x1,y1,..., are arbitrary real numbers.


In Example 1.4, Schwarz's inequality becomes

1.8: f(x)2dx) g(x)2dx)


s: (J
Our most important application of Lemma 1.6 is the following corollary:

Corollary 1.9: Let (V, < , )) be an inner product space. Then = (x, tx)1'2 is
a norm on V.

Proof We must verify that liii satisfies the conditions in Definition 1.1 of
Chapter IV. Of these conditions, only the triangle inequality
+ /Jfl + 11/311, requires any proof. Fix cx and /3 in V. Using Schwarz's
inequality, we have Ihx+13112= (cx+/3, cx+/3) = (oçoc) +2<oc/i)+ (/3,13)
C + 21(2,13)1 + 11/3112 C 112112 + 2021111/311 + 11/3112 = (1120 + 11/311)2. If we
now take the square root of both sides of this inequality, we get + /111 C
+ El

Corollary 1.9 implies that every inner product space (V, ( , )) is a normed
linear vector space via = (2, 2)h/2. We shall call the norm given in 1.9 the
norm associated with the inner product ( , ). In Example 1.2, for instance, the
norm associated with the standard inner product is just the ordinary Euclidean
norm given in equation 1.4(b) of Chapter IV. In Example 1.3, the norm
associated with ( , ) is the natural extension of the Euclidean norm to
[1.8(b) of Chapter IV]. In Example 1.4, the norm associated with the inner
product there is given by 1.6(b) of Chapter IV.
Since any inner product space (V, ( , )) is a normed linear vector space with
respect to the norm associated with ( , ), we have all the topological
machinery from Chapter IV at our disposal. In particular, we can talk about the
distance between two vectors, open and closed sets, continuity, limits, complete-
REAL INNER PRODUCT SPACES 209

and so on. It will be understood that these notions are all relative to the
ness,
norm associated with the inner product ( , >. Thus, when we speak of the
distance between two vectors and fi, for instance, we mean

When dealing with inner product spaces, we shall always use the symbol II
to represent the norm associated with the inner product, that is, = 2>1/2.
In terms of the associated norm, Schwarz's inequality can be rewritten as
follows:

Theorem 1.10 (Schwarz's Inequality): Let (V, ( , )) be an inner product space


with associated norm 1
Then (cx, /1>1 fl
There is an important corollary that follows from Theorem 1.10.
Corollary 1.11: Let (V, ( , )) be an inner product space with associated norm
II.Then the inner product ( , ): V x V —÷ R is a continuous function with
respect to any product norm on V x V.
Proof? By Theorem 2.7 of Chapter IV, any two product norms on V x V are
equivalent. So, we might as well prove the assertion for the sum norm
V&z, = + V x V. In order to show ( , > is continuous, it
suffices by Lemma 3.7 of Chapter IV to show that if /Jj} (cx, fi) in V x V,
then (z, /3> in R.
So, suppose {(z11, fij} -+ (cx, j3) in V x V. The convergence here is relative to
the sum norm Consequently, -+ and —p /3 in V. These statements

follow from (s?) in Corollary 3.9 of Chapter IV. Since we know the
sequence is bounded. Suppose Cc for all ne Ni. Applying Schwarz's
inequality, we have
fi,) — (cx, 11>1 flu) — 11>1 + — (ocfl>I

+
+ 01311

From this inequality, it is now obvious that -÷ <z /3> in R. 0


In the sequel, we shall often use the following special case of 1.11. Suppose
some inner product space (V, ( , -+ z. Here it is understood that the
convergence is relative to the norm associated to < , ). Then for any $ e V,
1

(cx, fi> in This conclusion follows immediately from applying the


continuous function ( , ) to the convergent sequence fl)} -+ /1) in
VxV.
We have seen that any inner product space is a normed linear vector space.
Spaces of this type are called pre-Hilbert spaces. Let us introduce the following
more precise definition:
210 INNER PRODUCT SPACES

Definition 1.12: A normed linear vector space (V, ii) Called a pre-Hilbert
space if there exists an inner product ( , > on V such that = (cx, x>"2 for
all cxeV.

Thus, (Rn, 1111) (notation as in 1.4 of Chapter IV) is a finite-dimensional pre-


Hilbert space. An example of an infinite-dimensional pre-Hilbert space is given
by (C([a, b]), U I) (notation as in equation 1.6 of Chapter IV). Note that the
notion of a space being pre-Hilbert depends on the particular norm being
discussed. Unlike most of the ideas discussed in Chapter IV, the property of
being pre-Hilbert is not invariant with respect to equivalence of norms. For
example, the sum norm Ii U1 on 11" is equivalent to the Euclidean norm 1. 1

However, (He, is not a pre-Hilbert space. To see this, suppose


fjl/2 for some inner product ( , > on R". As usual, let
U

=
3= . . , bj denote the canonical basis of R°. Set = and
let S denote the boundary of the unit ball in (W, HI Then
S = {cx =(x1,...,xjI IkxII1 = 1} = = 1}. On the other
hand, = = Thus, S is the set of zeros in 11" of the
quadratic polynomial 1
— 1. This is clearly impossible. S has too
many corners to be the set of zeros of any polynomial. Thus, the sum norm fi
cannot be the associated norm of any inner product on In particular, the
normed linear vector space (V, II III) is not a pre-Hilbert space.
Definition 1.13: A pre-Hilbert space (V, 1111) is called a Hubert space if V is
complete with respect to Ill.
Thus, a Hilbert space is a Banach space whose norm is given by an inner
product. For example, (R", III) is a Hilbert space. More generally, Corollaries
3.27 and 4.5 of Chapter IV imply that any finite-dimensional pre-Hilbert space is
in fact a Hilbert space. For an example of an infinite-dimensional Hilbert space,
we return to Example 1.5. We ask the reader to confirm that 12 is a Hilbert space,
infinite dimensional over R. (See Exercise 2 at the end of this section.)
An important point here when dealing with pre-Hilbert spaces is the analog
of Theorem 4.9 of Chapter IV.
Theorem 1.14: Let (V, Ill) be a pre-Hilbert space. Then there exists a Hilbert
space (V', Ill') and an isometry 6: V -÷ V' such that the closure of 6(V) in V' is all
of V'. Furthermore, if (V", III") is a second Hilbert space admitting an isometry
i/i: V —÷ V" such that t/i(V) is dense in V", then there exists a norm isomorphism
x: V' V" such that xO =
Proof We shall not use this theorem in the rest of this text. Hence, we define V'
and leave the rest of the details to the exercises. Let (V', Ill') denote the
completion of V constructed in Theorem 4.9 of Chapter IV. We define an inner
product on V' = S/N by the following formula:
+ N, + N> = tim fin>
REAL INNER PRODUCT SPACES 211

The reader can easily argue that this formula is well defined and gives an inner
product on whose associated norm is 11 1'. S

Throughout the rest of this section, (V, U will denote a pre-Hilbert space.
Let us recall a few familiar definitions from the calculus.

Definition 1.16: (a) Two vectors cx and $ in V are said to be orthogonal if


(cx, /3) = 0. If cx and /3 are orthogonal, we shall indicate this
by writing cx 1$.
(b) Two subsets A and B of V are orthogonal if cx ± /3 for all
cx e A and /3 e B. In this case, we shall write A ± B.
(c)
all /JeA}.
(d) A set of vectors i e I} is said to be pairwise orthogonal if
(cxi, cxi) = 0 whenever i
(e) A collection of subsets {A1 i e I} is said to be pairwise
orthogonal if ± whenever i j.

Note that A' is a subspace of V such that A' n L(A) = 0. In fact, we even
have A' n L(A) = 0. For suppose cx e MA), and fleA'. By Lemma 3.4 of
Chapter IV, there exists a sequence {cxn} in L(A) such that cx. Using the

continuity of the inner product, we have {(cxn, /3)} (cx, /9). But, $) = 0
for all n e Hence, (cx, /3) = 0. In particular, if cxc A' n MA), then <cx, cx) = 0.
Thus cx = 0.
Vectors that are orthogonal behave nicely with respect to length formulas.

Lemma 1.17 (Parallelogram Law): Let cx, /3 e V.Then

(a) flcx + /3112 + — $fl2 = 2(0cx02 + 0/9112).


IIcx

(b) cx±$if and only if = 0cx112 + 11/3112.


(c) If i = 1,..., n} is pairwise orthogonal, then cx1112 = 1

Proof (a) Ucx+/3112+Ilcx_/3112=(cx+/3,cx+/3)+(cx_/3,cx_fl)=2(cx,cx)


+ 2< /3, /3) = 2(11cx112 + 11/9112).
(b) 11cx+$112=(cx+$,cx+$>=IIcxlI2+2(cx,/3>+Il/3112. Thus, lIcx+/3112
= 1Icx02 + 11/3112 if and only if (cx,/3) = 0.
(c) This assertion follows trivially from (b). LI

The Parallelogram Law has an interesting corollary:

Corollary 1.18: Suppose A = {cx1 i e I} is a set of pairwise orthogonal, nonzero


vectors in V. Then A is linearly independent over
212 INNER PRODUCT SPACES

Proof Suppose + + = 0 is a linear combination of vectors from


A. By Lemma 1.17(c), we have 0 = c#1ll2 = Yi'=i 112. Since no

is zero, we conclude that every = 0. Thus, OCh,..., are linearly independent.


In particular, A is linearly independent over R. fl

At this point, we return to the study of least-squares problems in the context


of a pre-Hilbert space V. Suppose W is a subspace of V. Let fieV. We want to
decide when the distance, d(fi, W), between fi and W is given by floc — fill for
some z e W. We first note that d(fi, W) may not equal—Infifl for any vector
W. We had seen an instance of this in Example 3.35 of Chapter IV for the

j
normed linear vector space (V = {fe C([O, 1]) f(0) = 0}, 1111 J. Unfortunately,
(V, Ill is not a pre-Hilbert space. The reader can easily argue that 1111 is not
the norm associated with any inner product on V. To produce an example that
fits our present context, we can return to Example 1.5. If we set
W= €12 = 0 for all n sufficiently large}, then W is a subspace of 12. Let
fi = {l/n}. Then fi €12 — W. The reader can easily check that d(fi, W) = 0. (In
fact, W = 12.) Thus, d(fi, W) = 0, but 0 In — fill for any neW. We ask the
reader to verify these remarks in Exercise 7 at the end of this section.
Thus, in a pre-Hilbert space V, a given subspace W may contain no vector n
that is closest to fi in the sense that d(fi, W) = In — fill. However, if W does
contain a vector cx such that In — fill = d(fi, W), then we can give a nice
geometric characterization of cx.

Theorem 1.19: Let (V, 1111) be a pre-Hilbert space. Suppose W is a subspace of V,


and let fieV. Let neW. Then In — fill = d(fi, W) if and only if(cx — fi)± W.

Proof Suppose (cx — fi) ± W. Let c5eW — {cx}. Then — fill2 = ll((5 — cx)
11(5

+ (cx — fi)112 = 11(5 — 2112 + llcx — fill2. Since cx — fi is orthogonal to W, this last
equality comes from 1.17(b). Taking square roots, we see 1(5 — fill > llcx — fill. In
particular, d(fi, W) = inf{lly — fill yeW} = 112— fill.
Conversely, suppose llcx— fill = d(fi, W). Fix a vector (5€ W —{0}. Then for
any real number t, we have cx + t(5eW. Thus, llcx—fi112 llcx + t(5 — fill2 =
(cx—fi+t(5, cx—fi+t(5>=llcx—fi112+2t(cx—fi, (5>+t211(51l2. Thus, the
quadratic form q(t) = 2t(cx — fi, (5> + t2llcSll2 is nonnegative on It This can
only happen if the discriminant of q(t) is not positive. Thus, 4(cx — fi, (5>2
— 411(5112(0) 0. Hence, (cx — fi, (5> = 0. If (5= 0, then clearly (cx — fi, (5> = 0.
We conclude that cx — fi is orthogonal to W. fl

Let us make a couple of observations about the proof of Theorem 1.19. If


fi e W, then of course cx
= fi, and the result is trivial. Suppose fi is not in W. If W
contains a vector cx such that IIcx — fill = d(fi, W), then cx is the unique vector in
W with this property. For we have seen in the proof of 1.19, that if (5€ W — {cx},
then 11(5 — fill > IIcx — fill. Hence, if W contains a vector closest to fi, then that
vector is unique. This point is a characteristic feature of pre-Hilbert space
theory. If W is a subspace of an arbitrary normed linear vector space V, then W
REAL INNER PRODUCT SPACES 213

may Contain several vectors that are Closest to a given vector /3. In pre-Hilbert
spaces, if W contains a vector x closest to/I in the II Il-norm, then z is unique. We
want to give a special name to the vector when it exists.

Definition 1.20: Let W be a subspace of the pre-Hilbert space V. Let /1EV. If W


contains a vector such that — /3) 1 W, then will be called the orthogonal
projection of $ onto W. In this case, we shall use the notation = to
indicate that is the orthogonal projection of /3 onto W.

We caution the reader that does not always exist. By Theorem 1.19,
[when it exists] is the unique vector in W closest to /3 in the 1111-norm.
We have seen an example (Exercise 7 at the end of this section) that shows that in
general there is no vector in W closest to /3. If does exist, then
— /3 is orthogonal to W. Notice also that = /3 if and only if
flEW.
There is one important case in which always exists.

Theorem 1.21: Let W be a subspace of the pre-Hilbert space (V, Ill). Suppose
(W, 1111) is a Banach space. Then exists for every /3EV. In this case,
V=
Proof Let /3 E V. If /3 E W, then = /3, and there is nothing to prove.
Suppose /3 is not in W. Set d = d($, W). Then there exists a sequence {oç,} in W
such that — /31l} d. We claim that is a Cauchy sequence in W. To see
this, we first apply the Parallelogram Law. We have — 112 =

11(13 — — (/3 — (Zm)112 = 2(11/3 — + 11/3 — (2m112) — 112/3 — + 2m)112.


Since 112/3— (cxc + 2,j112 = 41113 + 2gn)/2112 4d2, we have — 2m112
+
— 2fl112 — cxmIl2) — 4d2. The limit of this last expression is zero as
m, n go to infinity. We conclude that — 2m11 = 0. This proves that

{ xj is a Cauchy sequence in W.
Since W is complete, there exists a vector 2 E W such that -÷ 2. Then
continuity of the norm implies — /311} —* 112
— /311. Thus, d = 112 — /311.
Theorem 1.19 now implies that = 2.
As for the second assertion, we always have W n W' = 0. We need to argue
that V = W + W1. Let /3EV. From the first part of this argument, we know
ci = exists. The vector ci — /3 is an element of W'. Thus, /3 — ci E W1.
Since 13 = ci + (/3 — ci) E W + W', we conclude that V = W W'. fl
Note that Theorem 1.21 is a generalization of Corollary 3.34 of Chapter IV
when (V, Ill) is a pre-Hilbert space. For if W is a finite-dimensional subspace of
V, then (W, 1111) is norm isomorphic to (lr, 11111) for some n. Thus, W is complete
by Corollary 4.5 of Chapter IV. Some of the most important applications of
Theorem 1.21 arise when V itself is finite dimensional. Then every subspace W of
V is a Banach space, and, consequently, exists for every vector /1 E V.
Let us discuss a well-known example of the above results. Suppose V is the
214 INNER PRODUCT SPACES

Hubert space (He, U fi) given in Example 1.2. Let W be a subspace of V, and let
fi e V. In this discussion, it will be Convenient to identify with the space of all
n x 1 matrices The standard inner product on 11? is then given by the
following formula: = Here is the transpose of the n x 1 matrix
and is the matrix product of with
Suppose is a basis of W. Let us write each column vector; as
= (a11,..., aJ and form the n x m matrix A = = IcXm). Then W is
just the column space of A.
Let fi = (b1,. . , bJ. Then finding the orthogonal projection,
. of fi
onto W is equivalent to determining the least squares solution to the following
system of linear equations:

1.22: AX =

In 1.22, X = (x1,..., Xm)t is a column vector in Mmx IffleW, then the linear
system in 1.22 is consistent. In this case, equation 1.22 has a unique solution Z
since rk(A) = m. If $ is not in W, then the linear system in 1.22 is inconsistent. In
either case, the words "least-squares solution" means a Z in Mm 1(R) for which
IIAZ — is as small as possible.
Now inf{IIAX — = — $11 yeW} = d(fJ, W). Thus,
by Theorem 1.19, the least-squares solution to 1.22 is a vector Z e Mm 1(R) such
that AZ = Since W is finite dimensional, we know from Theorem 1.21
that exists. Since the rank of A is m, there exists a unique Z e Mm x
such that AZ = It is an easy matter to find a formula for Z. Since
AZ — ,8 is orthogonal to W, we must have (AX)t(AZ — /1) = 0 for every
XE Mm 1(R). Thus, X'(AtAZ — Atfi) = 0 for every X. This implies that
AtAZ — A1$ = 0. At this point, we need the following fact:

1.23: Let A e If the rank of A is m, then A'A is a nonsingular,


symmetric m x m matrix.

A proof of 1.23 is easy. We leave this as an exercise at the end of this section.
Returning to our computation, we see Z = (AtA) lAt$ and =
A(AtA) lAt/9. Let us summarize our results in the following theorem:

Theorem 1.24: Let AEMnxm(IR) with rk(A)=m. Let Then the


least-squares solution to the linear system AX = /1 is given by

1.25: Z = (A'A)1At$

The orthogonal projection of $ onto the column space W of A is given by

1.26: = A(AtA) 1At$ fl


REAL INNER PRODUCT SPACES 215

If we look back at our discussion preceding Theorem 1.24, we see that the
hypothesis rk(A) = m was used in 1.23 to conclude that AtA was invertible. If A
is an arbitrary n x m matrix, then the linear system AX = fi still has a least-
squares solution Z in Mm Z is not necessarily unique, but the same
analysis as before shows that Z must satisfy the following equation:

1.27: AtAZ = At/3

Equation 1.27 is known as the "normal equation" of the least-squares solution


to AX = fi. If the rank of A is m, then equation 1.27 specializes to equation 1.25.
If the rank of A is less than m, then we need the theory of the pseudoinverse of A
to construct Z.
Let us now return to our general setting of an arbitrary pre-Hilbert space
(V, 1). Suppose W is a subspace of V for which exists for every fleV.
For example, W could be a finite-dimensional subspace of V. Then Theorem
1.21 implies that V = W $ W'. Thus, the map ): V -÷ W is just the
natural projection of V onto W determined by the direct sum decomposition
V = W e W1 (see Section 4 of Chapter I). In particular, ) satisfies the
usual properties of a projection map:

1.28: (a) W).


(b) = /1 if and only if fleW.
(c) ) is an idempotent endomorphism of V.
Thus, if exists for every $ eV, then W1 is a complement of W in V. The
converse of this statement is true also. If W1 is a complement of W, that is,
V = W $ W', then exists for every $ e V. To see this, let $ e V. Write
$=oc-i-5 with oceW, and öeW'. Then ri—$= —beW'. Thus, ri—fl is
orthogonal to W. Therefore, = ri by 1.20.
A careful examination of the coefficients of Projw($) relative to some basis in
W leads to the theory of Fourier coefficients. We need two preliminary results.

Lemma 1.29: Let (V, U be a pre-Hilbert space. If W1,..., W0 are pairwise


orthogonal subspaces of V, then W1 + + W0 = W1 In addition,
suppose exists for every $ e V and every i = 1,..., n. Set
W= W1 + + W0. Then exists and is given by the following
formula:

= Projw($)

Proof In order to show W1 + + = W1 $ we must argue that


W1 n } = 0 for every i = 1,..., n. This will follow from the fact that
the Wk are pairwise orthogonal. Fix i, and let yeW1 n }. Then
y= = Here y1eW1, and for all j i. Then '3', y) =
= y = 0.
216 INNER PRODUCT SPACES

To prove 1.30, let /3eV. For each i = l,...,n, set = Then


Set We claim (cx—/3)IW. To see this, let
yeW. Then y = + + where y1eW1 for all i = 1,..., n. So,

/3, = 0. Here (cxi, = 0 whenever j i because the subspaces Wk are


pairwise orthogonal. (cx1 — /3, = is orthogonal to W1 for all
0 because (cxi — /3)
= 1,.. ., n. Thus, (cx — /3)1 W. This means that cx = Therefore,
= cx = This established formula 1.30 and completes the
proof of the lemma. fl

Lemma 1.31: Let cx be a nonzero vector in a pre-Hilbert space V. Then the


orthogonal projection of V onto lRcx is given by the following formula:

1.32: = (</3, cx)/(cx, cx))cx

Proof The projection of /3 onto is xcx for some x e R. We also know that
(xcx — /3, cx) = 0. Solving this equation for x gives the desired result. El

The scalar (/3, cx)/(cx, cx) appearing in 1.32 is called the Fourier coefficient of
/3 with respect to cx. We can combine the last two lemmas in the following
theorem:

Theorem 1.33: Let i = 1,... , n} be a collection of pairwise orthogonal,


nonzero vectors in the pre-Hilbert space V. Set W = L({cx1,..., Then the
orthogonal projection of V onto W is given by the following formula:

1.34: = ((/3, cx1)/(cx1, cx1));

Proof Since W = $ we can apply the last two lemmas and get
the result. LII

Note then that if W has a basis consisting of pairwise orthogonal


vectors, then the vector in W closest to a given vector /3 is obtained by using the
Fourier coefficients of /3 with respect to cx1,. . , cxc. The formula for
.

given in 1.34 makes it clear that orthogonal bases of W are very useful. Let us
introduce the following definition:

Definition 1.35: A set of pairwise orthogonal vectors i e I} is said to


orthonormal if = 1 for every i El.

A basis = i e I} of some subspace W is said to be an orthonormal basis


if the set is orthonormal. Thus, an orthonormal basis of W is a basis consisting
of pairwise orthogonal vectors whose lengths are one. Any finite-dimensional
subspace of V has an orthonormal basis. In fact, we have the following slightly
stronger result, which is called the Gram—Schmidt theorem:
REAL INNER PRODUCT SPACES 217

Theorem 1.36: Let be a finite or infinite sequence of linearly independent


vectors in the pre-Hilbert space V. Then there exists an orthonormal set
such that L({cx1,.. ., = for every n.

Proof The finite sequence case is included in the infinite argument. Hence, we

assume {;} is an infinite sequence of linearly independent vectors in V. We shall


construct a sequence of pairwise orthogonal, nonzero vectors {m} such that
L({21,...,2n})=L({ij,...,in}) for every neN. Having done this, then
{p1 = is the required orthonormal set in the theorem.
To construct the sequence {'h}, we proceed as follows: Fix n e N. Define n
vectors tJi in V inductively by the formulas below.
1.37: =
12 = 22 — (<22, ij>I<ii,

=

It is obvious from the nature of the equations in 1.37 that L({a1,. . , }) .

= L({i71,..., i = 1,..., n. In particular, the vectors m are


nonzero and linearly independent for all i = 1,. ., n. It is also obvious that
.

each m is orthogonal to the ikS defined before This implies that


ii,1 are pairwise orthogonal.
Now in the last paragraph, n was a fixed natural number. Strictly speaking,
the vectors constructed in equation 1.37 depend on n and should be
labeled more explicitly as Thus, for each n e N, we have constructed
a function {1,.. ., n} V given by gji) = iN for i = 1,..., n. The important
point to note here is that when n cm, the vectors ir are precisely the
same as Thus, the functions and gm agree on their common
domain. Hence, it makes sense to define a function f: N -+ V by f(i) = gji) for any
n i. Let the sequence be the function f. Thus, i e N. Then
the first n vectors of {rjj} are precisely the vectors listed in equation 1.37. So, for
every n e N, we have L({cx1,..., = L({11,..., This completes the proof
of the theorem. U

Of course, Theorem 1.36 implies that every finite-dimensional subspace W of


a pre-Hilbert space V has an orthonormal basis = In terms of thi
basis, the orthogonal projection of V onto W is given by the following formula:

= (<11,

We shall finish this section with a brief description of what analysts call a
"basis" when dealing with Hilbert spaces. We need a formal name for the type of
sequence constructed in Theorem 1.36.
218 INNER PRODUCT SPACES

Definition 1.39: A sequence in a pre-Hilbert space V is called an or-


thonorinal sequence if the set i e N} is an orthonormal set.

Thus, a sequence {'p1} is orthonormal if = 1 for every ie N, and


i e N} is a set of pairwise orthogonal vectors in V. In particular, if {p1} is an
orthonormal sequence in V, then no i is linearly
independent over It Hence, V contains an orthonormal sequence if and only if
dimR(V) = oo. There are two famous results about orthonormal sequences,
which are used far more in analysis than in algebra.

Theorem 1.40: Let be an orthonormal sequence in the pre-Hilbert space V.


Let (5eV, and set x1 = for every ieN. Then

(a) (Bessel's Inequality): P (51

(b) (Parseval's Equation): (5= x1'p1 if and only if x? = octlI2.

Proof (a) For each ne N, set = Then by equation 1.38


((5— aj1L({p1,..., In particular, (5— and an are or-
thogonal. Hence, the parallelogram law implies 0(5112 =
0(5_an+anll2 = II(5_anII2+uanu2 = II(5—ano2+Er=ix?. We
conclude that (5112 for every n. Thus, the infinite series
xjZ is absolutely convergent with sum no bigger than II (5112. In
symbols, ospi2.
(b) For Parseval's equation, we first note that (5= means the
sequence {an} converges to (5.Now we had established in the proof of
(a) that 11(5112 = 0(5 — anII2 + Thus, {an} (5 if and only if
= xfl = 11(5112. fl
The infinite series p((5, in Theorem 1.40 is called the Fourier series
of(5 (relative to the orthonormal sequence Parseval's equation allows us to
test when the Fourier series of (5 converges to (5.
Definition 1.41: Suppose is an orthonormal sequence in the pre-Hilbert
space V. The set i e N} is called a "basis" of V if (5 = ((5, for every
vector (5eV.

Thus, if every vector (5in V is equal to its Fourier series (relative to then
the set i e N} is called a "basis" of V. The reader will note that the word
"basis" is included in quotation marks here. This is because an orthonormal
sequence that is a "basis" of V is not in general a vector space basis in the sense
of Chapter I. Consider the following example:

Example 1.42: We return to the Hilbert space 12 given in Example 1.5. For each
i e N, let p1 denote the sequence that is zero except in the ith position where it is
one. Thus, p1 = (0,...,0, 1,0,0,...). Clearly, the set is linearly
REAL INNER PRODUCT SPACES 219

independent over R. This set is not a basis of 12 since, for instance, tj = { 1/n} is
not a finite linear combination of the vectors in ie N}.
The sequence {q,1} is clearly an orthonormal sequence in 12. If = e 12,
then the Fourier coefficient of relative to is pj) = Thus, the Fourier
series of '5 is x1p1. The nth partial sum, of this series is clearly
(x1, x2,..., 0, 0,...). Thus, — = 0. Therefore, =
Since is arbitrary, we conclude that {p1 i e N} is a "basis" of 12.

When dealing with infinite-dimensional pre-Hilbert spaces V, analysts


usually use the word basis to mean an orthonormal sequence of vectors
satisfying 1.41 (i.e., a "basis"). A vector space basis of V is usually called a Hamel
basis. Algebraists, in general, do not use these words, and, so, the reader is
advised to use some caution when dealing with this terminology. In this text, we
have used the word basis consistently to mean a collection of linearly
independent vectors whose linear span is all of V. We shall use quotation marks
(i.e., "basis") around the word basis when referring to an orthonormal sequence
satisfying 1.41. Remember, a "basis" of V (if V has a "basis") is not in general a
vector space basis of V.
It is an easy matter to decide when an orthonormal sequence gives us a
"basis" of V.
Theorem 1.43: Let be an orthonormal sequence in the pre-Hilbert space V.
Then {q,1 lie 1%J} is a "basis" of V if and only if the closure of the subspace
all of V.

Proof Suppose {pJ is a "basis" of V. Then '5 = for every eV.


Set Then Each vector is in It
now follows from Lemma 3.4_[Chapter IV] that is in the closure of
L({cp1IieN}). Thus, V = L({QjIieN}).
Conversely, suppose L({911ieF%i}) = V. Let 5eV. Set x1 = p1>, and
= We want to show the sequence converges to Let r> 0.
Since e L({Q1 i e rkJ}), n L({Q1 i e 0. Hence, there exists scalars
in such that — <r. Now by Equation 1.38,
x1(p1 — '511 is the distance between and L({p1,..., for any ne N. In
particular, for any n m, — y1m — <r. This proves that
{ -÷ and completes the proof of the theorem. fl
If V is a Hilbert space, then the criterion in Theorem 1.43 can be simplified
significantly.

Corollary 1.44: Let be an orthonormal sequence in the Hilbert space V.


Then ic 1%J} is a "basis" of V if and only if lie N}' = (0).

Proof If a "basis" for V, then


is = L({91 lie =
L({Q1lieF%i})' =V' =(O).
220 INNER PRODUCT SPACES

Conversely, suppose = (0). Set W = L({ep1JieIN}). Then W is a


closed subspace of V for which W' = (0). Since V is complete, and W is closed,
W is complete. (See Exercise 13, Section 4, Chapter IV.) By Theorem 1.21,
V = W ® W' = W. Hence, Theorem 1.43 implies lie N} is a "basis" of

EXERCISES FOR SECTION 1

(1) Verify that the definition given in 1.1 is the same as that given in 7.12 of
Chapter I.
(2) Show that 12 is a Hilbert space.
(3) Show that the example given in 1.4 is a pre-Hilbert space but not a Hilbert
space.
(4) Occasionally it is convenient to relax condition (d) in Definition 1.1.A
function f: V x V is called a semiscalar product if f satisfies the
following conditions:
(a) f is bilinear.
(b) f(oc, /3) = f(fJ, x) for all cx, fieV.
(c) all aeV.
Give an example of a semiscalar product that is not an inner product on V.
Show that Schwarz's inequality remains valid for any semiscalar product f.
(5) Let V = {feC'([a, b])lf' is continuous}. Define <f, g> = f(a)g(a) +
r(t)g(t) dt. Show that <, > is an inner product on V.
(6) Provide the technical details for the proof of Theorem 1.14.
(7) Let V = {fe C([O, 1]) I f(0) = 0}. Let be the norm defined in equation
1.6 of Chapter lv. Let W = e12
Nc = 0 for all n > 0}.
(a) Show that the normed linear vector space (V, ill is not a pre-Hilbert
space.
(b) Let fi = {1/n}. Show that fle 12, d(fl, W) = 0 and 0 — for any
a e W. Thus, W contains no vector closest to /3.
(8) Give an example of a normed linear vector space (V, a subspace W,
and a vector fi such that W contains more than one vector such that
— /311 = d(/3, W). There is such an example in Chapter IV.

(9) If = hall in Theorem 1.10, prove that and /1 are linearly


dependent.
(10) If (V, ( , )) is a finite-dimensional, inner product space, and W is a
subspace of V, show directly that V = W ® WI.
(ii) In Exercise 10, show that W'1 = W. Give an example of an infinite-
dimensional inner product space V and a subspace W such that W" W.
SELF-ADJOINT TRANSFORMATIONS 221

(12) Let (V, < , >) be a finite-dimensional inner product space. Let {x1,...,
be a basis of V. Let c1, . .. , ; e It Show there exists a unique vector x e V
such that <x, cx1) = c1 for all i = 1,.. ., n.
(13) Let V = ,,(R). If A, Be V, set <A, B> = Tr(ABt).
(a) Show that (V, < , >) is an inner product space.
(b) Find the orthogonal complement of the subspace of all diagonal
matrices in V.
(14) Let W be a finite-dimensional subspace of an inner product space V. Show
that C II 1311 for all 13eV.
(15) In Exercise 14, suppose T 61(V) is an idempotent map with Im(T) = W. If
111(13)11 C Ii 1311 for all 13eV, prove T = ).

(16) Prove the assertion in 1.23.


(17) Find the linear equation y = mx + b that "best" fits the data
(x1, yi)'. ., yj by solving the normal equation. Here x1,..., are
.

assumed all distinct.


(18) Find the polynomial p(X) = bmXm + bm_ + + that "best" fits
the data in Exercise 17. Here we assume n >> m.
(19) Give an example of a pre-Hilbert space in which a "basis" turns out to be a
vector space basis of V also.
(20) Here is a calculus problem that can be solved using Schwarz's inequality:
Suppose a positive term series converges. Show that the series
converges.
(21) Consider the vector space given in Exercise 1, Section 2 of Chapter I.
Here we assume F = It Define an inner product on by setting
<f, g> = f(X)g(X)dX. Apply the Gram—Schmidt process to the vectors 1,
X, X2, X3, and X4 to produce the first five Legendre polynomials.
(22) Let V = C([— 1, 1]). Define an inner product on V by setting
<f, g> = flx)g(x)/(1 — x2)"2 dx. Repeat Exercise 21 in this setting.
The polynomials thus formed are the first five Chebyshev polynomials of
the first kind.

2. SELF-ADJOINT TRANSFORMATIONS

As in Section 1, we suppose (V,'( , >) is a real inner product space. The reader
will recall from Section 6 of Chapter I that the dual V* of V is the vector space
HomR(V, R). If 1: V —' W is a linear transformation, then the adjoint of I is the
linear transformation 1*: V* given by T*(f) = fT.
If V is an infinite-dimensional pre-Hilbert space, then V* is too large to be of
any interest. Recall that dimR(V) = cc implies that dimR(V*)> dimR(V). We
222 INNER PRODUCT SPACES

confine our attention to the bounded linear maps, R), in V*. Recall that
T e t?J(V, R) if and only if there exists a positive constant c such that dl II

for all 1eV. If V is finite dimensional, then we had seen in Chapter IV that
gj(V, R) = V is any Hubert space, finite or infinite dimensional, we shall
see that R) is isomorphic to V in a natural way.
Any pre-Hilbert space (V, Ill) admits a linear transformation 0: V -÷ YJ(V, 114
which we formally define as follows:

Definition 2.1: Let (V, Jill) be a pre-Hilbert space with inner product < , >. Let
0: V -÷ V* denote the linear transformation defined by O(fi)(cx) = <cx, fi>.

Thus, for any fie V, O(fi) is the real valued function on V whose value at cx is
(cx, fi>. We can easily check that O(fi) is a linear transformation from V to R. If
cx,cx'eV, and x,yek, then we have O(fi)(xcx + ycx') = (xcx + ycx',fl> =
x<cx, + y(cx', = xO(fi)(cx) + yO(fi)(cx'). Thus, U is a well-defined map from
V to V*. We next note that U is a linear transformation from V to V* We
have U(xfi + yfi')(cx) = (cx, xfi + yfi'> = x(cx, 11> + y'(cx, 11'> = xO(fi)(cx) + yO(fi')(cx) =
[xU(fl) + yU( fi')](cx). Since cx is arbitrary, we conclude that U(xfi + yfi') =
xU(fl) + yO(fi'). Hence, U is a linear map from V to V*. Let us also note that U is
an injective linear transformationr For suppose, U(fi) = 0. Then, in particular,
o = U(fi)(fi) = (fi, fi>. Thus, fi = 0.
The linear transformation U gives an imbedding of V into V*. We claim that
the image of U actually lies in R). As usual for statements of this kind, we
regard R as a normed linear vector space via the absolute value I , and V as a
normed linear vector space via the norm 1111 associated with ( , >. We claim
U(fi) e 11) for every fi eV. This follows from Schwarz's inequality. If cxcV,
then IU(fi)(cx)l = 1<; C fill llcxll. Thus, U(fi) is a bounded linear operator on V.
II

A bound for U(fi) is fill. We have now shown that U is an injective linear map
II

from V to 14
Now recall that YJ(V, 11) is a normed linear vector space relative to the
uniform norm IITJI = inf{cI c is a bound of T}. Here 14 Thus, in the last
paragraph, we showed that IIU(fi)ll C fill for all fie V. On the other hand, if
Il

fi 5& 0, then cx = fi/Ilfill has length one, and IU(fi)(cx)l = (fl/Il fi>l = II fill. In
particular, IIU(fl)ll II fill by Lemma 1.24(b) of Chapter IV. We conclude that
IIU(fi)ll = II fill for every fi eV. The reader will recall that a bounded linear map
between normed linear vector spaces that preserves lengths is called an isometry.
Hence, U is an isometry of V into R). We have now proved the first part of
the following theorem:

Theorem 2.2: Let (V, 1111) be a pre-Hilbert space with inner product ( , >. The
map U given by U(fi)(cx) = (cx, fi> is an isometry of V into the Banach space
14
SELF-ADJOINT TRANSFORMATIONS 223

Proof? We have already established the fact that 0: V -÷ R) is an isometry.


It remains to show that .c?J(V, R) is a Banach space with respect to the uniform
norm. This is a special case of Exercise 6, Section 4 of Chapter IV. We sketch a
brief proof of this special case here.
Suppose is a Cauchy sequence in fffl. We want to find a T in
gj(V, IR) such that -÷ T. Fix cz e V. Then — Tm(CL)I = — Tm)(CL)1
— Since, is Cauchy, — TmlI = 0. We conclude
that is a Cauchy sequence in It Since R is complete, the sequence
converges in R. We can thus define a function T: V -÷ IR by T(cx) = It
is an easy matter to show that T E R) and that —÷ T. fl
If dimR(V) c oo, then 11) = V*,
and the isometry U is surjective by
Theorem 3.33(b) of Chapter I. Thus, when V is finite dimensional over R,
0: V -÷ R) = V* is an isomorphism. If V is infinite dimensional over R,
then, in general, 0: V —÷ êJ(V, R) is not surjective. However, we can tell precisely
when U is surjective.

Theorem 2.3: Let (V, Ill) be a pre-Hilbert space. The isometry 0: V —' R) is
surjective if and only if (V, 0 1) is a Hilbert space.

Proof? Suppose U is surjective. Then V is isometric via U to the Banach space


t?J(V, R). This implies that (V, ill) is a Banach space. Since (V, liii) is a pre-
Hilbert space, we conclude that (V, Ill) is a Hilbert space.
Conversely, suppose (V, II) is a Hilbert space. Let 11) — {0}. Set
W = ker(f). Since f is bounded, f is a continuous map from V to R. {0} is a closed
subset of Therefore, W = 1(Q) is a closed subspace of V. We have seen that a
closed subspace of a complete space is itself complete (Exercise 13 of Section 4 of
Chapter IV). Thus, W is a Banach space. It now follows from Theorem 1.21 that
V = W EE W'. Since f 0, Im(f) = It Therefore, W' V/W Im(f) = 1k. We
conclude that there exists a nonzero vector CL e W' such that V = W El? Rac.
Since CL is not in W, f(oc) 0. Set = (f(CL)/11x112)CL. Then ltx = 1k$ and
consequently, V = W ® Rfl. We claim U(fl) = f. To see this, let e V. Write
for some and xeffk. Then
xf(fl)=xf(a)2/11CL112. On the other hand,
Thus,
for all e V. We conclude that U(fl) = f, and the map U is surjective. U

Now suppose T is a bounded endomorphism of the pre-Hilbert space V.


Thus, T e &J(V, V). We shall refer to T as an operator on V. Let T* denote the
adjoint of T. Thus, T* e HomR(V*, V*). Then we can consider the restriction of
T* to the subspace &J(V, 1k). Suppose fe 1k). Then T*(f) = IT is a bounded
linear map by Theorem 1.27 of Chapter IV. In fact, IIT*(f)I1 ITO If II. Thus, T*,
when restricted to YJ(V, 1k), is a bounded linear operator from 1k) to
224 INNER PRODUCT SPACES

t?J(V, R). We have the following diagram:

2.4: V*(

ii *
Ii
R) <— êJ(V,

oj

In diagram 2.4, the map T*J denotes the restriction of T* to the subspace
ff4 The map i is the inclusion of
(V, ( , )) is a Hubert space, the map U in 2.4 is an isomorphism by
Theorem 2.3. In this case, the composite map S = 0 i(T*I)O is a bounded linear
operator from V to V. There is a simple relationship between S and T.

<Ta,fl)=<ocSfl) forall oçjieV

In equation 2.5 and much of the sequel, we take the parentheses off the symbols
T(a) and S(fl) to simplify the notation. To prove 2.5, we compare both sides. On
the left, we have (Toç ,6) = O( I3XTa). On the right, we have (oçSfl) = O(Sfl)(cx) =
[O(0 1T*O)(fl)](a) = [T*O(/3)](cx) = O(fJ)(Tx). Thus, <Dx, = <cx, Sf3>.
We also note that S is the only bounded linear operator on V that satisfies
equation 2.5. For suppose 5' e YJ(V, V), and (Dx, fi> = (cx, for all x, 46eV.
Then (cx, (5' — S)fl) = (x, S'fl) — <x, Sfl) = (Tcx, ,6> — (Ta, fi> = 0. In part-
icular, II(S' — S)flh12 = <(5' — (5' — =0 for all fleV. We conclude that
5' = S.
We have now proved the following theorem:
Theorem 2.6: Let (V, < , >) be a Hilbert space. For every bounded linear
operator T e V), there exists a unique bounded linear operator Se V)
such that (Dx, ,6> = (cx, Sfl) for all a, fle V. J

In Section 1, we saw an instance where the terminology used in Hilbert space


theory diverges from what is normally used in algebra. Here is a second and
more important instance of the same phenomenon. In Hilbert space theory, the
map S given in Theorem 2.6 is called the adjoint of T and written T*. We shall
follow this convention also. Thus, when (V, , >) is a Hilbert space and
Te V), then the adjoint of T will be the unique, bounded linear operator
T*eGJ(V, V) such that

2.7: (Dx, fi> = (cx, T*fl) for all ; 4GeV


SELF-ADJOINT TRANSFORMATIONS 225

The reader is warned that we have changed our definition of the adjoint when
dealing with bounded linear operators on Hubert spaces. When we need to refer
back to our old usage of the word "adjoint" (Section 6 of Chapter I), we shall use
the words "algebra adjoint." Thus, if V is a Hubert space and T E V), then
the "algebra adjoint" is the induced map on given by f —÷ fT. The adjoint of T
is the operator T* e &J(V, V) that satisfies equation 2.7. When dimR(V) < then
YJ(V, ffk) = by Corollary 3.29 of Chapter IV. Thus, in this case, the adjoint of
T only differs from the "algebra adjoint" of T by the isometry 0.
Now let (V, ( , >) be an arbitrary Hilbert space. There are two types of
bounded linear operators on V that we want to study in the remainder of this
section.

Definition 2.8: Let (V, ( , >) be a Hubert space. Let T e YJ(V, V).

(a) We say T is seif-adjoint if T = T*.


(b) We say T is orthogonal if T*T = Is,.

We can give alternative definitions of self-adjoint and orthogonal operators


by using equation 2.7. We have the following equivalent definition to 2.8:

2.8': (a) T is self-adjoint if and only if (Ta, = (cx, T,6> for all x, jieV.
(b) T is orthogonal if and only if (T; T,6> = (cx, ji) for all a, $ e V.

If T is self-adjoint, then <Dx, 46> = <cx, T*fl) = (cx, Tfl>. Conversely, if


<Ta, ,6) = (cx, T$>, for all a, fle V, then <x, Tfl> = (cx, T*fl>. In particular,
I1(T — T*)fl112 = ((T — T*)fl, (T — T*)ll> = 0. Thus, T = T*, and T is self-
adjoint. If T is orthogonal, then (Ta, T46> = (cx, T*Tfl> = <x, 46> for all cx, /1eV.
Conversely, if (Dx, Tfl> = /3) for all a, jleV, then <a, T*Tf1> = (cx, /3). This
implies that T*T$ = /3 for all /3 e V. Consequently, T*T = Ii,, and T is or-
thogonal. In either case, we have shown that 2.8 and 2.8' are equivalent
definitions.
Note that Definition 2.8 is for bounded operators on V. Thus, an endomorph-
ism T of V is said to be self-adjoint or orthogonal only if T is a bounded map on
V satisfying 2.8.
If T is orthogonal, then T is left invertible. In particular, T is a monomorph-
ism of V. If dimR(V) <cc, then T is orthogonal if and only if T* = T '.In any
case, <Tcx, Tfl) = <cx, ji> implies = for all e V. Thus, an orthogonal
map is always an isometry of V into V. Let us consider examples in (R", II 1).

Example 2.9: Suppose V = with the standard inner product. Let


= be an orthonormal basis of P?. Then a linear transformation
T: if? —÷ ER? is an orthogonal transformation if and only if {T(qi1),. .., Tfrpj} is
an orthonormal basis of This remark is easy to prove. We leave it to the
exercises.
226 INNER PRODUCT SPACES

In terms of matrices, we have T is orthogonal if and only if = A is


an n x n matrix such that AtA = Matrices with this property are said to be
orthogonal.
If = .., is a second orthonormal basis of then there exists a
T e HomR(lr, lfl such that = i = 1, . , n. T then is orthogonal. In
. .

particular, J?('p, ep)(T) = M(y, (p) is an orthogonal matrix. Thus, a change of basis
matrix between orthonormal bases of R" is orthogonal. E

To produce examples of self-adjoint operators, we look at symmetric


matrices. We have the following general result:

Lemma Suppose (V, <, )) is a finite-dimensional Hilbert space. Let


(P = be an orthonormal basis for V. A linear transformation
T: V -. V is self-adjoint if and only if the matrix F'(Q, is symmetric.

Proof Set A = JT(9, q)(T) = Then = =1 for all j = 1,.. . n.


akJ
Suppose T is self-adjoint. Then for all i and j, we have =
TqiJ> = (T'p1, 'ps> = Thus, A' = A, and A is symmetric.
Conversely, suppose A is symmetric. Then At = A. For each i and j, we have
= = = <T'p1, 'pa. Since ( , > is bilinear, it now readily
follows that <Th, fi> = <cx, T$> for all ji e V. Hence, T is self-adjoint. fl

Since the canonical basis 6 = .., of R" is orthonormal, Lemma 2.10


implies that a linear transformation T: -÷ P? is self-adjoint if and only if
A = 17(6, Ô)(T) is symmetric, that is, A' = A.
SeIf-adjoint and orthogonal transformations are special cases of what we
shall call normal operators in Section 4. In the rest of this section, we shall prove
a special case of the spectral theorem for self-adjoint operators on a real Hilbert
space. In Section 4, we shall use different techniques to give a proof of the
spectral theorem for normal operators on a complex inner product space.
Let (V, < , >) be a Hilbert space. Suppose T e &J(V, V) is a self-adjoint
operator. We say T is nonnegative if (Dx, cr) 0 for every x e V. We shall need
the following lemma:

Lemma 2.11: Let V be a Hilbert space, and Tet?J(V, V). Suppose T is self-
adjoint and nonnegative. Then

(a) IJT(a)I1 for all xeV.


(b) (Dx, x) = 0 if and only if T(x) = 0.
(c) —, 0 if and only if {T@j} —, 0.

Proof (a) We define a semiscalar product [ , ] on V by setting


[a, fi] = (Ta, ,6>. Clearly, [ , ] is a bilinear function on V x V.
Since T is self-adjoint, we have [fi, a] = (Tfl, a> = (fi, T*a) =
SELF-ADJOINT TRANSFORMATIONS 227

(fi, Ta) = <Ta, fi) = [a, fi]. Thus, [ , ] is symmetric. Since T is


nonnegative, [a, a] = <Ta, a) ? 0. Thus, [ , ] is a semiscalar pro-
duct on V.
We had seen in Exercise 4 of Section 1 that any semiscalar product
satisfies Schwarz's inequality. Therefore, we have

2.12: I<Ta, 11>1 = lEa, fill C [a, fi]1/2 = (Ta, a)'12(Tfi, fi)1/2

If we set /3 = Ta in the inequality 2.12, then we have

2.13: (Ta, Ta)l C (Ta, a)112(T2a, Ta>"2 for all aeV

Now by Schwarz's inequality again, we have (T2a, Ta>"2 C


(T2a,T2a)'14(Ta,Ta)"4 = lT2alj"2llTalj"2 C iITaID"2IJTaIV'2=
IITII"2IlTall. Substituting this inequality into 2.13 gives us uTah2 C
(Ta, a)"2hjThl"2jlTahl. We can assume T(a) 0. Consequently,
hITahl C <Ta, 1/2 This completes the proof of(a). The assertions
in (b) and (c) follow trivially from (a). E
We can now state and prove the spectral theorem for a self-adjoint operator
on a finite-dimensional, real Hilbert space.
Theorem 2.14: Let (V, ( , >) be a finite-dimensional Hilbert space. Let T be a
self-adjoint linear transformation on V. Then V has an orthonormal basis
consisting entirely of eigenvectors of T.

Proof The first order of business is to argue that T has an eigenvector. To this
end, consider the following function f gives us
a real valued function on V. Since T is continuous on V and ( , ) is continuous
on V x V, we conclude that f is continuous on V.
Set S = = 1}. We had seen in Chapter IV that S is a closed and
bounded subset of V. Hence, S is sequentially compact by Corollary 3.28 of
Chapter IV. Note that f is a bounded function on S. For if eS, then
= )h C C 11Th 1k112 = DTII.
Set m= It follows from Corollary 3.16 of Chapter IV
that there exists a vector a e S such that f(a) = m. Now set T, mlv — T. The
map T, is the difference of two self-adjoint operators, and, consequently, is
self-adjoint. If then = — Tc5, = — =
If 5eV—{0}, then Therefore,
0.Thus, 0. We can now conclude that
(T,c5, 0 for all eV. In particular, T, is a nonnegative, self-adjoint
operator on V.
Now (T,a,a)=(ma—Ta,a)=m11a1h2—f(a)=m—m=0. But then,
Lemma 2.11 implies that T,(a) = 0. Thus, T(a) = ma, and we have found a unit
eigenvector a e V with eigenvalue m.
If dimR(V) = 1, then {a} is an orthonormal basis of V, and we are done. If
228 INNER PRODUCT SPACES

dimR(V)> 1, we proceed by induction on the dimension of V. We know from


Theorem 1.21 that V = Ra $ (Ps)'. Set V2 = (Ps)'. If c5eV2, then
(Tc5, a) = Ta> = ma> = a> = 0. Therefore, = V2.
Thus, V2 is a T-invariant subspace of V. If we restrict T to V2, we get a self-
adjoint operator on the Hubert space V2. Since dimR(V2) <dimR(V), our
induction hypothesis implies that V2 has an orthonormal basis, say {a2,...,
consisting of eigenvectors of T. Then {a, a2,..., is an orthonormal basis of V
which consists of eigenvectors of T. J
Let us say a few words about the construction given in the proof of Theorem
2.14. To find an eigenvalue of T, form the function = The real
number m = I
e S} is an eigenvalue for T. A vector a e S such that
f(a) = m is an eigenvector for T associated with m. We then pass to V2 =
The proof shows that m2 = I
c eS n V2} is an eigenvalue of the
restriction of T to V2. Of course, then m2 is an eigenvalue of T. Note that
m m2. Suppose dimR(V) = n. Then n applications of this argument produce a
decreasing sequence of eigenvalues m ? m2 ) ) m,,, and an associated
sequence a, a2,..., of eigenvectors such that = {a, a2, . , aj is an or-
. .

thonormal basis of V. In particular, JT(a, a)(T) = diag(m, m2,..., mj.


Now in general, the are not all distinct. Suppose c1,. .., cr are the distinct
real numbers in the set {m, m2,. . ., mj. We can label the so that
c1 > c2 > > cr. Then {c1, c2,..., cr} = .92R(T). Furthermore, if n1 equals the
number of vectors in that are eigenvectors for c1, then each c1 is repeated
precisely n1 times on the diagonal of F'(a, a)(T). In particular, the characteristic
polynomial of T is cT(X) = [111 1(X — cJ\ The minimal polynomial of T is given
by mT(X) = (X — (see 4.22 of Chapter III).
Let us set Y1 = ker(T — c1) for i = 1,..., r. Then each is spanned by of
the vectors in and T restricted to Y1 is just The Y1 are clearly pairwise
orthogonal subspaces of V. Hence, another version of Theorem 2.14, is as
follows:

Corollary 2.15: Suppose T is a self-adjoint, linear transformation on a finite-


dimensional Hilbert space V. Then there exists real numbers c1 > c2 > >
and pairwise orthogonal subspaces Y1,..., 1'r such that V = Y1 G3 and
the restriction of T to Y1 is fl
The reader can argue that the subspaces Y1,..., Y, of V in 2.15 are unique.
We leave this point as an exercise at the end of this section. We have seen in
Example 2.9 that a change of basis matrix between orthonormal bases of R" is
an orthogonal matrix. Hence, the matrix version of Theorem 2.14 is as follows:
Corollary 2.16: Let Ac JIR) be symmetric. Then there exists an orthogonal
matrix P such that PAP' is diagonal. J
Corollary 2.16 has many applications in applied mathematics. There are
many problems in which we need to compute Ara, where A e
SELF-ADJOINT TRANSFORMATIONS 229

= (x1,..., xjt, and r is a large positive integer. If the matrix A is


diagonalizable (e.g., if A is symmetric), then we can easily compute Mx. There
exists a matrix P such that PAP1 = diag(m1,.. ., mj. Then Mx =
(P 'diag(m1,. . , mjP)rcz = P
. . ., Thus, a potentially dif-
ficult computation becomes easy. Let us consider a specific example of these
ideas.

Example 2.17: Consider the following system of differential equations:

2.18: = a11x1 + +
eR

We assume here that the matrix A = of this system is symmetric. Then


Corollary 2.16 implies that there exists an orthogonal matrix P such that
for some We had seen in
Section 5 of Chapter III that any solution of 2.18 has the form x = etAC, where
'DP) P_ le®P =
C =(x1(O),. . . , Since A = P 'DP, we have etA =
P tdiag(emlt,. .., em.t)P.
The orthogonal matrix P is constructed by finding an orthonormal basis cx of
1k" consisting of eigenvectors of A. The columns of P -' are then the vectors in
Such a basis exists by Theorem 2.14. Thus, if = {x1,..., o,,}, then
P = M(x, ö), and = diag(m1,..., mj. A complete solution to 2.18 is
given by the following equation:

2.19: x= M(cr, cS) diag(em't,..., x)C

For instance, consider the following 2 x 2 system:

2.20: x'1 = —2x1 + x2


= x1 — 2x2

Then

)
is symmetric. The characteristic polynomial of A is given by
cA(X) = X2 + 4X + 3. Thus, the eigenvalues of A are —1 and —3. An or-
thonormal basis of 1k2 consisting of eigenvectors of A is easily seen to be
= where = 1/,,/i)t, and = — Thus,

1)
1) and
230 INNER PRODUCT SPACES

Equation 2.19 then becomes

2.21: n

The computations in Example 2.17 can be carried out for any matrix A that is
similar to a diagonal matrix. The main point here is that symmetric matrices are
always diagonalizable, and they are easy to recognize.
There is a third corollary to Theorem 2.14 that is worth mentioning here.
Corollary 2.22: Let (V, ( , )) be a finite-dimensional Hilbert space. Suppose
Te V) is an isomorphism. Then T = RS, where R is orthogonal and S
is a positive, self-adjoint operator.
Proof A self-adjoint operator S on V is said to be positive if all the eigenvalues
of S are positive. To prove the corollary, consider the adjoint T* of T. Since
(T*T)* = T*T** = T*T, we see that T*T is self-adjoint. By Theorem 2.14, V has
an orthonormal basis = fr',..., consisting of eigenvectors of T*T.
Suppose T*T(x1) = for i = 1,..., n. Since T is an isomorphism, $ 0.
Therefore, 0< IIT@xJ112 = <Tx1, = T*Tcz1> = m1x1) = m11hx1112=
Thus, each is a positive real number. Set =
We can define a linear transformation 5: V V by setting = for all
= 1,..., n. Then = T*T. The reader can easily check that S is self-adjoint.
Hence, S is a positive self-adjoint operator whose square is T*T.
Set P = 5T'. Since (Pcr, P/i> = <5T'ocST'fl) = <T'oc 52T1$> =
(T1oc, = <T1oc, T*/3> = (TT1oç/J) = (ac, P is orthogonal.
In particular, P = P" is also orthogonal. Set R = P1. Then T = RS with R
orthogonal, and S a positive, self-adjoint operator. El

If we combine Corollaries 2.22 and 2.16, we get what is known as the UDV-
decomposition of a nonsingular matrix.

Corollary 2.23: Let Ae , be nonsingular. Then A = UDV, where D is


diagonal and U and V are orthogonal matrices. El

In the last part of this section, we shall discuss a generalization of Theorem


2.14. If (V, ( , >) is an infinite-dimensional Hilbert space and T a self-adjoint
operator on V, then T may not have enough eigenvectors to span V even in the
"basis" sense discussed in Section 1. Thus, the infinite analog of Theorem 2.14
is not true in general. However, if T is a compact operator, then we can recover
much of 2.14. The theorem we shall present is true for any pre-Hilbert space.
Since we only defined a self-adjoint operator for Hilbert spaces, we need the
following definition:

Definition 2.24: Let (V, < , >) be a pre-Hilbert space, and suppose T e 98(V, V).
We say T is self-adjoint if (Tac, = <cx, T/J) for all cx, /3eV.
SELF-ADJOINT TRANSFORMATIONS 231

Obviously, our new definition agrees with the old one when V is a Hilbert
space. We had argued this point in 2.8'(a). Note that Lemma 2.llis still valid for
any pre-Hilbert space V, and any nonnegative, self-adjoint operator T. The
proof is precisely the same. The definition of a compact operator is as follows:

Definition 2.25: Let (V, liii) be a pre-Hilbert space, and T e V). Set
S= = 1}. We say Tis acompact operator if the closure of T(S)in V is
sequentially compact.

If < cc, then every linear transformation T on V is a compact


operator. This follows from Corollary 3.29 of Chapter IV. From this corollary,
we know T is bounded. Hence, for all we have C =
Let T(S) denote the closure of T(S) in V. If e T(S), then there exists a sequence
contained in T(S) such that -÷ (Lemma 3.4 of Chapter IV). We have
just seen that Ic ITO for aIln e 1%i. Since the norm is continuous, it follows
that 11TH. In particular, T(S) isabounded subset of V. It now follows from
Corollary 3.28 of Chapter IV that T(S) is sequentially compact. Thus, T is a
compact operator.
In general, suppose T e V). Then the same reasoning as above shows
that T(S) is a closed and bounded subset of V. However, when dimR(V) =
closed and bounded subsets of V are not necessarily sequentially compact.
(Recall exercise 10, Section 3 of Chapter IV.) Hence, we cannot conclude that
T(S) is sequentially compact. Those operators for which T(S) is sequentially
compact are called compact operators. The generalization of Theorem 2.14 to
possibly infinite-dimensional pre-Hilbert spaces is the following statement
about compact operators:

Theorem 2.26: Let V be a pre-Hilbert space, and T e V). Suppose T is a self-


adjoint, compact operator. Let W = Im(T).
If c cc, then W has an orthonormal basis consisting of eigenvectors
of T.
If = cc, then there exists an orthonormal sequence {q,1} in W
consisting entirely of eigenvectors of T. Furthermore, the set lie FkJ} is a
"basis" of W. The corresponding sequence of eigenvalues {c1 } associated with
converges to 0 in R.

Before we give the proof of Theorem 2.26, let us discuss why this theorem is a
generalization of Theorem 2.14. Suppose V is finite dimensional. Then T is just a
self-adjoint, linear transformation from V to V. We need the following lemma:

Lemma 2.27: Let V be a finite-dimensional Hilbert space. Let T be a self-adjoint


operator on V. Then V is the orthogonal direct sum of ker(T) and Im(T).
232 INNER PRODUCT SPACES

Proof We first show ker(T) n Im(T) = (0). Let e ker(T) n Im(T). Then = T(y)
for some ye V. Since xe ker(T), we have 0 = T(cr) = T2(y). But then,
o = <T2y, y> = <Ty, Ty> = We conclude that cx = T(y) = 0.
To show ker(T) + Im(T) = V, we can use the first isomorphism theorem
and count dimensions. We have Im(T). Therefore, dim(ker(T))
+ dim(Im(T)) = dim(V). Since ker(T) n Im(T) = (0), the union of a basis from
ker(T) with a basis from Im(T) is a set of linearly independent vectors in V. Since
the dimensions add upright, the union of bases from ker(T) and Im(T) is in fact a
basis of V. Therefore, ker(T) + Im(T) = V.
We have now shown that V = ker(T) Im(T). It remains to show that these
two subspaces are orthogonal. To see this, let cx e ker(T), and /3 e Im(T). Then
/1 = T(y) for some yeV. We have <cx,/3> = <cx, Ty) = <Tcx,y> = <0, y> = 0.
Thus, ker(T) and Im(T) are orthogonal, and the proof of the lemma is
complete. El

We can now apply Lemma 2.27 to our discussion. The subspace ker(T) has an
orthonormal basis by the Gram—Schmidt theorem. The vectors in are all
eigenvectors of T with eigenvalue 0. If cx' is an orthonormal basis of Im(T), then
Lemma 2.27 implies that the union u is an orthonormal basis of V. Hence, V
has an orthonormal basis consisting of eigenvectors of T if and only if Im(T) has
an orthonormal basis consisting of eigenvectors of T. In particular, Theorem
2.26 implies Theorem 2.14.

Proof of Theorem 2.26: Set r1 = 11Th = If r1 = 0, then


T = 0, and the theorem is trivial. Hence, we assume r1 # 0. Then there
exists a sequence {cxj in S such that —' — cxc> =
IIT(cxJD2. Since
— IIT(cxJ112) = 0, <(rf — = 0. Now the reader can easily
check that — T2 is a nonnegative, self-adjoint operator on V. The proof is
the same as that given in Theorem 2.14. Hence, Lemma 2.11(c) implies that
{(rf —
T is compact, and, hence, the sequence (T(cxj} has a convergent
subsequence. Replacing (T(cxj} by its convergent subsequence, we can assume
{T(cxj} —+ /3 for some /3eV. Since T is continuous, we have —* T(/3). We

also have —÷ T(f3) since — —÷ 0. In particular, —÷

and /3 = lim DT(cxJhI = r1 > 0, we have found


a nonzero vector /3 such that T2(f3) = Set cx = fl/fl/3D.
We have found a vector cx e S such that (rf — T2)(cx) = 0. Thus,
[(r1 — T)(r1 + T)](cx) = 0. If (r1 + T)(cx) = 0, then T(cx) = —r1cx. Suppose
(r1 + T)(cx) = y $ 0. Then (r1 — T)(y) = 0. Thus, T(y) = r1y. We could then divide
y by its length and produce a vector y'eS such that T(y') = r1y'. In either case,
we have found a vector q1eV such that jpJ = 1 and Tfrp1) = where
1c11 = r1. Since r1 0, c1 0. In particular, 4'i = T(çp1)/c1 eW.
We now proceed as in the proof of Theorem 2.14. Set V2 = (IRq,1)'. If cxc V2,
then <Tcx, = <cx, T91> = <cx, c191) = c1<cx, p1) = 0. Thus, V2 is a T-inva-
riant subspace of V. Let T1v2 denote the restriction of T to V2. Clearly, T1v2 is a
SELF-ADJOINT TRANSFORMATIONS 233

self-adjoint operator on V2. We claim T1v2 is compact as well. To ease notation,


let us call T1v2 just T. We must argue that the closure of T(S n V2) is sequentially
compact in V2. Let us denote the closure of T(S n V2) in V2 by Y. Then the
vectors in Y are the limits in V2of all sequences from T(S n V2). Suppose is
a sequence in Y. Since Y CT(S), the sequence has a subsequence which
converges to a vector /3 cT(S). We can replace by its subsequence and
assume -÷13. Since each = Corollary 1.11 implies /3eV2.
Since Y is closed in V2, and {oçj is a sequence in Y converging to /3 (in V2), /3 e Y.
We have shown that any sequence in Y has a subsequence which converges to a
vector in Y. Thus, Y is sequentially compact, and T1v2 is compact.
Set r2 = There are two cases to consider here. Either r2 = 0, or r2 0.
Before proceeding with these cases, we need to make the following remark:
ker(T) = W'. To see this, let xc ker(T), and fleW. Then /3 = T(y) for some ye V.
So, <oçfl) = Ty) = <To, y) = (0, y) = 0. Thus, ker(T) W'. For the other
inclusion, let If /3eV, then 0 = <x, T/3) = <Tac, /3). Since /3 here is
arbitrary, we conclude that T(x) = 0. Hence, e ker(T), and ker(T) = W'.
Suppose r2 =IITlv2II = 0. Then TJv2 = 0, and thus, V2 ker(T). Since
4i eW, ker(T) = W' = V,. Thus, (RQ1)' = ker(T). We now apply
Theorem 1.21 to the Banach space We get V=(Rq,1)E13(1Rq1)'=
e ker(T). In particular, W = Im(T) = T(V) = T(1Rp1) = Rp1. Thus,
dim(W) = 1, and is an orthonormal basis of W consisting entirely of
eigenvectors of T. Hence, if r2 = 0, the proof of 2.26 is complete.
Suppose r2 0. Then we can repeat the argument given in the first three
paragraphs of this proof for the map T1v2. We would then construct a vector
(p2eV2 such that = 1 and T((P2) = where 1c21 r2. Then
= T((P2)/c2 eW, and (P2} is an orthonormal subset of W. Note also that
1c21.
Suppose we repeat this argument n — I times obtaining an orthonormal
sequence (cu,..., W, a sequence r1 r2 i in R, and
subspaces V1 = L({q1,..., such that r1 = IlTivil and Tfrp1) = where
1c11=r1. Here i= 1,...,n— 1, and V1 =V. We suppose >0. Set
and = IITIv,II. Again if = 0, then V,1 ker(T) =
V,,. Thus, applying Theorem 1.21, V =
and W = L((41,.. . , p,, So, the argument is complete whenever our
construction process produces an = 0. In this case, dim(W) < cc. In any case,
our construction produces orthonormal sequences {Q'i,. . , (pj in W. Thus, if
.

dim(W) < cc, we produce an orthonormal basis of W consisting of eigenvectors


of T in a finite number of steps. Therefore, the proof is now complete when
dim(W) < cc.
Suppose dim(W) = cc. Then no can be zero, and our construction process
continues ad infinitum. Inductively, we obtain an orthonormal sequence
(tp1} W, and a sequence {c1} R such that T(91) = 1c11 = and
all Here V1=L({(P1,.. .,p1 with V1=V. We also
know that c1 0 for any i. We claim that c1 = 0, and the set e FkJ} is a
"basis" of W.
234 INNER PRODUCT SPACES

Suppose does not converge to 0. Then there exists a positive number b


such that lcd b for all ie FkJ. We have IITfrm) — $j. = 11c1(p1 — 112 =
1Ic1q1I12 + 112 = 1c1l2 + i 2b2 whenever i This inequality implies
the sequence (T(tp1)} has no convergent subsequence. Since T(S) is sequentially
compact, this is impossible. Hence, —÷ 0.

It remains to show that { pi Jie is a "basis" of W. Recall that this means

/3 = for every /leW. So, let /JeW. Then /3 = T(cx) for some cxeV.
Set b1 = </3, and = for every i. Then b1 = = (Tcx, (Pt> =
<cx, = (oc,c1q,1> = c1a1. In particular, T(a1Q1) = a1Tfrp1) = = b1Q1.
Thus, T(cr — = /3 — Since the are the Fourier coef-
ficients of cx (with respect to cx — a vector in by Theorem
1.33. The norm of T on is Thus, we have IIT@ — a1p1)11 C
Since is orthogonal to we
have 11cx112 = — + a1q,1112. In particular, —
C IIcxII. Putting this all together, we have — =
Since we conclude that
{YJ'=' b141} —÷/3. Thus, /3 = This completes the proof of
Theorem 2.26. fl

EXERCiSES FOR SECTION 2

(1) Let T e R"). Show that T is orthogonal if and only if Tfrp) is an


orthonormal basis of Here the inner product is assumed to be the
standard inner product on (p is an orthonormal basis of 1k".
(2) If T is a self-adjoint operator on a Hilbert space V and p(X) e R[X], show
p(T) is self-adjoint.
(3) Show that the subspaces Y1,... in Corollary 2.15 are unique up to
permutations.
(4) Prove Corollary 2.16.
(5) Let

[ 4 —1 1

A=1—1 4 —1

[ 1 —1 4

Find an orthogonal matrix P such that is diagonal.


(6) Prove Corollary 2.23.
(7) Write a specific formula for the UDV factorization of a nonsingular matrix
A. The formula should involve A and the eigenvectors of A'A.
(8) Show that any nilpotent, self-adjoint operator is zero.
EXERCISES FOR SECTION 2 235

(9) Let V be a Hubert space, and let f: V x V -÷ R be a bounded bilinear form.


Thus, there exists a positive constant b such that If(x, bIIcxII for all
fi e V. Show that there exists a unique T e V) such that
f(; /3) = <cx, T$>. Show T is seif-adjoint if and only if f is symmetric.
(10) Show that the factorization T = RS given in Corollary 2.22 is unique.
(11) If A e is orthogonal, show the rows of A are an orthonormal basis
of
(12) In 2.25, show T is compact if and only if for every bounded sequence
{ V, {T(crj} has a convergent subsequence.
(13) Consider the inner product space in Exercise 13 of Section 1. Let
Ce Compute the adjoint of the linear map T(A) = CA.
(14) Suppose C is nonsingular in Exercise 13. Compute the adjoint of the linear
transformation 5(A) = C 'AC.
(15) Let V = ER[X]. Define an inner product < , > on V by setting
g> = f(t)g(t)dt. Let f be a fixed polynomial in R[X]. Define a linear
transformation T on V by setting T(g) = fg. Show that T has an adjoint,
that is, there exists a map Se t(V) such that (Tg, h> (g, Sh> for all
g, heY.
(16) Let (V, < , >) be the inner product space in Exercise 15. Let D: V —÷ V be
differentiation. Show that D has no adjoint. Note that V is not a Hubert
space.
(17) If V is a finite dimensional Hilbert space and T e 1(V) is an isomorphism,
prove that T* is an isomorphism, and (T ')* = (T*) '. Is this true if
dimV = oo?
(18) Suppose I and S are self-adjoint operators on a Hubert space V. Show that
ST is seif-adjoint if and only if ST = TS.
(19) Let V be a Hilbert space. An operator T e t(V) is said to be positive if T is
self-adjoint and (Icr, cr) > 0 for all ct cV — {O}. If S and T are two positive
operators on V, show that S + T is positive, but ST need not be positive.
(20) Let (V, ( , >) be a finite-dimensional inner product space. Let I et(V).
Prove that T is positive if and only if I = 5*5 for some isomorphism
Sel(V).
(21) In Exercise 20, suppose is an orthonormal basis of V. Let T be a positive
operator on V. Show that every entry on the main diagonal of o)(T) is
positive.
(22) A variant of Corollary 2.23 is as follows: Let A Mm be nonzero.
Show there exists orthogonal matrices V and U such that
236 INNER PRODUCT SPACES

where D is a diagonal matrix of the form D = diag(x1,..., xj with


x1 ? x2 ? xr > 0. [Hint: Apply Theorem 2.14 to the matrix A'A].
This factorization 1S Called the singular value decomposition of A.
(23) Let A e Mm ,,(IR) be nonzero, and suppose A has singular value decom-
position as in Exercise 22. Let /3 e Mm 1(IR). Show that the vector

is a least-squares solution to the equation AX = /3. Thus, II /3 — AXII is as


small as possible. (Notice that the sizes of the zero matrices have been
changed here.)

3. COMPLEX INNER PRODUCT SPACES

In this section, we want to extend our discussion of inner product spaces to


include vector spaces over the complex numbers C. Suppose V is a vector space
over C. We want a complex inner product on V to be a function
>: V x V —' C that satisfies conditions similar to those in Definition 1.1. It is
obvious that some changes in the definition are going to have to be made. C, for
instance, is not an ordered field. (There is no order relation on C that behaves
nicely with respect to addition and multiplication.) Therefore, Definition 1.1(d)
makes no sense unless we demand that <cx, cr> e l1 for every e V. Whatever
definition we decide on, we should like = to behave like a norm on

To motivate the definition we shall use, consider the complex vector space C".
The analog of the standard inner product (on R") for C" is the bilinear map
/3> = zkwk. Here x = (z1, . . , zj, /3 = (w1,..., wj, and zk, wk e C for
1
.

all k = 1,..., n. This bilinear form does not work well as a candidate for an
inner product on C". If = (1, i, 0,... , 0), for example, then x $ 0, but
<oc,x> = 0. We can fix this problem by defining <x, /3> = zkWk. Here Wk
denotes the complex conjugate of wk. Now if x = (z1,..., e C", then
<oc = = IzkI. Here the notation Izi indicates the modulus of
1

the complex number z. The reader will recall that the modulus of a complex
number z = a + bi [a, b e is defined to be the positive square root of a2 + b2.
Thus, zi = (a2 + b2)"2. The modulus is a function from C to the nonnegative
real numbers. It agrees with the ordinary absolute value on It Hence, we have
chosen the same notation I I for the absolute value on R and the modulus on C.
The reader can easily check that the modulus I I: C —' l1 is a norm on the real
vector space C. We also have Izz'I = zi Iz'I, and z2 = z12 for all zeC. In
particular, (cx, x) is a nonnegative real number for every e C". Also, it is clear
that <ocx> =Oif and only if x= 0.
We give up something here with this new definition of < , >. The function
COMPLEX INNER PRODUCT SPACES 237

<ocfl> = is not a symmetric, bilinear form on C" x C". Instead,


> satisfies the following conditions:

3.1: (a) (zcz + z'fl, y> = z<x, y> + z'<fl, y).


(b) (y, + z'/J> = 2<y, + 2'<y,
(c)
(d) is a positive real number for all eC" — {O}.

The reader can easily verify that these equations hold for all cc, j9, and y in and
all z, z' in C. This function ( , > has the desired length properties and
furthermore reduces to the standard inner product on R" when restricted to
ER" x IFP. Finally, we note that the conditions listed in 3.1 make sense for any
complex vector space V, and any function f: V x V —÷ C. Hence, we adopt these
conditions as our definition of a complex inner product.
Definition 3.2: Let V be a vector space over C. By a complex inner product on V,
we shall mean a complex valued function < , >: V x V —÷ C that satisfies the
following conditions:

(a) + z'/J, y> = z<cc, y) + z'<(/J, y>.


(b) <at, fi) = <j9,
(c) <açcr> is a positive, real number for every x e V — {O}.
These conditions are to hold for all cc, /3, ye V, and all z, z' e C.
A complex vector space V together with an inner product ( , ) on V will be
called a complex inner product space and written (V, < , >). As with real inner
product spaces, a given complex vector space may admit several complex inner
products. Thus, a complex inner product space is an ordered pair consisting of a
vector space V over C and a complex valued function < , ): V x V —' C that
satisfies the conditions in 3.2.
Suppose (V, < , )) is a complex inner product space. The reader has
undoubtedly noted that 3.1(b) follows from 3.1(a) and 3.1(c). The same remark
can be made in any V. We have <y, zx + z'$> = 2<y, x) + i<y, fi) for all cc, /3,
ye V, and all z, z' cC. This readily follows from 3.2(a) and 3.2(b). A function
f: V —÷ C is said to be conjugate linear if f(zcx + z'$) = 2f(cx) + 2'f(/3) for all cc, $ e V
and z, z' e C. Thus, for any e V, the function (/3, > is conjugate linear on V,
and the function < , /3) is linear. Note also that (0, fi) = ($, 0> = 0 for all $ e V.
Thus, for every cV, we have <ayx> ? 0 with equality only if cc = 0.
The first four examples in Section 1 all have complex analogs. We begin with
the example that motivates the definition:

Example 3.3: Set V = Define <Cr, fi) = ZkWk, where = (z1,.. , zj .

and /3 = (w1,..., wj. Then (C", < , >) is a complex inner product space.
We shall refer to this particular inner product on C" as the standard inner
product. fl
238 INNER PRODUCT SPACES

Notice that when n = I in Example 3.3, (z, z> = = z12. Hence, the
modulus function I C —* R is the norm given by the standard inner product on
C. We had mentioned that a given V might support more than one inner
product. Here is a second inner product on C2.
Example 3.4: V = C2. Define <, )' by the following formula:
<(z1, z2), (w1, w2))' = 2z1*1 + z1*2 + + z2W2. The reader can easily
verify that ( , )' satisfies conditions (a)—(c) in Definition 3.2. Hence, (C2, < , >')
is a complex inner product space. fl
Example 33: Let V be the set of all continuous, complexed valued functions on
the closed interval [a, b] c 1k. Clearly, V is a complex vector space via pointwise
addition and scalar multiplication. V becomes a complex inner product space
when we define <f, g) = f(t)g(t) dt. fl

Example 3.6: Let V = C. V becomes an infinite-dimensional, complex


inner product space via <or, fi> = ZkWk. Here = (z1, z2,.. .) and $ = (w1,
1

w2,...). Since and fi have at most finitely many nonzero components, the
formula for <x, makes perfectly good sense. El

A more general example is the complex analog of 12.

Example 3.7: Let V = {{zk} e CN I lIzkI < cc}. V is a complex vector space
via componentwise addition and scalar multiplication. We can define an inner
product on V by setting (X, /3> = ZkWk. Here cx = {zk}, and /3 = (wk}.
This space is the complex analog of 12 in Example 1.5 of Section 1. In the
literature, it is also called 12. fl

The reader will note that several of these examples are the same as the
corresponding real inner product spaces. We have simply enlarged the base field
from 11R to C. There is a standard way to produce complex inner product spaces
from real ones. Namely, pass to the complexification.
Let (W, < , >) be a real inner product space. Consider the complexification
We = W ®R C of W (see Section 2 of Chapter II). Recall that if we identify a
vector e W with its image ® 1 in Wc, then WC is spanned by W. Thus, every
vector in Wc can be written in the form + + where eW
;
and z1,. .., e C. Any basis x of W over ER is also a basis of Wc over C. Also,
any vector in Wc can be written uniquely in the form cx + i$ for vectors x and
/JeW. In terms of this representation, addition and scalar multiplication in Wc
are given by the following formulas:

(x+ifl)+(p+iA)=(x+M)+i($+1)
(a + bi)(x + i$) = — bfl) + i(a/3 + bcr)

In equation 3.8, fi, p, AeW and a, be1k.


COMPLEX INNER PRODUCT SPACES 239

It is also useful to note that the complexification Wc has a natural R-


isomorphism on it given by + i/i -÷ a — i/i. If y = a + ifl, then a — ifl is called
the conjugate of y and written 53. The map y -÷ is a conjugate linear
isomorphism of Wc.
Now the real inner product < , > on W can be extended to a complex inner
product < , on Wc in a natural way such that the following diagram is
commutative:

3.9: WCxWC

The formula for ( , >i is as follows:

3.10: (pj + iA1, P2 + il2>1 = <Pi, P2> — 1<tZi' + iQ1, 112> + <As, 22>

Equation 3.10 is the only definition of ( , possible if ( , is to be a


complex inner product on Wc making diagram 3.9 commute. The fact that 3.9 is
indeed commutative is obvious. The fact that ( , satisfies the conditions in
Definition 3.2 is a tedious but straightforward exercise. Let us summarize what
we have said here in the following theorem:
Theorem 3.11: Let (W, ( , >) be a real inner product space. Then there is a
unique complex inner product < , >i on the complexification Wc of W such
that diagram 3.9 is commutative. Furthermore, < , >i is defined by equation
3.10. fl
Since < , is the natural extension of ( , >, we shall drop any notational
differences in the symbols and simply write ( , > for the inner product on both
W and Wc. As an illustration of Theorem 3.11, the reader should check that the
standard inner product on is the complexification of the standard inner
product on ur. Also, Example 3.6 is the complexification of Example 1.3.
Again suppose (V, < , >) is a complex inner product space. We claim that V
is a normed linear vector space over ftR relative to = <ci, Here and
elsewhere in this section, we use the symbols to mean the nonnegative real
number whose square is x. We need Schwarz's inequality for complex inner
product spaces.

Lemma 3.12: Let (V, < , )) be a complex inner product space. Then
<ac <ci p>1/2 for all a, /3eV.

Before proving Lemma 3.12, let us make a couple of comments about the
quantities appearing in the inequality. If ci and flare vectors in V, then (a, is a
240 INNER PRODUCT SPACES

complex number. Thus, the left side of 3.12 is the modulus of the complex
number (ci, /3). By 3.2(c), <ci, ci) and (/1, /3) are nonnegative real numbers. Thus,
the right side of 3.12 is the product of the (nonnegative) square roots of these
quantities. In the lemma, we compare these three real numbers.

Proof of 3.12: Fix ci and /3 in V. If (ci, /3) is a real number, then the proof of the
inequality is the same as in Lemma 1.6.
So, we assume z = (ci, /3)eC — R. Then z 0, and z'cieV. Since
(z 'ci, /3) '<ci, /3) = 1, a real number, Schwarz's inequality is true for the
vectors and /3. Thus, But
=(z2)'(ci,ci) = izL2<ci,ci). Therefore, <z'ci,z'ci)"2 =
izi - '<ci, ci)'12. Thus, izi (ci, ci)"2< /3, /3)1/2, and the proof is complete. fl

We can now define a norm on any complex inner product space.

Corollary 3.13: Let (V, < , )) be a complex inner product space. Then
Dciii = (ci, defines a real valued function on V that satisfies the following
conditions:

(a) OciD >0 for all cieV — {0}.


(b) iizciui = izi liciD for all cieV, and zeC.
(c) lici + fill ilciul + 11/311 for all ci, /3eV.

Proof? (a) Definition 3.2(c) implies that ilciui > 0 for every ci eV — {0}.
Note also that ilciD = 0 if and only if ci = 0.
(b) IizciuI = (zci, zci)"2 = (z2<ci, ci))"2 = (1z12<ci, ci))"2 = izi ilciui.
(c) In order to prove the triangle inequality, we need to recall the
definition of the real part, Re(z), of a complex number z. If z = a + bi,
then Re(z) = a. Note that z + 2 = 2 Re(z) and Re(z) E izi.
Now suppose ci and /3 are vectors in V. Then ilci + /3ii2 = ci + /3) =

<ci,ci) + <ci,/3) + </3,ci) + </3,/i) = iIcili2 + <ci,fl) +<ci,/3) + = ilciui2


+ 2 Re(<ci, /3)) + c iiciui2 + 2Kci, /3)1 + fill2. By Schwarz's inequality,
Ii

Dciui2 + 21<ci, /i)i + 11/3112 11ci112 + 2iiciuI li/ill + = (liciD + il/iiD2. Thus,
ilci + 13112 (iiciui + ii Taking square roots gives us (c). El

Any complex inner product space is of course a vector space over R since
c C. Corollary 3.13 says that any complex inner product space is a normed
linear vector space over It Actually, 3.13(b) is a stronger statement than what is
required in Definition 1.1(b) of Chapter IV. A complex vector space V together
with a real valued function liii: V —+ R satisfying conditions (a)—(c) in 3.13 is
called a complex, normed linear vector space. Thus, Corollary 3.13 says that any
complex inner product space (V, ( , )) is a complex, normed linear vector
space relative to ilcill = <ci,
COMPLEX INNER PRODUCT SPACES 241

As usual, we shall Call the norm lall = <a,a>'12 defined by the inner product
on V the norm associated with ( , ). Topological statements about a complex
inner product space will always be relative to the associated norm on V. We can
rewrite Schwarz's inequality as follows:
3.14: (a, 13)1 C Vail II /311 for all oc, /3EV

We now introduce the same definition discussed in Section 1 for real inner
product spaces.
Definition 3.15: Let (V, Ill) be a complex, normed linear vector space. We say
(V, 1111) is a pre-Hilbert space if there exists a complex inner product on V such
that all = <a a>'12 for all aeV. If the pre-Hilbert space (V, 1111) is complete,
then (V, Ill) is called a Hilbert space.
The inner product spaces and 12 are (complex) Hilbert spaces. The space
C is a pre-Hilbert space, but not a Hilbert space.
It is not our intention at this point to give the complex analog of every
theorem in Section 1 for complex pre-Hilbert spaces. We shall say just a few
words about some of these results. Let (V, < , )) be a complex inner product
space. Two vectors a and /3 in V are said to be orthogonal if (a, /3) = 0. If a and /3
are orthogonal, we shall write a I /3. Since (a, /3) = (/3, a>, we see a1 /3 if and
only if /31a. We shall use the same terminology introduced in Definition 1.16 for
complex inner product spaces.
The parallelogram law, Corollary 1.18, the Gram—Schmidt theorem, Bessel's
inequality, and so on are all true in any complex inner product space (with only
minor changes in their statements and proofs). For example, Bessel's inequality
becomes the following statement in a complex inner product space: Let be
an orthonormal sequence in V, and set Zk = (a, for all k e Then
C la 112. We shall cover these results in the exercises at the end of this
section.
We shall be interested in finite-dimensional, complex inner product spaces in
the next section. For these spaces, the complex analog of Theorem 1.21 can be
proved purely algebraically.
Theorem 3.16: Let (V, ( , )) be a finite-dimensional, complex inner product
space. Let W be a subspace of V. Then V = W $ W'.
Proof? Applying the Gram—Schmidt theorem to W, we can find an orthonormal
basis a = {a,,..., ar} of W. If aeV, then = <a,ak)ak is a vector
in W such that
a= + (a — eW + W'. Therefore, V = W + W'. Since we
alwayshaveWnW'=(O),V=W$W'. El

It easily follows from Theorem 3.17 that W" = W for any subspace W of V.
We leave this as an exercise at the end of this section.
242 INNER PRODUCT SPACES

EXERCISES FOR SECTION 3

(1) Verify that the standard inner product on indeed satisfies the Conditions
listed in 3.1.
(2) Show that the map < , >' given in Example 3.4 is a complex inner product
on C2.
(3) Do the same for the map given in Example 3.7.
(4) Show that ( , given in equation 3.10 is a complex inner product on
Wc.
(5) Show that Examples 3.3 and 3.6 are the complexifications of Examples 1.2
and 1.3, respectively.
(6) Let (V, ( , >) be a complex inner product space. Prove the Gram—
Schmidt theorem in this setting. Thus, if {ak} is a finite or infinite sequence
of linearly independent vectors in V, show that there exists an orthonormal
sequence {Qk} in V such that for allj.

(7) Show that the parallelogram law is valid in any complex inner product
space V. Thus, Ila + /3112 + Ia — /3112 = 2(11a112 + 11/3112).
(8) The complex analog of Lemma 1.17(b) is not true. If a and /3 are
orthogonal, show IIa + /3112 = 1la1l2 + /3112. Give an example that shows
II

that the converse of this statement is false.


(9) Show that two vectors a and /3 in a complex inner product space are
orthogonal if and only if IIza + z'131I2 = IIzaIl2 + lIz'/31I2 for all z,z'eC.
(10) If two vectors a and /3 in a real inner product space have the property that
Ilall= Il/ill, then a + /3 and a — /3 are orthogonal (prove!). Discuss the
corresponding statement for complex inner product spaces.
(11) State and prove the complex analogs of Bessel's inequality and Parseval's
equation.
(12) Suppose (V, ( , >) is a complex inner product space. Let W be a subspace
of V, and aeV. If <a, /3> + (/3, a) c (/3,13) for all /3eW, prove that a is
orthogonal to W.

(13) Suppose (V, III) is a complex pre-Hilbert space. Let W be a finite-


dimensional subspace of V. Suppose /3 e V. Show that there exists a vector
in W closest to /3 in the 1111-norm. Find a formula for such a vector.
(14) Let (V, III) be a complex pre-Hilbert space. Show that the map
( , ):V x V-+C is continuous.
(15) Describe in detail the complexification of the real inner product space
given in Exercise 13 of Section 1.
NORMAL OPERATORS 243

(16) Find an orthonormal basis of the complex inner product space given in
Exercise 15.
(17) What do all complex inner products on C look like?
(18) Let ( , ) denote the standard inner product on C2. Show there is no
nonzero matrix such that (ci, ciA> = 0 for all cieC2.

4. NORMAL OPERATORS

In this section, we shall assume that (V, ( , >) is a finite-dimensional, complex


inner product space. We shall let 1(V) = Homc(V, V). The reader will recall
from Chapter I that 1(V) is a finite-dimensional algebra over C. In the literature,
a map in 1(V) may be called a linear transformation, an endomorphism of V, or
a linear operator on V. All three names are used in various places. In this section,
we shall usually refer to a map T e 1(V) as a linear operator on V. Our first order
of business is to construct an adjoint of T. We begin with the following lemma:

Lemma 4.1: Let (V, ( , >) be a finite-dimensional, complex inner product


space. Then for every ft V*, there exists a unique vector y e V such that
f(ci) = (ci,y) for all cieV.

Proof Let fe = Homc(V, C). Let = {ci1,..., be an orthonormal basis


of V. Set y = lf(ciljcik. Define a complex valued function g on V by
g(ci) = (ci,y). We had seen in Section 3 that geV*.
We want to argue that f = g. This is true if and only if f and g agree on So,
fix je{1,...,n}. Then
= Thus, f = g, and, in particular, f(ci) = (ci, y) for all cieV.
To show that y is unique, suppose f(ci) = (ci, y') for all ci e V. Then
(ci, y — y'> = 0. In particular, ly — = (3' — y', y — y') = 0. Hence,
3'=3". El

Corollary 4.2: Let (V, ( )) be a finite-dimensional, complex inner product


,

space. Then for every T e 1(V), and every /3 e V, there exists a unique vector ye V
such that (Tci, /3) = (ci,y) for all cieV.

Proof Let T e 1(V) and /3eV. Then the map f(ci) = (Tci, /3) is clearly a linear
transformation from V to C. Thus, fe V*. By Lemma 4.1, there exists a unique
vector ye V such that f(ci) = (ci, y) for all ci eV. Thus, (Tci, /3) = (ci,y) for all
ci. U

We can use Corollary 4.2 to construct an adjoint of a linear operator T.


244 INNER PRODUCT SPACES

Theorem 43: Let (V, ( , >) be a finite-dimensional, Complex inner product


space. Let T e t(V). Then there exists a unique linear operator T* e €(V) such
that (Ta, /3> = (a, T*$> for all a, /3eV.
Proof Let T e t(V). By Corollary 4.2, there exists a function T* from V to V
such that (Ta, /3> = (a, T*$> for all a, /3eV. For a fixed /3, T*(/3) is the y in 4.2
for which (Ta, = (a, y> for all a e V. We claim that T* is a linear operator on
V.
Suppose and $2 are vectors in V. Then for any a e V, we have
(Ta, = (a, T*$i>, and (Ta, $2> = (a, T*/32>. Thus, (Ta, + $2> =
(Ta, + (Ta, /32> = (a, T*/31> + (a, T*/32) = (a, T*$1 + T*$2>. On the
other hand, T*(/31 + $2) is the unique vector in V such that (Ta, + $2> =
(a, T*(/31 + /32)> for all a. We conclude that T*($1 + $2) = T*(/31) + T*($2).
Thus, T* is an additive map on V.
Let /3eV, and z e C. Then for every a eV, (Ta, = (a, T*(z/3)>, and
(Ta, /3> = (a, T*$>. Thus, (a, zT*(/3)> = 2(a, T*$> = 2(Ta, = (Ta, =
(a, T*(z$)>. Since a is arbitrary, we conclude that T*(z/J) = zT*($).
We have now shown that T* e V*. The fact that T* is unique is obvious. fl
The operator T* constructed in Theorem 4.3 satisfies the same functional
equation as the adjoint of a real operator, that is, equation 2.7. In complex
theory, the map T* is called the Hermitian adjoint of T. Thus, we have the
following definition:
Definition 4.4: Let (V, ( , >) be a finite-dimensional, complex inner product
space, and T e 1(V). The unique T* e 1(V) for which (Ta, /3> = (a, T*$> for all
a, /3 e V is called the Hermitian adjoint of T.

Thus, T* is the complex analog of the adjoint of a real operator on a real


inner product space. In fact, we have the following relation between the
Hermitian adjoint and the adjoint:
Theorem 4.5: Suppose (W, ( , >) is a finite-dimensional, real inner product
space. Let T e HomR(W, W). Let T* denote the adjoint of T. Then the Hermitian
adjoint of Te on We is (T*)c.
Proof Recall from Chapter II that if T e HomR(W, W), then Te = T is
the C-linear transformation on We given by zkak) = zkT(ak). In
this equation, z1,. .., and a1,..., W. The adjoint T* of T is the unique
map in W) for which (Ta, /3> = (a, T*/3> for all a, /3eV. The inner
product ( , > on We is given by equation 3.10. Thus, to prove the theorem, we
need to show that the following equation in C is valid:
4.6: (TCa1,a2>=(ai,(T*fa2> forall ai,a2eWC
To see this, write = + with P2' and in W. Using equa-
tions 3.10 and 2.7, we have (TCa1, a2> = (Tp1 + iTA1, a2> = (Tp1, a2>
NORMAL OPERATORS 245

+ i<T21, a2> = <Tp1, P2> — i<Tp1, 22> + i<T21, P2> + <TA1, 22> =
<Pi' T*p2) — i<p1, T*22) + i(21, T*p2> + <2k, T*22) = <Pi + i).1, T*p2 +
iT*22> = <a1, (T*)Ca2).

Equation 4.6 says the Hermitian adjoint of the complexification of a real


operator T is the complexification of the adjoint of T. This statement can be
represented in symbols as follows: (Tc)* = (T*)C.
Let us return to the general setting of a finite-dimensional, complex inner
product space (V, < , >). The map on 1(V) which sends T to T* is a conjugate
linear isomorphism. This follows from the first three properties listed in our next
lemma.
Lemma 4.7: Let (V, < , >) be a finite-dimensional, complex inner product
space. The map on 1(V) which sends T to T* satisfies the following properties:

(a) (T*)* = T.
(b) (S + T)* = S* +
(c) (zT)* =
(d) (ST)* = T*S*.
(e) If cx is an orthonormal basis of V, then ['(a, a)(T*) is the conjugate
transpose of
Proof All the above assertions follow immediately from the functional
equation:

4.8: <Ta, fi> = <a, T*fl> for all a, fJeV

We prove (e) and leave the rest as exercises at the end of this section. Suppose
= {a1,..., is an orthonorinal basis of V. Set ['(a, a)(T) = (ZkJ) e JC).
Then for allj = 1,...,n. If T*(aJ)=>3=lwkjak, then we
have = = <ai, = <as, = Thus, the con-
jugate transpose of is ['(a, [1

Although we did not make it explicit in Section 2, the map sending T -÷ T*


for real inner product spaces also satisfies the same properties (a)—(e) (without
the conjugate) in Lemma 4.7.
We can now introduce the definitions of normal, Hermitian, and unitary
operators.

Definition 4.9: Let (V, < , >) be a finite-dimensional, complex inner product
space. Let T e 1(V).

(a) We say T is normal if TT* = T*T.


(b) T is Hermitian if T* = T.
(c) T is unitary if T*T =
246 INNER PRODUCT SPACES

Obviously a Hermitian operator is normal. Since dimc(V) < oo, T*T = Ij,, if
and only if TT* = Thus, a unitary operator is normal also. Hermitian and
unitary operators are the complex analogs of self-adjoint and orthogonal
operators on real inner product spaces. In fact, we have the following theorem:

Theorem 4.10: Let (W, ( , )) be a finite-dimensional, real inner product space.


Let T e HomR(W, W).

(a) If T is self-adjoint, then Tc is Hermitian.


(b) If T is orthogonal, then T* is unitary.

Proof These results follow immediately from Theorem 4.5 and the fact that the
complexification of a product of two endomorphisms is the product of
their complexifications. Thus, if T is self-adjoint, then T = T*. Hence,
= (T*)c = Tt. Therefore, TC is Hermitian. If T is orthogonal, then
T*T = Let I denote the identity map on the complexification Wc. Then
1 = (1w)C = = (T*)CTC = Therefore, Tc is unitary.
In terms of the complex inner product on V, the definitions of Hermitian and
unitary can be rewritten as follows:

4.11: (a) T is Hermitian if and only if (Ta, /3) = (a, T/3) for all a, /3eV.
(b) T is unitary if and only if (Ta, Tf3) = (a, /3) for all a, /3eV.

The proof of 4.11 is completely analogous to 2.8' in Section 2.


Since Hermitian and unitary operators satisfy the same functional relations
as their real analogs, self-adjoint and orthogonal operators, they should have
the same names. Unfortunately, they do not. Here is a handy chart to help you
remember the names and definitions of the real objects and their corresponding
complex analogs:

4.12: Real Inner Product Spaces Complex Inner Product Spaces

(a) (Ta, /3) = (a, T*/3) (Ta, /3) = (a, T*/3)


T* is the adjoint of T T* is the Hermitian adjoint of T

(b) (Ta, /3) = (a, T/3) (Ta, /3) = (a, T/3)


T is self-adjoint T is Hermitian

(c) (Ta, T/3) = (a, /3) (Ta, T/3) = (a, /3)


T is orthogonal T is unitary

In the last part of this section, we discuss normal operators. Since Hermitian
and unitary operato!s are both normal, whatever we say applies to both types of
NORMAL OPERATORS 247

linear transformations. Throughout the rest of this section, (V, ( , )) will


denote a finite-dimensional, complex inner product space. T is a linear operator
on V.

Lemma 4.13: T is unitary if and only if there exists an orthonormal basis of V,


such that T(a) is an orthonormal basis of V.

Proof Let = {a1,..., be an orthonormal basis of V. Suppose T is unitary.


Then (Tczk, = <ak, = 1 if j = k, and 0 otherwise. It ready follows that
T(a) = {T(cz1),..., T(cj} is an orthonormal basis of V.
Conversely, suppose = aj is an orthonormal basis of V such that
{T(a1),..., T(aj} is also an orthonormal basis of V. Let at, /3eV, and write
at and = Ek=lwkak. Then (ac/i) = <X,k=lzkaek, =
1
This last equation follows from the fact that is a set of ortho-
normal vectors. Since T(a) is also a set of orthonormal vectors in V, we have
<Tcz, Tf3) = = lZkT(cXk), =1 wkT(ak)) = =1 ZkWk. Thus, <Ta, T/i) =
<a, /3) for all a, /3eV, and T is unitary. fl

An operator T e t(V) is said to be skew-Hermitian, if T* = — T. Clearly, a


skew-Hermitian operator is normal. A typical example of a skew-Hermitian
operator is (T — T*). Here T is any operator. A nonzero, skew-Hermitian
operator cannot be Hermitian. In general, skew-Hermitian operators are not
unitary either. In particular, a normal operator need not be Hermitian or
unitary. If T is an arbitrary endomorphism of V, then
T = (T + T*)/2 + (T — T*)/2. Since (T + T*)/2 is Hermitian, and (T — T*)/2 is
skew-Hermitian, we have the following lemma:

Lemma 4.14: Every operator T e Homc(V, V) is the sum of two normal


operators. fl
Since C is an algebraically closed field, any operator T on V has a
characteristic polynomial of the following form:

cT(X) (X —
= k01

Here {z1,..., ;} = .9t'c(T), the spectrum of T. We also have


nk = n = dimc(V).

Lemma 4.16: If T is Hermitian, then

Proof Let z be an eigenvalue of T. Then there exists a nonzero vector a e V such


that T(a) = za. We have z<a, a) = <za, a) = <Ta, a) = <a, Ta) =
<a, za) = 2<a, a). Thus, (z — 2)11a02 = 0. Since a 0, we conclude that z = 2.
Hence, z is a real number. E
248 INNER PRODUCT SPACES

Thus, the eigenvalues of a Hermitian operator are always real. Our next
lemma says that nilpotent Hermitian operators are zero.

Lemma 4.17: If T is Hermitian, and T%x) = 0 for some vector a e V, then


T(a) = 0.

Proof We can assume that T2M(a) = 0 for some m 1. Set S = T2m1. Then S is
Hermitian by Lemma 4.7(d). Since 55* = S2 = T2M, we see that SS*(a) = 0. But
then, 0 = a> = (S*a, S*a> = (Sa, Sa> = We conclude that
S(a) = 0. We can now repeat this argument. We finally get T2(a) = 0. Then
o = (T2a, a> = (Ta, Ta) = ITaII2. Therefore, T(a) = 0. fl

We can extend this result to normal operators in general.

Corollary 4.18: If T is normal, and Tk(a) = 0 for some vector a in V, then


T(a) = 0.

Proof Set S = T*T. Then 5* = (T*T)* = T*T**= T*T = S. Thus, S is


Hermitian. Since T is normal, T commutes with T*. In particular, we have
Sk(a) = = = 0. By Lemma 4.17, S(a) = 0. But then, we have
o = <Sa, a> = (T*Ta, a> = (Ta, Ta> = VT(a)112. We conclude that
T(a)=0. fl

Corollary 4.18 is important for what is to follow. If T is a nilpotent, normal


operator, then T = 0. More generally, suppose T is a normal operator on V. If
p(X) e C[X], then clearly p(T) is a normal operator on V. Thus, if p(T) is
nilpotent, we can conclude that p(T) = 0.

Lemma 4.19: If T is a normal operator on V, then IT(a)II = IIT*(a)II for all a in


V.

Proof IIT(a) 12 = (Ta Ta> = (a, T*Ta> = (a, TT*a> = (T*a, T*a> = IIT*(a)112.
Taking square roots, gives us the desired result. fl

One immediate application of Lemma 4.19 is the fact that ker(T) = ker(T*)
for any normal operator T. An important special case of this equality is the
following corollary:

Corollary 4.20: Suppose T is a normal operator on V. Then T(a) = za for some


a e V. and z e C if and only if T*(a) = 2a.

Proof Since T is normal, T — zlv is also normal. The Hermitian adjoint of


T — zlv is given by (T — zlv)* = T* — 21v By Lemma 4.19,
ker(T — = ker(T* — 21v). Thus, T(a) = za if and only if T*(a) = 2a. fl
NORMAL OPERATORS 249

Thus, if T is a normal operator on V with spectrum given by


9'c(T) = {z1,. . , zj, then T* is a normal operator on V with spectrum given by
.

SOc(T*) = 2r}. Furthermore, is an eigenvector for T associated with Zk


if and only if cc is an eigenvector for T* associated with
We can use these results to say something about the spectrum of a unitary
operator.
Corollary 4.21: If T is a unitary operator on V and zE9'c(T), then Izi = 1.

Proof Let ccbe an eigenvector of T associated with z. Then TOx) = Since


unitary operators are normal, Corollary 4.20 implies T*(cc) = Therefore,
= = T*(zcc) = zT*(x) = zth. Since $ 0, we conclude that z2 = 1.
Thus, Izi = 1. E

Corollary 4.21 of course says that a unitary operator has all of its eigenvalues
lying on the unit circle in the complex plane C.
We need one more set of ideas before presenting the spectral theorem for
normal operators. The reader will recall that an endomorphism T e 1(V) is
called idempotent if T2 = T. A typical example of an idempotent map is the
projection ) of V onto a subspace W. If T is idempotent, then
V = ker(T) $ Im(T). To see this, first note that any vector cxe V can be written in
the form Clearly, T(oe) e Im(T). Since T(cc — T(oe)) =
cx = T(cx) + (oc — T(oc)).
= 0,
T(oc) — T2(oc) = T(cc) — T(oe) we see — T(oe) e ker(T). Therefore,
V = ker(T) + Im(T). If cceker(T) n Im(T), then cc = T(fJ) for some fJeV. Then
cc= T(fl) T2(fl) = T(oc) = 0. This last equality comes from the fact that
cc e ker(T). Thus, ker(T) n Im(T) = (0), and V is the direct sum of ker(T) and
Im(T).
Now, in general, when T is idempotent, the subspaces ker(T) and Im(T) are
not orthogonal. Thus, V is not in general an orthogonal direct sum of ker(T) and
Im(T). Those idempotent T for which ker(T)11m(T) are called orthogonal
projections. We can construct examples of orthogonal projections by following
the same procedure as in Section 2. Suppose W is a subspace of V. Let
= {cc1,..., be an orthonormal basis of W. We can define an operator
) on V by setting = <cx' The reader can easily check
that = Thus, the map ) is an idempotent
operator on V. The image of ) is clearly W. One can also check that
cx — is orthogonal to W. It easily follows from this fact that
)) is orthogonal to )). Thus, ) is an orthogonal
projection of V onto W.
We shall need the following result about orthogonal projections:
Lemma 4.22: Suppose T is idempotent and normal. Then T is a Hermitian,
orthogonal projection.

Proof. We first note that T* is idempotent. We have <cx, T*fl> = (Toe,


/1) = /3) = (oc, (T*)2fl>. We conclude from these equations that T* =
250 INNER PRODUCT SPACES

(T*)2. Our Comments before this lemma now imply that V = ker(T) e Im(T) =
ker(T*) e Im(T*).
Since T is normal, Lemma 4.19 implies that ker(T) = ker(T*). We Claim that
Im(T) = Im(T*). Since T commutes with T*, we have T*(Im(T)) Im(T). Thus,
Im(T*) = T*(V) = T*(ker(T) + Im(T)) = T*(ker(T*) + Im(T)) = T*(Im(T))
Im(T). Reversing the roles of T and T*, gives us the other inclusion
Im(T) c Im(T*). Thus, Im(T) = Im(T*).
We now claim that T = T*. Since both maps are idempotent, they are both
the identity map on Im(T) = Im(T*). Let oe e V. We can write oe as oe = + oe2
where e ker(T) = ker(T*), and oe2 e Im(T) = Im(T*). Then T(oe) = T(oe1)
+ T(oe2) = T(oe2) = T*(oe2) = T*(oei) + T*(oe2) = T*(oe). We have now es-
tablished that T is Hermitian.
For any Hermitian operator on V, we have ker(T) = Im(T)1. This argument is
exactly the same as the self-adjoint argument. If oe e ker(T), and /3 e Im(T), then
/3 = T(y) for some ye V. In particular, we have (at, /3> = (at, Ty) = (Toe, y) = 0.
Therefore, ker(T) Im(T)1. If ate Im(T)', and /3 is arbitrary, then
o= TfJ> = <Toe, /3>. We conclude that T(oe) = 0. Thus, Im(T)' ker(T).
Since ker(T) = Im(T)', in particular, ker(T)11m(T). Hence, T is an orthogonal
projection. El
We can now state the spectral theorem for normal operators. With the
lemmas we have proved in this section, the proof of the Spectral Theorem is a
simple consequence of Theorem 4.23 of Chapter III. The reader is advised to
review Theorem 4.23 before proceeding further.
Theorem 4.23 (Spectral Theorem): Let (V, ( , >) be a finite-dimensional,
complex inner product space. Let T e Homc(V, V) be a normal operator.
Suppose the characteristic polynomial of T is cT(X) = = — zk)k. Then
there exists a set of pairwise orthogonal idempotents {P1,..., Pr} c
Homc(V, V) having the following properties:
(a)
(b) >klZkPkT.
(c) For each k = 1,. ..;r, Im(Pk) {ateVIT(at) =
If we set Vk = Im(Pk) for each k = 1,. .., r, then we also have
(d) dimc(Vk) = nk.

(f) Vk±VJ for all 1 <k r.


(g) mT(X) = — Zk).

Proof Let P1,..., P, and N1,..., Nr be the endomorphisms in t(V) given by


Theorem 4.23 of Chapter III. Each Nk is a nilpotent operator on V and is also a
polynomial in T. Since T is normal, our comments after Corollary 4.18 imply
N1 = = Nr = 0. It now follows from Theorem 4.23(d) of Chapter III that
NORMAL OPERATORS 251

>J= lZkPk = T. We had also proved in Theorem 4.23 that P1, ., . are
pairwise orthogonal idempotents whose sum is Lb,. Hence, we have established
(a) and (b).
In Theorem 4.23 of Chapter III, we had also established that
Vk = Im(Pk) = ker(T — for each k = 1,..., r. Since T — Zk is normal,
Corollary 4.18 implies ker(T — = ker(T — Zk). This proves (c). The as-
sertions in (d) and (e) were also established in 4.23 of Chapter III. The assertion
in (g) follows from Corollary 4.22 of Chapter III. The only thing that remains to
be proved is the statement in (f).
Each is a polynomial in T. Thus, each is a normal operator on V.
Lemma 4.22 implies each is a Hermitian, orthogonal projection of V onto
Suppose k and let cxeVk and Then cx = Pk(cx'), and /3 = for
some fl'eV. Since = 0, we have <at, = (Pk(oc'), =
<at', = <a', if> = <a', 0) = 0. Thus, and (f) is proved. E
The spectral theorem says that V decomposes into an orthogonal direct sum
of the eigenspaces Vk. If we choose an orthonormal basis (by Gram—Schmidt) of
each Vk and take their union, we get an orthonormal basis of V consisting
entirely of eigenvectors of T. Hence, we can restate Theorem 4.23 as follows:
Corollary 4.24: Let T be a normal operator on a finite-dimensional, complex
inner product space V. Then V has an orthonormal basis consisting of
eigenvectors of T. LI

The matrix version of Theorem 4.23 is easy to state. The Hermitian adjoint
A* of a complex matrix A is its conjugate transpose. Thus, A* = (A)t. A matrix
U is unitary if U*U = I. A matrix A is Hermitian if A* = A. A change in
orthonormal bases in is given by a unitary matrix. Thus, the matrix version of
4.23 is as follows:
Corollary 4.25: Let Ae be a normal matrix, that is, AA* = A*A. Then
there exists a unitary matrix U such that UAU1 is diagonal. S

The two most important examples of normal operators on V are Hermitian


and unitary operators. Thus, the most important applications of the spectral
theorem are the following two special cases:

Corollary 4.26: If T is a Hermitian operator on a finite-dimensional, complex


inner product space V, then V has an orthonormal basis consisting of
eigenvectors of T. Equivalently, if A is Hermitian, then there exists a
unitary matrix U such that UAU' is diagonal. fl

Corollary 4.27: If T is a unitary operator on a finite-dimensional, complex inner


product space V, then V has an orthonormal basis consisting of eigenvectors of
T. Equivalently, if A is a unitary matrix, then there exists a second
unitary matrix U such that UAU1 is diagonal. E
252 INNER PRODUCT SPACES

EXERCISES FOR SECTION 4

(1) Prove assertions (a)—(d) in Lemma 4.7.

(2) Show that the assertions in 4.11 are correct.


(3) Exhibit a normal operator on which is neither Hermitian nor unitary.
(4) If T is a normal operator on V and p(X) e C[X], compute the Hermitian
adjoint of p(T). Use this answer to show that p(T) is normal.
(5) Prove Corollary 4.25.
(6) Generalize Corollary 4.25 as follows: If A1,..., are a finite number of
normal, commuting matrices [in then there exists a unitary
matrix U such that UAkU1 is diagonal for all k 1,..., s.
(7) Show that Exercise 6 is false if A1 and A2 do not commute.
(8) Use the spectral theorem to show that for every symmetric matrix
Ae there exists an orthogonal matrix P e such that
is diagonal. (Hint: Pass to the complexification, and argue that the
eigenvalues and eigenvectors you need are all real.)
(9) Let V be a complex inner product space, and let T Suppose T is
normal. Show that
(a) T is Hermitian if and only if R.
(b) T is unitary if and only if {z€C 1}.

(10) Return to the example you gave in Exercise 3, and find an orthonormal
basis of consisting of eigenvectors for your normal operator.
(11) Suppose T is a nonzero, skew-Hermitian operator on V. Show the
eigenvalues of T are purely imaginary.
(12) Suppose T is a normal operator on V. Show that the Hermitian adjoint T*
of T is a polynomial in T.
(13) Let

1000
A=
0010
0100
0001
Find a unitary matrix U such that UAU1 is diagonal.
(14) Let A Show there exists a unitary matrix U such that UAU' is
upper triangular.
EXERCISES FOR SECTION 4 253

(15) Show that an n x n matrix A is normal if and only if A Commutes with


AA*.

(16) Let A e JC). Suppose the characteristic polynomial of A is given by


cA(X) = 1(X — zk)". Let w1,. . . , w,, be a list of the Zk, each repeated nk
times. Thus, z1 appears n1 times, z2 appears n2 times etc. Show that
L=lIwkI Tr(A*A). Show that A is normal if and only if we have
equality here. In this problem, Tr(A*A) denotes the trace of the matrix
A*A.

(17) If A, Be JC) and AB = 0, then BA need not be zero [Example!]. What


happens when A and B are both normal? Does AB = 0 imply BA = 0?
(18) Write down the complex analog of Exercise 15 of Section 2. This is a little
more interesting than the real case.
(19) Return to the complex inner product space given in Exercise 15 of Section
3. Compute the Hermitian adjoint of the map T(A) = CA for any

C is nonsingular. Compute the Hermitian adjoint


of the map S(A) = C1AC.
(21) Let (V, < , )) be a finite-dimensional, complex inner product space. Let
T e 1(V). Prove that the following assertions are equivalent:
(a) T is normal.
(b) T* = g(T) for some g(X)eC[X].
(c) IIT@)II = IIT*@)II for all xeV.
(d) Every T-invariant subspace of V is
Glossary of Notation

F an arbitrary field, 1
0 the field of rationals numbers, 2
l1 the field of real numbers, 2
C the field of complex numbers, 2
the field of p elements, 2
7 the integers, 2
V an arbitrary vector space, 2
N the natural numbers, 3
F n-tuples of elements from F, 3
B" all functions from A to B, 3
VI' all functions from {1,. . , n} to V, 3
a matrix,3
Mmxn(F) the set of m x n matrices over F, 3
F [X] polynomials in X over F, 3
C(I) the set of continuous functions on I c R, 4
Ck(I) k-times differentiable functions on I, 4
Riemann integrable functions on A, 4
At the transpose of a matrix A, 4
the sum of the Wi, 5
US) the linear span of 5, 6
the set of all subsets of V, 6

254
GLOSSARY OF NOTATION 255

the set of all subspaces of V, 6


Al the cardinality of the set A, 8
(A, a set A and relation on A, 8
dim V the dimension of V, 12
abasisofV,14
the x-skeleton, 14
M(3, change of basis matrix, 14
Col1(M) ith column of M, 14
T a linear transformation 17
Hom(V, W) the linear transformations from V to W, 17
ker T the kernel Df T, 18
ImT theimageofT, 18
an isomorphism, 18
transpose of []!' 19
the n x n identity matrix, 20
Vx xV a finite product ofV, 20
F(x, fl)(T) the matrix representation of T relative to oc, /3,
22
rk(A) the rank of a matrix A, 24
C= {(V1, dJ} a chain complex, 26
the identity map on V, 28

JJ V1 the product of the V1, 30


ieA
a A-tuple, 30

$jetS
V1 the direct sum of the 33

V1 e a finite direct sum, 34


the algebra of endomorphisms of V, 36
an equivalence relation, 39
the equivalence class containing x, 39
V/W the set of equivalence classes of V modulo W, 39
rz+W acosetofW,40
d(V) the set of all affine subspaces of V, 40
translation through x, 41
AffF(V, V') the set of affine transformations from V to V', 41
V* the dual of V, 46
= the dual basis of oc = 46
co(x, /3) a bilinear map, 47
the annihilator of A, 48
256 GLOSSARY OF NOTATION

T* the adjoint of T, 49
rk{T} the rank of T, 50
Tr(A) the trace of A, 53
a quadratic form, 54
4' a multilinear mapping, 59
infinitely differentiable functions on I, 60
Mu1F(Vl x x Vi,, V) the set of multilinear maps from V1 x x
toY, 61
V1 ®F V,, the tensor product of V1,..., 64

A®B the Kronecker product of A and B, 67

the tensor algebra of V, 68


T1 ® the tensor product of the maps 71
the complexification of V, 79
the complexification of the map T, 79
S,, the group of permutations of {1,. . . , n}, 83
(i1,.. . ,ir) an r-cycle, 85
sgn(c) the sign of the permutation a, 86
Alt(Ø) the alternating map formed from 4', 86
W) the space of alternating maps from V" to W,
87
the n-th exterior power of V, 89
A acosetinifl(V),89
the binomial coefficient N over n, 91
A"(T) the n-th exterior power of T, 92
A(V) the exterior algebra of V, 93
W) symmetric multilinear maps from V" to W, 94
5(4') the symmetric map formed from 4', 94
the n-th symmetric power of V, 95
... a coset in 95
the n-th symmetric power of T, 96
F[X1,. . . , Xn] the polynomial algebra in n variables, 97
S(V) the symmetric algebra of V, 97
5(f) the degree of f(X), 99
fig f divides g, 99
g.c.d.(f1,. . ., a greatest common divisor of f1,.. 100
R(f) the roots of f, 102
F the algebraic closure of F, 103
GLOSSARY OF NOTATION 257

Yv®F 103
TT=T®FIp, 103
l.c.m.(f1,. . , fj a least common multiple of f1, , 104
mT(X) the minimal polynomial of T, 105
the inclusion map, 107
vi' VKV®FK, 107
T"=T®FIK, 107
Mm JF[X]) m x n matrices with coefficients in F[X], 110
adj(A) the adjoint of A, 110
cA(X) the characteristic polynomial of A, 111
&°F(T) the eigenvalues of T in F, 118
diag(a1,..., aj n x n diagonal matrix, 121
a k x k subdiagonal matrix, 123
if the conjugate of /3 in yC, 143
the exponential of A, 153
C(g(X)) the companion matrix of g(X), 161
lxi the absolute value of x, 171
liii a norm, 171
(V, liii) a normed linear vector space, 172
the Hilbert space of square summable
sequences, 173
d(oc, /3) the distance between z and /3, 173
Br(OC) a ball of radius r about at, 173
d(fl, A) the distance between /3 and A, 173
A° the interior of A, 173
AC the complement of A, 173
lim f

the of bounded linear operators from V to


W, 175
the uniform norm on W), 177
A the closure of the set A, 179
Aa the boundary of the set A, 179
Ills the sum norm, 181
{
a sequence in V, 186
{ at the sequence {acj converges to at, 186
{at,,} is a subsequence of 189
an inner product, 206
(V, < , >) an inner product space, 207
258 GLOSSARY OF NOTATION

I /3 a and /3 are orthogonal, 211


AIB A and B are orthogonal, 211
A1 the vectors orthogonal to A, 211
the orthogonal projection of /3 onto W, 213
Izi the modulus of a complex number z, 236
References

1. E. Kamke, Theory of Sets, Dover, New York, 1950.


2. I. Kaplansky, Linear Algebra and Geometry, Allyn & Bacon, Boston, Massachusetts,
1969.
3. J. L. Kelley, General Topology, Van Nostrand, Princeton, New Jersey, 1955.
4. S. Lang, Algebra, Addison-Wesley, Reading, Massachusetts, 1965.
5. D. G. Northcott, Multilinear Algebra, Cambridge University Press, London, 1984.
6. 0. Zariski and P. Samuel, Commutative Algebra, Vol. 1, Van Nostrand, Princeton,
New Jersey, 1958.

259
Subject Index

Absolute value, 171 rational, 164


Adjoint, 49 real Jordan, 150

Hermitian, 244 Cauchy sequence, 200


matrix, 110 Cayley—Hamilton, 111
Affine subspace, 40 Chain complex, 26
Affine transformation, 41 Change of basis matrix, 14
Algebra, 36 Class, 39
exterior, 93 Closed set, 173
symmetric, 97 Closure, 179
tensor, 68 Cochain complex, 52
Algebraically closed, 102 Codimension, 44
Alternating map, 86 Column space, 16
Associative law, 1, 2 Commutative diagram, 15
Augmented matrix, 16 Commutative law, 1,2
Compact, sequentially, 189
Ball, 174 Compact operator, 231
Banach space, 200 Companion matrix, 161
Basis, 9 Complement, 13
dual, 46 Complete, 200
orthonormal, 216 Completion, 202
pre-Hilbert, 219 Complex conjugation, 79
Bessel's inequality, 218 Complexification, 79
Bijective, 18 Complex numbers, 2
Bilinear, 47 Congruent, 39
Boundary, 179 Continuous, 174
Bounded linear map, 175 Convergent sequence, 186
Bounded set, 173 Coordinate map, 14
Coset, 40
Canonical basis, 10 Cyclic subspace, 126, 159
Canonical form(s):
Jordan, 134 Degree, 99

261
262 SUBJECT INDEX

Dense set, 199 Homomorphism, 17


Dependent set, 9 bijective, 18
Determinant, 110 epimorphism, 18
Diagonalizable, 122 image, 18
Diagonal matrix, 121 injective, 18
Dimension, 12 isomorphism, 18
Direct sum: kernel, 18
external, 34 surjective, 18
internal, 35 Hyperplane, 45
Distance, between vectors, 173
Division algorithm, 99 Idempotent, 36
Divisor, elementary, 166 Image, 18
Dyad, 51 Increasing sequence, 191
Inequality:
Eigenvalue, 117 Bessel's, 218
Eigenvector, 119 Schwarz's, 209
Elementary divisor, 166 Infinite dimensional, 12
Equivalence class, 39 Injective, 18
Equivalence relation, 38 Inner product, 56
Equivalent norms, 180 Interior, 173
Euclidean norm, 172 Internal direct sum, 35
Even permutation, 86 Intersection, subspaces, 5
Exact, 27 Invariant factor, 167
Exponential matrix, 153 Invariant subspace, 114
Exterior power, 89 Irreducible polynomial, 100
External direct sum, 34 Isomorphism, 18
norm, 171
Factorization, 101
Field, 1 Jordan:
algebraically closed, 102 block, 133
complex, 2 form, 134
rational, 2
real, 2 Kernel, 18
subfield, 16 Kronecker product, 67
Finite dimensional, 12
Flat, 40 Laplace expansion, 110
Form(s): Least common multiple, 104
Jordan, 134 Limit, 174
rational, 164 sequence, 186
real Jordan, 150 Line, 45
real quadratic, 54 Linear:
Fourier: dependence, 9
coefficient, 216 independence, 9
series, 218 map, 17
Fundamental theorem of algebra, 102 operator, 175, 243
span, 6
Gram-Schmidt Theorem, 216 transformation, 17
Greatest common divisor, 100 Lipschitz, 174

Hamilton—Cayley Theorem, Ill Map, 17


Hermitian operator, 245 Mapping cone, 30
Hilbert space, 210 Matrix, 3
Hom (V, W), 17 companion, 161
Homogeneous polynomial, 8 diagonal, 121
SUBJECT INDEX 263

Hermitian, 251 minimal, 105


identity, 20 monic, 105
indecomposable, 169 roots, 102
normal, 251 Positive definite, 56
orthogonal, 226 Positive transformation, 235
skew-symmetric, 8 Pre-Hilbert space, 209
symmetric, 8 Primary decomposition, 142
trace of, 53 Primary polynomial, 169
transpose, 4 Product:
unitary, 251 inner, 56
Metric space, 173 Kronecker, 67
Minimal polynomial, 105 norm, 182
Monic polynomial, 105 tensor, 64
Monomial, 7 vector space, 30
Monotone sequence, 191 Projection, 35
Multilinear map, 59 orthogonal, 249
alternating, 86
symmetric, 94 Quadratic form, 54
definite, 56
Natural map, 48 negative definite, 56
Negative definite, 56 positive definite, 56
Nilpotent, 122 Quotient space, 40
Nonderogatory, 169
Nonnegative transformation, 226 Rank, 24
Norm, 171 Rational:
Normal equation, 215 canonical form, 164
Normal map, 245 field, 2
Normed linear space, 172 Real field, 2
Null space, 16 Real vector space, 54
Reflexive, 38
Odd permutation, 86 Relation, 8
Open set, 173 equivalence, 38
Orthogonal: reflexive, 38
matrix, 226 symmetric, 38
projection, 249 transitive, 38
sets, 211 Relatively prime, 102
vectors, 211 Root, 102
Orthonormal:
basis, 216 Scalar product, 220
sequence, 218 Schwarz's inequality, 209
vectors, 216 Self-adjoint, 225
Seminorm, 178
Parallelogram law, 211 Semiscalar product, 220
Parseval's equation, 218 Sequence, 186
Permutation: bounded, 191
even, 86 Cauchy, 200
odd, 86 convergent, 186
Perpendicular, 211 decreasing, 191
Plane, 45 increasing, 191
Polynomial, 3 monotone, 191
characteristic, 111 orthonormal, 218
degree, 99 subsequence, 189
division algorithm, 99 Sign, permutation, 86
irreducible, 100 Signature, 56
264 SUBJECT INDEX

Similar matrices, 25 image of, 18


Skeleton, 14 injective, 18

Skew-Hermitian, 247 isometry, 186


Skew-symmetric, 8 isomorphism, 18
Space(s): kernel of, 18
Banach, 200 monomorphism, 18
complex, 236 normal, 245
dual, 46 orthogonal, 225
Hilbert, 210 positive, 235
inner product, 207 self-adjoint, 225
metric, 173 set of, 17

normed linear, 172 surjective, 18

pre-Hilbert, 209 unitary, 245

quotient, 40 Transitive, 38

real, 54 Transpose, 4
Span, linear, 6 Transposition, 85
Spectral theorem, 227, 250 Triangular matrix, 117
Spectrum, 118
Standard basis, 10 Uniform:
Subfield, 16 continuity, 199
Subsequence, 189 norm, 177
Subspace, 4 Unique factorization, 101
cyclic, 126, 159 Unitary map, 245
independent, 34 Upper triangular, 117
invariant, 114
Sum, 5
Values, characteristic, 117
external, 34
Vectors:
internal, 35
characteristic, 120
Symmetric:
independent, 9
matrix, 8
Vector space(s):
power, 95
Banach, 200
relation, 38
complete, 200
complex, 236
Tensor product, 64
dual, 46
Topology, 174
Hilbert, 210
Trace, 53
inner product, 207
Transformation(s):
normed linear, 172
adjoint, 49
pre-Hilbert, 209
algebra of, 36
18
quotient, 40
bijective,
real, 54
bounded, 175
Hermitian, 245
identity, 28 Wedge product, 87

Das könnte Ihnen auch gefallen