Beruflich Dokumente
Kultur Dokumente
A complete list of titles in this series is available from the publisher upon request.
The Theory of Matrices
Second Edition
with Applications
Peter Lancaster
Department of Mathematics
University of Calga ry
Calgary. Alberta, Canada
Miron Tismenetsky
IBM Scientific Center
Technion City
Haifa, Israel
Academic Press
Sm Diego New York Boston
London Sydney Tokyo To ronto
This is an Academic Press reprint reproduced directly from the pages of a
title for which type, plates, or film no longer exist. Although not up to the
standards of the o riginal, this method of reproduction makes it possible to
provide copies of book which would otherwise be out of print.
ACADEMIC PRESS
A Division of Harcourt&lace A Company
525 8 Street Suite 1900, San Diego, CA 92101.4495
Preface xiii
1 Matrix Algebra
•
9 Functions of Matrices
9.1 Functions Defined on the Spectrum of a Matrix 305
9.2 Interpolatory Polynomials 306
9.3 Definition of a Function of a Matrix 308
9.4 Properties of Functions of Matrices 310
9.5 Spectral Resolution of f(A) 314
9.6 Component Matrices and Invariant Subspaces 320
9.7 Further Properties of Functions of Matrices 322
CONTENTS
11 Perturbation Theory
11.1 Perturbations in the Solution of Linear Equations 383
11.2 Perturbations of the Eigenvalues of a Simple Matrix 387
11.3 Analytic Perturbations 391
11.4 Perturbation of the Component Matrices 393
11.5 Perturbation of an Unrepeated Eigenvalue 395
11.6 Evaluation of the Perturbation Coefficients 397
11.7 Perturbation of a Multiple Eigenvalue 399
13 Stability Problems
13.1 The Lyapunov Stability Theory and Its Extensions 441
13.2 Stability with Respect to the Unit Circle 451
CONTENTS xi
14 Matrix Polynomials
14.1 Linearization of a Matrix Polynomial 490
14.2 Standard Triples and Pairs 493
14.3 The Structure of Jordan Triples 500
14.4 Applications to Differential Equations 506
14.5 General Solutions of Differential Equations 509
14.6 Difference Equations 512
14.7 A Representation Theorem 516
14.8 Multiples and Divisors 518
14.9 Solvents of Monic Matrix Polynomials 520
15 Nonnegative Matrices
15.1 Irreducible Matrices 528
15.2 Nonnegative Matrices and Nonnegative Inverses 530
15.3 The Perron—Frobenius Theorem (I) 532
15.4 The Perron— Frobenius Theorem (II) 538
15.5 Reducible Matrices 543
15.6 Primitive and Imprimitive Matrices 544
15.7 Stochastic Matrices 547
15.8 Markov Chains 550
Index 563
Preface
In this book the authors try to bridge the gap between the treatments of matrix
theory and linear algebra to be found in current textbooks and the mastery of
these topics required to use and apply our subject matter in several important
areas of application, as well as in mathematics itself. At the same time we
present a treatment that is as self-contained as is reasonably possible, beginning
with the most fundamental ideas and definitions. In order to accomplish this
double purpose, the first few chapters include a complete treatment of material to
be found in standard courses on matrices and linear algebra. This part includes
development of a computational algebraic development (in the spirit of the first
edition) and also development of the abstract methods of finite-dimensional
linear spaces. Indeed, a balance is maintained through the book between the
two powerful techniques of matrix algebra and the theory of linear spaces and
transformations.
The later chapters of the book are devoted to the development of material that
is widely useful in a great variety of applications. Much of this has become a part
of the language and methodology commonly used in modern science and en-
gineering. This material includes variational methods, perturbation theory,
generalized inverses, stability theory, and so on, and has innumerable applica-
tions in enginee ri ng, physics, economics, and statistics, to mention a few.
Beginning in Chapter 4 a few areas of application are developed in some
detail. First and foremost we refer to the solution of constant-coefficient systems
of differential and difference equations. There a re also careful developments of
the first steps in the theory of vibrating systems, Markov processes, and systems
theory, for example.
The book will be useful for readers in two broad categories. One consists of
those interested in a thorough reference work on matrices and linear algebra for
use in their scientific work, whether in diverse applications or in mathematics
xiv PREFACE
There are many exercises and examples throughout the book. These range
from computational exercises to assist the reader in fi xing ideas, to extensions of
the theory not developed in the text. In some cases complete solutions are given,
PREFACE XV
and in others hints for solution are provided. These are seen as an integral pa rt of
the book and the serious reader is expected to absorb the information in them as
well as that in the text.
In comparison with the 1969 edition of "The Theory of Matrices" by the first
author, this volume is more comprehensive. First, the treatment of material in the
first seven chapters (four chapters in the 1969 edition) is completely rewritten
and includes a more thorough development of the theory of linear spaces and
transformations, as well as the theory of determinants.
Chapters 8-11 and 15 (on variational methods, functions of matrices, norms,
perturbation theory, and nonnegative matrices) retain the character and form of
chapters of the first edition, with improvements in exposition and some addition-
al material. Chapters 12-14 are essentially extra material and include some quite
recent ideas and developments in the theory of matrices. A treatment of linear
equations in matrices and generalized inverses that is sufficiently detailed for
most applications is the subject of Chapter 12. It includes a complete description
of commuting matrices. Chapter 13 is a thorough treatment of stability questions
for matrices and scalar polynomials. The classical polynomial criteria of the
nineteenth century a re developed in a systematic and self-contained way from the
more recent inertia theory of matrices. Chapter 14 contains an introduction to the
recently developed spectral theory of matrix polynomials in sufficient depth for
many applications, as well as providing access to the more general theory of
matrix polynomials.
The greater part of this book was written while the second author was a
Research Fellow in the Department of Mathematics and Statistics at the Universi-
ty of Calgary . Both authors are pleased to acknowledge suppo rt during this
period from the University of Calga ry. Many useful comments on the first edition
are embodied in the second, and we are grateful to many colleagues and readers
for providing them. Much of our work has been influenced by the enthusiasms of
co-workers I. Gohberg, L. Rodman, and L. Lerer, and it is a pleasure to
acknowledge our continued indebtedness to them. We would like to thank H. K.
Wimmer for several constructive suggestions on an early draft of the second
edition, as well as other colleagues, too numerous to mention by name, who
made helpful comments.
The secretarial staff of the Department of Mathematics and Statistics at the
University of Calga ry has been consistently helpful and skillful in preparing the
typescript for this second edition. However, Pat Dalgetty bore the brunt of this
work, and we a re especially grateful to her. During the period of production we
have also benefitted from the skills and patience demonstrated by the staff of
Academic Press. It has been a pleasure to work with them in this enterprise.
P. Lancaster M. Tismenetsky
Calgary Haifa
CHAPTER 1
Matrix Algebra
A = a21
a22 a2n
where a1 (1 < i 5 m, 1 < j < n) denotes the element of the matrix lying on
the intersection of the ith row and the jth column of A.
Two matrices having the same number of rows (m) and columns (n) are
matrices of the same size. Matrices of the same size
[au 1 —
—
a 21
A=a22 a2n
1 —
0 Al = an -I, n
a„1 an. n —I ann o 0 ann
0 0 a „„
If all = a22 = • = am , = a, then the diagonal matrix A is called a scalar
matrix;
a 0 0
0 a
A= diag[a, a, . . . , a].
0
0 0 a
In pa rt icular, if a = 1, the matrix A becomes the unit, or identity matrix
1 0 0
j— FO 1
0'
0 01
arri;uu. I ma OF MATRICES 3
It is clear that, in the case of a real matrix (i.e., consisting of real numbers),
the notions of Hermitian and symmetric matrices coincide.
Returning to rectangular matrices, note particularly those matrices having
only one column (column-matrix) or one row (row-matrix) of length, or
size, n:
b,
b= 62 r cj.
c = [C1 c2
b„
The reason for the T symbol, denoting a row-matrix, will be made clear in
Section 1.5.
Such n x 1 and 1 x n matrices are also referred to as vectors or ordered
n-tuples, and in the cases n = 1, 2, 3 they have an obvious geometrical
4 IVIATKIA HLIiCCKA
XO PV(Q )
o X
Since vectors are special cases of matrices, the operations on mat ri ces will
be defined in such a way that, in the particular cases of column matrices and
of row mat rices, they correspond to the familiar operations on position
vectors. Recall that, in three-dimensional Euclidean space, the sum of two
position vectors is introduced as
[xi Yl Z1] + [x2 Y2 z2] Q [x i + X2 Yl + Y2 Z1 + Z21
This definition yields the parallelogram law of vector addition, illustrated
in Fig. 1.2.
I.L IHE LIPEKAJ IUNS OF ADDITION AND ACALAK MULTIPLICATION a
z
R (xi +x2, Yi +y 2 , z, +z2 )
Q (x2 , y2 , zz )
or
X1 I YI [XI + YI
x2 + Y2
•
X,, yn yn X,, +
That is, the elements (or components, or coordinates) of the resulting vector
are merely the sums of the corresponding elements of the vectors. Note that
only vectors of the same size may be added.
Now the following definition of the sum of two matrices A = [auu=mjn ,
These rules allow easy definition and computation of the sum of several
matrices of the same size. In particular, it is clear that the sum of any
p I MATRIX ALGEBRA
X + B = A.
Obviously,
A—B= [au — 6„];n;â t,
where
m, n m.n
A = aIJ i.l =1r B— i; i.l= 1
It is clear that the zero-matrix plays the role of the zero in numbers: a
matrix does not change if the zero-matrix is added to it or subtracted from it.
Before introducing the operation of multiplication of a matrix by a scalar,
recall the corresponding definition for (position) vectors in three-dimensional
Euclidean space: If aT = [a l a2 a3] and a denotes a real number, then
the vector aaT is defined by
aaT — [ow l aay aa3].
Thus, in the product of a vector with a scalar, each element of the vector is
multiplied by this scalar.
This operation has a simple geometrical meaning for real vectors and
scalars (see Fig. 1.3). That is, the length of the vector ai r is I a I times the length
of the vector aT, and its orientation does not change if a > 0 and it reverses
if a < O.
Passing from (position) vectors to the general case, the product of the
matrix A = [a1,] with a scalar a is the matrix C with elements ci; = aau,
that is, C a [aau]. We also write C = «A. The following properties of scalar
28 1 ^ T
28
tively, are known, then the computation of the scalar product is simple;
a• b=a lb 1 +a2 b Z +a3 b 3 . (2)
^
^aibl aTb 2 alb"
A a2b 1 a2b 2 arb" T mn
= [a ^bj7^.i= 1.
a,T"b1 4,b22 a,T„ b"
Using (4) we get the multiplication formula
/ m,n
AB = ^ a;k bkj (5)
k61 i,j =1
Exercise 1. Show that
30
2 —1 0 (4 —1 1•
D
[0 3 1] 2 1 7 3J
10
It is important to note that the product of two matrices is defined if and
only if the number of columns of the first factor is equal to the number of
1.3 MATRIX MULTIPLICATION 9
rows of the second. When this is the case, A and B are said to be conformable
(with respect to matrix multiplication).
Exercise 2. If a is a column matrix of size m and bT is a row matrix of size
n show that abT is an m x n matrix. (This matrix is sometimes referred to
as an outer product of a and b.) ❑
The above definition of matrix multiplication is standard; it focusses on
the rows of the first factor and the columns of the second. There is another
representation of the product AB that is frequently useful and, in contrast,
is written in terms of the columns of the first factor and the rows of the second.
First note carefully the conclusion of Exercise 2. Then let A be m x 1 and
B be 1 x n (so that A and B are conformable for matrix multiplication).
Suppose that the columns of A are a l , a2i ..., a, and the rows of B are
bT, bi ... g. Then it is claimed that
, ,
AB = k ak bk .
k=1
(6)
Some types of mat rices having special properties with regard to matrix
multiplication are presented in this section. As indicated in Exercise 1.3.5,
two matrices do not generally commute: AB # BA. But when equality
does hold, we say that the matrix A commutes with B; a situation to be
studied in some depth in section 12.4.
Exercise 1. Find all matrices that commute with the matrices
0 1 0
O (b) E0 0 1 .
(a) [ —1 J, 0 0 0
Answers.
a b c
d—c 01
(a) (b) 0 a b ,
c d '
[
0 0 a
where a, b, c, d denote arbitrary scalars. ❑
It follows from Exercise 1.3.4 that the identity matrix commutes with any
square matrix. It turns out that there exist other matrices having this property.
A general description of such matrices is given in the next exercise. Following
that, other special cases of commuting matrices are illustrated.
Exercise 2. Show that a matrix commutes with any square conformable
matrix if and only if it is a scalar matrix.
Exercise 3. Prove that diagonal matrices of the same size commute.
Exercise 4. Show that (nonnegative integer) powers of the same square
matrix commute.
Exercise S. Prove that if a matrix A commutes with a diagonal matrix
diag[a t , a2 , ... , an], where a; # a; , i # j, then A is diagonal. ❑
Another important class of matrices is formed by those that satisfy the
equation
A 2 = A. (1)
Such a matrix A is said to be idempotent.
Exercise 6. Check that the matrices
1 2 2
are idempotent.
A,
C
0
l 0
0]
and A2 = 0 0 - 1
0 0 1
with e = t 1.
Exercise 16. Show that if A is an involutory matrix, then the matrix
B = 3{1 + A) is idempotent. ❑
In this section some possibilities for forming new matrices from a given
one are discussed.
Given an m x n matrix A = DO, the n x m matrix obtained by inter-
changing the rows and columns of A is called the transpose of the matrix A
and is denoted by AT. In more detail, if
all a12 a1,, all a21 ami
It is clear that for a real matrix A the notions of transpose and conjugate
transpose are equivalent.
Exercise 9. Check that the conjugate transposes of the matrices
2 —3
1 3 —
A= 0 1 and B —
0 2i + 1
1 0
are the matrices
* — 201 _ — r i 0
A A
AT and B*
— [-3 1 0]
—
Observe that the 1 x 1 complex matrices (i.e., those with only one scalar
element) correspond precisely to the set of all complex numbers. Further-
more, in this correspondence the 1 x 1 Hermitian mat ri ces correspond to
the real numbers. More generally, it is useful to consider the embedding of
the Hermitian matrices in C"" as a generalization of the natural embedding
of It in C. In particular, note the analogy between the following representa-
tion of a complex square matrix and the familiar Cartesian decomposition
of complex numbers.
16 1 MATRIX A1.CtEBM
Exercise 14. Show that every square matrix A can be written uniquely in
the form A = A l + iA 2 where A t and A2 are Hermitian matrices.
, ❑
is skew-Hermitian.
Exercise 16. Show that the diagonal elements of a skew-Hermitian matrix
are pure imaginary. ❑
2 —1 0 0
then the matrices
[20 31] 24
, [1], [2 3 —2 4],
[2 0
are some of the submatrices of A. ❑
There is a special way of dividing a matrix into submatrices by inserting
dividing lines between specified rows and between specified columns (each
dividing line running the full width or height of the matrix array). Such a
division is referred to as a partition of the matrix.
Example 2. The matrix
2 3 —2 4
A= 0 1 1 0
2 —1 0 0
1.6 SUBMATRICES AND PARTITIONS OF A MATRIX 17
can be partitioned as
2 3 —2 4 2 3 —2 4
A —
0 1 1 0
or A = 0 1 1 0
2 —1 0 0 2 —1 0 0
and in several other ways. In the first case, the matrix is partitioned into
submatrices (or blocks) labeled
A= [A i l A1Z1 (1)
A21 A22 '
where
[2 31 2 41
A11 _ AIZ _ 1 A21 = [2 —1], A22 = [0 0].
0'
where
A 11 = [2], Al2 = [3 —2], A13 = [4],
A=
[A11 Al2 1 B
_r .1 0 1
0 A22 ' B22
B= —2 1 4 [B2:1ll
0 2 4
Check by a direct computation that
B21 J
The above examples show that in certain cases, the rules of matrix addition
and multip lication carry over to block-mat rices. The next exercises give the
general results.
I A Mw f1uR
r uLIINUauwl.lf d 19
Exercise 7. Prove that if each pair of blocks Ali and B^ of the matrices A
and B, respectively, are of the same size, then
A+B = [Au +8î1]îi=1, A—B= [Au — Bij]ijsi.
Exercise 8. Prove that if the m x l matrix A has blocks Aik of sizes
ln^ x Ik where 1 5 1 5 r and 1 5 k 5 a, and the I x n matrix B consists of
_
blocks Bkf of sizes Ik x n1 , where 1 5 j < s, then [see Eq. (1.3.5)]
a Ts
AB = [ E A a Bk1 ❑
k=1 1= 1
In the expression p(A) = q(A) d(A) + r(A), if r(A) is the zero polynomial or
has degree less than the degree of d().), then q(.) and r(A) are, respectively,
the quotient and remainder obtained on division of p(A) by d(A). In the case
r(A) = 0, d(A) is a divisor of p(A). Exercise 2 shows that such a representation
carries over to polynomials in a matrix and, in particular, implies that if
d(A) is a divisor of p(A), then d(A) is a divisor of p(A) in the sense that p(A) =
Q d(A) for some square matrix Q of the same order as A.
However, the well-known and important fact that a scalar polynomial of
degree l has exactly i complex zeros does not generalize nicely for a poly-
nomial in a matrix. The next example illustrates this.
Example 3. Let
p(A) = (A — 1)(A + 2)
and let A be an arbitrary 2 x 2 matrix. Then in view of Exercise 2,
p(A)=(A—IXA+21)=AZ+A-21. (1)
In contrast to the scalar polynomial p(A), the polynomial p(A) has more than
two zeros, that is, matrices A for which p(A) = O. Indeed, since the product
of two matrices A — I and A + 21 may be the zero matrix even in the case
of nonzero factors (see Exercise 1.3.6), the polynomial (1), in addition to the
zeros I and —21; may have many others. In fact, it is not difficult to check
that the matrix
A— I 01
1
is also a zero of p(A) for any number a.
t These zeros of p(A) are scalar matri ces. Obviously p(A) has no other scalar matrices as
zeros.
1.8 MISCELLANEOUS EXERCISES 21
1. Let a square matrix A e R"x" with nonnegative elements be such that the
sum of all elements of each of its rows (each row sum) is equal to one.
Such a matrix is referred to as a stochastic matrix. Prove that the product
of two stochastic matrices is a stochastic matrix.
.
3. The trace of a matrix A e C" X ", written tr A, is defined as the sum of all
the elements lying on the main diagonal of A. Prove the following.
(a) tr(aA + $B)= atr A+$ tr B for any A,B eC01 "; a, fi e C.
(b) If A e C"'"m and BE C"' X" then tr AB = tr BA.
(c) If A = [a v) j = 1 E C" X ", then tr AA* = tr A*A = E; J. 1, fia,; I Z.
(d) If A, B E C"" and A is idempotent, then tr(AB) = tr(ABA).
4. Prove that if n > 2, there are no n x n matrices A and B such that
—
AB—BA=1".
Hint. Use Exercise 3.
5. Use induction to prove that if X, Y e .^" X ", then for r = 1, 2, ... ,
•- 1
X' — Y' = Yj(X —
j =o
CHAPTER 2
a21 x + a22 Y = b2
for x, y and
a 11 x + a12Y + a13z = b1,
a21 x + a22y + a23z = b2 , (2)
a31 x + a32 y + a33z — b3
for x, y, z.
23
24 2 DETERMINANTS, INVERSE MATRICES, AND RANK
It is easily seen that in the case alla22 — a12a21 # 0 the solution (x, y)
of (1) can be written in the form
b1a22 — b2 a12
x — Y — 011
b2a11 — b1a21 (3)
4140122 — a12a21' 0 22 — a12a21'
where the denominator in the above expressions is said to be the determinant
of the coefficient matrix
A = rail a12
42 21 a22
of system (1). Thus, by definition
4211 012 p
det — alla22 — a12a21•
a21 022
Note also that the numerators in (3) can be written as determinants of the
matrices
of 1, 2, ... , n.
Let A = [a;i] i =, denote an arbitrary square matrix. Keeping in mind the
structure of determinants of small-sized matrices, we define a diagonal of A
as a sequence of n elements of the matrix containing one and only one element
from each row of A and one and only one element from each column of A.
A diagonal of A is always assumed to be ordered according to the row indices ;
therefore it can be written in the form
denoted briefly by t(i). This number is uniquely defined if the inversions are
counted successively, in the way illustrated in the next example. The permu-
tation is called odd or even according to whether the number t(j) is odd or
even. This property is known as the parity of the permutation.
Exercise 2. Find t(j), where j = (2, 4, 3, 1, 5).
det A = a 12a24a33a41a55•
Exercise 8. Check that
det diag[a11, a22, • • • , ann] = a1 1a22 • • ann.
det À = k det A. O
Recall that a function f is homogeneous over a field if f (kx) = kf(x) for
every k e . The assertion of Exercise 1 thus shows that the determinant
is a homogeneous function of the ith row (or column) of the matrix A.
Applying this property n times, it is easily seen that for any scalar k and
n x n matrix A
det(kA) = k" det A. (1)
W ` L16►bKM91NqN-lô, YNYM^{
` b IgA1K1iz,a, AN{! JiAN&
is in the (j, i)th position in AT, the second indices in (2) determine the row of
AT in which each element lies. To arrive at a term of det AT, the elements
must be permuted so that the second indices are in the natural order. Suppose
that this reordering of the elements in (2) produces the diagonal
ak , 1 , ak=2 , • • • , ako • (4)
2.2 PROPERTIES OF IJBTBRMINANT9 6S
Then the sign to be attached to the term (4) in det A T is (- 1)'00, where k
denotes the permutation
k = (kl, kz, ..., kn). (5)
To establish Property 2 it remains to show only that permutations (3) and
(5) are both odd or both even. Consider pairs of indices
(l,j1), (2,j2), (n,./„) (6)
corresponding to a term of det A and pairs
(k1,1), (k2, 2), (k,,, n) (7)
obtained from (6) by a reordering according to the second entries. Observe
that each interchange of pairs in (6) yields a simultaneous interchange of
numbers in the permutations
jo = (1, 2 ... , n) and j = (11, jz, . . . ,./N)-
Thus, a sequence of interchanges applied to j to bring it to the natural order
.io (refer to Exercise 2.1.5) will simultaneously transform j o to k. But then the
same sequence of interchanges applied in the reverse order will transform k
to j. Thus, using Exercise 2.1.5, it is found that t(j) = t(k). This completes the
proof. •
Note that this result establishes an equivalence between the rows and
columns concerning properties of determinants. This allows the derivation
of properties of determinants involving columns from those involving rows
of the transposed matrix and vice versa. Taking advantage of this, subsequent
proofs will be only of properties concerning the rows of a matrix.
Property 3. If the n x n matrix B is obtained by interchanging two rows
(or columns) of A, then det B = —det A.
PROOF. Observe that the terms of det A and det B consist of the same
factors, taking one and only one from each row and each column. It suffices,
therefore, to show that the signs of each term are changed.
We first prove this result when B is obtained by interchanging the first two
rows of A. We have
bu, = azJ,+ bzh = alh, bkJk = akik (k = 3, 4, ..., n),
and so the expansion for det B becomes
det B = E (- 1 )t b11,b2 2 ... b„1„
1
^U)
)V L UErERMINANIY, INVMl1SI! LV1AIKIL . AMU RAN&
4 —5 10 3
_ 1 —1 3 1
A
2 — 4 5 2'
—3 2-7-1
interchange the first two rows. Next add —4 (respectively, —2 and 3) times
the first row of the new matrix to the second (respectively, third and fourth)
rows to derive, using Properties 3 and 5,
1 —1 3 1
det A = —det
0 —2 —1 0 •
0 —1 2 2
l'iaw add —2 (respectively, —1) times the second row to the third (respec-
,'irely, fourth) to obtain
1 —1 3 1
0
det A = —det
0 3 2•
0 0 4 3
II remains now to add the third row multiplied by —4 to the last row to get
the desired (upper-) triangular form:
1 —1 3 1
det A = —det
I0 0 3 2 •
0 0 0
Tim, by Exercise 2.1.10, det A = 1.
Exercise .. If A is an n x n matrix and A denotes its complex conjugate,
prove the .t det  = det A.
Exercise 6. Prove that the determinant of any Hermitian matrix is a real
numbee,
Exerc ;, Show that for any matrix A, det(A *A) is a nonnegative number.
Exercise 8. Show that the determinant of any skew-symmetric matrix of
odd order is equal to zero.
34. l DETERMINANTS, INVERSE MATRICES, AND RANK
L — I„ B
det ^ =
]
det
-
A AB
la 0 '
where In stands for the n x n identity matrix.
Hint. Use Property 5 and add b,% times the jth column (1 < j 5 n) of the
matrix on the left to the (n + k)th column, k = 1, 2, ... , m. ❑
3 12 —1 0
_ —1 0 5 6
A
0 3 0 0
0 4 —1 2
can be easily evaluated by use of the cofactor expansion along the third row.
Indeed, the third row contains only one nonzero element and, therefore, in
view of (1);
3 —1 0
det A = 3A32 = 3( -1)
3+2 det —1 5 6 = ( -3)46 = —138. D
0 —1 2
1
0 0 —1 1
is exactly the nth term in the Fibonacci sequence
1, 2, 3, 5, 8, 13, ... = (a.}:=1,
where a„ = aft _ 1 + an _ 2 (n Z 3).
Exercise 5. (The Fibonacci matrix is an example of a "tridiagonal"
matrix. This exercise concerns general tridiagonal matrices.) Let J. denote
the Jacobi matrix of order n;
al bl 0 0
l c 2a b 2
J. = 0 c2 a3 b3 0
b n _,
0 cn _ 1 an
Show that
IJnI = anIJn- — bn_Lcn_11Jn- 2 I (n Z 3),
where the abbreviation I AI, as well as det A, denotes the determinant of A.
Exercise 6. Confirm that
1 1 1
xi X2 Xn
det x; x2 2X„ _ H(x; - xi). (8)
i s i<In
4-1 xz -1 x„- i
The matrix in (8), written Vn , is called the Vandermonde matrix of order n.
The relation in (8) states that its determinant is equal to the product of all
differences of the form x ; — x (1 < j < i 5 n).
Hint. Use Property 5 of Section 2.2 and add the ith row multiplied by —x i
toherwi+1(=n—,2.1)Aplyacoumnftrexp-
sion formula for the reduction of det V. to det V„_,.
,u
A°
3 5 _ a21
a31
a22
a32
a24
a34
12 4
a51 a52 a54
PROOF. The proof of this theorem can be arranged similarly to that of the
cofactor expansion formula. However, we prefer here the following slightly
different approach. First we prove that the product of the p x p leading
principal minor with its complementary minor provides p!(n — p)! elements
of det A. Indeed, if
(-1)=^^ ►a,jla2
J7... aid,
P;p (-1)=U"'a P+l , ;p + laP+ Z.;p+s • • anJn
.
are any terms of the given minor and its cofactor, then the product
is a term of det A, because j (1 < k < p) is less than j, (p + 1 < r < n).
Therefore the number t(j') + «f') is the number of inversions in the permu-
tation j = (j l, j2 , ..., j„), so it is equal to t(j).
Since a p x p minor has p ! terms and the (n — p) x (n — p) corresponding
complementary cofactor has (n — p) ! terms, their product provides p!(n — p) !
elements of det A.
Now consider an arbitrary minor
tz
A 11 =P (5)
i j2 jP
lying in the chosen rows of A. Starting with its fi lth row, we successively
interchange the rows of A with the preceding ones so that eventually the
minor in (5) lies in the first p rows of A. Note that D_, ik - D. k inter-
changes of rows are performed. Similarly, the shifting of (5) to the position
of the p x p leading submatrix of A requires D=, jk — sf_, k interchanges
of columns: Observe that the complementary minor to (5) did not change
and that, using the part of the theorem already proved and Property 3
of determinants, its product with the minor provides p ! (n — p) ! terms of
(- 1)' det A, where s = D., ik + D., jk .
It suffices now to replace the complementary minor by the corresponding
complementary cofactor. Thus, each of C„,,, distinct products of minors lying
in the chosen prows with their complementary cofactors provides p!(n — p)!
distinct element of det A. Consequently, the sum of these products gives
exactly
C „, P p!(n — p)! = n!
distinct terms of det A.
The proof of (3) is completed. The formula in (4) follows from (3) by use
of Property 2 of determinants. •
Exercise 2. If A 1 and A4 are square matrices, show that
A, 0
det = (det A ,xdet A4).
A3 A4
2.5 THE BINET-CAUCHY FORMULA .SY
Hint. Observe that the submatrix —D possesses only one nonzero minor
of order n — ni, and use (4).
Exercise 6. Show that if all minors of order k (1 5 k < min(m, n)) of an
x n matrix A are equal to zero, then any minor of A of order greater than m
k is also zero.
Hint. Use Laplace's theorem for finding a larger minor by expansion along
any k rows (or columns). ❑
det C =
1 2 m / 1 j2 jm (1 )
Isis <l: <), Sn V1 jz jm B ( 12 • • • m
That is, the determinant of the product AB is equal to the sum of the products
of all possible minors of (the maximal) order m of A with corresponding minors
of B of the same order.t
for at least one j, > n (1 < r < m), it follows, using Laplace's theorem, that
A ^1 2 m1 Cc ^1 2 m) (3)
det C =
1 Sit <J1.-. <1mAn 1 j2 jm 1 ji jm
Recalling the definition of C and the result of Exercise 2.4.5, it is not difficult
to see that
ml =
l 2 1)`BI11 - (4)
C°G t j2 jm 2 m'
jm ^
=
(1 3)B(1 2) + A (2 3)B(1
A(1 2)B 1 2) + A
(
=1 011-1
—1 -2 0 1 + 1 -1 1 ^1 1 1 1 + 11 11 1
= 2.
Exercise 2. Check that for any k x n matrix A, k < n,
A( 1 2 k)
det AA* _ > O.
1sJL J2< <Jks J1 h Jk
Exercise 3. Applying the Binet-Cauchy formula, prove that
fl R
Eai c+ Eaidi
=1
E b,c,
i =1
i =1
E bidi
i =1
15/<k5n bJ
a1• ak
b,, l l
l dJ dk
4'1 LC,Cn w,,.n.s.u. a...u.v .. ..... .......—, ....
(
i )2 5
a1bi ( a )( ± b)
i= 1
in real numbers.t
Hint. Use Exercise 3.
Exercise S. Check that the determinant of the product of any (finite)
number of square matrices of the same order is equal to the product of the
determinants of the factors. ❑
t For a straightforward proof of this inequality (in a more general context), see Theorem
3.11.1.
then
2 0 0 0I
1° _1I. T
—1 3 1 3
adj A = — —
1
1
1
3 I 2 —1
1 3
I
2—l
1 1 I =
6
0
—2
—2 2
7 0.
3 4
❑
12 11
02
Some properties of adjoint matrices follow immediately from the
definition.
Exercise 2. Check that for any n x n matrix A and any k e
adj(AT) = (adj A)T,
adj(A*) = (adj A)*,
adj I = I,
adj(kA) = k" -1 adj A. ❑
The following striking result, and its consequences, are the main reason
for interest in the adjoint of a matrix.
Theorem 1. For any n x n matrix A,
A(adj A) _ (det A)1. (1)
PROOF. Computing the product on the left by use of the row expansion
formula (2.3.1) and the result of Exercise 2.3.3, we obtain a matrix with the
number det A in each position on its main diagonal and zeros elsewhere,
that is, (det A)I. •
Exercise 3. Prove that
(adj A)A = (det A)I. ❑ (2)
Theorem 1 has some important consequences. For example, if A is n x n
and det A # 0, then
det(adj A) = (det A)"" 1 . (3)
In fact, using the property det AB = (det A)(det B) with Eq. (1) and the
known determinant of the diagonal matrix, the result follows.
The vanishing or nonvanishing of det A turns out to be a vitally important
property of a square matrix and warrants a special definition: A square
matrix A is said to be singular or nonsingular according to whether det A is
zero or nonzero.
2 DETERMINANTS, INVERSE MATRICES, AND RANK
B add A (5)
det A
A -1A(B I — B2) = 0,
and the required equality B 1 = B2 is obtained.
Similarly, the uniqueness of the left inverse of a nonsingular matrix is
established. Hence if det A # 0, then both one-sided inverses are unique.
Furthermore, since they are both given by Eq. (5), they are equal.
2 DETERMINANTS, INVERSE MATRICES, AND $.ANK
det l ^
C ] = det(D — CA - 'B)det A. (6)
[A B1
complement of A in
CD
Exercise 16. A matrix A e R"x" is said to be orthogonal if ATA = I: Show
that:
(a) An orthogonal matrix A is invertible, det A = t 1, and A - ' = AT.
2.7 ELEMENTARY OPERATIONS ON MATRN;è:S
gu ► = (1)
1
1 •• 0
1
INS 2 DETERMINANTS, INVERSE MATRICES, AND RANK
1
P21 = (1) k (2)
1
1
Finally, adding k times row 1 2 to row 1 1 of A is equivalent to multiplication
of A from the left by the matrix
1
(i 1 ) 1 k
1 (i2)
Ec31 = Et31 =
or , (3)
02) 1 (i 1)
1
1
depending on whether 1 1 < 12 or it > i2 . Again note that the operation to
be performed on A is applied to I in order to produce E(3).
Similarly, the multiplication of A on the right by the appropriate matrices
E"11, EI21, or Ec31 leads to analogous changes in columns.
The matrices EI1J, E(21, and Ec31 are called elementary matrices of types 1, 2,
and 3, respectively.
Properties 1, 3, and 5 of determinants may now be rewritten as follows:
det EI 11A = det AE01 = —det A,
det EI21A = det AEI21 = k det A, (4)
det E(3)A = det AE (3) = det A.
Exercise 1. Check that
det = —1, det E(^ ) = k, det Ec31 = 1.
2.7 ELEMENTARY OPERATIONS ON MATRICES 49
Exercise 2. Check that the inverses of the elementary matrices are also
elementary of the same type. ❑
The application of elementary operations admits the reduction of a general
m x n matrix to a simpler one, which contains, in particular, more zeros
than the original matrix. This approach is known as the Gauss reduction
(or elimination) process and is illustrated below. In fact, it was used in Example
2.2.4 for computing a determinant.
Example 3. Consider the matrix
0 1 2 —1
A=[ — 2 0 0 6.
4-2-4-10
Multiplying the second row by -1, we obtain
0 1 2 —1
A l = EN = [ 1 0 0 —3 ,
4-2-4-10
where E12) is a 3 x 3 matrix of the type 2 with k = —1, i = 2.
Multiplying the second row by —4 and adding it to the third, we deduce
0 1 2 —1
A2= EPA1 = 1 0 0 —3 ,
0 —2 —4 2
where EV) is a 3 x 3 elementary matrix of the third type with k = —4,
i1= 3, i 2 = 2.
The interchange of the first two rows provides
1 0 0 —3
A3= EPA2 = 0 1 2 —1 ,
0 —2 —4 2
and the first column of the resulting matrix consists only of zeros except one
element equal to 1.
Employing a 3 x 3 matrix of type 3, where k = 2, i, = 3, i2 = 2, we obtain
1 0 0 —3
A4 = E(43)A3 = 0 1 2 —1 (5)
0 0 0 0
and it is clear that A4 is the simplest form of the given matrix A that can be
achieved by row elementary operations. This is the "reduced row-echelon
Zli LPG 1camuv... .u,
form" for A, which is widely used in solving linear systems (see Section 2.9).
Performing appropriate elementary column operations, it is not difficult
to see that the matrix A can be further reduced to the diagonal form:
1 0 0 0
A5 = A4 0s ) E6)E1?) = 0 1 0 0 ,
0 0 0 0
where E53), E)6 ), ES)3) denote the following matrices of type (3):
1
0 0 3 1 0 0 0 1 0 0 0
E53) = 0 1 0 0 ^ E6 ) — 0 1 —2 0 E^3) — O 1 0 1
0 0 1 0 0 0 1 0' 0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1
^1 0]. , In (8)
m [ 0] ' [01. 0]
Since elementary matrices are nonsingular (see Exercise 1), their product
is also and, therefore, the matrix in Eq. (9) can be written in the form PAQ,
where P and Q are nonsingular. This point of view is important enough to
justify a formal statement.
Corollary 1. For any m x n matrix A there exist a nonsingular m x m
matrix P and a nonsingular n x n matrix Q such that PAQ is one of the matrices
w (8).
Consider now the set of all nonzero m x n matrices with fixed m and n.
Since the number of matrices of the types in (8) is strictly limited, there must
exist many different matrices having the same matrix from (8) as its reduced
form.
Consider two such matrices A and B. In view of Theorem 3, there are two
sequences of elementary matrices reducing A and B to the same matrix.
Hence
Ek Ek _ ... EIAEk +I ... Ek+s = &R t—I .. . É I BÉ, +I ... EI+„
1 DETERMINANTS, INVERSE MATRICES, AND RANK
= m,
rank[I„, 0] rank I Ô
J = n,
rank[ = r, rank 1„ = n.
0]
2 DETERMINANTS, INVERSE MATRICES, AND RANK
0 ^ Akk
A case in which inequality occurs in Exercise 7 is illustrated in the next
exercise.
Exercise 8. The rank of the matrix
1 0 0 0
0 0 0 1
[A 11 Al2 ]
0
A22 1.0 0 0 0
0 0 1 0
is 3, although
rank[0 0] + ran[ 2. ❑
1=
The rank of a matrix has now been investigated from two points of view:
that of reduction by elementary operations and that of determinants. The
Concept of rank will be studied once more in Section 3.7, from a third point
of view.
VCa.:.aüauNn,I..l, Ail . QM= Oanan,a.Gl, n..a/ 0.111.111.
if and only if D = CA -1B; that is, if and only if the Schur complement of A
is the zero matrix.
Hint. See Eq. (2.6.7). ❑
and using the definition of the matrix product, we can rewrite (1) in the
abbreviated (matrix) form
Ax = b. (4)
Obviously, solving (1) is equivalent to finding all vectors (if any) satisfying
(4). Such vectors are referred to as the solution vectors (or just solutions) of
the (matrix) equation (4), and together they form the solution set.
Note also that the system (1) is determined uniquely by its so-called
augmented matrix
[A b]
of order m x (n + 1).
Recall now that the basic elimination (Gauss) method of solving linear
systems relies on the replacement of the given system by a simpler one
having the same set of solutions, that is, on the transition to an equivalent
system. It is easily seen that the following operations on (1) transform the
system to an equivalent one:
(1) Interchanging equations in the system;
(2) Multiplying an equation by a nonzero constant;
(3) Adding one equation, multiplied by a number, to another.
Recalling the effect of multiplication of a matrix from the left by ele-
mentary matrices, we see that the operations 1 through 3 correspond to left
multiplication of the augmented matrix [A b] by elementary matrices
Eel), E" ), E' 3 ►, respectively, in fact, by multiplication of both sides of (4) by
such matrices on the left. In other words, for any elementary matrix E. of
order m, the systems (4) and
Em Ax = Em b
are equivalent. Moreover, since any nonsingular matrix E can be decomposed
into a product of elementary matrices (see Exercise 2.7.8), the original system
(4) is equivalent to the system EAx = Eb for any nonsingular m x m
matrix E.
Referring to Theorem 2.7.1, any matrix can be reduced by elementary row
operations to reduced row-echelon form. Therefore the system (4) can be
transformed by multiplication on the left by a nonsingular matrix É to a
simpler equivalent form, say,
ÉAx = Lb (5)
Where [LA Lb] has reduced row-echelon form.
This is the reduced row-echelon form of the system, which admits calcula-
tion of the solutions very easily. The argument above establishes the following
theorem.
Theorem 1. Any system of in linear equations in n unknowns has an equivalent
system in which the augmented matrix has reduced row-echelon form.
In practice, the reduction of the system to its reduced row-echelon form is
carried out by use of the Gauss elimination process illustrated in Example
2.7.3.
Exercise 1. Let us solve the system
x2 + 2x3 = —1,
— 2x1 = 6,
4x 1 -2x2 -4x 3 =-10,
having the augmented matrix
0 1 2 —1
—2 0 0 6
4- 2-4 -10
considered in Example 2.7.3. It was shown there that its reduced row-
echelon form is
1. 1 0 0 —3
0 1 2 —1 ,
0 0 0 0
(see Eq. 2.7.5) which means that the system
1•x1 +0•x2 +0•x3 =-3,
0•x 1 +1•x2 + 2x 3 =-1
is equivalent to the original one. Hence the relations
x1 = —3, x2 = -1 - 2t, x3 = t,
where t may take any value, describe all of the (infinitely many) solutions of
the given system. ❑
and
x, = 1, x2 =-2, x3 =-1. ❑
Another approach to solving the linear system Ax = b with nonsingular
A relies on the formula (2.6.5) for finding the inverse A -1 . Thus
x = A -1 b = (adj A)b.
det A
In more detail, if A u denotes the cofactor of au in A, then
Consider Theorem 2.7.3 once more, and suppose that the matrix A is
square of siz n x n and, in addition, is such that the reduction to canonical
form can be completed without elementary operations of type 1. Thus, no
row or coluiin interchanges are required. In this case, the canonical form is
[I 0
Do = '
0 0
if r < n, and we interpret Do as In if r = n. Thus, Theorem 2.7.3 gives
(EkEk_k ... Ei)A(Ek+iEk+2 ... Ek+ ) = Do
,
A(1 2 (2)
AA(
kl7 1 2)' 12 n— )
are all nonzero. Then there is a unique lower triangular L with diagonal
elements all equal to one and a unique upper triangular mat7ix U such that
A=LU.
Furthermore, det A = det U = u11u22 • - -uw".
PROOF. The proof is by induction on n; the size of matrix: A. Clearly, if
n = 1, then a ll = 1 - a ll provides the unique factorization (4 the theorem.
Now assume the theorem true for square matrices of size (n -• 1) x (n — 1),
and partition A as
(A
A j (3)
l T" -'
aM
a 2JJJ
al ,
L_ Lw -1 and (4)
dr 1 ^
UwT 1
U=1LD aM -
,
where c and d are vectors in C„-1 to be determined. Form the product
L„ U„_ L„_ 1c
LU =
dT U a„„
and compare the result with Eq. (3). It is clear that if c and d are chosen so
that
r r
(5)
then the matrices of (4) will give the required factorization. But Eqs. (5) are
uniquely solvable for c and d because L„_ 1 and U„_ 1 are nonsingular. Thus,
' 1a 1 and dT = a2 U„
and subsitution in (4) determines the unique triangular factors of A required by
the theorem.
For the determinantal property, recall Exercise 2.1.10 and Corollary 2.5.1
to deduce that det L = 1 and det A = (det L)(det U). Hence det A =
det U. III
Exercise 1. Under the hypotheses of Theorem 1, show that there are unique
lower- and upper-triangular matrices L and U with all diagonal elements
equal to one (for both matrices) and a unique diagonal matrix D such that
A = LDU.
x+ x x
x x+ .1
A=
x
x x x x + ^l
(a) Prove that det A = .t" - '(nx + A).
(b) Prove that if A' exists, it is of the same form as A.
Hint. For part (a), add all columns to the first and then subtract the
first row from each of the others. For part (b), observe that a matrix A
is of the above form if and only if PAPT = A for any n x n permutation
matrix. (An n x n matrix P is a permutation matrix if it is obtained from
In by interchanges of rows or columns.)
2.11 MISCELLANEOUS EXERCISES 65
+detl a, az dx a"^'
nn
Idet AI 2 < f I=1
E la ;; 12.
j =I
FSO sl 52 sn-
s1 s2 sA
n
Hn =
Sn-1 sA s2A-2
Co C1 Cn — 1
Cn — 1 CO Cl Cn- 2
c=
C1
C1 cn - 1 Co
Answer. det C = r["=, f(e;), where f(x) = E7: c; xt and 8 1 , F.2 ,..., e,,
denote the distinct nth roots of unity.
Hint. CV„ = VA diag[ f (e 1 ), f(e 2 ), ... , f (en)], where VA is the Vander-
monde matrix constructed from the ei (i = 1, 2, ... , n).
15. Prove that the determinant of a generalized circulant matrix Z`,
Co C1 c„-1
acn _ 1 CO C1 Cn- 2
e= ac„_ 2 ac„_ 1 Co
C1
ŒC1 ac2 aCA _ 1 Co
is equal to n;=, f(e,), where f(x) = ô c ; x` and e1, e 2 , ..., en are all
the nth roots of a.
L,, t lglOI.CLLANEVUS CXERCISFS 67
0
1 —1
1 —1
0 1
and A2= 0 —1
0
1
0 -1 1
Oli
are matrices with zeros below (respectively, above) the main diagonal
and ones elsewhere.
17. Check that the inverses of the n x n matrices
1 0 0
2 —1 0
—2 1
—1 2 —1
A= 1 —2 and
nd B=
—1
0
0 —1 1
0 1 —2 1
are the matrices
1 0 0
2
3
A- '—
n - 1 1 0
n n- 1 2 1
and
1. 1 1
12
1
2
B- '_
n-1 n — l '
12 n-1 n
an-1
is said to be a Toeplitz matrix, and it is said to be generated by the quasi-
polynomial
p(A)=an_l2,n-1 +.••+a 1 2+ ao +a_ 1,1 -1 +•••+ a—n+1Â. ■ +1
(For example, matrices A and B of Exercise 17 are Toeplitz mat rices.)
Show that an invertible matrix A = [a;;];, ;= 1 is Toeplitz if and only if
there are companion matrices of the forms
— Poo— 1 — Pn— z — Po
Cpl) = 1 0 0
0 1 0
and
0 —Po
0
1 — Pn -1
such that
Cv A = AC2 ► .
Hint. — Pn-1 Pn— z
[ Po] = [a -1 a_ 2 a_n]A_ 1
for any a_ n .
19. Let the polynomial p()) = 1 + a.1 + 11.1 2 (b 0) with real coefficients
have a pair of complex conjugate roots re* . Confirm the decomposition
1 0
1 01 0
a 1
z 1 dz 1
b a l =C
ca d3
0 0 en 1 0 'dn 1
b a l
where
sin(k — 2)9 d sin lap
ck (k = 2, 3, . .. , n).
r sln(k — 1)9' r stn(k — 1)9 •k
20. Consider the n x n triangular Toeplitz matrix
1 0 0
at
1
then
xt 1 x2 1 ;1
Ca = Ÿ„ diag X5 1 V
1 1 1
Xt X2 xs
k1 k2 k,
t Observe that this is the factorization of the polynomial q(.1) = d'p(1 /.1.) into linear and
quadratic factors irreducible in R.
where Y" denotes the generalized Vandermonde matrix
P. = [ W, w2 WS].
Linear, Euclidean,
and Unitary Spaces
The notion of an abstract linear space has its origin in the familiar
three-dimensional space with its well-known geometrical properties.
Consequently, we start the discussion of linear spaces by keeping in mind
properties of this particular and important space.
71
72 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
Thus, the properties Al-A5 and Sl-S5 are basic in the sense that they are
sufficient to obtain computational rules on the set of position vectors which
extend those of arithmetic in a natural way.
It turns out that there are many sets with operations defined on them like
vector addition and scalar multiplication and for which these conditions
make sense. This justifies the important unifying concept called a "linear
space." We now make a formal definition, which will be followed by a
few examples.
Let So be a set on which a closed binary operation (+))(like vector addition)
is defined. Let be a field and let a binary operation (scalar multiplication)
be defined from x .9 to ^ If, in this context, the axioms Al-A5 and
Sl-S5 are satisfied for any a, b, c e S and any a, fi e F, then .9' is said to be
a linear space over F
74 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
Thus, the concept of a linear space admits the study of structures such
as those in Exercise 3 from a unified viewpoint and the development of a
general approach for the investigation of such systems. This investigation
will be performed in subsequent sections.
Before leaving this section, we make two remarks. The first concerns
sets and operations that fail to generate linear spaces; the second concerns
the geometrical approach to n-tuples.
Exercise 4. Confirm that the following sets fail to be linear spaces under
the familiar operations defined on them over the appropriate field :
(a) the set of all position vectors of unit length in three-dimensional
space;
3.2 SUBSPACES 75
(b) the set of all pairs of real numbers of the form (1, a);
(e) the set of all real matrices of the form
fa
e^
with a fixed bo # 0;
(d) the set of all polynomials of a fixed degree n;
(e) the set of all Hermitian matri ces over C;
(f) the set of all rotations of a rigid body about a fixed point over R.
In each case pick out at least one axiom that fails to hold. ❑
It should be noted that a set can be a linear space under certain operations
of addition and scalar multiplication but may fail to generate a linear
space (over the same field) under other operations. For instance, the set LR2
is not a linear space over R under the operations defined by
(al, a2)+(b1,b2)= (a t + b 1 -1, a2 + b2),
a(a 1, a2) = (sal, aa2)
for any (a 1, a2), (b 1 , b2) a R2 and any a e R.
To avoid any such confusion, we shall always assume that the operations
on the sets of n-tuples and matrices are defined as in Chapter 1.
In conclusion, note also that a triple (a l, a2 , a3), which was referred to
as a vector with initial point at the origin and terminal point at the point P
with Cartesian coordinates (a 1, 02 , 03) (this approach was fruitful, for
instance, for the definition of operations on triples), can be considered as a
point in three-dimensional space. Similarly, an n-tuple (0 1, 02 , ... , a„)
(a, e S i = 1, 2, ... , n) may be referred to as a" point" in the ” n-dimensional"
linear space .^„.
3.2 Subspaces
Let .9 denote a linear space over a field .5r and consider a subset .So o of
elements from 9. The operations of addition and scalar multiplication are
defined for all elements of .9 and, in particular, for those belonging to .So o .
The results of these operations on the elements of .90 are elements of .So
It may happen, however, that these operations are closed in .9,, that is, when
both operations are applied to arbitrary elements of .9 0, the resulting
elements also belong to Soo . In this case we say that Soo is a subspace of .9
76 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
Thus, a nonempty subset .9'0 of a linear space .9' over F is a subspace of .9'
if, for every a, b e .50 and any a e
(1) a+ b e Soo .
(2) aae Soo .
(Observe that here, and in general, bold face letters are used for members
of a linear space if some other notation is not already established.)
It is not difficult to see that .90 is itself a linear space under the operations
defined in .9' and hence the name "subspace."
If .9' is a linear space, it is easily seen that if 0 is the zero element of .9', then
the singleton {0} and the whole space 5o are subspaces of .9'. These are known
as the trivial subspaces and any other subspace is said to be nontrivial.
It is also important to note that the zero element of .9' is necessarily the
zero element of any subspace bo of So (This follows from Exercise 3.1.2(a).)
It is easy to verify that criteria (1) and (2) can be condensed into a single
statement that characterizes a subspace. Thus, a nonempty subset go of a
linear space .9' over .F is a subspace of .9 if and only iffor every a, b e .9'0
and ever a, fl e as + flb a .°o .
Example 1. (a) Any straight line passing through the origin, that is, the
set of triples
{x=(x 1 ,x2 ,x3):x 1 = at, x 2 =bt,x 3 =ct,(—oo <t<co)},
is a subspace of 683.
(b) Any plane in R3 passing through the origin, that is, the set of triples
{x = (x 1 , x2 , x3) : ax 1 + bx2 + cx 3 = 0, a2 + b2 + c2 > 0),
for fixed real numbers a, b, c, is a subspace of 683.
(c) The set of all n x n upper-(lower-) triangular matrices with elements
from R is a subspace of 68"x"
(d) The set of all real n x n matrices having zero main diagonal is a
subspace of 68"x"
(e) The set of all symmetric real n x n matrices is a subspace of 68""". O
Note again that any subspace is itself a linear space and therefore must
contain the zero element. For this reason, any plane in R3 that does not pass
through the origin fails to be a subspace of R.
Exercise 2. Prove that if .9'1 and .V3 are subspaces of a linear space So
then the set .Sol n .992 is also a subspace of Y. Give an example to show that
.9'1 u .1'2 is not necessarily a subspace of .!
3,2 SUBSPACES 77
Exercise 3. Let A e .F" x ". Prove that the set of all solutions of the homo-
geneous equation Ax = 0 forms a subspace of $". ❑
The set of all vectors x for which Ax = 0 is the null space or kernel of the
matrix A and is written either N(A) or Ker A. Observe that Exercise 3
simply states that Ker A is a subspace.
Exercise 4. Find Ker A (in R3) if
_ 1 -1 0
A
1 1 1 •
: ae .^}
Returning to the problem formulated above and making use of the new
terminology, we conclude that any subspace containing the given elements
a„ ... , a", must contain all their linear combinations.
It is easily verified that the set of all linear combinations (over F) of the
elements a l , a2, ... , a" belonging to a linear space .9' generates a subspace
.9'0 of .So Obviously, this subspace solves the proposed problem: it contains
the given elements themselves and is contained in any other subspace
. 9i of •' such that al, a2, ... , an a .VI .
Theorem 1. If .9' is a linear space and a 1 , a2 , ... , a„ E S^, then span {a 1 , .... a„}
is the minimal (in the sense of inclusion) subspace containing a l , a2 , ...,a„.
Example 2. (a) If a, and a 2 stand for two nonparallel position vectors
in R3,, then their linear hull is the set of all vectors lying in the plane containing
a, and a 2 and passing through the origin. This is obviously the minimal
subspace containing the given vectors.
(b) Let a, and a 2 be parallel position vectors in R3. Then the linear hull
of both of them, or of only one of them, is the same, namely, the line parallel
to each of a, and a 2 and passing through the origin (in the language of PO.
Let the elements a l , a2 , ..., a„ from a linear space .9' over .9" span the
subspace ..9 = span{a 1 , a2 , ... , a„}, and suppose that b', can be spanned
by a proper subset of the given elements. For example, suppose that
3.4 LINEAR DEPENDENCE AND INDEPENDENCE 81
,^o = span{a 2 , a3 , ..., a„}. This means that the spaces consisting of linear
combinations of the n elements a 1 , a 2 , ... , an and n — 1 elements a 2 ,
a3 ,... , a„ are identical. In other words, for any fixed linear combination of
a 1, a 2 ,... , a„ there exists a linear combination of the last n — 1 elements
such that they coincide.
For the investigation of this situation we take an element a of the subspace
.9'0 expressed as:
a = E alai,
where a„a 2 ,...,a„a.54", and a i sO. Since ,9 is also the span ofa 2 ,a 3 ,...,a„,
2 fJia1 , where N2, N3, ... , p„
then necessarily a = a SW. Hence
nn
+
E ai ai — L.+
^{
I't a i
i =1 1=2
=--Y1 a1 =
Y, 1 Yj+1 .. V.
a1 —•••— y, a ;-1— ai+1 —.— y, an,
Y^
and consequently, a; is a linear combination of the remaining elements of
{a 1f a2 ,..., a„}.
Furthermore, let a e span(a 1 , a 2 , ... , a„}, so that a is a linear combination
of the spanning elements a,, a 2 , ... , a„. Since for, some j, the element a ; is
a linear combination of a1, ... , ai _ 1 a + 1 , ... , a", then so is a. Hence
,
ylai= 0,
i =1
1 2 0 0 0 0 0 0 0
0 1 0 , 0 0 1 , 0 0 0;
0 0 0 0 0 1 0 1 0
(c) the set of m x n matrices Eu (1 < i 5 m, 1 5j 5 n) with 1 in the
(i, j) position and zeros elsewhere;
(d) the set of polynomials
p 1 (x) = 1 + 2x, p2(x) = x2, p3(x) = 3 + 5x + 2x2 ;
(e) the set of polynomials 1, x, x 2, ... , x";
(f) any subset of a set of linearly independent elements. ❑
Let the elements a l , a 2 , ... , a,„ (not all zero) belong to a linear space .So
over the field F, and let
Soo = span {a 1 , a 2 , ... , a„,)
If the elements a,, a 2 , ... , a„, are linearly independent, that is, if each of
them fails to be a linear combination of others, then (Theorem 3.4.1) all
a, (i = 1 , 2, ... , m) are necessary for spanning .9'0 . In the case of linearly
.
and no a, (1 < i 5 n) in Eq. (3) can be discarded. In this case the elements
in (1) are referred to as basis elements of
Recalling our discussion of linear hulls in the first paragraph of this
section, we conclude that any space generated by a set of m elements, not
all zero, has a basis consisting of n elements, where 1 5 n 5 m. Some examples
of bases are presented in the next exercise; the results of Example 3.3.1 and
Exercise 3.4.2 are used.
Exercise 1. Verify the following statements:
(a) The set of n unit vectors e 1 , e2 , ..., e" e .9"" defined in Example
3.3.1(b) is a basis for F". This is known as the standard basis of the space.
(b) The set {1, x, x 2, ... , x"} is a basis for the space of polynomials
(including zero) with degree less than or equal to n.
(c) The mn matrices E. (1 <_ i <_ m, 1 5 j < n) defined in Example
3.3.1(c) generate a basis for the space .11" 1 ". D
Note that a basis necessarily consists of only finitely many vectors.
Accordingly, the linear spaces they generate are said to be finite dimensional;
the properties of such spaces pervade the analysis of most of this book. If a
linear space (containing nonzero vectors) has no basis, it is said to be of
infinite dimension. These spaces are also of great importance in analysis and
applications but are not generally within the scope of this book.
Proposition 1. Any element of a (finite-dimensional) space .9' is expressed
uniquely as a linear combination of the elements of a fixed basis.
In other words, given a e .° and a basis {a 1 , 0 2 , ... , a"} for So, there is one
and only one ordered set of scalars a1 , a2 , ... , a" e F such that the repre-
sentation (2) is valid.
PROOF. If n
CC pi th,
a= E aial =-- i=1
1=1
L
pp
then
E
i= 1
(ai — fii)a1= t).
a= E a,a,
i= 1
3.5 THE NOTION OE A BASIS 85
The following example illustrates the fact that, in general, a linear space
does not have a unique basis.
Exercise 3. Check that the vectors
1 0 1
a1 = 1 , a2 = 1 , a3 = O
0 1 1
form a basis for 083, as do the unit vectors e l, e2 , e3 .
Hint. Observe that if a = [al a2 a3]T E R3, then its representation with
respect to the basis {a 1, a2 , (23) is
i[at + a2 — a3 -a1 + 22 + 23 al — a2 + a3]T
and differs from the representation [a l a2 a3]T of a with respect to the
standard basis. ❑
In view of Exercise 3, the question of whether or not all bases for .So have
the same number of elements naturally arises. The answer is affirmative.
Theorem 1. Al! bases of a finite-dimensional linear space .9' have the same
number of elements.
PROOF. Let {6 1 , b2 , ... , b„) and [el, e2 , ... , c,,,) be two bases in 9'. Our
aim is to show that n = m. Let us assume, on the contrary, that n < m.
First., since {b 1 , b2 , ..., b,) constitutes a basis for 9' this system consists
of linearly independent elements. Second, every element of .' and, in
particular, each c ; (1 S j < m) is a linear combination of the b, (i = 1, 2,
, n), say,
■
c; = E flu b,, 1 5 j_<m. (4)
I= I
We shall show that Eq. (4) yields a contradiction of the assumed linear
independence of the set {c l , .... , c,,,}. In fact, let
E y c= 0. (5)
^=1
86 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
E yj( E pu b) = i=1
E- J=1
(E YJPIJ bI = O.
,
J=1 1=1
The linear independence of the system {b 1, b 2 , ... , b"} now implies that
E yifiu= 0,
J=1
i=1,2,...,n, (6)
and, in view of Exercise 2.9.5, our assumption m > n yields the existence
of a nontrivial solution (y i , y2 , ... , y,„) of (6). This means [see Eq. (5)]
that the elements c 2, c2 , ... , c", are linearly dependent, and a contradiction
with the assumption m > n is derived.
If m < n, exchanging roles of the bases again leads to a contradiction.
Thus m = n and all bases of .9' possess the same number of elements. •
The number of basis elements of a finite-dimensional space is, therefore,
a characteristic of the space that is invariant under different choices of basis.
We formally define the number of basis elements of a (finite-dimensional)
space .9'to be in the dimension of the space, and write dim So for this number.
In other words, if dim .9' = n (in which case .9' is called an n-dimensional
space), then .9' has n linearly independent elements and each set of n + 1
elements of .9' is necessarily linearly dependent. Note that if .9' = {0) then,
by definition, dim .9' = O.
The terminology "finite-dimensional space" can now be clarified as
meaning a space having finite dimension. The "dimension" of an infinite-
dimensional space is infinite in the sense that the span of any finite number
of elements from the space is only a proper subspace of the space.
Exercise 4. Show that any n linearly independent elements b 1 , b2 ,..., b"
in an n-dimensional linear space .9' constitutes a basis for .9).
SOLUTION. If, on the contrary, span{b 1, b2 , ..., b"} is not the whole space,
then there is an a e .9' such that a, b t , b2 , ... , k are linearly independent.
Hence dim .9' z n + 1 and a contradiction is achieved.
Exercise S. Check that if .Soo is a nontrivial subspace of the linear space .'
then
dim Soo <dim9 ❑
Let .9' be an n-dimensional linear space. It turns out that any system of r
linearly independent elements from .9' (1 5 r s n — 1) can be extended to
form a basis for the space.
Proposition 2. If the elements {a,)i = , (1 5 r 5 n — 1) from a space .9' of
dimension n are linearly independent, then there exist n — r elements
a, + ,, ar+ 2, - • • , a„ such that (a 1}7= , constitutes a basis in So
PROOF. Since dim .9 = n > r, the subspace 91 = span{a,, a2 , ... , a,} is
a proper subspace of .So Hence there is an element a, + 1 e .9' that does not
belong to 5",. The elements {a 1}1:1 are consequently linearly independent,
since otherwise a,+, would be a linear combination of {a,}; = , and would
belong to 91. If r + 1 = n, then in view of Exercise 4 the proof is completed.
If r + 1 < n the argument can be applied repeatedly, beginning with
span{a 1 , a2 , ... , a ,} until n linearly independent elements are con-
, + ,
structed. •
Let a,, a 2 , ..., ar (1 5 r 5 n — 1) belong to .9, with dim .9 = n. Denote
.9', = span{a,, a 2 , ... , ar). Proposition 2 asserts the existence of basis
elements a, + ,, ar+2 , ..., a„ spanning a subspace .9, in .50 such that the
union of bases in .50, and 502 gives a basis in .9. In view of Eq. (2), this provides
a representation of any element a e .9' as a sum of the elements
Let .9' be a finite-dimensional linear space and let S0, and 92 be subspaces
of Sa The sum .50, + .9°2 of the subspaces .9', and .92 is defined to be the
set consisting of all sums of the form a, + a 2 , where a, a .50, and a 2 e 502.
88 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
It is easily seen that ,91 + $02 is also a subspace in S° and that the operation
of addition of subspaces satisfies the axioms Al-A4 of a linear space (see
Section 3.1), but does not satisfy axiom A5.
Note also that the notion of the sum of several subspaces can be defined
similarly.
Exercise 1. Show that if a, = [1 0 1]T, a 2 = [0 2 —1]T, b 1 =
[2 2 1]T, 62 = [0 0 1]T, and
.^, = span{a,, 02 ), ,So2 = span{6 1, b2),
then ,q' + 992 = Q. Observe that dim 51 = dim 92 = 2, while dim 683
=3. ❑
A general relationship between the dimension of a sum of subspaces
and that of its summands is given below.
Proposition 1. For arbitrary subspaces So l and Sot of a finite-dimensional
space,
dim(b 1 + .9'2 ) + dim(.9; n Y2) = dim Sot.
dim .9'1 + (1)
The proof of (1) is easily derived by extending a basis of So l n .9'2 (see
Proposition 3.5.2) to bases for .9's and Sot .
Obviously, Eq. (1) is valid for the subspaces considered in Exercise 1.
More generally, the following statement is a consequence of Eq. (1).
Exercise 2. Show that for any positive integer k,
PROOF. We shall give details of the proof only for the case k = 2, from
which the general statement is easily derived by induction.
Suppose, preserving the previous notation,
={b 1, 6 2,...,br,C1 e2,...,cm}
,
E S;c= O.
E Œ,b,= j=1
1=1
The linear independence of the basis elements for Sol and 502 now yields
a, = 0 (i = 1, 2, ... , r) and Si = 0 (j = 1, 2, ... , m) and, consequently, the
linear independence of the elements of d'. Thus the system d' is a basis for
`oo • ■
In the next three exercises it is assumed that .9 is the sum of subspaces
-V2, • • • , ..rk and necessary and sufficient conditions are formulated for
the sum to be direct.
Exercise 5. Show that Soo = if and only if
k
dim .9'0 = E dim .So,.
1= 1
❑
3.7 MATRIX REPRESENTATION AND RANK 91
for i = 2, 3, ... , k.
Exercise 7. Confirm that .9'0 = Et, 1'• '; if and only if for any nonzero
a; e. (i = 1, 2, ... , k), the set {a 1 , a 2 , ... , ak} is linearly independent.
Hint. Observe that linear dependence of {a„ ... , a k} would conflict with
(5). ❑
Let .9' denote an m-dimensional linear space over .9= with a fixed basis
(a„ a2 ,... , a m). An important relationship between a set of elements from .9'
and a matrix associated with the set is revealed in this section. This will
allow us to apply matrix methods to the investigation of the set, especially
for determining the dimension of its linear hull.
Recall that if b e .9 and b = Er_ 1a ia,, then the vector [a 1 a2 a,„]T
was referred to in Section 3.5 as the representation of b with respect to the
basis {air! 1 . Now we generalize this notion for a system of several elements.
yL
• b3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
E^ =0 (3)
1 =1
implies, in view of (2),
P m PP
E j3 (i=1
E E E=1aü $/)ai = 0,
J=1 aUai) = i=I
deduce the relation (3). Hence the systems {b} 1 and {aj}y= , are both
linearly dependent or both linear independent. Rearranging the order of
elements, this conclusion is easily obtained for any p elements from
(6 1, b2 , ... , b„) and their representations (1 S p 5 n).
Hence, we have proved the following.
Proposition 1. The number of linearly independent elements in the system
{6 1, b2 , ... , b"} is equal to the dimension of the column space of its (matrix)
representation with respect to a fixed basis.
In other words, if SBA is the column space of a matrix representation A
of the system 0 1 , 112 , , b ") with respect to some basis, then
dim(span{b 1, 6 2 , ... , b")) = dim WA= dim(Im A). (5)
The significance of this result lies in the fact that the problem of determining
the dimension of the linear hull of elements from an arbitrary finite-dimen-
sional space is reduced to that of finding the dimension of the column space
of a matrix.
Note also that formula (5) is valid for a representation of the system with
respect to any basis for the space 9'. Consequently, all representations of the
given system have column spaces of the same dimension.
The next proposition asserts a striking equality between the dimension
of the column space '8A of a matrix A and its rank (as defined by the theory of
determinants in Section 2.8).
Theorem 1. For any matrix A,
dim WA rank A.
PROOF. We need to show that the dimension of the column space is equal to
the order of the largest nonvanishing minor of A e arm" (see Theorem 2.8.1).
For convenience, let us assume that such a minor is the determinant of the
submatrix of A lying in the upper left corner of A. (Otherwise, by rearranging
columns and rows of A we proceed to an equivalent matrix having this
property and the same rank.) Let the order of this submatrix be r. Consider
the (r + 1) x (r + 1) submatrix
all a1r a1,
A= arr
Let A be a matrix from .F"' x ". It follows from Exercise 2.8.2 and Corollary
3.7.1 that, if m = n and A is nonsingular, then rank A = n, Ker A = (0),
and hence, when det A # 0,
rank A + dim(Ker A) = n.
This relation is, in fact, true for any m x n matrix.
Proposition 1. For any A a F"",",
rank A + dim(Ker A) = n. (1)
PROOF. Let rank A = r and, without loss of generality, let it be assumed
that the r x r submatrix A 11 in the left upper corner of A is nonsingular :t
A= A11 Al2]
^ A21 A22]'
where A ll e ,F•x ' and det A 11 # O. By virtue of Theorem 2.7.4, the given
matrix A can be reduced to the form
 = P- 1AQ-1 , (2)
= [' 0]
where P and Q are invertible m x m and n x n matrices, respectively.
Recall (see Section 2.8) that the rank is not changed under, in pa rticular, the elementary
operations of interchanging columns or rows.
96 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
Note that r = rank A = rank A and observe that Ker A (being the linear
hull of the unit vectors er+1 , e, + 2, e") has dimension n — r.
... ,
It remains now to show that the dimension of Ker d or dim(KerP - ' AQ'')
,
PROOF. Consider the subspace 90 spanned by all the column vectors of both
A and B. Clearly, dim go 5 rank A + rank B. On the other hand, the
column space of A + B is contained in .9's and hence, in view of Theorem
3.7.1,
rank(A + B) 5 dim .moo .
The assertion follows. ■
The next proposition shows that any nonzero matrix can be represented
as a product of two matrices of full rank.
3.8 SOME PROPERTIES RELATED TO RANK 97
PROOF. Let two nonsingular matrices P and Q be defined as in Eq. (2). Writing
Q =[Q11
P = [P1 PO and
Q2
where P 1 E P2 e .orrnX (m r) Q 1 e sp. " Q2 a jr (n - r) "e It is easy to
, ,
see that the blocks P2 and Q2 do not appear in the final result:
1 QQ
A= P O 12 = P1^rQ1 = PiQI•
LIr
The desired properties of P 1 and Q 1 are provided by Exercise 1. •
Note that Eq. (3) is known as a rank decomposition of A.
Exercise 4. Show that any matrix of rank r can be written as a sum of r
matrices of rank 1.
Exercise 5. If A and B are two matrices such that the matrix product AB
is defined, prove that
rank(AB) 5 min(rank A, rank B).
SOLUTION. Each column of AB is a linear combination of columns of A.
Hence 48,18 ç and and the inequality rank (AB) 5 rank A follows from that
of Exercise 3.5.5 and Theorem 3.7.1. Passing to the transpose and using
Exercise 2.8.5, the second inequality is easily established.
Exercise 6. Let A be a nonsingular matrix from .F'. Prove that for any
B ea " k and C e.fo"",
rank(AB) = rank B, rank(CA) = rank C.
Hint. Use Exercise 3.5.6 and Theorem 3.7.1. ❑
Let .9° denote an m-dimensional space and let b e So In this section we will
study the relationship between the representations l and l' of b with respect
to the bases d' = {a 1, a 2 , ... , a,,,} and d" = {ai, a2, ... , a„,} for .9'. For this
purpose, we first consider the representation P E .f"x of the system é
with respect to the basis d". Recall that by definition,
P o L«;,7r..
provided
Now we are in a position to obtain the desired relation between the repre-
sentations of a vector in two different bases. Preserving the previous nota-
tion, the representations JI and Iv of b E 5o are written
e
P
P = [PI P2
P Ym JT , r = IPI P2
where
m m
b= E fi,ai = E ^a'; .
1 =1 J =1
3,9 CHANGE OF BASIS AND TRANsmoN MATRICES 99
b= G
1=1
Niai = E aiiYJl a i
t=1 I=1
= E a t;f6;, i = 1, 2, . . . , m.
1
Rewriting the last equality in matrix form, we finally obtain the following:
Theorem 1. Let b denote an element from the m-dimensional linear space .9'
over If 11 and p' (e .Az") are the representations of b with respect to bases
s and f' for .9', respectively, then
P' = PQ, (2)
where P is the transition matrix from d' to t'.
Exercise 2. Check the relation (2) for the representations of the vector
b = [1 2 —1] with respect to the bases {e 1, e2 , e3 } and {a1 , a 2 , a 3} defined
in Exercise 3.7.1.
Exercise 3. Let the m x n matrices A and A'• denote the representations
of the system {b 1, b2 , ..., b„} with respect to the bases d° and or for .9'
(dim .9' = m), respectively. Show that
A' = PA, (3)
where P is the tr ansition matrix from e to t'.
Hint. Apply Eq. (2) for each element of the system {b 1, b2 , ... , b„). ❑
In Exercise 3 we again see the equivalence of the representations of the
same system with respect to different bases. (See the remark after Theorem
3.7.2.)
Exercise 4. Check Eq. (3) for the matrices found in Exercise 1 in this
section and Exercise 3.7.1 ❑
The relation (2) is characteristic for the transition matrix in the following
sense:
Proposition 1. Let Eq. (2) hold for the representations p and a' of every
element b from .9' (dim S" = m) with respect to the fixed bases l and g',
respectively. Then P is the transition matrix from e to t'.
100 3 LINEAR, EUCLIDEAN, AND UNITARY SPAN
PROOF. Let A' denote the representation of the elements of d with respect
to d'. Since the representation of these elements with respect to d is the iden-
tity matrix I," and (2) for any b e SP yields (3), it follows that A' = P. It re-
mains to note that the representation A' of the elements of g with respect
to d' coincides with the transition matrix from d to d'. •
Note that the phrase "every element b from .9" in the proposition can lx
replaced by "each element of a basis d"
We have seen in Exercise 1 that the exchange of roles of bases yields the
transition from the matrix P to its inverse. More generally, consider three
bases {a i }i 1, {a;}m 1 , and {a}?. 1 forte Denote by P (respectively, P') the
transition matrix from the first basis to the second (respectively, from the
second to the third).
Proposition 2. If P" denotes the transition matrix from the first basis u
the third then, in the previous notation,
P" = P'P.
PROOF. In fact, for the representations it F, and %" of an arbitrarily choses
element b from .9' with respect to the above bases, we have (by using Eq. (2
twice)
"= P' r = P'(P m =P'PR•
Hence, applying Proposition 1, the assertion follows. •
Corollary 1. If P is the transition matrix from the basis d to the basis d'
then P - 1 is the transition matrix from d' to d.
Example S. Describe the solution set of the system given in the matrix form
1 —2 1 1 xi _[..
1 x2 2030_
Ax =
-
— b.
3 2 4 0 x3
— 1
[
_1 —2 —2 2 x 4 1
3.10 SOLUTION OF EQUATIONS 103
SOLUTION. It is easily seen that the last two rows of A are linear combina-
tions of the first two (linearly independent) rows. Hence rank A = 2. Since
b is a linear combination of the first and the third columns of A, then rank
[A b] = 2 also, and the given equation is solvable by Theorem 1. Observe,
0 = [1 0 —1 0] is a particular solution of the system furthemo,ax
and that Ax' = 0 has n — r = 2 linearly independent solutions. To find
them, first observe that the 2 x 2 minor in the left upper corner of A is not
zero, and so we consider the first two equations of Ax' = 0:
[21 _ [001,
0] [x2] + [31
or
Linear spaces with inner products are important enough to justify their
own name: A complex (respectively, real) linear space ,' together with an
inner product from .9' x fP to C (respectively, R) is referred to as a unitary
(respectively, Euclidean) space.
To distinguish between unitary spaces generated by the same S" but
different inner products (x, y) and (x, y), we shall write V I = .9'( , ) and
qi2 = .9'( , ), respectively. Recall that any n-dimensional space over the
field of complex (or real) numbers can be considered a unitary (or Euclidean)
space by applying an inner product of the type in Eq. (2), or of the type in
Eq. (3), or of some other type.
Exercise S. Check that the length of the element x E .9' having the repre-
sentation [a l a2 a"]T with respect to a basis {a 1 , . , an } for .9' is
given by
Il xll 2 = E
J, k=I
aJak(aJ, ak) (4)
The notion of angle is not usually considered in unitary spaces for which
the expression on the right side of (6) is complex-valued. Nevertheless,
extending some geometrical language from the Euclidean situation, we
say that nonzero elements x and y from a unitary (or Euclidean) space are
orthogonal if and only if (x, y) = 0. In particular, using the standard scalar
product in C" of Exercise 1, nonzero vectors x, y in C" are o rthogonal if and
only if y*x = x*y = 0.
Exercise 8. Check that any two unit vectors e, and of (1 S i < j < n) in C"
(or R") are orthogonal in the standard inner product.
Exercise 9. (Pythagoras's Theorem) Prove that if x and y are orthogonal
members of a unitary space, then
Ilx ± yI1 2 = 411 2 + 11Y11 2 . ❑ (7)
Note that if x, y denote position vectors in R', then fix — Y11 is the length
of the third side of the triangle determined by x and y. Thus, the customary
Pythagoras's theorem follows from Eq. (7). The next result is sometimes
described as Appollonius's theorem or as the parallelogram theorem.
Exercise 10 Prove that for any x, y e (11,
IIx + y11 2 + 1Ix — yII 2 = 2(11x11 2 + 11y11 2 )
=1 (y y;
;,
It is quite clear that the procedure we have just described can be used to
generate a basis of orthonormal vectors for a unitary space, that is, an
orthonormal basis. In fact, we have the following important result.
Theorem 1. Every finite-dimensional unitary space has an orthonormal basis.
PROOF. Starting with any basis for the space an orthogonal set of n
elements where n = dim 'f/ can be obtained by the Gram-Schmidt process.
Normalizing the set and using Exercises 1 and 3.5.4, an orthonormal basis
for ell is constructed. •
Compared to other bases, orthonormal bases have the great advantage
that the representation of an arbitrary element from the space is easily found,
3.12 ORTHOGONAL SYSTEMS 109
as is the inner product of any two elements. The next exercises give the
details.
Exercise 3. Let {a l , a2 , ... , a„} be an orthonormal basis for the unitary
space'* and let x = ^; 1 a; ai be the decomposition of an x e in this basis.
Show that the expansion coefficients a; are given by a, = (x, a,), i = 1, 2, ... , n.
Exercise 4. Let {a 1 , a2 , ... , an} be a basis for the unitary space '* and let
Prove that (x, y) = F7= 1 a, pi for any x, y e a* if and only if the basis (a,)7= 1
is orthonormal. ❑
Hint. Use Exercises 3-5, and recall the comment on Exercise 3.11.4. ❑
It is instructive to consider another approach to the orthogonalization
problem. Once again a linearly independent set {x 1 , x2 , ... , x,} is given
and we try to construct an orthogonal set (y 1 , ... , y,) with property (1)
We again take y 1 = x1 , and for p = 2, 3, ... , r we write '
p-1
yp = xp + E aipxJ
J=1
where a,,,, . .. , a,,_,,,, are coefficients to be determined. We then impose
the orthogonality conditions in the form (x,, y p) = 0 for i = 1, 2, ... , p — 1,
and note that these conditions also ensure the orthogonality of yp with
Y1, ... , yp _,. The result is a set of equations for the a's that can be written
as one matrix equation:
Gotp = - Pp , (3)
110 LINEAR, EUCLIDEAN, AND UNITARY SPACES
PROOF. Observe first that the Gram matrix is Hermitian and, consequently,
the Gramian is real (Exercise 2.2.6).
Taking advantage of Theorem 1, let y 1 , y2 , ... , y„ bean orthonormal basis
for the space and for j = 1, 2, ..., r, let xJ =.1 a;kyk . Define the r x n
matrix A = [N Jk] and note that A has full rank if and only if x 1 , ... , x, are
linearly independent (see Theorem 3.7.2).
Now observe that
_
Exercise 7. Let A e C"" with m > n and let rank A = n. Confirm the
representation of A = QR, where R is an n x n nonsingular upper triangular
matrix, Q e C" x ", and Q*Q = 1„ (i.e., the columns of Q form an ortho-
normal system).
Hint. If z,, = yj=1 t;"tA *; (p = 1, 2, ... , n) are the orthonormal vectors
constructed from the columns A *; of A, show that Q = [z 1 z2 z„]
and R = ([t 1 ]; p =,)- `, where tip = tr, satisfy the required conditions. ❑
Observe that when the QR decomposition of a square matrix A is known,
the equation Ax = b is then equivalent to the equation Rx = Q*b in which
the coefficient matrix is triangular. Thus, in some circumstances the QR
decomposition is the basis of a useful equation-solving algorithm.
Exercise 8. Let (u 1 , ... , u") be a basis for a unitary space oil and let v e 'Pl.
Show that v is uniquely determined by the n numbers (e, u;), j = 1, 2, ... , n. ❑
Let .9'1 and .2 be two subsets of elements from the unitary space 'W = .9'
( , ). We say that .So l and ^2 are (mutually) orthogonal, written .9' 1 1 .'2,
if and only if (x, y) = 0 for every x e .9 1 , y e 992 .
Exercise 1. Prove that two nontrivial subspaces .9' 1 and Sot of 'i'l are
orthogonal if and only if each basis element of one of the subspaces is
ort hogonal to all basis elements of the second.
Exercise 2. Show that (x, y) = 0 for all y e ah implies x = O.
Exercise 3. Let b" be a subspace of the unitary space V. Show that the
set of all elements of orthogonal to .'p is a subspace of V. ❑
The subspace defined in Exercise 3 is said to be the complementary ortho-
gonal subspace of .'o in ail and is written 91. Note that Exercise 2 shows that
= (0) and { 0 } 1 = all (in 'W).
Exercise 4. Confirm the following properties of complementary orthogonal
subspaces in 'Yl:
(a) (b )1 = Yo ,
(b) If Sol e 9'2 , then .9' D
(c) (Y1 + Y2 )1 = 9l n $P2,
(d) (.Sol n Y2) 1 = soi + .SoZ . ❑
112 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES
In view of Eq. (1), each element x of* can be written uniquely in the form
x =x 1 +x 2i (2)
where x 1 a Soo and x 1 1 x2 . The representation (2) reminds one of the de-
composition of a vector in 118 3 into the sum of two mutually orthogonal
vectors. Hence, the element x 1 in Eq. (2) is called the orthogonal projection
of x onto moo .
Exercise 6. Show that if 46 is a unitary space,
'VI = E
i=1
where .9', = span{a,}, 1 S i 5 n, and (a 1 , a2 , ... , a„}` is an orthonormal
basis for 4i.
Exercise 7. Let
k
^i = E ®.91
I- 1
(P, Q) °— f 1 P(x)Q(x) dx
1
P, Q e Y.
Linear Transformations
and Matrices
Let us denote by 2(501i 502) the set of all linear transformations mapping
the linear space .9'1 into the linear space 502. Both spaces are assumed
throughout the book to be finite-dimensional and over the same field Sr.
When .9'1 = .9', we write 9(6') instead of 2(.^ .9').
Note that, combining the conditions in (1), a transformation T e 2(.9'1, `'2)
can be characterized by a single condition:
T(ax 1 + fix2) = aT(x 1) + /3T(x 2) (2)
for any elements x 1 , x2 from .51 and a, fi e . This implies a more general
relation:
( k k
T l E a; x; = a; T(xil,
1 =1 1 =1
where x 1 e ai e (i = 1, 2, . . . , k). In particular,
T(0) = 0,
T(—x) _ —T(x),
T(x 1 — x2) = T(x 1 ) — T(x 2).
120 4 LINEAR TRANSFORMATIONS AND MATRICES
) 1
x,
x1 — x2+x3
x2
T
x3 =[X2_X3+ X4
0
X4
Exercise 7. Check that, with respect to the operations introduced, the set
2)(9'1, sot) is a linear space over F.
Exercise 8. Let {x 1 , x2 , ... , x„} and (Yi, y2i . . . , y„,} denote bases in .9
and .9'2 , respectively. The transformations 74 E 2'(.°1, .9'2), i = 1, 2, ... , n
and j = 1, 2, ... , m, are defined by the rule
= yj and Tj(xk) = O if k
then T,(x) is defined for any x E Sol by linearity. Show that the To's constitute
a basis in 2(b1, .502). In particular, dim 2M, .9's) = mn. ❑
4.1 LINEAR TRANSFORMATIONS 121
Consider three linear spaces ."„ b"2 , and .So3 over the field and let
T1 e 2'(.i„ .9'2), T2 e 2(So2 .93). If x E .So then T1 maps x onto an element
,
by 1(x) g x for all x E .99 has the property that IT = TI = T for all
T e 2'(S0).
It was shown in Exercise 3.1.3(b) that . " x " is a linear space. The space
F"" together with the operation of matrix multiplication is another
example of an algebra.
Polynomials in a transformation are defined as for those in square matrices
(see Section 1.7).
Exercise 12. Show that for any T e 2'(So) there is a nonzero scalar poly-
nomial p(2) _ Ei =o pEAlt with coefficients from F such that p(T) plT`
is the zero transformation.
Let T e .(.Sol, .9'2), where dim .5P1 = n and dim Sot = m. We have seen
in Example 4.1.3 that, in the particular case Sol = F" and .9.2 = F'", any
m x n matrix defines a transformation T E 2(F ", Fm). Now we show that
any transformation Te 2'(Sol, .9'2 ) is associated with a set of m x n matrices.
Let d = {x 1 , x2 , ... , x") and / = { y 1 , y2 , ... , y",} be any bases in the
spaces .501 and Sot over F respectively. Evaluating the images T(x x)
(j = 1, 2, ... , n) and finding the representation of the system {T(x,),
T(x 2 ), ... , T(x")) with respect to the basis (see Section 3.7), we obtain
The matrix Ag , g = [a1]. 1 e Fym %" is defined by this relation and is said
to be the (matrix) representation of T with respect to the bases (e', c4). Note
that the columns of the representation A,, g of T are just the coordinates of
T(x,), T(x 2 ), ... , T(x") with respect to the basis / and that, as indicated
in Section 3.7, the representation of the system {T(x j)}1 =I with respect to
the basis / is unique. Hence, also fixing the basis 1, the (matrix) representa-
tion of T with respect to the pair (r, ') is uniquely determined.
4.2 MATRIX REPRESENTATION 123
x1 x2 + x3
T x2 = x1 + x2 , •x = [x1 X2 x3]T ..1"3,
x3 Xi
X i — X2
0 11
A
= 1 0 01
1 -1 0
The representation of a linear transformation Te 2(5", .9"'") with respect
to the standard bases of the two spaces is referred to as the standard (matrix)
representation of T.
In view of Exercise 3.7.1, it is clear that the transition from one basis in 9'2
to another changes the representation of the system {T(.0}3= 1 . There is
also a change of the representation when a basis in ,9' 1 is replaced by another.
Exercise 2. For the same T e 2(.5r 3 , 54) defined in Exercise 1, verify that
the representation with respect to the bases
1 0
Xi =[1 , x2= 0 1 x3= 1
0 1 0
for .:3 , and
1 1 0 0
y1 = —1 ,
y2= 0 , y3= 1 , ya= 0 .
o 1 0 t
0 0 0 1
__........nm....UN AND MATRICAS
for .5x4, is
0 —1
_ 1 0 1O
A^
2 0 1
0 1 —1
Hint. Calculate T(x;) (j = 1, 2, 3). The solution of the jth system of type (1)
(within = 4) gives the jth column of A'. ❑
Hint. Define a transformation Ton the elements of if by Eq. (1) and then
extend T to .Sol by linearity:
T(x) = E a,T(x;),
! =1
provided x = 21= 1 a x, .
, ❑
Tk(x j) =E
i= 1
aVyt
4,2 MATRIX REPRESENTATION 125
aTk(x;) _ E
=1
aai;)Yl,
r
M
Note that the relation in (4) obviously holds for transformations T, and
T2 both acting on 99, and their representations with respect to the pairs of
bases (B, g) and (g, gt), where $, g, and 9t are all bases for .991 . This general
approach simplifies considerably when we consider transformations acting
on a single space and their representations with respect to one basis (that is,
from the general point of view, with respect to a pair of identical bases).
Clearly, the correspondences in Eqs. (2), (3), and (4) are valid for transforma-
tions T1 and T2 acting on the linear space .9' and their representations with
respect to one basis or (= g = 91) for .5o Using Exercise 3 and supposing
the basis I to be fixed, we say that there is a one-to-one correspondence
between algebraic operations involving linear transformations in .r($ )
and those involving square matrices. Such a correspondence, given by
126 4 LINEAR TRANSFORMATIONS AND MATRICES
Along with the established isomorphism 2(501, .502) •-0 .9"" between the
spaces £°(.91, .q2) and Jr'," there are isomorphisms .9'1 4-► F " and .9'2 4-) .Fm
between pairs of linear spaces of the same dimension (see Exercise 3.14.4).
For instance, these isomorphisms are given by the correspondence x •-• a
(respectively, y4- ► 11), where a (respectively, p)
denotes the representation
of x e .Sol (respectively, y e .92) with respect to a fixed basis S for 501
(respectively, g for 502). It turns out that these isomorphic correspondences
carry over to the relations y T(x) and A,, g a.1=
Theorem 2. In the previous notation, let the bases de, g be fixed and x H a,
1,
y. , T.--► A,, g .Then
-
y = T(x) (5)
if and only if
p = As. g^ (6)
PROOF. Let T e 2'(501, 22). Assume Eq. (5) holds and that x i-+ a and
y .--* p, where a = Ca l 02 CI"]T and l_
[fl 1 P2 PO T . Then,
using Eq. (1),
•
Therefore
11
x23
1.
Exercise 5. Verify that the matrices
200I 1-1 1
1 0 0 and 0 1 0
0 1 1 1 11
are representations of the same transformation T e 2'(R 3) with respect to the
standard basis and the basis [1 1 03T, [0 1 13T, [ 1 0 1]T respec- ,
Let T e 2(S92, 992). To describe the set .d7. of all matrices that are repre-
sentations of T in different pairs of bases, we shall use the results of Section
3.9 about how a change of bases influences the representation of a system.
As before, assume that dim .Sol = n and dim .9'2 = m.
Let g and Ç' be two bases in Sot . If clr is a basis in .91 , then the representa-
tions Ar, e and Ar , g. of T are related by Exercise 3.9.3 as
Ar , g, = PAr. g , (1 )
where P e .Fm x m is the transition matrix from the basis Ç to the basis
Now consider along with Z a basis Z' in Sol and let Q denote the transition
matrix from a to 8'. Then we will show that
= A r. g' Q. (2)
128 4 LINEAR TRANSFORMATIONS AND MATRICRS
Indeed, if a and a' denote the representations of x E .Sol with respect to the
bases if and d", respectively, then, by Theorem 3.9.1,
a'= Qa. (3)
Let y = T(x) and let $ denote the representation of y with respect to the basis
f9'. By Theorem 4.2.2 we have
where P and Q denote the transition matrices from V to V' and from S to S',
respectively.
0 —1 0
0 0 01
Recalling the definition of equivalent matrices (Section 2.7), the last
theorem can be restated in another way:
Theorem 1'. Any two matrices that are representations of the same linear
transformation with respect to different pairs of bases are equivalent.
It is important to note that the last proposition admits a converse state-
ment.
Exercise 2. Show that if A is a representation of T e 2(.91 , .92), then any
matrix A' that is equivalent to A is also a representation of T (with respect
to some other pair of bases).
4.3 MATRIX REPRESENTATIONS 129
Hint. Given A' = PAQ - 1, use the matrices P and Q as transition ma tr ices
and construct a pair of bases with respect to which A' is a representation
of T. ❑
Thus, any transformation T e (.So l, 9'2 ) (dim .9 = n, dim Sot = m) gen-
erates a whole class sadT of mutually equivalent m x n matrices. Furthermore,
this is an equivalence class of matrices within Son" (see Section 2.7). Given
Te 9(.9 , 602), it is now our objective to find the "simplest" matrix in this
class since, if it turns out to be simple enough, it should reveal at a glance
some important features of the "parent" transformation.
Let T E 21.991 , .52), T # O. The class of equivalent matrices generated
by T is uniquely defined by the rank r of an arbitrary matrix belonging to
the class (see Section 2.8). Referring to Section 2.7, we see that the simplest
matrix of this equivalence class is of one of the forms
Lo 0,
0
or [I. 01 g;]. I, (6)
Denote the set of matrix representations of T (of the last theorem) by s4.
In contrast to the problem of finding the "simplest" matrix in the equivalence
class SiT of equivalent matrices, the problem of determining the "simplest"
matrix in the equivalence class A. of similar matrices (that is, the canonical
form of the representations of T with respect to similarity) is much more
complicated. For the general case, this problem will be solved in Chapter 6,
while its solution for an especially simple but nonetheless impo rtan t kind of
transformation is found in Section 4.8.
4.4 SOME PROPERTIES OF SIMILAR MATRICES 131
p(A) = E p (PBP -1 )1
r=o
=P E
I= o
p1 BI'P -1 = Pp(B)P - 1 •
1
When A and B are similar the matrix P of the relation A = PBP -1 is
said to be a transforming matrix (of B to A). It should be pointed out that this
transforming matrix P is not unique. For example, P may be replaced by PP 1
where P 1 is any nonsingular matrix that commutes with B.
Exercise 2. Show that
AZ ] r 02 A 1 E.`Fkxk , A2
2 E.irmx 1",
L Ol — All
and that nonsingular matrices of the form
E .rk xk
0 ,
p1 P2 E.^M x "',
LP2 0
P1]
where P 1 commutes with A l and P2 commutes with A2, are some of the
transforming matrices.
described in terms of row and column operations depending on the type (see
Section 2.7) of elementary matrix. The following exercise is established in this
way.
A=
! 01 i
and I = LD
are not similar, although they have the same rank, determinant, and trace.
SOLUTION. Representing T as
x2 + x3 0 1 1
X X2 —xl i +x2 01 +x3 ô '
Xl I
x l — x2 1 —1 0
it is not difficult to see that Im T coincides with span{v 1i e2 , e3 }, where
e3 = [0 1 1 11T, e 2 = [1 1 0 — 11T, 53 = [1 0 0 01T. The vec-
tors vi (i = 1, 2, 3) are linearly independent and, therefore, rank T = 3.
Exercise 2. Show that, for any linear subspace 9' of Sol,
dim T(.9') 5 dim .' (1)
and hence, that rank T 5 dim
Hint. Use Exercise 4.1.5. Compare this result with that of Exercise 3.5.6,
Exercise 3. Check that for any T1, T2 E (Sol, Y2),
Im(T1 +T2)clmT1 + Im T2 .
Then prove the inequalities
frank T1 — rank T2 1 5 rank(T1 + T2) <_ rank T1 + rank 7 .
Hint. Use Exercise 3.6.2; compare this with the assertion of Proposition
3.8.2.
PROOF. For any x 1 , x2 a Ker T and any a, /I e.F, the element ax 1 + fx2
also belongs to the kernel of T:
T(ax 1 + /3x2) = ŒT(x 1 ) + fiT(x 2 ) = a0 + 110 = 0.
This completes the proof. •
The dimension of the subspace Ker T is referred to as the defect of the
transformation and is denoted by def T.
Exercise S. Describe the kernel of the transformation defined in Exercise
4.2.1 and determine its defect.
SOLUTION. To find the elements x e 501 such that T(x) = 0, we solve the
system
x2 + X3
x1 + x2 =
0
X1
x1 - x2
E aJ E arJYr = tE , aiaiJ Yt = 0.
=1=1
( i
The linear independence of the elements y 1 , y2 , ..., y,„ from the basis 4
implies that Aa g a = 0, where .41, 11 = [a 1]7 J. 1. It remains only to apply
'
Proposition 3.7.1. •
Using the notion of the defect for matrices, we say that def T = def A
for any A that is a representation of T. It should be emphasized that, in
general, the subspaces Im T and Im A, as well as Ker T and Ker A, lie in
different spaces and are essentially different.
Theorems 2 and 4, combined with the relation (3.8.1), yield the following
useful result:
Theorem 5. For any Te 2'(.Y 1 , 92),
rank T + def T = dim .9 1. (2)
This result is in fact a particular case of the next theorem, which is inspired
by the observation that Ker T consists of all elements of .95'1 that are carried
by T onto the zero element of Sot . Hence, it is reasonable to expect that
Im T is spanned by images of the basis elements in a complement to Ker T.
Theorem 6. Let T e 2'(So l , .9'2) and let .911 be a direct complement to Ker T
in
^1 = Ker T + .Y0 . (3)
If x i , x2 , ... , x, form a basis for .moo, then T(x 1 ), T(x 2), ... , T(x r) form a
basis for Im T.
PROOF. We first observe that the elements T(x 1), T(x,) are linearly
independent. Indeed, if
EarT(xr)=( Ê ai xt^ = 0,
1=1 r=I ü
4,5 IMAGE AND KERNEL OF A LINEAR TRANSFORMATION 137
then the element D.= a i x1 is in (Ker T) n and hence is the zero element
,o
yi = T(x i) for i = 1, 2, ... , r. (We say xi is the preimage of y..) Confirm that
Soo = span{x l , x2 , x,) satisfies Eq. (3). ❑
...,
(c) Apply the inequality (5) with .9' = Ker(T2 T1) and T = Ti to show
that
def(T2 TI ) < def TI + def T2 . (6)
Exercise 12. Check that for any Te .
Ker T c Ker T2 c Ker T3 e•••
1m T ImT 2 n 1mT 3 D•••
❑
where k denotes a positive integer between 1 and n, n = dim 5', and the
inclusions are strict.
Theorem 1. Let T e (.Sol, Y2) and let dim So i = n, dim 9'2 = m. The
following statements are equivalent:
(a) T is left invertible;
(b) Ker T = {0);
(c) T defines an isomorphism between .9', and Im T;
(d) n 5 m and rank T = n;
(e) Every representation of T is an m x n matrix with n 5 m and full
rank.
PROOF. If Eq. (1) holds and T(x) = 0, then x = 1 1 (x) = T1 (T(x)) = 0;
hence Ker T = {0} and (a) (b). To deduce (c) from (b) it suffices, in view
of the linearity of T, to establish a one-to-one correspondence between
.9's and Im T. Let y = T(x i ) = T(x 2), where x, a 5°2 (i = 1, 2). Then
T(x 2 — x2) = 0 and the condition Ker T = {O} yields x 1 = x2 and, con-
sequently, the required uniqueness of x. To prove the implication (c) (a),
we observe that the existence of a one-to-one map (namely, T) from .SOT
onto Im T permits the definition of the transformation T i e So(Im T, .SOT )
by the rule: If y = T(x) then Ti (y) = x. Obviously Ti (T(x)) = Ti(y) = x
for all x e 9 2 , or, what is equivalent, Ti T = I. Thus, (c) - (a). We have
shown that (a), (b) and (c) are equivalent.
Now we note that the equivalence (b) a (d) follows from Eq. (4.5.2), and
Theorem 4.5.2 gives the equivalence (d) p (e). This completes the proof. •
Exercise 1. Check the equivalence of the following statements:
(a) T e .2(.9)i , .502) is left invertible.
(b) For any linear subspace .5° c .9i ,
dim T(5°) = dim .So
(c) Any linearly independent system in 5°1 is carried by T into another
linearly independent system in 92 . D
The notion of right invertibility of a transformation is defined similarly
by saying that Te 2(.92 , .92) is right invertible if one can find a transforma-
tion T2 e 2(502, Yi) such that
TT2 = I2, (2)
where I2 denotes the identity transformation in 5 °2. The transformation
T2 in Eq. (2) is referred to as a right inverse of T.
Theorem 2. Let T e 2(9' s , 502) and let dim S 01 = n, dim Sot = m. The
following statements are equivalent:
(a) T is right invertible;
(b) rank T = m;
140 4 LINEAR TRANSFORMATIONS AND MATRICES
PROOF. (a) (b). Let Eq. (2) hold and let y e Sot . Then y = TT2 (y) =
T(T2 (y)) and the element x = T2(y) e Soi is a pre-image of y. Thus .9'2 =
Im T and part (b) follows from part (a). To prove the implication (b) (c),
we consider a subspace 9', such that .9'1 = Ker T + .9'0 and show that!.
T provides a one-to-one correspondence between .Soo and .9'2 . Indeed, it
y = T(x i ) = T(x 2), where xi e Soo (i = 1, 2), then x i — x2 = n Ker 7
= {0). Hence x 1 = x2 and the required isomorphism between go and '±
follows from the linearity of T. Now observe that this isomorphism guarantees;
the existence of a transformation T2 e 2'(fî2 , Soo) defined by the rule:
Clearly, T(T2(y)) = T(x) = y for all y e 5'2 and the relation (2) hold-
Thus, the equivalence of (a), (b), and (c) is established.
Note that the equivalences (b) ea (d) and (d) a (e) are immediate corse-
quences of Theorems 4.5.5 and 4.5.2, respectively. •
The notions of left (or right) invertible matrices and their one-sided inverse
are defined similarly to those for transformations. As pointed out in Sectio
2.6, a one-sided inverse of a square matrix is necessarily an inverse matrixti
that is, both a left and right inverse. 1
Let Te 2'(b° 992) and let Soo be a linear subspace in .501 . The transforma-
,
tion T :.% —+ 992 defined by T(x) = T(x) for every x e Soo is referred to as the
restriction of T to Soo and is denoted by T l,y ". Obviously, T(yg ^a .2'(.9', .5e2).
In contrast, a transformation Te 2(.9's, .9'2) coinciding with T on .SP° c •911
is called an extension of is
Exercise 1. Let Te 2'(501, b"2). Check that for any linear subspace 9'0
Im(TIyo) = T(.9'0), Ker(T14,0) = 990 n Ker T: ❑
The second relation of Exercise 1 shows that if Soo is any direct complement
to Ker T in .9 1 (thus Soo -+ Ker T = .91 ), then T I y has the trivial sub-
"
space {O} as its kernel. In this case, we deduce from Theorem 4.5.6 that
T = T(.9'0). So the action of a linear transformation T is completely Im
determined by its action on a direct complement to Ker T, that is, by the
restriction of T to such a subspace.
We now consider an arbitrary decomposition of the domain into a direct
sum of subspaces (as defined in Section 3.6).
Exercise 2. Let T e AY" .9'2) and let 9°1 = 9'0 -+ g Write x = x 1 + x2
where x 1 e .970 , x2 a .9 and show that, if Ti e 2(4, Y2) and T2 e 2(9, 92),
the equality
T(x) = TA(x1) + T2(x2) (1 )
Obviously, {0} and the whole space are invariant under any transformation.
These spaces are referred to as the trivial invariant subspaces and usually are
excluded from further considerations.
The next exercise shows that linear transformations that are not invertible
always have nontrivial invariant subspaces.
Exercise 3. Let T e .P(.9'). Show that Ker T and Im T are T-invariant.
Show also that any subspace containing Im T is T-invariant.
SOLUTION. For the first subspace, let x e Ker T. Then T(x) = 0 and, since
0 e Ker T, the condition (2) is valid. The proof of T-invariance of Im T is
similar.
Exercise 4. Show that Soo is T-invariant if and only if T(x i) e .90 (i = 1,
2, ... , k), where {x 1, x2 , ... , xk} is a basis for .9'0 .
Exercise 5. Check that if T1 T2 = T2 T1, then Ker T2 and Im T2 are T1
-invari nt.
Exercise 6. Show that sums and intersections of T-invariant subspaces are
T-invariant. ❑
implies that the subspace of the representations of the elements of .9"0 with
to k. •
respect to IS is Ac-invariant. Clearly, the dimension of this subspace is equal
T1 LXZJ/
_
Example 7. Let T e 2'(.9F 2) be defined as follows:
LXl + x21.
Let T e 11(.9') and let .ra'0r denote the equivalence class of similar matri ce s
associated with T In searching for the simplest matrices in this class, we
first study the representations of T in the case when the space So is decom-
posed into a direct sum of subspaces, at least one of which is T-invariant.
Let b" = b" -I- Y2, where .P, is T-invariant, assuming for the moment
that such a decomposition is always possible. Now we construct a represen-
tation of T that is most "appropriate" for the given decomposition of the
space, namely, we build up the representation of T with respect to the basis
{x1 , x2, ..., xk, xk+1, ..., x"} (1)
in .9; where k = dim .%1 and the first k elements constitute a basis in .91.
Since .9's is T-invariant, T(x f) a .Sol (j = 1, 2, ... , k) and Eq. (4.2.1) gives
T(x,) = E ailxi,
1= 1
j = 1, 2, ... , k. (2)
Hence the matrix A = [au]i i=1, where the ail (1 5 i, j 5 n) are found
from the equation T(xi) = +=1 aux1 (j = 1, 2, ..., n), is of the form:
A='AI A3^
(3)
l0 AZ'
where A 1 e x k and A2 e .5w (" -k) % 1"- k). By definition, the matrix (3) is the
representation of T with respect to the basis (1). Thus, a decomposition of
the space into a direct sum of subspaces, one of which is T-invariant, admits
a representation of T in block-triangular form (3).
Exercise 1. Let T e .2(b"), where .9' = b" + .9'2 and Y 2 is T-invariant.
Check that the representation of T with respect to a basis in which the basis
elements of 61 are in the last places is a lower-block-triangular matrix. ❑
In view of Exercise 4.7.4, the subspaces .9' s = span{xj }.1= I and .9's =
span {0J_ k+1 are T-invariant and, obviously, So l +`✓2 = Y. By analogy,
the matrix A in (4) is said to be a direct sum of the matrices A l and A2 and
is written A = A l -1- A2. A similar definition and notation, A = A I +
A2 4- • • + A p or, briefly, A = Er_ 1 Ai, is given for the direct sum of p
matrices. We summarize this discussion.
Theorem 1. A transformation T e 2(.99) has a representation A e d in
block-diagonal form, consisting of p blocks, if and only if T can be decomposed
into a direct sum of p linear transformations.
In symbols, if A a .atT, then A = ^p = , A ; if and only if T = E °
,
Ti .
In this case the matrix A i is the representation of Ti (1 < i < p).
Recall that Example 4.7.7 provides an example of a tr ansformation T from
.2'(.F2) that cannot be written T = T1 + T2. Hence, there are linear trans-
formations having no block-diagonal representations (with more than one
block). Such transformations can be referred to as one-block transformations.
In contrast, a particularly import ant case arises when a transformation from
m(.9') can be decomposed into a direct sum of transformations acting on
one-dimensional spaces. Such transformations are called simple. Note the
following consequence from Theorem 1 regarding simple tr ansformations.
Theorem 2. Let dT denote the equivalence class of matrix representations
of the transformation T e 2'(.50). Any matrix A e s4 is similar to a diagonal
matrix D if and only if T is simple.
By analogy, a matrix A e .9-fl'`" is simple if it is a representation of a simple
transformation T acting on an n-dimensional space. Using this notion, the
previous result can be partially restated.
Theorem 2'. Any matrix A a F" x" is simple if and only if it is similar to a
diagonal matrix D.
The meaning of the diagonal elements in D, as well as the structure of the
matrix that transforms A into D, will be revealed later (Theorem 4.10.2),
after the effect of a linear transformation on a one-dimensional space is
studied.
Exercise 2. Check that any circulant is a simple matrix.
4.9 EIGENVALUES AND EIGENVECTORS OF A TRANSFORMATION 147
Exercise 3. Show that if the nonsingular matrix A is simple, so are A - ' and
adj A.
Exercise 4. Let A e SP"" and let there be a k-dimensional subspace of
F" that is A-invariant (1 5 k < n). Show that the relation
[A, p- 1
A=P A3l (5)
A 2,
is valid for some A, e Fk x I and a nonsingular P e
Hint. Choose a basis b,, b2 , ... , b" for F" by extending a basis b,, ... , bk
for the A-inva ri ant subspace. Then define P to be the matrix whose columns
are b,,b 2i ...,b".
Thus, the nonzero elements x satisfying Eq. (2) play a crucial role in the
problem of existence and construction of one dimensional T-invariant
subspaces. We need a formal definition. If T e -(.Y), a nonzero element
xo e .' satisfying Eq.. (2) is said to be an eigenvector of T corresponding to
the eigenvalue x of T. Thus, the existence of T-invariant subspaces of
dimension 1 is equivalent to the existence of eigenvectors of T
If we now observe that Eq. (2) is equivalent to
(T —AI)xo = 0, xo # 0,
we can characterize eigenvalues in another way.
Exercise 2. Show that A e F is an eigenvalue of T if and only if the trans-
formation T — ill fails to be invertible. ❑
The next proposition uses the concept of an algebraically closed ;field,
which is introduced in Appendix 1. For our purposes, it is important to
note that C is algebraically closed and (11 is not.
Proposition 2. Let .9' be a finite-dimensional linear space over an algebraically
closed field. Then any linear transformation T e .T(S0) has at least one eligen-
value.
PROOF. By virtue of Exercise 4.1.12, there is a monic polynomial p(A)iwith
coefficients from such that p(T) = O. Since .F is an algebraically closed
field, there exists a decomposition of p(A) into a product of linear factors:
(A — 2i),
;=,
P(A) =
where A. e . ; (i = 1, 2, ..., I). Then p(T) = r
(T — Ai l) is the zero
transformation, hence at least one of the factors T — A1 1 (1 5 i < 1) trust
be noninvertible (see Exercise 4.6.4). Now the required assertion follows From
Exercise 2. •
Obviously, if x is an eigenvector of T corresponding to A, then it follows
from T(ax) = A(ax) for any a e .F that any nonzero element from theione-
dimensional space spanned by x is an eigenvector of T corresponding tb the
same eigenvalue A. This also follows from the results of Proposition 11 and
Exercise 2. Moreover, every nonzero linear combination of (linearly d pen-
dent or independent) eigenvectors corresponding to the same eigen*slue
A is an eigenvector associated with A. Indeed, if T(x i ) = Ax; (xi #
1, 2, ... , k), then the linearity of T implies
TI E = E ai T(xi) = A I E aixi) ,
` , =1 aixi/ 1 =1 \, -1
SOLUTION Solving the system T(x) = Ax, which becomes in this case
2x 1 = ilx li x l = Ax2 , X2 + X3 = Ax3,
we obtain
2=2 and x 1 = 2x2 = 2x3 ,
2=0 and x 1 = 0, x2 = —x3a
2=1 and x1 =0, x2 = O.
Thus, the transformation has three one-dimensional eigenspaces spanned
by the elements [2 1 1] T , [ 0 1 —1]T, [0 0 1]T and corresponding
to the eigenvalues 2, 0, and 1, respectively. ❑
It is import an t to note that, in contrast to Exercise 4, a transformation
acting on an n-dimensional space may have less than n distinct eigenvalues.
Exercise 5. Check that the transformation T e 2'(.F2) defined in Example
4.7.7 has only one eigenvalue, A = 1, associated with the eigenspace spanned
by [1 0]T. ❑
Furthermore, it may happen that the tr an sformation has two or more
linearly independent eigenvectors corresponding to the same eigenvalue,
as the next exercise shows.
Exercise 6. Check that the transformation T e 2'(. 3) defined by
x1 2x t + x3
T x2 = 2x2
x3 3 3x
has eigenvalues A l = A2 = 2 and A3 = 3 associated with eigenspaces
span{[1 0 0] T, [0 1 0]T} and span{[1 o 1]T }, respectively. ❑
150 4 LINEAR TRANSFORMATIONS AND MATRICES
a.x1 = 0. (3)
i =1
To prove that ai = 0 (i = 1, 2, ... , s), we first multiply both sides of Eq. (3)
on the left by T — A 1 1, noting that (T — A 1 1)x 1 = 0 and T(x i)
= 1, 2, ... , s. Then i
PROOF. If T e 2(.Y) and T has k distinct eigenvalues, k > n = dim .9', then
the n-dimensional space b" has more than n linearly independent elements
(eigenvectors of T, in this case). This is a contradiction. •
4.9 EIGENVALUES AND EIGENVECTORS OF A TRANSFORMATION 151
In this event we also say that the eigenvectors of T span the space ^ or that
T has an eigenbasis in .So Note that Exercise 6 shows that a tr ansformation
with no < n distinct eigenvalues can still have an eigenbasis. On the other
hand, not every transformation (see Exercise 5) has enough eigenvectors
to span the space. It turns out that the existence of an eigenbasis of T in .So is
equivalent to saying that T is a simple transformation.
E• 971_ ^ ❑
=l
152 4 LINEAR TRANSFORMATIONS AND MATRICES
PROOF. In view of Proposition 1; it suffices to show that Eq. (2) holds for
at least one matrix A from .4. Let A e a(T) and
T(x) = Ax, x 0 0. (3)
4.10 EIGENVALUES AND EIGENVECTORS OF A MATRIX 153
PROOF. Rewriting Eq. (4) in the form AP = PD and comparing the columns
of the matrices AP and PD, it is easily found that
Axj =Aj xj , j= 1, 2, ... , n, (5)
where xi (1 5 j 5 n) is the jth column of P and Aj (1 S j 5 n) is the jth
diagonal element of D. Since P is nonsingular, none of its columns can be
the zero element in Sw" and hence x j in (5) is an eigenvector of A correspond-
ing to Ai. •
Theorem 2 can be reformulated in an illuminating way. Define the matrix
Q = (P -1 and write Q = [y y2 ye]. Then QTP = P - ' P = I and,
)T
eigenvectors for AT. They are also known as left eigenvectors of A since they
satisfyy;A = Ajy,.
Now, to recast Theorem 2, observe that Eq. (4) implies, using Eq. (1.3.6),
A = PDQT = [x1 x2
=
j=t
Thus, A is expressed as a sum of matrices of rank one (or zero if .lj = 0),
and each of these is defined in terms of the spectral properties of A, that is,
in terms of its eigenvalues, eigenvectors, and left eigenvectors. Consequently,
the next result is called a spectral theorem.
Theorem 3. Let A a .f ." x " be a simple matrix with eigenvalues A l ,
and associated eigenvectors x 1 , x2 , ..., x". Then there are left eigenvectors
Pi, Y2,•••,y" for which y;x k =Sjk ,1 SjkSn,and
A = G Aj xj yj . (7)
J=1
Exercise 1. Consider a matrix A e F"" with eigenvectors x,, x2i ... , x"
corresponding to the eigenvalues A 1 , A 2, ... , A", respectively. Show that any
subspace of .F" generated by a set of eigenvectors
xj,,x1x ,...,x Jk (1 LJ1 < J2 < <3kS n)
of A is A-invariant.
Exercise 2. Show that the spectrum of the square matrix
A31
A= [ 0 A2
where A, and A2 are square, is the union of the spectra of A, and A2.
We now bring the ideas of the theory of determinants to bear on the prob-
lem of finding the eigenvalues of an n x n matrix A e .9"1 ". As in the
situation for transformations (see Exercise 4.9.3), it is easily shown that
Ae c(4) if and only if the matrix AI — A is not invertible, or equivalently
(Theorem 2.6.2),
det(AI — A) = 0. (1)
Observe now that the expression on the left side of Eq. (1) is a monic poly-
nomial of degree n in A:
det(Al—A)=c o +c 1 1.+•••+ "_,A"' +c"A", c"= 1. (2)
The polynomial c(A) o c,A' in Eq. (2) is referred to as the character-
istic polynomial of the matrix A. These ideas are combined to give Theorem 1.
Theorem 1. A scalar A e .F is an eigenvalue of A e .F""" if and only if A is a
zero of the characteristic polynomial of A.
The significance of this result is that the evaluation of eigenvalues of a
matrix is reduced to finding the zeros of a scalar polynomial and is separated
from the problem of finding (simultaneously) its eigenvectors.
156 4 LINEAR TRANSFORMATIONS AND MATRICES
Note that if the field is algebraically closed (for definition and discussion
see Appendix 1), then all the zeros of the scalar polynomial c(A) belong to
F. Hence the spectrum of any A e.91"x" is nonempty and consists of at
most n points; properties that are already fami li ar. They are established
in Theorem 4.10.5 and, as indicated in Theorem 4.10.1, are also properties
of the underlying linear transformation.
Observe also that, if A = PBP -1, then
cA(A) = det(At — A) = det(P(AI — B)1 3- 1 ) = det(M — B) = cs(A).
Thus the characteristic polynomial is invariant under similarity and hence
all matrices in the equivalence class salt have the same characteristic poly-
nomial. In other words, this polynomial is a characteristic of the transforma-
tion T itself and therefore can be referred to as the characteristic polynomial
of the transformation.
Proceeding to an investigation of the coefficients of the characteristic
polynomial of a matrix A or, equivalently, of a transformation T having A
as a representation, we first formulate a lemma.
Lemma 1. If A n "" and distinct columns i 1 , i2 , ..., i, (1 < r < n) of A
are the unit vectors e ; ,, ei,, ... , e; , then det A is equal to the principal minor
of A obtained by striking out rows and columns i1 , i2 , ... , Ç.•
PROOF. Expanding det A by column i 1, we obtain
i, -1 i 1 +1 n
det A = (-1)4+4A(1
2 i1 -1 i 1 +1 n
_1 2 i t -1 i 1 +1 n
A (1 2 i t -1 i 1 +1 .n
Now evaluate this minor by expansion on the column numbered i 2 . It is
found that det A is the principal minor of A obtained by striking out rows
and columns i 1 and i2 . Continuing in this way with columns i 3 , i4 ,
the result is obtained. •
4.11 7 HE CHARACTERISTIC POLYNOMIAL 157
Theorem 2. Let A e ." x " and let c(A) = o c, A' denote the characteristic
polynomial of A. Then the scalar 4.(0 < r < n) is equal to the sum of all prin-
cipal minors of order n — r of A multiplied by (-1)" - r.
PROOF. Let a,, a 2 an be the columns of A. Then, using the unit vectors
, ... ,
in . ", we write
(- 1)"c(A) = det(A — A1) = det[a, — Ae, a 2 — 2e2 a" Ae"]. —
volving just r columns of the form — Ae, for some i. There are precisely C",, of
these obtained by replacing r of the columns of A by (— A) x (a unit vector)
in every possible way. Thus, from the lemma, each of these determinants is
of the form (—A)r times a principal minor of A of order n — r. Furthermore,
every principal minor of order n — r appears in this summation. The result
follows on putting r = 0, 1, 2, ... , n — 1. •
In particular, by setting r = n — 1 and r = 0 in Theorem 2, we obtain
—c"_ E
i=l
a„ = tr A and ca = (-1)"det A,
are eigenvalues of A (not necessarily distinct) then Theorem 2 gives the im-
portant special cases:
n n
tr A = E Ar and det A = l A,. (3)
r=1 r=l
We have seen that any n x n matrix has a monic polynomial for its charac-
teristic polynomial. It turns out that, conversely, if c(A) is a monic polynomial
over .Sr of degree n, then there exists a matrix A e .F"'" such that its charac-
teristic polynomial is c(A).
Exercise 3. Check that the n x n companion matrix
0 1 0 0
0 0 1
0
0 0 0 1
— co — c, — c2 — cn - I
has c(A) = An + E"=ô c,A` for its characteristic polynomial. (See Exercise
2.2.9.) ❑
158 4 LINEAR TRANSFORMATIONS AND MATRICES
The last result implies, in particular, that for any set a 0 of k points in C
(1 5 k 5 n), there is an n x n matrix A such that a(A) = a 0 .
It was mentioned earlier that similar matri ce s have the same characteristic
polynomial. The converse is not true.
Exercise 4. Check that the mat rices
C^1 j and
to 1
are not similar and that they have the same characteristic polynomial,
c(2) = (2 — 1)2 .
Exercise 5. Prove that two simple matrices are similar if and only if their
characteristic polynomials coincide.
Exercise 6. Show that
(a) The characteristic polynomial of the matrix
0 A3
[A1 A2]
Exercise 10. Show that the eigenvalues of the real symmetric matrix
13 4 —2
4 13 —2
—2 —2 10
are9,9,and 18.
Exercise 11. Show that the following matrices are simple and find corre-
sponding matrices D and P as described in Theorem 4.10.2:
1 —1 0 0 0 1 5 8 16
(i) 2 3 2, (ii) 1 0 —1 , (iii) 4 1 8.
1 1 2 0 1 1 —4 -4 -11
Hint. Show first that the characteristic polynomials are
(A — 1)(A — 2)(A — 3), (A — 1)(A 2 + 1), and (A — 1)(A + 3)2 ,
respectively. ❑
A=
A
0 p.l
L
The characteristic polynomial c(A) of is c(A) = A 2 and therefore the
algebraic multiplicity of the zero eigenvalue A. = 0 is 2. On the other hand,
dim(Ker(A — AA) = dim(Ker A) = 1 and the geometric multiplicity of
Ao =0is1. ❑
Theorem 1. The geometric multiplicity of an eigenvalue does not exceed its
algebraic multiplicity.
PROOF. Let A be an eigenvalue of the n x n matrix A and let
ma = dim(Ker(A — Al)).
If r denotes the rank of A — AI then, by Eq. (4.5.2), we have r = n — m2 and
all the minors of A — AI of order greater than n — m2 are zero. Hence
if c`(µ) —E7=0 caµ' is the characteristic polynomial of A — Al, Theorem
4.11.2 implies that co = E = • • • = c",x _ I = O. Therefore
cÛu ) = E eat'
and must have a zero of multiplicity greater than or equal to m2 . It remains
to observe that, if c(µ) is the characteristic polynomial of A, then
c(µ) = c(µ — A). •
Denote by L (respectively, L) the sum of the algebraic (respectively,
geometric) multiplicities of the eigenvalues of an n x n complex matrix A;
Then Theorem 1 implies
=n. (2)
In the case L= n, the relations (2) imply that the algebraic and geometric
multiplicities of each eigenvalue of A must coincide. On the other hand,
L = n also means that the space,weusPropitn4.10,hequaly
9" can be spanned by nonzero vectors from the eigenspaces of the matrix,
that is, by their eigenvectors. Hence if L = n, then the matrix A is simply
It is easily seen that the converse also holds and a criterion for a matrix to be
simple is obtained.
Theorem 2. Any matrix A with entries from an algebraically closed field
is simple if and only if the algebraic and geometric multiplicities of each eigeo!
value of A coincide.
4.13 APPLICATIONS TO DIFFERENTIAL EQUATIONS 161
For a simple matrix A we can say rather more than the conclusion of
Exercise 1, for in this case we have a basis for C" consisting of eigenvectors
of A. In fact, if x i, ... , x" is such a basis with associated eigenvalues A l , . . . ,
respectively, it is easily verified that the n vector-valued functions x jex"',
are solutions, j = 1, 2, .. . . , n, and they generate an n-dimensional subspace
of solutions, Soo . We are to show that .So o is in fact, all of P
More generally, let us first consider the inhomogeneous system
z(t) = Ax(t) + f(t) (2)
(where ft) is a given integrable function). It will be shown that Eq. (2) can
be reduced to a set of n independent scalar equations. To see this, let eigen-
vectors, x 1, ... , x" and left eigenvectors y l, ... , y" be defined for A as re-
quired by Theorem 4.10.3. If P = [xi x„] and Q= [y i y,],
then the biorthogonality properties of the eigenvectors (see Eq. (4.10.6)) are
summarized in the relation
QT P =1, (3)
and the representation for A itself (see Eq. (4.10.7)) is
A = PDQT, (4)
where D = diag[x a, ... , 2,"].
The reduction of the system (2) to a set of scalar equations is now accom-
plished if we substitute Eq. (4) in Eq. (2), multiply on the left by QT, and
take advantage of Eq. (3) to obtain
/^
0,40= DQTx(t) + QTf(t)•
6
constant cj . Thus, in this case, z(t) = diag[e 2 '', ..., e`~']c and so every
solution of Eq. (1) has the form
x(t) = P diag[ea '', ... , e4']c (6)
for some c e C", c = [cl c2 cur.
Observe that Eq. (6) just means that any solution of Eq. (1) is a linear
combination of the primitive solutions x j exo, j = 1, 2, ... , n. Since these
primitive solutions are linearly independent (see Exercise 1), it follows that
if .9' is the solution set of Eq. (1) and A is a simple n x n matrix, then
dim .5o=n.
In Chapter 9 we will return to the discussion of Eq. (2) and remove our
hypothesis that A be simple; also we will consider the case when f(t) # 0
more closely.
Exercise 2. Describe the solutions of the following system of differential
equations with const ant coefficients (xi = xj(t), j = 1, 2, 3)
JCl = 2x 1
XZ = 11x1 + 4x2 + 5x3, (7)
X3 = 7X1 - X2 - 2x3.
T(x) = xi,
where x = x 1 + x2 and x 1 e x2 E .°2 Prove that.
(a) T e 2(.9');
(b) .5°1 and Y2 are T-invariant;
(c) T satisfies the condition T2 = T(i.e., T is idempotent).
The transformation T is called a projector of Y on .91 parallel to Y2 .
Observe that T le, is the identity transformation on Y 1.
2. Consider a simple linear transformation T e 9(Y) where dim .50 = n,
At, A2, A. are its eigenvalues, and x1, x2 , ... , x„ are corresponding
. . . ,
associated with
4. The matrix RA = (A I — A) -1 called the resolvent of A, is defined at
,
E 1 GI, A o(A).
RA =
I- IA - AJ
Hint. First use Eqs. (4.13.3) and (4.13.4) to write RA = P(AI - D) -1 QT•
5. Prove that if dim .9' = n, any transformation T e 2(.9') has an invariant
subspace of dimension n — 1.
Hint. Let A e o(T) and use Eq. (4.5.2) for T — AI to confirm the exis-
tence of the k-dimensional subspace Im(T — AI) invariant under 7
4.14 MISCELLANEOUS EXERCISES 165
theorem).
8. Prove that if A is a simple matrix with characteristic polynomial c(A),
then c(A) = O. (This is a special case of the Cayley-Hamilton theorem,
to be proved in full generality in Sections 6.2 and 7.2).
Hint. Use theorem 4.10.2.
9. Show that the inverse of a nonsingular simple matrix A e C"" is given
by the formula
1
A -1 = — —(An -1
co
+ cn-1An -2 + ... + cll),
where c(A) = 2" + E7:1, ci 2 is the characteristic polynomial of A.
Hint. Observe that A -1c(A) = O.
10. Check the following statements.
(a) If A or B is nonsingular, then AB BA. Consider the example
A_
10
B
_ 0 11
00 ' — [0 0
11. Prove that if AB = BA, then A and B have at least one common eigen-
vector. Furthermore, if A and B are simple, then there exists a basis in
the space consisting of their common eigenvectors.
Hint. Consider span{x, Bx..... Bsx, ...}, where Ax = Ax, x # 0, and
use Exercise 4.9.8 to establish the first statement.
12. Let A e ,F"". Show that, in comparison with Exercise 4.5.12,
(a) If the zero eigenvalue of A has the same algebraic and geometric
multiplicities, then Ker A = Ker A2 ;
D"
_ sin(n + 1)6
sin n6
(b) Find the eigenvalues of A".
Note that D"(c) is the Chebyshev polynomial of the second kind, U"(c).
The matrix A" with c = — 1 arises in Rayleigh's finite-dimensional ap-
proximation to a vibrating string.
Hint. For part (b), observe that A" — AI preserves the form of A. for
small A to obtain the eigenvalues of A„:
k = 1, 2, ... , n.
Ak = 2(c — cos n lot ),
`
16. Find the eigenvalues and eigenvectors of the n x n circulant matrix
with the first row [a 0 a l a„_1).
4.14 MISCELLANEOUS EXERCISES 167
and
Linear Transformations
in Unitary Spaces
and Simple Matrices
168
5.1 ADJOINT TRANSFORMATIONS 169
in Exercise 3.12.8, it is sufficient to know these inner products for all vectors
y in a basis for w: It follows that the action of T is determined by the values
of all the numbers (Tx, y)2i where x, y are any vectors from 'll and
respectively. We ask whether this same set of numbers is also attained by
first forming the image of y under some T1 e .(r 'V), and then taking the
,
= aT*(y) + fiT*(z),
by definition (2) of T*. Thus T* E.'(t , W). Our first theorem will show that
T*, so defined, does indeed have the property (1) that we seek. It will also
show that our definition of To is, in fact, independent of the choice of ortho-
normal basis for 'ft.
First we should note the important special case of this discussion in which
= 7r and the inner products also coincide. In this case, the definition (2)
reads
(x, T* (Y))i = E
E (x, uI)IUi, J=1 (y, T(ui))auj
l,- I /
Since (y, T(u i))2 = (T(u 1), Y) 2, Eq. (4) follows on comparing Eq. (6) with
Eq. (5).
For the uniqueness, let (T(x), Y)2 = (x, T1(y)) 1 for some Ti e 20;'W)
and all x e*, y e i% Then, on comparison with Eq. (4),
(x, TA(Y))I = (x, T*(Y))1
for all x e'W and y e i! But this clearly means that (x, (T 1 — T*)(y)) 1 = 0
for all x E ' and y e lY ; and this implies (by Exercise 3.13.2) that T 1 — T* =
0, that is, T1 = T*.
The proof of the converse statement uses similar ideas and is left as an
exercise. •
It is interesting to compare Eq. (4) with the result of Exercise 3.14.8. The
similarity between these statements will be explained in our next theorem.
Exercise 1. Let T e 2(W 3, .F4) be defined as in Exercise 4.2.1. Check that
the transformation
Y2 + 7'3 + Y4
TI(y) = Y1 + Y2 - Y4
Yi
for all y= [Y1 Y2 Y3 Y4.? e .^4 , is the adjoint of T with respect to the
standard inner products in .0r 3 and F. Confirm Eq. (4) for the given trans-
formations. p
Some important properties of the adjoint are indicated next.
Exercise 2. Prove the following statements:
(a) I* = I, 0* = 0;
(b) (T*)* = T;
(c) (aT)* = diT* for any a E .r;
5.I ADJOINT TRANSFORMATIONS 171
Theorem 2. Let T e 2('W, 7f) and let T* denote the adjoint of T. IfA E .F^ ""
is the representation of T with respect to the orthonormal bases (ê, T), then the
conjugate transpose matrix A* e .9-""m is the representation of T* with respect
to the pair of bases (W,1).
172 5 LINEAR TRANSFORMATIONS IN UNITARY SPACES
PROOF. Let g = {v 1 , 112 , ... , a„} and_ {e l , e 2, ... e,,,} denote ortho-
,
T*(ei) = E bjiw;,
J =1
i = 1, 2, ..., m,
implies
bji = (e i T(uj))2 = (T(aj), e 1)2 = au
, 1 < i 5 m, 1 5 j 5 n,
and the assertion follows. •
Exercise 4. Check the asse rt ion of Theorem 2 for the transformations T
and T1 = T* considered in Exercise 1.
Exercise S. Deduce from Theorem 2 properties of the conjugate transpose
similar to those of the adjoint presented in Exercises 2 and 3.
Exercise 6. Check that for any Te 2('l,
rank T = rank T*,
and that, if 'l = V ; then in addition
def T = def T*.
Exercise 7. Let T e 2('W, r) and let T* denote the adjoint of T. Show that
Wl = Ker T®ImT*,fir = Ker T*® lin T.
(See the remark at the end of Section 5.8.)
Hint. Note that (x, T(y))2 = 0 for all y e 'fl yields x e Ker T.
Exercise 8. Let T e AV). Check that if a subspace 'Plo c 'W is T-invariant,
then the orthogonal complement to flo in 'Pl is T*-invariant. ❑
The next example indicates that the relation between the representations
of T and T* with respect to a pair of nonorthonormal bases is more com-
plicated than the one pointed out in Theorem 2.
5.1 ADJOINT TRANSFORMATIONS 173
In particular, the relation (8) shows that A' = A* provided the basis d
is orthonormal. However, it turns out that the assertion of Theorem 2 re-
mains true if 'Yl = 7r and the two sets of basis vectors are biorthogonal
in 'fl (see Section 3.13).
Theorem 3. Let T e '('W) and let (d, 8') denote a pair of biorthogonal bases
in*. If A is the representation of T with respect to d, then A* is the representa-
tion of T* with respect to d'.
PROOF. Let d = {at, u2 , ... u„}, d' = {Ki, II2, ... , u } and A = [au] j= I•
, ,
Since T(ui) = auu, (1 < j < n) and (uk , s;) = b, (1 < i, k S n), it
follows that (T (u;), u;) = a;; . Computing the entries b uof the representation
B of T* with respect to d” in a similar way, we obtain b ;; = (T*(u'i), u ;)
(1 S 1,1 < n). Hence for all possible i and j,
bu=(UJ,T(u,))=(T(u,),Ui ,= a)i,
and therefore B = A*. •
Exercise 10. Check that if T e .2'('W) then 2 e or(T) if and only if .[ a o(T*).
Hint. See Exercise 4.11.9.
Exercise 11. Prove that the transformation T e AV) is simple if and only
if T* is simple.
Hint. Use Theorem 3.
Exercise 12. If T e 2'(W) is simple, show that there exist eigenbases of
T and T* in 'Y6 that are biorthogonal.
Hint. Consider span{x 1, x2r ... , x;_ I, xj+I, ... , x„}, 1 < j S n, where
{x;}7. 1 is an eigenbasis of T in', and use Exercise 8.
Exercise 13. If T e ^('YW) is simple and {x 1, x2 , ... , x„} is its eigenbasis in
V, prove that a system {yi, y2 , ...,y„) in 'Yl that is biorthogonal to {xI };= I
is an eigenbasis of T* in 'Yl. Moreover, show that T(x ;) = 1; x; yields T*(y;) _
X1Y1,1< _ i5_n.
Hint. Use Eq. (4).
Exercise 14. Check that the eigenspaces of T and T* associated with non-
complex-conjugate eigenvalues are orthogonal. ❑
174 5 UNBAR TRANSFORMATIONS IN UNITARY SPAcm
tions x in ill!! are to be found. This is a general result giving a condition for
existence of solutions for any choice of b e V in terms of the adjoint operator
T*. It is known as the Fredholm alternative. (Compare with Exercise 3.14.10.)
Theorem 4. Let T E 2(4l 1). Either the equation T(x) = b is solvable for
,
PROOF. Let A e .97" x" and let '6 be the unitary space of 5" equipped with the
standard inner product. Then, in the natural way, A determines a transfor-
mation T e .('Pr). By Exercise 4.14.6 there exists a chain of T-invariant
subspaces
{0}='W0 c tf/ 1 c•••c'W, , = 'W
such that 'W 1 has dimension i for each i. It follows that, using the Gram-
Schmidt procedure, an orthonormal basis {u 1 , ... , u") can be constructed
for 'P/ with the properties that u ; a'1 and u1 1 for i = 1, 2, ... , n. Then
the T-invariance of the chain of subspaces implies that the representation of
T with respect to this orthonormal basis must be an upper-triangular matrix,
B say (see Section 4.8).
But the matrix A is simply the representation of T with respect to the
standard basis for .9"" (which is also orthonormal), and so A and B must be
unitarily similar. •
The Schur-Toeplitz theorem plays a crucial role in proving the next
beautiful characterization of normal matrices; it is, in fact, often used as the
definition of a normal matrix.
Theorem 3. An n x n matrix A is normal if and only if
AA* = A*A. (2)
PROOF. If A is normal, then by Eq. (1) there is a unitary matrix U and a
diagonal matrix D for which A = UDU*. Then
AA* = (UDU*XUDU*) = UDDU* = UDDU* _ (UDU*)(UDU*) = A*A,
so that A satisfies Eq. (2).
Conversely, we suppose AA* = A*A and apply Theorem 2. Thus, there
is a unitary matrix U and an upper triangular matrix S = [s;^]ï^= 1 for which
A = USU*. It is easily seen that AA* = A*A if and only if SS* ®S*S
The 1,1 element of the last equation now reads
1 51112 + 151212 + ... + I 51nI 2 = 151112.
'Math. Annalen 66 (1909), 488-510.
= Math. Zeitschrift 2(1918), 187-197.
5.2 NORMAL TRANSFORMATIONS AND MATRICES 177
Hence s ij = 0 for j = 2, 3, ... , n. The 2,2 element of the equation SS* = S*S
yields
...
I s 221 2 + 1s2312 + +1s2n1 2= 1s121 2 +1s221 2,
and since we have proved s 12 = 0 it follows that s2J = 0 for j = 3, 4, ..., n.
Continuing in this way we find that sJk = 0 whenever j # k and 1 5 j, k < n,
Thus, S must be a diagonal matrix and A = USU*. Now use Theorem 1. •
Note that by virtue of Exercise 1 and the relation (4.2.4), a similar result
holds for transformations.
Theorem 3'. A transformation T e 2'(gl) is normal if and only if
TT* = T*T. (3)
Proceeding to special kinds of normal transformations, we first define
a transformation Te 2('&) to be se f adjoint if T = T*, and to be unitary
if TT* =T*T=I.
Exercise 6. Check that a transformation T e 2('W) is self-adjoint (re-
spectively, unitary) if and only if its representation with respect to any
orthonormal basis is a Hermitian (respectively, unitary) matrix. ❑
Exercise S. Show that for any (even rectangular) matrix A, the matrices
A*A and AA* are positive semi-definite.
Hint. Let A be m x n and consider (A*Ax, x) for x e .F".
izing the above discussion, note that 2, ea(H112) if and only if ,1.2 e Rr(H), anir
the corresponding eigenspaces of H"2 and H coincide. The concept of â
square root of a positive semi-definite matrix allows us to introduce a spectral
characteristic for rectangular matrices.
Consider an arbitrary m x n matrix A. The n x n matrix A*A is (generally)
positive semi-definite (see Exercise 5.3.5). Therefore by Theorem 1 the matrix
A*A has a positive semi-definite square root H, such that A*A = H t
The eigenvalues 1,, 2 2 i ... , ^l „ of the matrix H, = (A *A)1 /2 are referred to
as the singular values s,, s 2 , ...,s„ of the (generally rectangular) matrix Âr
Thus, for i = 1, 2, ... , n,
s,(A) — ° 2,((A *A) 112 ).
Obviously, th e singular values of a matrix are nonnegative numbers.
Exercise 2. Check that s, = ,, s2 = 1 are singular values of the matrix
10
A= 0 1. D
10
Note that the singular values of A are sometimes defined as eigenvalues a1
the matrix (AA*)'/ 2 of order m. It follows from the next fact that the differ
ence in definition is not highly significant.
Theorem 2. The nonzero eigenvalues of the matrices (A*A) 112 and (AA*) ii1
coincide.
PROOF. First we observe that it suffices to prove the assertion of the theoren
for the matices A*A and AA*. Furthermore, we select eigenvectori
x„ x2 , ... , x„ of A*A corresponding to the eigenvalues 2,, 2.2 , ... , 2„ such
that (x,, x2 , ... , x„) forms an orthonormal basis in .W". We have
(A * Ax:, x 3) _ .l,(x,, x;) = 2,5,; , 1 5 i ! 5 n.
,
On the other hand, (A*Ax,, x ;) = (Ax,, Ax e), and comparison shows tha
(Ax,, Ax i) = 2,, i = 1, 2, ... , n. Thus, Ax, = 0 (1 i < n) if and only
2, = 0. Since
AA*(Ax,) = A(A*Ax,) = 15 i 5n,
the preceding remark shows that for A, # 0, the vector Ax, is an eigenvectoi
of AA*. Hence if a nonzero Ali e a(A*A) is associated with the eigenvectd
.4 SQUARE ROOT OF A DEFINITE MATRIX 183
Ordering the eigenvalues so that the firsts scalars 4 1 , 42, ... , A8 on the main
diagonal of D are positive and the next r — s numbers A,+1, A"+a , • • •, A,
are negative, we may write D = U1D,D0D 1 Ur, where
D, = drag[ Al, Az . , , IA+11, .. , IAI, 0, . , 0],
Do is given by Eq. (2), and U 1 is a permutation (and therefore a unitary)
matrix. Hence, substituting into (3), we obtain H = PDo P*, where P =
UU 1D 1 . The theorem is established. is
Corollary 1. A Hermitian matrix H e .9""" is positive definite if and only
if it is congruent to the identity matrix.
In other words, H > 0 if and only if H = PP* for some nonsingular
matrix P.
Corollary 2. A Hermitian matrix H e .F"'" of rank r is positive semi-
definite if and only if it is congruent to a matrix of the form
0
0
CIr 0"_,J •
Thus H >_ 0 if and only if H = P*P for some P.
Corollary 3. Two Hermitian matrices A, B e .54-118 " are congruent rand only
if rank A = rank B and the number of positive eigenvalues (counting multi-
plicities) of both matrices is the same.
We already know that a Hermitian matrix H can be reduced by a con-
gruence transformation to a diagonal matrix. If several Hermitian matrices
are given, then each of them can be reduced to diagonal form by means of
(generally distinct) congruence transformations.
The import ant problem of the simultaneous reduction of two Hermitian
Matrices to diagonal form using the same congruence transformation will
now be discussed. This turns out to be possible provided one of the Hermitian
matrices is positive (or negative) definite. If this condition is relaxed, the
question of canonical reduction becomes significantly more complicated.
Theorem 2. If H 1 and H2 are two n x n Hermitian matrices and at least one
of them, say H 1 , is positive definite, then there exists a nonsingular matrix Q
such that
Q*11 1Q = I and Q*H 2 Q = D, (4)
Where D = diag[A1, . . . , A"] and Al, A2, ... , A" are the eigenvalues of Hi 1H2 ,
H = H1"2(Hi 1H2)H1 112, the matrices H and Hi 'H2 are similar. It follows
that A1 , d2 , ... , A. are also the eigenvalues of Hi 1H2 and are real.
Now define Q = H 112 U; we have
Q *H1Q = U*Hi 112H1H2 1/2 U = U* U = I,
Q*H 2 Q = 1/2H2 Ht- 1/2U
= U*HU = D,
U*H i (6)
as required for the first part of the theorem.
It is clear from Eqs. (6) that
(Q*H,Q)D — Q *H2Q = 0,
and this implies H,QD — H 2 Q = O. The jth column of this relation is Eq.
(5). •
Exercise 1. Show that if the matrices H, and HZ of Theorem 2 are real,
then there is a real transforming matrix Q with the required properties.
Exercise 2. Prove that the eigenvalues of H , in Theorem 2 are real by
first defining an inner product <x, y> = y*H,x and then showing that
H 1 112 is Hermitian in this inner product. ❑
Q
* 0 1
Q and Q* 0 O Q
1 0 0 1
are diagonal. ❑
In the study of differential equations, the analysis of stability of solutions
requires information on the location of the eigenvalues of matrices in rela-
tion to the imaginary axis. Many important conclusions can be deduced by
knowing only how many eigenvalues are located in the right and left half-
planes and on the imaginary axis itself. These numbers are summarized in
the notion of "inertia." The inertia of a square matrix A E .1r" x ", written
In A, is the triple of integers
In A = {n(A), v(A), b(A)},
5.5 CONGRUENCE AND THE INERTIA OF A MATRIX 187
where n(A) v(A) and 5(A) denote the number of eigenvalues of A, counted
with their algebraic multiplicities, lying in the open right half-plane, in the
open left half-plane, and on the imaginary axis, respectively. Obviously,
n(A) + v(A) + «A) = n
and the matrix A is nonsingular if «A) = O.
In the particular case of a Hermitian matrix H, the number n(H) (respec-
tively, v(H)) merely denotes the number of positive (respectively, negative)
eigenvalues counted with their multiplicities. Furthermore, S(H) is equal to
the number of zero eigenvalues of H, and hence, H is nonsingular if and only
if S(H) = O. Note also that for a Hermitian matrix H,
n(H) + v(H) = rank H.
The difference n(H) — v(H), written sig H, is referred to as the signature of
H.
The next result gives a useful sufficient condition for two matrices to have
the same inertia. The corollary is a classical result that can be proved more
readily by a direct calculation. The interest of the theorem lies in the fact that
matrix M of the hypothesis may be singular.
Theorem 3. Let A and B be n x n Hermitian matrices of the same rank r.
If A = MBM* for some matrix M, then In A = In B.
PROOF. By Theorem 1 there exist nonsingular matrices P and Q such that
r-a
x*Dô)x = (R*x)*Dg'(R *x) = - E IY112 S 0,
1= 1
and a contradiction with Eq. (8) is achieved.
Similarly, interchanging the roles of Dg) and DP, it is found that t < s is
impossible. Hence s = t and the theorem is proved. •
Corollary 1. (Sylvester's law of inertia). Congruent Hermitian matrices
have the same inertia characteristics.
PROOF. If A = PBP* and P is nonsingular, then rank A = rank B and the
result follows. •
Exercise 4. Show that the inertia of the Hermitian matrix
is (n, n, 0). ❑
C i" 0"j
The converse of this statement is also true for normal matrices. (It is
an analog of Theorem 5.3.2 for Hermitian mat rices).
Theorem 2. A normal matrix A E.F"" is unitary if all its eigenvalues lie
on the unit circle.
PROOF. Let Ax i = 21 x1 , where RI = 1 and (x1 , xi) = b,1 (1 5 i, j 5 n).
For any vector x E .9r", write x = i x, ; then it is found that
Thus A*Ax = x for every x e .fir" and therefore A*A = I. The relation,
AA* = I now follows since a left inverse is also a right inverse (see Section
2.6). Hence the matrix A is unitary. •
Thus, a normal matrix is unitary if and only if its eigenvalues lie on the unit
circle. Unitary matrices can also be described in terms of the standard inner
product. In particular, the next result shows that if U is unitary, then for any
z the vectors x and Ux have the same Euclidean length or norm.
Theorem 3. A matrix U e ;Ir" x " is unitary if and only if
(Ux, Uy) _ (x, y) (2)
for all x, y e .F".
PROOF. If U*U = I, then
(Ux, Uy) = (U * Ux, y) = (x, y)
and Eq. (2) holds. Conversely, from Eq. (2) it follows that (U*Ux, y) = (x, y)
or, equivalently, ((U*U — I)x, y) = 0 for all x, y e .F". Referring to Exercise
3:13.2, we immediately obtain U*U = I. Consequently, UU* = I also,
and U is a unitary matrix. •
Observe that Eq. (2) holds for every x, y e Sr" if and only if it is true for
the vectors of a basis in 9r". (Compare the statement of the next Corollary
With Exercise 3.14.12.)
Corollary 1. A matrix U e Ar" x" is unitary if and only if it transforms an
,;ï
brthonormal basis for .F into an orthonormal basis.
190 5 LINEAR TRANSFORMATIONS IN UNITARY SPAC
The following result has its origin in the familiar polar form of a comp
number: A = .1.017, where 20 > 0 and 0 5 y < 2n.
—
A = HU,
PROOF. Let
Note the structure of the matrices U and V in Eqs. (7) and (9): the columns
of U (respectively, V) viewed as vectors from Sr" (respectively, .9") constitute
an orthonormal eigenbasis of AA* in SF" (respectively, of A*A in E"). Thus,
a singular-value decomposition of A can be obtained by solving the eigen-
value-eigenvector problem for the matrices AA* and A*A.
Example 4. Consider the matrix A defined in Exercise 5.4.2; it has singular
values s, = f and s2 = 1. To find a singular-value decomposition of A,
we compute A*A and AA* and construct orthonormal eigenbases for these
matrices. The standard basis in .9r 2 can be used for A*A and the system
{[a 0 a)T, [ 0 1 0]T, [a 0 — a)T), where a = 1/.1i, can be used for
AA*. Hence
1 0 a 0 a V` 0
(1 0
A = [011 = 0 1 0 0 1
[0 1]
1 0 a 0 a — 00
is a singular-value decomposition of A. ❑
The relation (7) gives rise to the general notion of unitary equivalence
Of two m x n matrices A and B. We say that A and B are unitarily equivalent
if there exist unitary matrices U and V such that A = UBV*. It is clear that
unitary equivalence is an equivalence relation and that Theorem 2 can be
interpreted as asserting the existence of a canonical form (the matrix D) with
respect to unitary equivalence in each equivalence class. Now the next result
is to be expected.
Proposition 2. Two m x n matrices are unitarily equivalent if and only if
(hey have the same singular values.
Moor'. Assume that the singular values of the matrices A, B e .Srm""
coincide. Writing a singular-value decomposition of the form (7) for each
Matrix (with the same D), the unitary equivalence readily follows.
Conversely, let A and B be unitarily equivalent, and let D, and D2 be the
matrices of singular values of A and B, respectively, that appear in the
Singular-value decompositions of the matrices. Without loss of generality,
It may be assumed that the singular values on the diagonals of D , and D2
are in nondecreasing order. It is easily found that the equivalence of A and
a.. implies that D, and D2 are also unitarily equivalent: D2 = U,D,Vt.
Viewing the last relation as the singular-value decomposition of D2 and
ptiting that the middle factor D, in such a representation is uniquely deter-
,Vned, we conclude that D I = D2. •
Corollary 1. Two m x n matrices A and B are unitarily equivalent if and only
the matrices A*A and B*B are similar.
194 5 LINEAR TRANSFORMATIONS IN UNITARY SPACES
Conversely, the similarity of A*A and B*B yields, in particular, that they
have the same spectrum. Hence A and B have the same singular values and,
consequently, are unitarily equivalent. •
In view of this result, we shall say that the idempotent matrix P performs
the projection of the space .^" on the subspace .1' 2 parallel to 6°1, or onto .9's
along .P,. Hence the name "projector" as an alternative to "idempotent
matrix."
We now go on to study the spectral properties of projectors.
Theorem 3. An idempotent matrix is simple.
PROOF. We have already seen in Exercise 4.11.7 that the eigenvalues of an
idempotent matrix are all equal to 1 or 0. It follows then that the eigenvectors
of P all belong to either Im P (2 = 1) or Ker P (2 = 0). In view of Theorem 2,
the eigenvectors of P must span Sr" and hence P is a simple matrix. ■
The spectral theorem (Theorem 4.10.3) can now be applied.
Corollary 1. An n x n projector P admits the representation
P= E (1)
; =I
where r = rank P and {x 1, ..., x,}, {yl , ..., y,) are biorthogonal systems in
^n •
t Strictly speaking, this construction first defines a linear transformation from Sr" to .9".
The matrix P is then its representation in the standard basis.
196 5 LINEAR TRANSFORk1A7WNS IN UNITARY SPACES
P=
j=1
E xj4,
and for any x e Sr",
r r
Px = E xjx jx = E (x, xj)xj•
J =1 j=1
matrix of another set of basis vectors, (and k < n) then there is a nonsingular
k x k matrix M such that A= XM, whence
P = XM(M*X*XM) -1M*X*
= X MOW- 1(X*X)-1(M*)-1)M*X*
= X(X*X) -1 X*.
Hence, we get the same projector P whatever the choice of basis vectors for
9' may be. In particular, if we choose an orthonormal basis for .Sp then
X*X = l and
k
P=XX*= Exj xj,
J=
and
,4..
we have returned to the form for P that we had before.
Exercise 1. If P e SF" x " is a projector on 91 along .9'2 , show that:
(a) P* is a projector on .91 along .91- (see Section 3.13);
(b) For any nonsingular H e,$_" x ", the matrix H- 'PH is a projector.
Describe its image and kernel. ❑
The next results concern properties of invariant subspaces (see Section
4.7), which can be readily described with the help of projectors.
Theorem 4. Let A e ."x ". A subspace .P of ..iw" is A-invariant if and only
IT PAP = AP for any projector P of 3r" onto 5°
PROOF. Let .9' be A-invariant, so that Axe SP for all x e SP If P is any
projector onto ,P then (I — P)Ax = Ax — Ax = 0 for all x e 9 Since for
any y e Sr" the vector x = Py is in .9 it follows that (I — P)APy = 0 for all
ire Sr", and we have PAP = AP, as required.
6 Conversely, if P is a projector onto .P and PAP = AP, then for any x e So
But Ax 2 e = Im P, so
However, it is important to bear in mind that Eq. (4) is not generally true for
:square matrices, as the example
A = [00 o1]
quickly shows.
The general property (3) can be seen as one manifestation of the results in
Exercise 5.1.7. These results are widely useful and are presented here as a
formal proposition about matrices, including some of the above remarks.
Proposition 1. If A e Cm x " then
C" = Ker A ®Im A*, Cm = Ker A* ® Im A.
,Exercise 2. Use Theorem 5.1.2 to prove Proposition 1.
xercise 3. Verify that the matrix
2 1 —1
13 1 2 1
- 1 1 2
Ms an orthogonal projector and find bases for its image and kernel.
nÉ.tercise 4. Let P and Q be commuting projectors.
(a) Show that PQ is a projector and
Im PQ = (Im P) n (Im Q), Ker PQ = (Ker P) + (Ker Q).
`(b) Show that if we define P ® Q = P + Q — PQ (the "Boolean sum"
t P and Q), then P ® Q is a projector and
,Tm(P ®Q) = (Im P) + (Im Q), Ker(P ® Q) = (Ker P) n (Ker Q).
flint. For part (b) obse rve that I — (P W Q) = (I — P)(1 — Q) and
nitirt (a).
gt4fircise S. Let P and Q be projectors. Show that QP = Q if and only if
P c Ker Q.
Let A e R"". Clearly, all statements made above for an arbitrary fie ]
pi' ,vER, O. (
L-v
(See also the matrix in Example 4.1.2, in which y is not an integer multiple
n.) On the other hand, if the spectrum of a real matrix is entirely real, th
there is no essential difference between the real and complex cases.
Now we discuss a process known as the method of complexification, whit
reduces the study of real matrices to that of matrices with complex entri
First we observe the familiar Cartesian decomposition z = x + iv, wher
z E C" and x, y E R". Hence if {al, a2i ... , a"} is a basis in R",
Thus C" can be viewed as the set of all vectors having complex coordinates;
in this basis, while R" is the set of all vectors with real coordinates wi
respect to the same basis. The last remark becomes obvious if the basis
R" is standard. Note that the vectors x in R" can be viewed as vectors of t
form x + 10 in C".
Proceeding to a matrix A e R" x ", we extend its action to vectors from
by the rule
Âz= Ax+ iAy
for t = x + iy, x,y E R". Note that the matrices A and d are in fact the
and therefore have the same characteristic polynomial with real coefficie
It should also be emphasized that only real numbers are admitted as eig
values of A, while only vectors with real entries (i.e., real vectors) are admi
as eigenvectors of A.
Let 20 a a(A) and Ax a = 20 x0 , where xo e R" and x o # O. By Eq.
Âx o = Axo = 20 x o and the matrix I has the same eigenvalue and ei
vector. If 20 is a complex zero of the characteristic polynomial c(2) of A
=5,9 MATRICES OVER THE REAL NUMBERS 201
Let A e .ice" x " and x,y e 9". Consider the function fA defined on 9" x ,
by
fA(x, Y) °— (Ax, y). (1
Recalling the appropriate properties of the inner product, it is clear tha
for a fixed A, the function fA in Eq. (1) is additive with respect to both vecto
variables (the subscript A is omitted):
f(xi + x2, Y) =f(x1,Y) +f(x2,Y),
f(x,Y i + Y2) = f(x, Y1) + f(x, Y2),
and homogeneous with respect to the first vector variable:
fax, Y) _ «f (x, Y), ae .^
(3
If F = It th e function is also homogeneous with respect to the seco
vector variable:
I (x, Py) = Rf(x,Y), fl e I, (4
while in the case = C, the function f fA is conjugate-homogeneous wit
respect to the second argument:
= Qe
Now we formally define a function f(x, y), defined on pairs of vecto
from 9", to be bilinear (respectively, conjugate bilinear) if it satisfies
conditions (1) through (4) (respectively, (1) through (3) and (4')). Consid
the conjugate bilinear function fA(x, y) defined by Eq. (1) on C" x . 0
If e=(x„ x2 ,...,x")isa basis in C" and x = aixi , y=
then
fA(x, Y) _ (Ax, y) _ E ”
(Ax,, x;)ar A; =
n
Yüaji;,
where
Ycj =(Ax„xj), i,j=1, 2, ... ,n.
When expressed in coordinates as in Eq. (5), fA is generally called a c
jugate bilinear form or simply a bilinear form according as = C or
respectively. The form is said to be generated by the matrix A, and the mat!
J=1 is said to be the matrix of the form f A with respect to the basil
cl' Our first objective is to investigate th e connection between matrices ..
the bilinear form (1) with respect to different bases and, in particular, betty
them and the generating matrix A.
5.10 BILINEAR, QUADRATIC, AND HERMAN FORMS 203
We first note that the matrix A is, in fact, the matrix of the form (1) with
respect to the standard basis (e 1, e2 , ... , e"} in C. Furthermore, if P denotes
the transition matrix from the standard basis to a basis {x1 , x2,.. , x"}
it► C" then, by Eq. (3.9.3), xi = Pet, i = 1, 2, ... , n. Hence for i, j = 1, 2, ... , n,
yu = (Axt , xi) = (APe; , Pei) = (P*APe 1 , ej ).
Recalling that the standard basis is orthonormal in the standard inner
roduct, we find that the y o are just the elements of the matrix P*AP. Thus
= P*AP and matrices of the same conjugate bilinear form with respect
lb different bases must be congruent.
Theorem 1. Matrices of a conjugate bilinear form with respect to different
-bits
' in C" generate an equivalence class of congruent matrices.
Now consider some special kinds of conjugate bilinear forms. A conjugate
bilinear form fA(x, y) = (Ax, y) is Hermitian if fA(x, y) = f4(y, x) for all
x, yeC".
Exercise 1. Check that a conjugate bilinear form is Hermitian if and only
if the generating matrix A is Hermitian. Moreover, all matrices of such a
than are Hermitian matrices having the same inertia characteristics.
.Hint. Use Theorem 1 and Sylvester's law of inertia. ❑
Particularly interesting functions generated by conjugate bilinear forms
Ate the quadratic forms obtained by substituting y = x in fA(y, x), that is, the
orals fA (x) = (Ax, x) defined on Sr" with values in .F:
Exercise 2. By using the identity
Lr
f(x,y)=â(f(x+ y,x+ y)+if(x+iy,x+iy)
f(x — iy, x — iy) — f(x — y, x — y)),
show that any conjugate bilinear form is uniquely determined by the cor-
'- responding quadratic form.
.ercise 3. Show that the conjugate bilinear form fA(x, y) = (Ax, y) is
ermitian if and only if the corresponding quadratic form is a real-valued
Unction. ❑
The rest of this section is devoted to the study of quadratic forms generated
by a Hermitian matrix, therefore called Hermitian forms. Clearly, such a
t`m is defined on .r" but now takes values in R. In this case
where yi; = ÿ;; (4./ = 1, 2, ... , n) and the ai (i = 1, 2, ... , n) are the coordi
ates of x with respe ct to some basis in C". In view of Theorem 1, the Hermit
matrices it = [y, 7; J_ 1 defined by Eq. (7) are congruent. Consequent!
, ,
Theorem 5.5.1 asserts the existen ce of a basis in C" with respect to whi
there is a matrix of the form do = diag[I„ — Ir_ 0]. In this (canonic
„
à 'x can be ascribed to the form that it generates. Also, the simultaneous
#tction of two Hermitian matrices to diagonal matrices by the same
uent transformation, as discussed in Section 5.5, is equivalent to a
ultaneous reduction of the corresponding Hermitian forms to sums of
res.
cise 5. Consider a Hermitian form f(x) defined on R3. The set of
is in R3 (if any) that satisfies the equation f(x l, x2 , x3) = 1 is said to
a central quadric. Show that if the equation of the quadric is referred
suitable new axes, it is reduced to the form
21z? + 22 z1 + 23z3 = 1. (10)
new coordinate axes are called the principal axes of the quadric. (Note
the nature of the quadric surfa ce is determined by the relative magnitudes
Signs of 21, 22, 20
t . Use f (x 1 , x2 , x3) = x*Hx and Eq. (9).
else 6. Verify that 21 S 22 5 23 in Eq. (10) are the eigenvalues of H and
for H > O, the quadric surface (10) is an ellipsoid with the point
: 21 , 0, 0) farthest from the origin.
h,zercise 7.Examine the nature of the central quadric whose equation is
3x2 + 3x3 — 2x2 x3 — 4x 1x3 — 4x1x2 = 1.
t. Diagonalize the matrix of the Hermitian form on the left using
rem 5.2.1. ❑
(Ax, x) = ^ 2, I ar 12,
^ =I
206 5 LINEAR TRANSFORMATIONS IN UNITARY SPACB4 4
where r = rank A and ,l„ ... , A,. denote the nonzero eigenvalues of A cor$
responding (perhaps, after reordering) to the eigenvectors 2,, ... , âr . How- ,
ever, the difficulties involved in finding the full spectral properties of a matrix :
f(x) _ „
E
Q=1
Yo,a,
Œ2 ,..., a,, .
Case 1. Assume that not all of y u are zero. Then assuming, for instan t
that y l I # 0, we rewrite Eq. (1) in the form
Thus, we have proved that any real Hermitian form can be reduced to the
form (2). Applying the same process to f2(x,) and so on, the reduction is
completed in at most n — 1 such steps. •
Exercise 1. Find the rank and inertia of the form f (x) = 2Œ1Œ2 + 2a2 a3 +
a3 and determine a transformation of the coordinates a,, a 2 , a3 that reduces
f to a sum of squares.
Another method (due to Jacobi) for finding the inertia of A or, equivalently,
Of the corresponding Hermitian form will be discussed in Section 8.6.
208 5 LINEAR TRANSFORMATIONS IN UNITARY SP
Exercise 2. Find the rank and inertia of the following Hermitian forms and
find transformations reducing them to sums of squares:
(a) tai — a3 + Œ 1a2 + a 1a3 i
(b) ai + 2(a1Œ2 + 012 a3 + Œ3a4 + Œ4 as + alas);
(c) 2Œ1a2 — Œ1a3 + Œ1a4 — Œ2 a3 + Œ 2 a4 — 2Œ3 a4.
I.
Answers. (a) r = 3, n = 1, v = 2; (b) r = 5, n = 3, y = 2; (c) r
n = 1, y = 2. The respective transforming mat rices are
ti I
1
0
1 0
1-1
0 1
0 1
+ -1 #
—1 1 0 0
01 1 , 0 0 1 1 —1 , [7
0 0 11 '
00 1 0 0 0 1 —2
0 0 0 1
0 0 0 0 1
P dQ . (1)
dx (PQ) = dz Q +
We now view p1 i p 2 i ... , p", p1, p2, . . . , p" as a set of 2n independent
yvariables and we obtain
aT 1 apT _ T ap
Ap p A
api 2 api + api,
= 2 (ei Ap + pT Aei)
=
e Ais, denotes the ith row of A. Similarly,
av
api = C1*p
Equations of motion for the system (with no external forces acting) are
btained by Lagrange's method in the following form:
daT OT aV
api' = 1, 2,...,n.
dc api, bpi
m the above expressions for the derivatives we find that the equations
(motion are
Ai„,P = — Co, i = 1, 2, . . . , n,
in matrix form,
AP+Cp=O. (2)
Following the methods in Section 4.13, and anticipating sinusoidal
lutions, we look for a solution p(t) of Eq. (2) of the form p(t) = qe i°",
ere q is independent of t and w is real. Then p = — w2 geiwi. and if we write
or
and
V = 3pTCp = 14TW24
= 4001C; + ... + ru^^^)•
02 Tue THEORY OF SMALL OSCILLATIONS 211
Thus, when expressed in terms of the normal coordinates, both the kinetic
and potential energies are expressed as sums of squares of the appropriate
variables.
We can obtain some information about the normal modes by noting that
when the system vibrates in a normal mode of frequency m,, we have s = e,
and so the solution for the ith normal-mode vector q, is
q, = (A -112 U)e, = ith column of A -112 U.
Hence the matrix of normal-mode vectors, Q = [q 1 Q2 q„] is given
by Q = A -112 U. Furthermore,
QTAQ = 1, QTCQ = W 2, (5)
which may be described as the biorthogonality properties of the normal-
mode vectors.
Now we briefly examine the implications of this analysis for a system to
which external forces are applied. We suppose that, as in the above analysis,
the elastic system makes only small displacements from equilibrium but
that, in addition, there are prescribed forces fi (t), ... , f„(t) applied in the
coordinates pi, p2, ... p„. Defining ft) = [ fi(t) f„(t)] T, we now have
,
iately reduced to
^ + W2C = QTf(t),
or
= g;f(Txt — T) d-r.
0
1^
41
^
wo^
is a solution of Eq. (6).
This result can be used to discuss the resonance phenomenon. If we imagine
the applied frequency w to be varied continuously in a neighbourhood of
natural frequency w;, Eq. (7) indicates that in general the magnitude oft
solution vector (or response) tends to infinity as w — ► w; . However, titi
singularity may be avoided if q1 fo = 0 for all q; associated with w;.
Exercise 1. Establish Eq. (7).
Exercise 2. Show that if C is singular, that is, C is nonnegative (but si
positive) definite, then there may be normal coordinates ,(t) for whi
I i(t)I — ► co as t oo.
Exercise 3. Consider variations in the "stiffness" matrix C of the fo ile
C + kl, where k is a real parameter. What is the effect on the natural fr
quencies of such a change when k > 0 and when k < 0? ❑
K2=Ker .
[XT]
More generally, for I = 1, 2, ... , define
X
XT
K 1 = Ker XT 2 ,
XT I-1
and it is seen that we have a "nested" sequence of subspaces of CP :
K1 = K2MK3M•••.
Lemma 1. Let (X, T) be an admissible pair of order p and let the sequence
{ K1, K2, ... of subspaces of CP be defined as above. If K 1 = K j+ 1i then K j =
1+1 = K1+2 = •
F. The matrices
X X
XT XT
and (1)
1 XTi
h ave sizes jr x p and (j + 1)r x p, respectively. If Kt = K1+ 1 then, using
`Proposition 3.8.1, these two matri ce s are seen to have the same rank. Think-
ing in terms of "row rank," this means that the rows of XTt are linear com-
^mations of the rows of the first matrix in (1). Thus, there is an r x jr matrix
such that
X
XT
XT' = E I. (2)
XT.r -1
214 5 LINEAR TRANSFORMATIONS IN UNITARY SPACES
The next result is the form in which the observability of (X, T) is usually
presented. Note that the value of the index of the pair (X, T) does not appear
explicitly in the statement.
Theorem 1. The admissible pair (X, T) of order p is observable if and only if
X
XT
rank =p . (3)
XTp - '
PROOF. If condition (3) holds, then in the above notation, the kernel
K„ = {0}. Thus, the unobservable subspace is trivial and (X, T) is an
observable pair.
Conversely, if (X, T) is observable then, by Exercise 3, the index s cannot
exceed p. Hence K, = K = {0}, and this implies the rank condition (3). •
The results described so far in this section have a dual formulation. The
discussion could begin with consideration of the subspaces of C",
R , =Im X*, R2=Im X* + Im T*X*,
and the trivial observation that R 1 c R2 c CP. But there is a quicker way.
Define Ri = D =I Im((T*)r-'X*) and observe that
Ri = Im[X* T*X* • • • T*' - ' XI']. (4)
Then Proposition 5.8.1 shows that
X
Ker XT ® Im[X* T* X* • .. T*; - ' Xe] = CP.
XT; -1
Consequently, we have the relation
KJ ® R1 = CP (5)
216 5 LINEAR TRANSFORMATIONS IN UNITARY SPACES
for j = 1, 2, .... The properties of the sequence K 1, K2, ... now tell us im-
mediately that R, e R2 e • • • e Cp with proper inclusions up to R,_ 1 e R„
where s is the index of (X, T), and thereafter R, = R, + 1 = • • • • Furthermore,
R, = Cp if the pair (X, T) is observable.
In this perspective, it is said that the pair (T*, X*) is controllable when
(X, T) is observable, and in this case K, = {0} and R, = CP. If R, # Cp,
then R, is called the controllable subspace and the pair (T*, X*) is uncontrol-
lable.
Note that, using Eq. (4) and the fact that s S p, we may write
a-1 p- 1
R, =E
k=0
Im((T*) IX*) = E
k=0
Im((T*) kX*). (6)
1. For which real values of a are the following matrices positive definite:
l a a 1 1 a
(a) a 1 a , (b) 1 1 1.
a a 1 a l l
Answer. (a) — < a < 1; (b) none.
2. Let A E C"" and suppose Ax = Ax and A*y = µy (x, y # 0). Prove
that if A # µ, then x 1. y, and that if x = y, then A = p.
3. Check that the matrix H z 0 is positive definite if and only if det H # O.
4. Let A, B e C""" and define
B—B
R— LA Ai
Show that A is an eigenvalue of R only if — A is an eigenvalue of R.
Hint. Observe that PR + RP = 0 where
0 11
P
[ -1 .1 0 .
5. Verify that a Hermitian unitary matrix may have only + 1 and —1 as its
eigenvalues.
6. If P is an orthogonal projector and a l , a2 are nonzero real numbers,
prove that the matrix aiP + az(I P) is positive definite.
—
dependent.
16. Let x,y be elements of a unitary space qi with a basis S = {x 1, x2 , ... , x"}.
Prove that, if a and l are representations of x and y, respectively, with
respect to d, then
:
(x, y) _ (a, GP), (1)
where G is the Gram matrix of e.
Conversely, show that for any G > 0, the rule in Eq. (1) gives an inner
product in efi and that G is the Gram matrix of the system d with respect
to this inner product.
17. Let A, B, C eV" be positive definite.
(a) Prove that all 2n zeros A i of det(2 2A + .1B + C) have negative ;
real parts.
Hint. Consider the scalar polynomial (x, (A A + .1 1 B + C)x) for
x e Ker(.l?A + .t i B + C) and use Exercise 5.10.4.
(b) Show that all zeros of det(.1 2A + AB — C) are real.
(c) Prove the conclusion of pa rt (a) under the weaker hypotheses ;
that A and C are positive definite and B has a positive definite real part;
18. Let A e C""". Prove that if there exists a positive definite n x n matriX.
5.14 MISCELLANEOUS Enecaes 219
220
6.1 ANNIHILATING POLYNOMIALS 221
PROOF. Consider the case k = 2. To show that the given sum is direz
it suffices to show that Ker p i (T) r Ker p2(T) = (0}. But if simultaneous
CI ANNIInLA1INO POLYNOMIALS 223
jr1(TXx) = 0 and p2(TXx) = 0 then, since p i(A) and p2(A) are relatively
prime (see Theorem 3 in Appendix 1)
1 (x) = ci(T)p1(TXx) + c 2(T)p 2 (TXx) = 0,
and consequently the required result: x = O. An induction argument like
that of Theorem 1 completes the proof. •
Note that the hypothesis on the polynomials in Theorem 2 is stronger than
the condition that the set {p,(2)}i s1 be relatively prime. (See Exercise 4 in
Appendix 1 and recall that the polynomials being considered are assumed
to be monic.)
Now we define a nonzero scalar polynomial p(.1) over Sr to be an
annihilating polynomial of the transformation T if p(T) is the zero trans-
formation: p(T) = O. It is assumed here, and elsewhere, that Te .r°(.°) and
is over X
'Exercise S. Show that if p(,) is an annihilating polynomial for T, then so is
the polynomial p(2)q(.1.) for any q(2) over SF.
= xercise 6. Check that if p(2) is an annihilating polynomial of T and
e a(T), .1 0 e .51r, then p().o) = O. (See Exercise 4.) 0
In other words, Exercise 6 shows that the spectrum of a transformation T
is'included in the set of zeros of any annihilating polynomial of T. Note that,
in view of Exercise 5, the converse cannot be true.
Theorem 3. For any Te 2(.9') there exists an annihilating polynomial of
degree 15 n 2, where n = dim
PRomF. We first observe that the existence of a (monic) annihilating poly-
nbmial p(2) for T is provided by Exercise 4.1.12 and is equivalent to the
condition that T' be a linear combination of the transformations T',
0, 1, ... , 1 — 1, viewed as elements in 2(6°). (We define T ° I.) Thus,
kit some Po, Pi, ...,P ► -I e
T'= — pot — PIT —.. . — PI-IT S (3)
To verify Eq. (3), we examine the sequence I, T, T 2, .... Since the dimen-
.$z n of the linear space 2(50) is n 2 (see Exercise 4.1.8),, at most n 2 elements
n this sequence are linearly independent. Thus there exists a positive
Steger 1(1 < 1 5 n 2) such that I, T, ... , T' - ' viewed as elements in 2(.Y)
linearly independent, while I, T, ... , T' form a linearly dependent system.
ence T' is a linear combination of T', i = 0, 1, ... , I — 1. This proves
theorem. •
q'he main result follows readily from Theorem 2, bearing in mind that
`. P(T) .0 then Ker(p(T)) = .P
224 6 THE JORDAN CANONICAL FORM
then
Show that the conclusion does not hold with the choice of factory
q1(2) = 92(A) = (1— 21X2 — 22 for p(2). ❑ )
It is clear that a fixed linear transformation Te 2(0) has many annih ilat
ing polynomials. It is also clear that there is a lower bound for the degree
of all annihilating polynomials for a fixed T. It is natural to seek a monii
annihilating polynomial of the least possible degree, called a minimà)
polynomial for Tand written mT(A), or just m(1). It will be shown next that till
set of all annihilating polynomials for T is "well ordered" by the degrees
the polynomials and, hence, there is a unique minimal polynomial.
PROOF. In view of Exercise 6.1.6, it suffices to show that any zero of the
minimal polynomial m(A) of T belongs to the spectrum o(T) of T.
Let m(A) = q(A) F .1 (A Ai), where q(2) is irreducible over Sr.Assume
-
required. •
In this chapter the nature of the field F has not played an important role.
OW, and for the rest of the chapter, it is necessary to assume that F is
0gebraically closed (see Appendix 1). In fact, we assume for simplicity that
= C, although any algebraically closed field could take the place of C.
226 6 THE JORDAN CANONICAL FORM
m(A) _ fi (A — A,m'
i =1
(2)
for some positive integers mi (i = 1, 2, ... , s).
The exponent mi associated with the eigenvalue A i e a(A) is clearly unique
(see Corollary 1 of Theorem 1). It plays an important role in this and subse-
quent chapters and is known as the index of Ai .
Applying Theorem 6.1.4 to m(A) written in the form (2), we arrive at thé
following important decomposition of a linear transformation.
Theorem 3. Let Te 2'(.9'), let v(T) = (A1 A2 , , ... , Ai); and let A I have
index mi for each i. Then
d
T= E• T l , (3 )
P.1
is a T-invariant subspace.
PROOF, By Eq. (6.1.5) we have, on putting AO) = (A — AJ', i = 1, 2, ... , s
.9' = • gi . (5) .
i =1
It will be shown later (see Proposition 6.6.1) that the numbers q 1, q 2 , .... , t:
are, in fact, the algebraic multiplicities of the eigenvalues of T (or of A);
Note that each matrix A; in Eq. (6) can also be described as a representa- ;
tion of the transformation T = Ti $, with respect to an arbitrary basis in go
Corollary 1 above can be restated in the following, equivalent form.
Note that the positive integer s in Eq. (6) can be equal to 1, in which case:;
the spectrum of T consists of a single point.
Exercise 2. Let A be an eigenvalue of two similar matri ce s A and B. Show
that the indi ce s of A with respect to A and to B are equal. D
The final theorem of the section is known as the Cayley- Hamilton theorem
and plays an important part in matrix theory. Another proof will appear ii
Section 7.2 in terms of matrices rather than linear transformations.
m(T) = El (A
j°
—
Exercise 3. Use the Cayley -Hamilton theorem to check that the matrix
of Exercise 4.11.1 satisfies the equation
A3 =4A 2 —A+6I. Æ'
63 GENERALIZED EI(iStISPACeS 229
002
PROOF. Let x,, x2 , ..., xr be a Jordan chain and observe that Eqs. (3)
imply
xr _, = (T —
(5)
x2 = (T u)r-2x,,
= (T Ally - 'x,.
—
Using the binomial theorem, it is clear that the Jordan subspace generated
by x,, x2 , .. . , x, is also generated by x„ Tx„ ... , Tr - t x,. Since the Jordan
subspace is T-invariant (Exercise 5) and x, is in this subspace, it follows that
the subspace is cyclic with respect to T. •
Observe that Eqs. (5) show that if x r is a generalized eigenvector of order r,
then (T aI)`x, is a generalized eigenvector of order r — i, for i = 0, 1, ... ,
—
r — 1.
232 6 Doe JoaneN CANONICAL Foa/sii
^D = L . + E•lp-1+...+ E
1=1 i=1 a=1
where /(1 ) (1 S j 5 p, 1 5 i < t i) is a cyclic Jordan subspace of dimension ji
x(;),
PROOF. Bearing in mind the relationships (6.3.1), let 4 ), x(2), ... , x(1`,p)
denote linearly independent elements in .9'p such that
.9',,_, + span(xpl ), xp2j, ... , x(y) = S9p . (2)
Obviously, the vectors 41j, ... , x p°P) in Eq. (2) are generalized eigenvectors
Of T of order p. It is claimed that the ptp vectors of the form
(T — I)k(4 )), (T — 2 I)1`(4)), (T — AI)k(40), (3)
C = 0, 1, ... , p — 1, are linearly independent. Indeed, if
p - 1 tp
a,a(T —.10(4)
.10(4) = 0, E
(4)
k=0 !=1
(^ aukxp') I =
(T — my-1(E o
\ 1 111
and L,,,=1 c4) €.50 _ 1, which contradicts (2) again unless aik = a2k =
acpk = 0, so that Eq. (5) follows.
'Now consider the elements .1_ 1 = (T — MXxpe), i = 1, 2, ... , tp , from
which are generalized eigenvectors of T of order p — 1 (Exercise 6.3.2). It
flows from above that 4!_ i are linearly independent and that by Eq. (5),
?'\ .Vp- 2 n span{Xp 1 . .422. 1 .... , xp°-) 1} = {O} •
234 6 Tin JORDAN CANONICAL FORD*
Hence there is a set of linearly independent elements {4, 0 1 }si? ;'D- in Pp_ t`
such that
Yp_2 + span{xp) 1}i? j" - ' = 9'p_ . (6)
Applying the previous argument to Eq. (6) as we did with Eq. (2), we obtain
a system of elements of the form (3), where p is replaced by p — 1, and having
similar properties. It remains to denote
^P! = span{(T — A.It(4,_+ ) )}f =o, 1 5 i S ti,_ 1,
1
Exercise 2. Check that the number k,, of Jordan chains in (1) of order m is
km= 21m -1m -1 — im +1
`where i, = dim(Ker(T — AI)').
_
^s
s the minimal polynomial for T and, by Theorem 6.1.4,
T(xi) = E u,1x,,
d= 1
j = 1, 2, ..., r,
we obtain
A, 1 0 0
0 Ai 1
Ji = 0 , (3
1
0 0 A,
where all superdiagonal elements are ones and all unmarked elements ail
zeros. The matrix in (3) is referred to as a Jordan block (or cell) of order
6.5 THE JORDAN THEOREM 237
corresponding to the eigenvalue A i . Thus, the Jordan block (3) is the repre-
sentation of Thy with respect to the basis (1), which is a Jordan chain
generating f
Exercise 1. Show that the Jordan block in (3) cannot be similar to a block-
diagonal matrix diag[A 1 , A2 , ... , Ai ], where k > 2. _
SOLUTION. If, for some nonsingular P E C'" ',
J = P diag[A 1, A 2]P -1 ,
then the dimension of the eigenspace for J is greater than or equal to 2 (one
for each diagonal block). This contradicts the result of Exercise 6.3.6. ❑
Thus, a Jordan block cannot be "decomposed" into smaller blocks by any
similarity transformation.
Let T E 2(b°) with . = C, and write a(T) = {Ai, Â.2 , ... , As). By Theorem
,
6.23, T = ^ z l' T where Ti = Ta g,, and g( = Ker(T — .t,If' (1 5 i 5 s).
Choosing in each g i a Jordan basis of the form (6.4.1), we obtain a Jordan
basis for the whole space 9' as a union of Jordan bases; one for each general-
ized eigenspace. This is described as a T-Jordan basis for 9'. We now appeal
to Theorem 6.4.1.
Theorem 1. In the notation of the preceding paragraph, the representation of
T with respect to a T-Jordan basis is the matrix
J = diag[J(.I), J(22), ... , J(2 )], , (4)
'Where for 1 S i < s,
J(^O = diag[J (D,) , ... , JP + Jp-1, ... ,
(t) (i) (!) (q ^)
Jp-1+ ... , J l , ... , Ji ], (5)
it which the j x j matrix Jr of the form (3) appears t5 0 times in Eq. (5). The
/utmbers tr (i = 1, 2, ... , s; j = 1, 2, ... , p) are uniquely determined by the
ansformation T.
It is possible that t;`) = 0 for some j < m1 , in which case no blocks of size j
appear in Eq. (5).
Note that the decomposition of J in Eqs. (4) and (5) is the best possible
in the sense of Exercise 1. The representation J in Eqs. (4) and (5) is called a
ordan matrix (or Jordan canonical formr of the transformation T. We now
ply Corollary 2 of Theorem 6.2.4 to obtain the following.
rollary 1. Any complex square matrix A is similar to a Jordan matrix
n t is uniquely determined by A up to the order of the matrices J(x i) in Eq. (4)
the Jordan blocks in Eq. (5) ;`i
t Iordan, C., Traité des substitutions et des équations algébriques. Paris, 1870 (p. 125).
238 6 THE JORDAN CANONICAL FORM ;
Exercise 4. Show that if A = SJS - ' and A = TJT -', then T = SU for
some nonsingular U commuting with J.
Conversely, if U is nonsingular and commutes with J, A = SJS' 1, and
if we define T = SU, then A = TJT - '. ❑
Let A e C" x" and let J, given by Eqs. (6.5.4) and (6.5.5) be a Jordan
canonical form of A, that is, A = PJP -1 for some invertible P. In this section
we shall indicate some relations between the multiplicities of an eigenvalue
and the parameters of the corresponding submatrices J(A,) of J.
Proposition 1. The dimension of the generalized eigenspace of A e C"x"
associated with the eigenvalue A l e o(A) [that is, the order of the matrix
J(A;) in Eq. (6.5.4)] is equal to the algebraic multiplicity of A.
PROOF. Let A = PJP -1 , where J is given by Eq. (6.5.4). The characteristic
polynomials of A and J coincide (Section 4.11) and hence
p
J = diagLL
1 ], [1], 0 2],
C2] ,
C2]] . o
O
6.6 PARAMETERS OF A JORDAN MATRIX 241
In cases different from those discussed above, the orders of the Jordan
blocks are usually found by use of Eq. (6.4.8) and Exercise 6.4.2, which
require the determination of dim(Ker(A — pli)') for certain positive integers s.
2 0 0 0
A
_ —3 2 0 1
002 1 .
0 0 0 2
J = diagl
I 0 2]' [0 2]]' ❑
f Segre• C., AHi Accad. naz. Linrei, Mem. !!!, 19 (1884), 127-148.
242 6 THE JORDAN CANONICAL RAW
Let A e I1" " " If A is viewed as a complex matrix, then it follows from
.
Theorem 6.5.1 that A is similar to a Jordan matrix as defined in Eqs. (6.5.4) '
and (6.5.5). However, this Jordan matrix J, and the matrix P for which .
P - `AP = J, will generally be complex. In this section we describe a normal-
form for A that can be achieved by confining attention to real transforming
matrices P, and hence a real canonical form. To achieve this, we take advan-
tage of the (complex) Jordan form already derived and the fundamental fact
(see Section 5.9) that the eigenvalues of A may be real, and if not, that they
appear in complex conjugate pairs.
We shall use the construction of a (complex) Jordan basis for A from the
preceding sections to obtain a real. basis. The vectors of this basis will form
the columns of our real transforming matrix P.
First, if A is a real eigenvalue of A, then there is a real Jordan basis for the,
generalized eigenspace of A, as described in (6.4.1); these vectors will form -a
part of our new real basis for Ft'. If A e a(A) and is not real, then neither is
e a(A). Construct Jordan chains (necessarily non-real) to span the general-
ized eigenspace of A as in (6.4.1). Thus, each chain in this spanning set satisfies
relations of the form (6.3.3) (with T replaced by A). Taking complex conju-
gates in these relations, the complex conjugate vectors form chains spanning
the generalized eigenspace of .
It will be sufficient to show how the vectors for a single chain of length
together with those of the conjugate chain, are combined to produce the
desired real basis for a 2r-dimensional real inva riant subspace of A. Let
A be the eigenvalue in question and write A = µ + ico, where p, w are read
and w # O. Let P be the n x r complex matrix whose columns are the
vectors x 1 , x2 , ... , x, of a Jordan chain corresponding to A. If J(A) is the
r x r Jordan block with eigenvalue A, then AP = PJ(A) and, taking conju. .
gates, AP = PJ(A). Thus,
rJ(A)
A[P P] = [P P] L 0
J(O J '
and for any unitary matrix U of size 2r,
*[J(A) 0 l U
A[P P]U = [P P]UU
lL 0 J(1)1 ïŸ
Q g
We claim that there is a unitary matrix U such that the 2r columns of
[P P] U are real and such that J,(A, A) g U* diag[J(A), J(A)] U is also real
The matrix J,(A, A) then plays the same role in the real Jord an form for A
that diag[J(A), J(A)] plays in the complex form.
6 7 THE REAL JORDAN FORM 243
'which is real and of full rank since [P P] has full rank and U is unitary.
'These vectors span a six-dimensional real invariant subspace of A associated
,
'with the non - real eigenvalue pair 2, A (see Theorem 5.9.1). This statement
is clear from the relation AQ = QJ (.i, A). Our discussion leads to the fol-
lowing conclusion.
Exercise 1. Describe the real Jordan form for a matrix that is real and
simple. O
244 6 THE JORDAN CAWON1cA1. FOR*
1. Prove that two simple matrices are similar if and only if they have the
same characteristic polynomial.
2. Show that any square matrix is similar to its transpose.
3. Check that the Jordan canonical form of an idempotent matrix (a
projector) is a diagonal matrix with zeros and ones in the diagonal ;
positions.
4. Show that
(a) the minimal and the characteristic polynomials of a Jordan!'
block of order r are both (2 — .lo)';
(b) the minimal polynomial of a Jordan matrix is the least common
of the minimal polynomials of the Jordan blocks.
S. Show that a Jordan block is similar to the companion matrix associate&
with its characteristic (or minimal) polynomiaL
Hint. For a geometrical solution: Show that the representation of
TI f with respect to the basis {x, T(x),... T' - 1(x)} for a Jordaff
,
ao a l: al ,. ro. 1 00
0 •
and 0
ar_i,r 1
0 0 ao
0 0 ao
are similar, provided a;, 1+ 1 q-̀ 0, i = 1, 2, ... , r — 1.
8. Check that for an r x r Jordan block J associated with an eigenvaluC
20 and for any polynomial p(A),
,
A
P(.1.0) PU0)/1 ! P fr-1) (4)l(r — 1)!
0
P(J) =
P1110)l11
0 0 Pa o)
gg Misceu.*Neous Exmuasss 245
and a2
a2 — al
a3 —Al
Let A e C" x ". Show that A = A"' if and only if there is an m 5 n and
a nonsingular S such that
_ Sr l,„
A— 0 S -1
0 —1" _,"J
Hint. Use the Jordan form for A.
CHAPTER 7
Matrix Polynomials
and Normal Forms
246
7.1 Tns NOTION OF A MATRIX POLYNOMIAL 247
and suppose that A and the coefficients of the polynomials au(I) are from a
field 3 so that when the elements of A(A) are evaluated for a particular value
of A, say A = Aa , then A(A0 ) e.'. The matrix A(A) is known as a matrix
polynomial or a lambda matrix, or a A -matrix. Where the underlying field
needs to be stressed we shall call A(A) a A - matrix over.. Note that an example
of a A-matrix is provided by the matrix AI — A and that the matrices whose
elements are scalars (i.e., polynomials of degree zero, or just zeros) can be
viewed as a particular case of A-matrices.
A A-matrix A(A) is said to be regular if the determinant det A(A) is not
identically zero. Note that det A(A) is itself a polynomial with coefficients
in A so A(A) is not regular if and only if each coefficient of det A(A) is the
zero of ,E The degree of a A-matrix A(A) is defined to be the greatest degree
of the polynomials appearing as entries of A(A) and is denoted by deg A(A).
Thus, deg(AI — A) = 1 and deg A = 0 for any nonzero A e 5r"".
Lambda matrices of the same order may be added and multiplied together
in the usual way, and in each case the result is another A-matrix. In particular,
a A-matrix A(A) is said to be invertible if there is a A-matrix B(A) such that
A(A)B(A) = B(A)A(A) = I. In other words, A(A) is invertible if B(A) = [A(A)] - I
exists and is also a A-matrix.
Proposition 1. A A-matrix A(A) is invertible if and only if det A(A) is a non-
zero constant.
PROOF. If det A(A) = c # 0, then the entries of the inverse matrix are equal
to the minors of A(A) of order n — 1 divided by c # 0 and hence are poly-
nomials in A. Thus [A(A)] -1 is a A-matrix. Conversely, if A(A) is invertible,
then the equality A(A)B(A) = I yields
det A(A) det B(A) = 1.
Thus, the product of the polynomials det A(A) and det B(A) is a nonzero
constant. This is possible only if they are each nonzero constants. •
A A-matrix with a nonzero constant determinant is also referred to as a
unimodular A-matrix. This notion permits the restatement of Proposition 1:
A A- matrix is invertible if and only if it is unimodular. Note that the degree of a
unimodular matrix is arbitrary.
Exercise 1. Check that the following A-matrices are unimodular:
1 A —2A2
RA - 1)2
A 1 (2) =[1:1 1
0 0 1
A4 ,
AZ(A) - L A - 2 1JA
248 7 MATRIX POLYNOMIALS AND NORMAL FORMS
where k = max(I, m). Thus, the degree of A(A) + B(A) doex not exceed k.
7.2 DIVISION OF MATRIX POLYNOMIAL4 249
PROOF. If I < m, we have only to put Q(2) = 0 and R(A) = A(A) to obtain
the result.
If 1 Z m, we first "divide by" the leading term of B(1), namely, BM Am
Observe that the term of highest degree of the A-matrix AI B; 1A I- MB(A) is
just A,AI. Hence
A(A) = A I B; 1AI-MB(A) + A" )(A),
Where A(1)(A) is a matrix polynomial whose degree, l l , does not exceed
— 1. Writing A" )( A) in decreasing powers, let
AI1j(A)=Ai;)A''+...+Ao), A1)#0, 11 < 1.
250 7 MATRIX POLYNOMIALS AND NORMAL FORMS
PROOF. Suppose that there exist matrix polynomials Q(A), R(t) and Q 1(4
R 1(A) such that
A(.1) = Q(Z)B(A) + R(A)
and
A(.) = Q1(Z)B(A) + R1Q)
where R(2), R 1 QL) each have degree less than m. Then
RExereise1. Examine the right and left quotients and remainders of A(2)
on division by B(2), where
4 +22 + 2 — 1 23 +22 +2+ 2
A(2) = 2
223 2 - 22+22 ]'
[12
B(2) _
SOLUTION. Note first of all that B(2) has an invertible leading coefficient.
It is found that
A(2)
= r22221 A 2 11 [22 2 1 22 + 2] + -52 2^ 2 3] [
= Q(2)B(2) + R( 2),
and
2
A (2
) = L 2 1 22 + 2] [2 22
-1 1 1] = B(24(2).
Thus, 0(2) is a left divisor of A(2). ❑
Thus,
The result now follows from the uniqueness of the right remainder on divisio
of A(1) by (AI — B).
The result for the left remainder is obtained by reversing the factors in
initial factorization, multiplying on the right by A;, and summing. IN
PROOF. Let A e SF" x " and let c()) denote the characteristic polynomial of
Define the matrix B(A) = adj(AI — A) and (see Section 2.6) observe
ELEMENTARY OPERATIONS AND EQUIVALENCE 253
=
1
1,
0 1
(2)
E
1 0
or
1
b(.1) ••• 1
7.3 ELEMENTARY OPERATIONS AND EQUIVALENCE 255
As in Section 2.7, the mat rices above are said to be elementary matrices
of the first, second; or third kind. Observe that the elementary matrices can
be generated by performing the desired operation on the unit matrix, and
that postmultiplication by an appropriate elementary matrix produces an
elementary column operation on the matrix polynomial.
Exercise 1. Prove that elementary matrices are unimodular and that their
inverses are also elementary matrices. ❑
We can now make a formal definition of equivalence tr an sformations: two
matrix polynomials A(2) and B(A), over . are said to be equivalent, or to be
connected by an equivalence transformation, if B(2) can be obtained from A(1.)
by a finite sequence of elementary operations. It follows from the equivalence
of elementary operations and operations with elementary matrices that
A(d) and B(A) are equivalent if and only if there are elementary matrices
E1(.1.), E2(A), ..., Ek( 1), Ek+1(2), ... , E,(.1) such that
B(A) = Ek(A) ... E1(2)A(11)Ek+L(2) ... Es())
or, equivalently,
Be) = P(A)A(A)Q(A), ( 1)
where P(1,) = Ek(A) • • • E 1(A) and Q(,) = Ek +1Q) • • • Es(A) are matrix poly-
nomials over Exercise 1 implies that the matrices P(t) and Q(A) are
unimodular and hence the following propositions.
Proposition 1. The matrix polynomials A(2) and B(%) are equivalent if and
only if there are unimodular matrices P(A) and Q(1.) such that Eq. (1) is valid.
The last fact readily implies that equivalence between matrix polynomials
is an equivalence relation.
Exercise 2. Denoting the fact that A(2.) and B(1.) are equivalent by A —, B,
prove that
(a) A — A;
(b) A —' B implies B — A;
(c) A B and B. C implies A C.
Exercise 3. Prove that any unimodular matrix is equivalent to the identity
matrix.
SOLUTION. If A(1,) is unimodular, then A(Q.)[A(d)r = I and hence relation
(1) holds with P(2) = I an d Q(1) = [A(A)) -1 . ❑
Exercise 2 shows that the set of all n x n matrix polynomials splits into
subsets of mutually equivalent matrices. Exercise 3 gives one of these classes,
namely, the class of unimodular matrices.
256 7 MATRIX POLYNOMIALS AND NORMAL FOR
are equivalent and find the transforming matrices P(A) and Q(A).
SOLUTION. To simplify A(A), add — (A — 1) times the first row to the secon
row, and then add —1 times the first column to the second. Finally, addin
— A times the second column of the new matrix to the first and interchangin
columns, obtain the matrix B. Translating the operations performed into
elementary matrices and finding their product, it is easily found that
Q(A) = 01
[ 11 ^A J [1 1 A
and that Eq. (1) holds.
Exercise 6. Show that the following pairs of matrices are equivalent:
0 1 A 1 0 0
(a) A A 1 and 0 1 0;
A2 — A A2 -1 A2 -1 0 0 0
r A 2 A + 1j and r1 0 1
1.
ro) LO A4— A2 +
LA — 1 12
than that of the new a, ,(2). Clearly, since the degree of a,,(A) is strictly
decreasing at each step, we eventually reduce the A-matrix to the form
a„(A) 0 O
O a22(A) a2e(A)
(3)
0 a„2(A) a„„(A)
Step 2. In the form (3) there may now be nonzero elements au(A), 2 <_
j < n, whose degree is less than that of a„(A). If so, we repeat Step 1 again
and arrive at another matrix of the form (3) but with the degree of a„(#
further reduced. Thus, by repeating Step 1 a sufficient number of times, wei' ;
A(A) and for which canfidmtrxohe(3)aisquvlento
a„(A) is a nonzero element of least degree.
Step 3. Having completed Step 2, we now ask whether there are nonzero
elements that are not divisible by a„(A). If there is one such, say ail)), we
add column j to column 1, find remainders and quotients of the new column
1 on division by a„ (A), and go on to repeat Steps 1 and 2, winding up with
a form (3), again with a,,(A) replaced by a polynomial of smaller degree.
Again, this process can continue only for a finite number of steps before
we arrive at a matrix of the form
0 0
6810(11)
O b 22(A) b2n(A) ,
0 b a2(A) b„„(A)
0 O Co(A) Can(A)
Note that a matrix polynomial is regular if and only if all the elements
al(1.), ... , a„(A) in Eq. (1) are nonzero polynomials.
Exercise 1. Prove that, by using left (right) elementary operations only,
a matrix polynomial can be reduced to an upper- (lower-) triangular matrix
polynomial B(A) with the property that if the degree of b fj(A) is f (j = 1,
2, ... , n), then
(a) = 0 implies that bkj = 0 (bjk = 0), k = 1, 2, ... , j — 1, and
(b) l > 0 implies that the degree of nonzero elements bk/A) (b,, k(A)) is
less than lj , k = 1, 2, ... , j — 1. O
The reduction of matrix polynomials described above takes on a simple
form in the important special case in which A(A) does not depend explicitly
on A at all, that is, when A(A) A E .F""": This observation can be used to
show that Theorem 1 implies Theorem 2.7.2 (for square matrices).
The uniqueness of the polynomials a l (A), ... , a„(A) appearing in Eq. (1)
is shown in the next section.
where p,(2) and q,(.1.) denote the appropriate minors of order j of the matrix
polynomials P(.1) and Q(.l), respectively. If now b(.t) is a nonzero minor of
B(.) of the greatest order r (that is, the rank of B(A) is r), then it follows from
Eq. (1) that at least one minor a,(.1) (of order r) is a nonzero polynomial and
hence the rank of B(A) does not exceed the rank of A(A). However, applying
the same argument to the equation A(A) = [POW 1B(A)[Q(A)] - ', we see
that rank A(A) <_ rank B(A). Thus, the ranks of equivalent matrix polynomials
coincide. ■
Suppose that the n x n matrix polynomial A(A) over has rank r and let
d/2) be the greatest common divisor of all minors of A(A) of order j, where
j = 1, 2, ... , r. Clearly, any minor of order j z 2 may be expressed as a linear
combination of minors of order j — 1, so that d1_ 1(A) is necessarily a factor
dig). Hence, if we define da(A) = 1, then in the sequence
d0(A), d l(A), ... ,
di(A) is divisible by di_ 1 (.), j = 1, 2, ... , r.
Exercise 1. Show that the polynomials d,{A) defined in the previous para-
graph for the canonical matrix polynomial A M(A) = diag[a 1(A), a2(2), ...,
ar(A), 0, ... , 0] are, respectively,
Proposition 2. Let the matrix polynomials A(A) and B(A) of rank r be equiva-
lent. Then, with the notation of the previous paragraph, the polynomials d i(A)
and 8.1(A) coincide (j = 1, 2, ... , r).
1 1(A) = d-(A_ ,
d2(A)
12(A) = d
_ dr( 2) ^
i(A),... dr- 1(^)
In view of the divisibility of (IAA) by d; _ 1(A), the quotients i,(A) (j = 1, 2, ... , r)
are polynomials. They are called the invariant polynomials of A(A); the
invariance refers to equivalence transformations. Note that for j = 1, 2, ... , r,
di(A) = i1(A)i2(A) ... i ;(A), (3)
and for j = 2, 3, ... , r, i;(A) is divisible by i;_ 1(A).
Exercise 2. Find (from their definition) the invariant polynomials of the
matrix polynomials
0 1 A A A2 0
(a) A A 1 ; (b) [A' A' 0 1.
A2 —A A2 - 1 A2 - 1 0 0 21.
Answer. (a) '1(A) = 1, i2(A) = 1; (b) i i(A) = A, i2(A) = A, i3(A) = A 5 — A4
Recalling the result of Exercise 1 and observing that the invariant poly-
nomials of the canonical matrix polynomial are merely a.(A), we now use
Proposition 2 to obtain the main result of the section.
Theorem 1. A matrix polynomial A(A) of rank r is equivalent to the canonical
matrix polynomial diag[i 1(A), ..., i (A) 0, ..., 0], where i 1 (A), i 2(A),: ir(A)
, ,
they have the same Smith canonical form. The transitive property of equiva-
lence relations (Exercise 7.3.2) then implies that they are equivalent. •
Exercise 3. Find the Smith canonical form of the matrices defined i lia
Exercise 2. .
Answer.
1 0 0 A 0 0
(a) 0 1 0 (b) 0 d 0 . ❑
0 0 0 0 0 A° — ,l4
Exercise 3 illustrates the fact that the Smith canonical form of a matri
polynomial can be found either by performing elementary operations a,
described in the proof of Theorem 7.4.1 or by finding its invariant polynomials,
Exercise 4. Find the invariant polynomials of 11 C. where, as in Exerci se
—
Thus, using Theorem 7.5.1, A and B are similar if and only if Al — A and
Al B have the same invariant polynomials.
—
polynomials. Then they are equivalent and there exist unimodular matrix
polynomials PQ) and Q(A) such that
P(2)(AI — A)Q(A) = AI — B.
Thus, defining M(A) = [P(A)] - I we observe that M(A) is a matrix polynomial
,
and
M(AXAI — B) = (Al — A)Q(A,).
Now division of M(A) on the left by AI — A and of Q(A) on the right by
Aï — B yields .
M(A) = (AI — A)S(A) + Mo , (1)
Q(A) = R(AxAI — B) + Qo,
where M o , Q o are independent of A. Then, substituting in the previous
`'etluation, we obtain
((AI — A)S(A) + M o)(AI — B) = (AI — AXR(AXAI — B) + Q0),
whence
(AI — A){S(A) — R(2)}(AI — B) = (AI + A)Q o — M0(AI — It).
'But this is an identity between matrix polynomials, and since the degree of
the polynomial on the right is 1, it follows that S(A) = R(A). (Otherwise the
degree of the matrix polynomial on the left is at least 2.) Hence,
Mo(A! — B) = (AI — A)Qo, (2)
so that
equivalence of AI — C, and A(A); hence 1, ... ,1, is(A), ... , i„(A) are the
invariant polynomials of both Al — C 1 and AI — A. •
Exercise 1. Check that the first natural normal forms of the matrices
3 —1 0 6 2 2
At = —1 3 0 , A2 = —2 2 0
1 —1 2 0 0 2
are, respectively,
2 : 0 0 0 1 0
•--• -
Cil ) = 0 0 1, C12) = o 0 1. ❑
0 —8 6 32 —32 10
where k 1 # 0; A1, A2 , ... , A, are the distinct latent roots of A(A); and mj z 1
for each j. From the Smith canonical form we deduce that
r
n
J=1
{.l) =H
k=1
(2. - itkr.
J=1
E aJk = mk •
Each factor (A — Ak)ak appearing in the factorizations (1) with aJk > 0 is
called an elementary divisor of A(A). An elementary divisor for which a Jk =1
is said to be linear; otherwise it is nonlinear. We may also refer to the elemen
divisors (A — Ak)°1k as those associated with Ak , with the obvious mean-'
ing. Note that (over C) all elementary divisors are powers of a linear poly
nomial.
Exercise 1. Find the elementary divisors of the matrix polynomial define d;
in Exercise 7.5.2(b).
Answer. A, A, A°, A — 1. ❑
Note that an elementary divisor of a matrix polynomial may appear severs
times in the set of invariant polynomials of the matrix polynomiaL Als
the factorizations (1) show that the system of all elementary divisors (along
with the rank and order) of a matrix polynomial completely defines the sel
of its invariant polynomials and vice versa.
It therefore follows from Corollary 1 of Theorem 7.5.1 that the elementaiy
divisors are invariant under equivalence transformations of the matrit
polynomial.
Theorem 1. Two complex matrix polynomials are equivalent if and only ;
they have the same elementary divisors.
Exercise 2. Determine the invariant polynomials of a matrix polynomial
of order 5 having rank 4 and elementary divisors A, A, A 3, A — 2, (A — 2)
A + 5.
1 7,. ELB NTARY Dimon 267
SOLUTION. Since the order of the matrix polynomial is 5, there are five
invariant polynomials ik(A), k = 1, 2, ... , 5. And since the rank is 4, is(A) =
ds(A)/d 4(A) 50. Now recall that the invariant polynomial i,{,1) is divisible
by i;_ 1(A) for j Z 2. Hence, i4(2) must contain elementary divisors associated
with all eigenvalues of the matrix polynomial and those elementary divisors
must be of the highest degrees. Thus, i 4(A) = A3(A — 2)2(A + 5). Applying
the same argument to i3(A) and the remaining elementary divisors, we obtain
13(A) = 2(2 — 2) and then i 2(A) = A. All elementary divisors are now
exhausted and hence i 1(A) 1. ❑
The possibility of describing an equivalence class of matrix polynomials
in terms of the set of elementary divisors permits a reformulation of the
statement of Theorem 7.6.1.
Proposition 1. Two n x n matrices A and B with elements from C are similar
if and only if AI — A and Al — B have the same elementary divisors.
Before proceeding to the general reduction of mat rices in C"'", there is
one more detail to be cleared up.
Theorem 2. If A(A), B(A) are matrix polynomials over C, then the set of
elementary divisors of the block-diagonal matrix
C(2) _ 14(A) 0
0 B(A)
is the union of the sets of elementary divisors of A(A) and B(A).
PROOF. Let D 1 (A) and D2(A) be the Smith forms of A(A) and B(2), respec-
tively. Then clearly
or, equivalently,
(b) ll = 1, lz = 21, 1p = pl, Ç+c = r, i Z 1,
where p = [r/1], the integer part of ill.
(c) The elementary divisors of AI A are —
kp_1
Hint. For part (c) use part (b) and Exercise 6.4.2 to obtain k 1 = . • •
= 0, kp = I q, 4 +1 = q (in the notation of Exercise 6.4.2). Also,
—
_
observe that in view of Exercise 4, the number k," gives the number of ele-
mentary divisors (A — A 0)" of A.
with A. D
7.8 The Second Normal Form and the Jordan Normal Form
Let A e C" X" and let l l(A), l2(A), ... , lp(A) denote the elementary divisors
f the matrix Al A. —
Theorem 1 (The second natural normal form). With the notation of the
Previous paragraph, the matrix A is similar to the block diagonal matrix
-
Exercise 1. Show that the first and second natural normal forms of the
matrix
3 —1 —5 1
1 1 —1 0
0 0-2 —1
0 0 1 0
are, respectively, the mat rices
0 1 00 0 1 0 0
0 0 1 0 —1 —2 0 0
and D
0 0 0 1 0 0 0 1 •
—4 —4 3 2 0 0 —4 4
Note that in contrast to the first natural normal form, the second normal
form is defined only up to the order of companion matrices associated with
the elementary divisors. For inst ance , the matrix
0 1 0 0
—4 4 0 0
0 0 0 1
0 0 —1 —2
is also a second natural normal form of the matrix in Exercise 1.
In the statement of Theorem 1, each elementary divisor has the form
ii(A) = (A — ,t,r. Observe also that there may be repetitions in the eigen-
values A l , 2Z , ... , 4, and that Ef=, a; = n.
Now we apply Theorem 1 to meet one of the declared objectives of this
chapter, namely, the reduction of a matrix to the Jordan normal form.
Theorem 2 (Jordan normal form). If A E C" x and .11 — A has t elementary
"
Conversely, if all the elementary divisors of AI A are linear, then all the
—
Ca) = B(A),
known as the reduced adjoint of A, and observe that the elements of C(A) are
relatively prime. In view of Eq. (7.2.1) we have c(A)I = S(AXAI — A)C(A).
This equation implies that c(A) is divisible by S(A) and, since both of them are
monic, there exists a monic polynomial m(A) such that c(A) = 8(A)111(2). Thus
m(A)1 = (AI — A)C(A). (2)
Our aim is to show that m(A) = m(A), that is, that m(A) is the minimal
polynomial of A. Indeed, the relation (2) shows that in(A)I is divisible by
AI — A and therefore, by the corollary to Theorem 7.2.3, ►n(A) = O. Thus,
m(A) is an annihilating polynomial of A and it remains to check that it is an
annihilating polynomial for A of the least degree. To see this, suppose that
m(A) = 0(A)m(2). Since m(A) = 0, the same corollary yields
m(.l)I = (Al — A)C(A)
for some matrix polynomial C(A), and hence
»1(2)I = (AI — A)0(A)C(A).
Comparing this with Eq. (2) and recalling the uniqueness of the left quotient , .
we deduce that C(A) = B(A)Z(A). But this would mean (if 0(A) 1) that 0(A)
is a common divisor of the elements of C(A), which contradicts the definition
of (5(A). Thus 0(A) = 1 and m(A) = m(A). •
This result implies another useful description of the minimal polynomial.
Theorem 2. The minimal polynomial of a matrix A e F"'" coincides with
the invariant polynomial of AI — A of highest degree.
PROOF. Note that AI — A has rank n and that by Proposition 7.3.1 and
Theorem 7.5.1,
det(AI — A) = ai 1(A)i2(A) . . . i"(A), (3)
in which none of i 1(A), ... , i"(A) is identically zero.
Also, a = 1 since the invariant polynomials, as well as det(AI — A), are
monic. Recalling Eq. (7.5.3), we deduce from Eq. (3) that
c(A) = det(Al — A) = d"_ 1 (A)i"(A), (4)
7.9 TIE CenuscrnaIstic AND MINIMAL POLYNOMIALS 273
in which d"_ 1(2) is the (monic) greatest common divisor of the minors of
21 — A of order n — 1 or, equivalently, the greatest common divisor of the
elements of the adjoint matrix B(2) of A(2). Thus, d"_ 1(2) = s(2), where 5(2)
is defined in the proof of Theorem 1. This yields i"(2) = m(2) on comparison
of Eqs. (4) and (1). •
Now we can give an independent proof of the result obtained in Proposition
6.6.2.
Corollary 1. A matrix A e V" is simple if and only if its minimal polynomial
has only simple zeros.
PROOF. By Corollary 7.8.1, A is simple if and only if all the elementary
divisors of 21— A are linear, that is, if and only if the invariant polynomial
11;= 1 (2 — 1g), where 21, 22 ... , 1. are the distinct eigenvalues of A.
,
zeros, then .11 — A has at least two elementary divisors. Theorem 7.8.1 then
states that again A is decomposable.
Conversely, if A is nonderogatory and c(A) = (A — A a)", then A must be
indecomposable. Indeed, these conditions imply, repeating the previous
argument, that AI — A has only one elementary divisor. Since a matrix B is
decomposable only if the matrix Al — B has at least two elementary divisors,
the result follows. U
Corollary 1. The matrices C1k (k = 1, 2, ... , s) in Eq. (7.8.1) are indecom-
posable.
The proof relies on the fact that a companion matrix is nonderogatory
(Exercise 6.6.3) and on the form of its characteristic polynomial (see Exercise
4.11.3).
Corollary 2. Jordan blocks are indecomposable matrices.
To prove this see Exercises 6.6.2 and 7.7.4. Thus, these forms cannot be
reduced further and are the best in this sense.
Exercise 1. The proof of Theorem 1 above [see especially Eq. (2)] shows
that the reduced adjoint matrix C(A) satisfies
(AI — A)C(A) = m(A)1.
Use this relation to show that if Ao is an eigenvalue of A, then all nonzero
columns of C(20) are eigenvectors of A corresponding to A. Obtain an
❑ anlogusretfhadjinmrxB(A,).
for certain matrices Lo , L„ ... , L i a C" x ", where L I # 0 and the indices on
the vector x(t) denote componentwise derivatives.
Let W(C") be the set of all continuous vector-valued functions y(t) defined
for all real t with values in C". Thus y(t) _ [y I(t) y"(t)J T and
y I(t), ... , y"(t) are continuous for all real t. With the natural definitions of
vector addition and scalar multiplication, WC") is obviously a (infinite-
dimensional) linear space.
Here and in subsequent chapters we will investigate the nature of the
solution space of Eq. (1), and our first observation is that every solution is a
member of (B(C"). Then it is easily verified that the solution set is, in fact, a
linear subspace of 'B(C"). In this section, we examine the dimension of this
subspace.
The matrix polynomial L(1) = . o 2-14 is associated with Eq. (1), which
can be abbreviated to the form L(d/dt)x(t) = O. Thus,
t j
> Lit
L(dt)x(t)J=o
To illustrate, if
L(2) = [A d: 1 ^12 ]
and x(t) — x2{t
) E
^(C2),
L 02x+2(xt)v)(01.
[xv)(xtr(t)x+
d x(t) _
(dc)
L(1) = Li(A)L2(Z)•
Show that
I f dt l (L2 (d )x(t)l. ❑
L(dt)x(t) L
With these conventions and the Smith normal form, we can now establish
the dimension of the solution space of Eq. (1) as a subspace of (C"). This
result is sometimes known as Chrystal's theorem. The proof is based on the
well-known fact that a scalar differential equation of the form (1) with order
I has a solution space of dimension I.
276 7 MATRIX POLYNOMIALS AND NORMAL FORMS
for some unimodular matrix polynomials P(a) and Q(2). Applying the result
of Exercise 1, we may rewrite Eq. (1) in the form
= 0. (3)
P(dt)D(d)Q(dt)x(t)
Let y(t) = Q(d/dt)x(t) and multiply Eq. (3) on the left by P - '(d/dt) (recall
that P - '(A) is a polynomial) to obtain the system
/d d d1
(4)
diag i11 dt)' iz (dt)' . . . , in(1)1Y(t) = 0,
which is equivalent to Eq. (1). Clearly, the system in (4) splits into n indepen-
dent scalar equations
fying Eq. (4) is equal to the sum of the numbers of linearly independent
solutions of the system (5). Thus the dimension of the solution space of Eq. (3)
is d 1 + d 2 + • • • + d„, where dk = deg ik(A), 1 S k 5 n. On the other hand
Eq. (2) shows that
„
det L(A) = a det D(A) = a ^ ik(A),
k1
x(t) =
Q _ 1 (lt
now gives the required result. •
t Trans. Roy. Soc. Edin. 38 (1895), 163.
7.10 DIFFERENTIAL AND DIFFERENCE EQUATIONS 277
Corollary 1. If A e C" X" then the solution space of x(t) = Ax(t) has
dimension n.
Theorem 2. If det 41) # 0, then the dimension of the solution space of Eq.
(6) is equal to the degree of det L(A).
PROOF. Using the decomposition (2), reduce Eq. (7) to the equivalent form
D(E)y = 0, (8)
where y = Q(E)x and 0, x, y e SP(C"). Observe that Eq. (8) splits into n
independent scalar equations. Since the assertion of the theorem holds true
for scalar difference equations, it remains to apply the argument used in the
proof of Theorem 1. •
Corollary 2. If A e C""", then the solution space of the recurrence relation
x;+1= Ax f , 1= 0,1,2,...,
has dimension n.
0
0 0 a„,(A) 0 0
if m5 n,or
a l(A) 0 0
0
0
a"(A)
0
0 0
if m > n, where a l(A) is divisible by ,(A), j 2.
4. Show that if A e Qï""" is idempotent and A # I, A s O, then the minimal
polynomial of A is A 2 — A and the characteristic polynomial has the form
(A — 1)'A'. Prove that A is similar to the matrix diag[I„ O].
S. Prove that
(a) The geometric multiplicity of an eigenvalue is equal to the number
of elementary divisors associated with it;
(b) The algebraic multiplicity of an eigenvalue is equal to the sum of
degrees of all elementary divisors associated with it.
6. Define the multiplicity of a latent root Ao of A(A) as the number of times
the factor A — Ao appears in the factorization of det A(A) into linear
factors.
Let A(A) be a monic n x n matrix polynomial of degree I. Show that
A(t) has In latent roots (counted according to their multiplicities), and
280 7 MATRIX POLYNOMIALS AND NORMAL FoRM1
that if the in latent roots are distinct, then all the elementary divisors oe-
A(A) are linear.
7. If Ao is a latent root of multiplicity a of the n x n matrix polynomial A(A),
prove that the elementary divisors of A(A) associated with Ao are all
linear if and only if dim(Ker A(A0)) = a.
8. If m(A) is the minimal polynomial of A e C"" and f(A) is a polynomial
over C, prove that f (A) is nonsingular if and only if m(A) and f(A) are
relatively prime.
9. Let A(A) be a matrix polynomial and define a vector x # 0 to be a latent
vector of A(A) associated with A0 if A(A0)x = O.
If A(A) = A o + AA, + A 2 A 2 (det A2 # 0) is an n x n matrix poly-
nomial, prove that r is a latent vector of A(A) associated with A 0 if and
only if
0.
\2 LA1Ad
A + LAo o ZJ/ L2oXO J
Generalise this result for matrix polynomials with invertible leading
coefficients of general degree.
10. Let A(A) be an n x n matrix polynomial with latent roots A 1, A2 , ... , A„
and suppose that there exist linearly independent latent vectors
X1 i x2 , • • • , x„ of A(A) associated with A l, A2 , ... , A„, respectively. Prove
that if X = [x 1 x2 x„] and D = diag[A 1, A2 , ... , A„7, then
S = XDX" is a right solvent of A(A).
11. Let the matrix A e C" X” have distinct eigenvalues A 1, A2 , ... , As and
supoethamxildgreofthmnaydivsrocte
with Ak (1 5 k S s) is mk (the index of Ak). Prove that
C" _ E • Ker(Ak1 — Ar".
k=1
12. Prove that if X e C"X" is a solvent of the n x n matrix polynomial A(A),
then each eigenvalue of X is a latent root of A(A).
Hint. Use the Corollary to Theorem 7.2.3.
13. Consider a matrix polynomial A(A) = n=o VA with det Ai # 0, and
define its companion matrix
0 I" 0 0
0 0 1"
CA 0 ,
0 0 0 1„
—Â 0 —Â 1 ... —Ai- 1
7.1 t Mrscuccnrreous Exsncsses 281
282
FIELD OF VALUES OF A HERMITIAN MATRIX 283
For normal matrices, the geometry of the field of values is now easily
characterized. The notion of the convex hull appearing in the next theorem
is defined and discussed in Appendix 2.
Theorem 2. The field of values of a normal matrix A e Cm" coincides with
the convex hull of its eigenvalues.
PROOF. If A is normal and its eigenvalues are il l, A2 , ... , 2", then by
Theorem 5.2.1 there is a unitary n x n matrix U such that A = UDU*, where
D = diag[A1, A2 , ... , A„1. Theorem 1 now asserts that F(A) = F(D) and so it
suffices to show that the field of values of the diagonal matrix D coincides
with the convex hull of the entries on its main diagonal.
Indeed, F(D) is the set of numbers
« = (Dx, x) = E
I=1
^ I x112 ,
x such that (x, x) = 1. Hence, C = 0 (see Exercise 3.13.2) and A = A*, that
is, A is Hermitian. ■
Corollary 1. WI is Hermitian with the eigenvalues 2 1 5 A2 5 • • • 5 2„, then
F(H) = [A1, 2„]. (1)
Conversely, if F(A) = [A,, A„], then A is Hermitian and 2„ A,, are the minimal
and maximal eigenvalues of A, respectively.
This result is an immediate consequence of Theorems 2 and 3.
Recalling the definition of F(H), we deduce from Eq. (1) the existence of
vectors x, and x„ on the unit sphere such that (Hx,, x,) = d, and (Hx,,, x„) =
A„. Obviously,
A, _ (Hx,, x l) = min (Hx, x), (2)
(x, x)= 1
and
A„ = (Hx„, x„) = max (Hx, x). (3)
(x. x)= L
and it will now be shown that the other eigenvalues of H can be characterized
in a way that generalizes statement (1).
Theorem 1. Let R(x) be the Rayleigh quotient defined by a Hermitian
matrix H, and let the subspaces r8 1 , ... , g' (associated with H) be as defined
„
j l2) . (5)
R(x) = (1 • 1JIaj12)/(E
1=i J=1 la
Since d, 5 .11 + 1 5 5 .l„, it is clear that R(x) Z .ti for all ai , ai+ 1 , ... , a„
(not all zero, and hence for all nonzero x e'B,) and thatR(x) = d, when a, # 0
and ai+ 1 = 0/1+2 = • • • = a„ = O. Thus relation (2) is established. Note that
the minimizing vector of V', is just a, x,, an eigenvector associated with .i,. •
The following dual result is proved similarly.
Exercise 1. In the notation of Theorem 1, prove that
A, = max
R(x), i = 1, 2, ... , n,
_
0 *ate Y„
The result of Exercise 1, like that of Theorem 1, suffers from the disad•
vantage that A, (j > 1) is characterized with the aid of the eigenvectors
, x Thus in employing Theorem 1 we must compute the eigen-
values in nondecreasing order together with orthonormal bases for their
eigenspaces. The next theorem is a remarkable result allowing us to charac-
terize each eigenvalue with no reference to other eigenvalues or eigenvectors.
This is the famous Courant-Fischer theorem, which originated with E.
Fischert in 1905 but was rediscovered by R. Courant* in 1920. It was Courant
who saw, and made clear, the great importance of the result for problems in
mathematical physics. We start with a preliminary result.
Proposition 1. Let Soj (1 S j S n) denote an arbitrary (n — j + 1)-dimen-
sional subspace of C", and let H be a Hermitian matrix with eigenvalues 21 5 .
Thus obviously
1 1
Al E Iak I2 5 (Hxo, xo) 5 2j E Iak12
k= 1 k=1
In other words, Eq. (8) says that by taking the minimum of R(x) over all
nonzero vectors from an (n — j + 1)-dimensional subspace of C", and then
the maximum of these minima over all subspaces of this dimension in C", we
obtain the jth eigenvalue of H.
PROOF . To establish Eq. (8) it suffices, in view of the first statement of Eq.
(6), to show the existence of an (n — j + 1)-dimensional subspace such that
min R(x), where x varies over all nonzero vectors of this subspace, is equal
to Ai . Taking (gg = span{xj, xj+1, ..., x"}, we obtain such a subspace. In
fact, by (6),
min R(x) S A j,
0*xe if/
while by Theorem 1,
min R(x) = AJ.
0*xE Vi
This means that the equality (8) holds. Equation (9) is established similarly.
•
Exercise 2. Check that for H e R3 ' 3 , H > 0, and j = 2, the geometrical
interpretation of Eq. (8) (as developed in Exercises 5.10.5 and 5.10.6) is the
following: the length of the major axis of any central cross section of an
llipsoid (which is an ellipse) is not less than the length of the second axis
of the ellipsoid, and there is a cross section with the major axis equal to the
plecond axis of the ellipsoid. ❑
those nonzero members of C" that satisfy an equation (or constraint) of the
form a*x = 0 where a* _ [a l a2 a"], or
a1x1 + a2 x2 +•••+ a"x" = 0,
where not all the aj are zero. Clearly, if aj # 0 we can use the constraint to
express xj as a linear combination of the other x's, substitute it into R(x),
and obtain a new quotient in n — 1 variables whose stationary values
(and n — 1 associated eigenvalues) can be examined directly without
reference to constraints. As it happens, this obvious line of attack is not the
most profitable and if we already know the properties of the unconstrained
system, we can say quite a lot about the constrained one. We shall call
eigenvalues and eigenvectors of a problem with constraints constrained
eigenvalues and constrained eigenvectors.
More generally, consider the eigenvalue problem subject to r constraints:
(x, ar) = a; x = 0, i = 1, 2, ... , r, (1)
wherea 1, a 2 , ... , a, are linearly independent vectors in C". Given a Hermitian
matrix H e C" x ", we wish to find the constrained eigenvalues and constrained
eigenvectors of H, that is, the scalars 2 e C and vectors x e C" such that
Hx = Ax, x # 0, and (x, a) = at x = 0, i = 1, 2, ... , r. (2)
Let d, be the subspace generated by a 1 , a2 , ... , a,. appearing in Eq. (1)
and let ^B _r denote the orthogonal complement to d, in C":
"
J1r®if"-r=C".
If b 1, b2 i ... , b„_, is an orthonormal basis in if” _, and
B = [b1 62 b" 3 -
then, obviously, B e C"x 0"' 0, B*B = I"_„ and BB* is an orthogonal pro-
jector onto cg"_, (see Section 5.8).
The vector x E C" satisfies the constraints of Eq. (1) if and only if x e
That is, BB*x = x and the vector y = B*x e C" - ' satisfies x = By. So, for
x# 0,
(Hx, x) (HBy, By) (B *HBy, y)
Rx(x) _
(x, x) (By, By) (B*By, y) •
Since B*B = I"_„ we obtain
R(x) = R R(x) = Ra•rre(y) (3)
and the problem in Eq. (2) is reduced to the standard form of finding the
stationary values of the Rayleigh quotient for the (n — r) x (n — r) Herm-
itian matrix B*HB (Section 8.3). We have seen that these occur only at the
eigenvalues of B*HB. Thus A is a constrained eigenvalue of H with associated
292 8 THE VARIATIONAL METHOD
although Rayleigh himself attributes it to Routh and it had been used evetl
earlier by Cauchy. It is easily proved with the aid of the Courant-Fischet
theorem.
Theorem 1. If A i 5 .12 5 • • • 5 A„_, are the eigenvalues of the eigenvalue ;
problem (2) for a Hermitian H e C" x" with r constraints, then M<
Pi5 2t5Pi+., i= 1, 2,...,n r, —
PROOF. First recall the definition of the space X i in Proposition 1 and ob-
serve that dim X i = n — r — i + 1. It follows from Theorem 8.2.2 that
= max min R(x) Z min R(x).
Y,+, 0 *xe•,*, 0*xeX,
By virtue of Proposition 1,
Ps+r Z min R(x) _ A J ,
0* 'roc,
and the first inequality is established.
For the other inequality we consider the eigenvalues v 1 5 v2 .5 • • • 5 v„
of the matrix — H. Obviously, vi = —µ„+ 1 _ i• Similarly, let the eigenvalues
of —H under the given r constraints in (1) be CO l 5 u12 5 • • • 5 m„_ r , where
= — - r +l-J
Wi for i = 1, 2, ..., n — r.
Applying the result just proved for H to the matrix —H, we obtain
V i+r
Note also that the eigenvalue problem for H with linear constraints (1)
can be interpreted as an unconstrained problem for a leading principal
Sübmatrix of a matrix congruent to H.
More precisely, let H e C"" be a Hermitian matrix generating the Rayleigh
quotient R(x) and let B denote the matrix introduced at the beginning of the
section, that is, B = [b 1 b 2 b„_ r], where {b}^=^ forms an ortho-
normal basis in if„_„ the orthogonal complement of .Jalr in C. Consider the
matrix C = [B A], where the columns of A form an orthonormal basis in
fir, and note that C is unitary and, therefore, the matrix C*HC has the
flame eigenvalues as H. Since
* rB*HB B *HAl
C HC = [BIN
H [B A] LA
*HB A*HA J '
294 S THE VARIATIONAL METHOD
5 At 5 µl+r•
µi (1)
PROOF. If H„_r is not the (n — r) x (n — r) leading principal submatrix 1f
H we apply a permutation matrix P so that it is in this position in the trans
formed matrix PTHP. Since P -1 = PT, the matrix P T HP has the same
spectrum as H, so without loss of generality we may assume that H„_ r Is -
the (n — r) x (n — r) leading principal submatrix of H.
In view of the concluding remarks of the previous section, the eigenvalua
of H„_ r are those of an eigenvalue problem for CHC* with r constraints;
where C is some unitary matrix. This also follows from the easily checked
relation B*(CHC*) B = H„_ r . Appealing to Theorem 8.4.1, we obtain
Îi15At5 Ïp1+r, i=1,2,...,n—r, fl
where µ l , µ2 , ... , %4„ are the eigenvalues of CHC*. It remains now to obser
that, because C is unitary, )i = µ for j = 1, 2, ... , n and this completes td!,
^
proof. •
Exercise 1. All principal minors of a positive definite matrix are positive
8.5 THE RAYLEIGH THEOREM AND DEFINITE MATRICES 295
are positive.
By Theorem 8.5.1, Ask) 5 4:1) s 4,1:)., 1 and two cases can occur:
Ant +1 < 0 or A V > 0.
The first inequality is impossible in the case dk > 0 and dk+ i > 0 presently
being considered. In fact, if
A(ik+1) 5AP5•••5A11) s, +1 ) <0<A;„+15•••5 ,{kk+11),
we have
sign dk+1 = sign(At+1W+1)... Xk+i)) _ ( - 1r+1,
sign dk = sign(04 . . . Ask)) = ( - 1)'",
a contradiction with the assumption sign dk + 1 = sign dk = 1. Thus, A„k+ i ) > 0
and Hk+ 1 has one more positive eigenvalue than Hk.
Other cases are proved similarly to complete an inductive proof of the
theorem. •
Exercise 1. Consider the matrix
2+ +
H= i 0 0
0 —1
associated with the real Hermitian form discussed in Exercise 5.11.2(a).
Since
d1 =2, d2 =det[1 U]=-4, d3 =det H= #,
2 J
the number of alterations of sign in the sequence 1, 2, —I, I or, what is equiv-
alent, in the sequence of signs +, +, —, + is equal to 2, while the number of
constancies of sign is 1. Thus, n(H) = 1, v(H) = 2, and the first canonical
form of the corresponding Hermitian form contains one positive and two
negative squares. ❑
It is clear that this method of finding the inertia (if it works) is simpler
than the Lagrange method described in Section 5.11. However, the latter
also supplies the transforming matrix.
298 8 THE VARIATIONAL METHOD
(between dk and dk+1) according as Hk+1 has one more negative or positive
eigenvalue than Hk .
Suppose that dk+ 1 = det Hk + 1 = 0. Then at least one eigenvalue 4" )
4 # 0 and dk+2 # 0 there is exactly (1Sik+)mustbezro.Sinc
one zero eigenvalue of Hk+ 1. Indeed, if 2;k+ 1) = r +1) = 0 (i < s), then by
applying Theorem 8.5.1 there are zero eigenvalues of Hk and Hk+2, which
yields d k = dk+ 2 = O.
Let 4+11 = 0 (1 S m S k - 1). Then it follows from
4,,k) < 4:1) = 0 < 2,t,ek+1
and
tam+ 1 1 < 4: 1) = 0 < 2 k (2)
that the number of negative eigenvalues of Hk is m and the number of negative
eigenvalues of Hk + 2 is m + 1. Hence sign 4 = (- 1)"', sign 4+2 = ( -1 )PN+1,
and therefore dkdk+ 2 < 0.
If ,11k+ 1 ) = 0, then all eigenvalues of Hk are positive and so sign dk = 1.
Also, we may put m = 0 in (2) and find that sign dk+ 2 = — 1. Thus dkdk+ 2 < 0
once more. A similar argument gives the same inequality when #11,k ++,' = 0.
It is now clear that, by assigning any sign to 4 + = 0 or omitting it, we
obtain one alteration of sign from dk to dk+2 . This proves the theorem. ■
Example 2. Consider the matrix
0 1 0
H= 1 0 1
011
associated with the quadratic form in Exercise 5.11.1. Compute the number
of alterations and constancies of sign in the sequence
1, d1 = 0, d2 = —1, d3 = —1,
or equivalently, in the sign sequences
+, +, —, — or +,
Using the Gundelfinger rule, we deduce v(H) = 1, a(II) = 2. ❑
Theorem 3 (G. Frobeniust). If, in the notation of Theorem 1, det H, # 0
and the sequence (1) contains no three successive zeros, then Jacobi's method
can be applied by assigning the same sign to the zeros di+I = di+2 = 0 if
didi+ 3 < 0, and different signs if d ; d i+ 3 > 0 .
PROOF. We only sketch the proof, because it relies on the same argument
as that of the previous theorem.
Let dk+1 = dk+2 = 0, dk # 0, dk+3 # 0. As before, there is exactly one
zero eigenvalue of Hk+1, say, 4+1). Similarly, it follows from ,14k + 3) # 0
(i = 1, 2, ... , k + 3) and Theorem 8.5.1 that there exists exactly one zero,
,,+12) = 0 (0 S s 5 k + 1), of Hk+2. Using Theorem 8.5.1 to compare the
eigenvalues of Hk+ 1 and Hk+ 2 , it is easily seen that either s = m or s = m + 1.
The distribution of eigenvalues in the case s = m looks as follows:
Hence the matrices Hk, Hk+1, Hk+ 2, and Hk+ 3 have, respectively, m, m, m,
and m + 1 negative eigenvalues.
Thus, for s = m we have dkdk+3 < 0 and, in this case, Hk+ 3 has one more
negative eigenvalue than Hk. Assigning to the zeros dk +1, dk+ 2 the same sign,
we obtain only one alteration of sign in the sequence 4, dk+ 1, dk+2, 4+3,
which proves the theorem in this case.
It is similarly found that if s = m + 1, then dkdk+ 3 > 0, and the assertion
of the theorem is valid. •
It should be emphasized that, as illustrated in the next exercise, a further
improvement of Jacobi's theorem is impossible. In other words, if the
sequence (1) contains three or more successive zeros, then the signs of the
leading principal minors do not uniquely determine the inertia character-
istics of the matrix.
f Siizungsber. der preuss. Akad. Wiss. (1894), 241-256 Mira, 407-431 Mai.
1.
300 8 Tm VARIATIONAL METHOD .'.:
Hl H2 =
= [1 2j, [1 i]'
while Hi — H2 fails to be a positive semi-definite matrix. ❑
Applying the Courant—Fischer theory, a relation between the eigenvalues
of matrices H1 and H2 satisfying the condition H, z H2 can be derived.
This can also be viewed as a perturbation result: What can be said about the
perturbations of eigenvalues of H2 under addition of the positive semi-
definite matrix H 1 — H2 ?
Theorem 1. Let H 1, H2 e.5""" be Hermitian matrices for which H 1 Z H2.
If p S µ2' S • • • 5 µfa denote the eigenvalues of Hk (k = 1, 2) and the rank
of the matrix H = H 1 — H2 is r, then
(a) 1 z µ42) for i = 1, 2, ... , n;
(b) µ(1)5 Alt for j= 1,2,...,n r. —
(Hx, x) = E v/ lain.
1=1
Write y y = of 2x1 for/ = 1, 2, ... , r and use the fact that a l = (x, x j) (Exercise
3.12.3) to obtain
E v;1a;1 2 = E I(x, y l )1 2,
1=1 1= 1
302 8 THE VARIATIONAL METHOD.ti
and note that y 1 , ... , yr are obviously linearly independent. Since H 1 -'
H2 + H, we now have
M IX, x)
(Hlx, _ (H2 x, x) + E I(x,
t®1
yt)12, (2)
and we consider the eigenvalue problems for H l and 112 both with constraints
(x, yt) = 0, i = 1, 2, ... , r. Obviously, in view of Eq. (2), the two problems};
coincide and have the same eigenvalues Al S 12 5 • • • S A„_ r . Applying
Theorem 8.4.1 to both problems, we obtain
Pr5Aj541r , k= 1,2; j= 1,2,..., n —r,
which implies the desired result:
it?) 5 A! 5 /424 •
_r'
We return now to the problem discussed in Section 5.12. In that secti
we reduced a physical vibration problem to the problem of solving the matri `
x1
Notice also that if is an eigenvector of B, then gi = A _ 112x1 is a normal-
mode vector for the vibration problem and, as in Eqs. (5.12.5),
qi Aqk = a, qi Cqk = coP ik .
On applying each theorem of this chapter to B, it is easily seen that a general-
ization is obtained that is applicable to the zeros of det(AA — C), As an
example, we merely state the generalized form of the Courant-Fischer
theorem of Section 8.2.
Theorem 1. Given any linearly independent vectors a l , a2 ,...,ai_ 1 in
R", let Y I be the subspace consisting of all vectors in 01" that are orthogonal to
aI , ... , at_ 1 . Then
xTCx
AI = max min
y , = Ey
#o
, rxT Ax
The minimum is attained when a., = qj , j = 1, 2, ... , i — 1.
The physical interpretations of Theorems 8.4.1 and 8.7.1 are more obvious
in this setting. The addition of a constraint may correspond to holding one
point of the system so that there is no vibration at this point. The perturbation
considered in Theorem 8.7.1 corresponds to an increase in the stiffness of
the system. The theorem tells us that this can only result in an increase in
J e natural frequencies.
CHAPTER 9
Functions of Matrices
P(A) g E AA'
1=1
if P(A) = E AA'.
l= o
Moreover, if m(A) is the minimal polynomial for A, then for a polynomial
p(A) there are polynomials q(A) and r(A) such that
P(.) = m(A)q(A) + r(A),
where r(d) is the zero polynomial or has degree less than that of m(A). Thus,.
since m(A) = O, we have p(A) = r(A).
The more general functions f (A) that we shall consider will retain this
property; that is to say, given f (A) and A, there is a polynomial r(A)
degree less than that of the minimal polynomial of A) such that f (A) = r(A).
It is surprising but true that this constraint still leaves considerable freedom'
in the nature of the functions f(A) that can be accommodated.
304
9:I FUNCTIONS DEFINED ON THE SPECTRUM OF A MATRIX 305
Let A e C""" and suppose that A i , 22 , ... , A, are the distinct eigenvalues
of A, so that
m(A) = (A — AjNA — A2 )"" ... (A —
( 1)
is the minimal polynomial of A with degree m = m 1 1- m2 + • • • + m,.
Recall that the multiplicity ink of Ak as a zero of the minimal polynomial is
referred to as the index of the eigenvalue Ak (Section 6.2) and is equal to the
maximal degree of the elementary divisors associated with Ak (1 5 k 5 s).
Given a function f(A), we say that it is defined on the spectrum of A if the
numbers
(",k -1 }( k = 1, 2, . . . , s,
f(Ak), f '(Ak), . . . , f Ak), (2)
called the values off (A) on the spectrum of A, exist. Clearly, every polynomial
over C is defined on the spectrum of any matrix from C"". More can be said
about minimal polynomials.
Exercise 1. Check that the values of the minimal polynomial of A on the
spectrum of A are all zeros. ❑
The next result provides a criterion for two scalar polynomials p l (A) and
to produce the same matrix pi(A) = p2(A). These matrices are defined
112 (2)
as described in Section 1.7.
Proposition 1. If p 1 (A) and p z (A) are polynomials over C and A e C"",
then p 1(A) = p 2(A) if and only if the polynomials MA) and p 2(A) have the
same values on the spectrum of A.
PROOF. If P1(A) = P2(A) and Po(A) = Pi(A) — P2(21.), then obviously, Po(A)
= 0 and hence p o(A) is an annihilating polynomial for A. Thus (Theorem
6.2.1) po(A) is divisible by the minimal polynomial m(A) of A given by (1) and
there exists a polynomial q(A) such that po(A) = q(A)m(A). Computing the
values of po(A) on the spectrum of A and using Exercise 1, it is easily seen that
R'(Ak) =
for k = 1, 2, ... , s.
mk , and a set of numbers
fk. 0 f
1,
If • •
Y 1) (Ak) =
k = 1, 2, ... , s,
fk. mk-
,_.
►►tl,^
W k(A) = I 1 (A —
J= 1 .J*k
has degree less than m and satisfies the conditions
u -1)(A1) = 0
Pk(Ai) = P,1)(Aa = ... = pk"
for i # k and arbitrary at Of
ak.1f • • • ak, mk _ L . Hence the polynomial
P^J (Ak) = i
+± (»ok»w-02k)
for 1 5 k 5 s, 0 S j 5 mk 1. Using Eqs. (3) and recalling the definition of
—
*k j =1.1*k
fork=1,2,...,s. O
Using the concepts and results of the two preceding sections, a general
definition of the matrix f (A) in terms of matrix A and the function f can be
made, provided f (2) is defined on the spectrum of A. If p(2) is any polynomial
that assumes the same values as f (2) on the spectrum of A, we simply define
f (A) °— p(A). Proposition 9.2.1 assures us that such a polynomial exists,
while Proposition 9.1.1 shows that, for the purpose of the formal definition,
the choice of p(2) is not important. Moreover, Eq. (9.2.8) indicates how
p(2) might be chosen with the least possible degree.
Exercise 1. If A is a simple matrix with distinct eigenvalues A l , ... , 2, and
f (2) is any function that is well defined at the eigenvalues of A, show that
1(A) = E f(2k) [
k=1
ffl (A — 2 J11)
J=1,J*k
/ jl (2 k —
J°I.1*k
[
3 2.
9.3 DEFINITION OF A FUNCTION OF A MATRIX 309
p(A)= f(AI) A, A Z —
(A2) A—A
,
3e 10 e6 — e 10
= e6
1[3e" — 3e6 —e10 + 3e6 ]
Exercise 3. If
2 0 0
A = 71 0 1 1 ,
2
0 0 1
prove that
0 0 0
4
sin A= 4n A —
n
A2 = 0 1 0 .
0 0 1
Exercise 4. If fa) and g(A) are defined on the spectrum of A and h(A)
= of (A) + kg(A), where a, $ e C, and k(A) = f (A)g(A), show that h(A) and
k(A) are defined on the spectrum of A, and
h(A) = of(A) + $g(A),
k(A) = f (A)g(A) = g(A)f (A).
Exercise S. If f,(A) = A' and det A # 0, show that f,(A) = A -1 . Show
also that if f2(A)
= (a — A) -1 and a it o(A), then f2(A) _
(al A) -1 — .
Hint: For the first part observe that if p(A) is an interpolatory poly-
nomial for f,(A), then the values of Ap(A) and Af,(A) = 1 coincide on the
spectrum of A and hence Ap(A) = Af,(A) = 1.
Exercise 6. Use the ideas of this section to determine a positive definite
square root for a positive definite matrix (see Theorem 5.4.1). Observe that
there are, in general, many square roots. ❑
We conclude this section with the impo rtant observation that if f (A) is a
polynomial, then the matrix f (A) determined by the definition of Section 1.7
coincides with that obtained from the more general definition introduced
in this section.
310 9 FUNCTIONS OF MATRICES
we have
f(A) = p(A) = diag[p(A1), p(A2), ..., p(A,)].
Second, since the spectrum of A; (1 5 j 5 t) is obviously a subset of the .
Theorem 3. Let A EC " N " and let J = diag[a, t be the Jordan canonical
form of A, where A = PJP -1 and Ji is the jth Jordan block of J. Then .
Ao 1
Ao
Jo =
^ f (Ao)
f(Jo) = (4)
1^ f YAo)
o o f(Ao)
PROOF. The minimal polynomial of J0 is (A — A 0)1 (see Exercise 7.7.4) and the
;Values off(A) on the spectrum of J o are therefore f (20), f'(A0), • • . , P`
The interpolatory polynomial p(A), defined by the values of f (A) on the
spectrum {AO}ofJ0 , is found byputting s = 1,mk = l,Al = A0 , and ji 1(A) = 1
Eqs. (9.2.2) through (9.2.4). One obtains
r-1 1
E i. f"°(Ao)(). —
p(A) = le0
The fact that the polynomial p(A) solves the interpolation problem pul(A0)
fUJ(A0), 1 s j S 1 1, can also be easily checked by a straightforward
—
calculation.
We then have j(4) = p(J 0) and hence
r- I
f (Jo) = E i f"(AoXJo — Ao 1)`•
312 9 FUNCTIONS OF MATRICER,
01 0 1 0 0-
0 0
(J0 — 10 1)I 1
1 0
0_ 0
and f (A) is defined on the spectrum of A, then the eigenvalues of f (A) ari
f f(22), ... , f(A)•
Exercise 2. Check that e4 is nonsingular for any A e
;A PROPERTIES OF FUNCTIONS OF MATRICES 313
Exercise 3. Show that, if all the eigenvalues of the matrix A € C" X " li e in
'#he open unit disk, then A + I is nonsingular,
;Exercise 4. If the eigenvalues of A E C"'" lie in the open unit disk, show
that Re A < 0 for any A e v(C), where C = (A + I) -1 (A — I). ❑
f(A) = E E fk,JZkJ.
k=1 J =0
(1)
Moreover, the matrices ZkJ are linearly independent members of C"x" and
commute with A and with each other.
PROOF. Recall that the function f(A) and the polynomial p(A) defined by
Eq. (9.2.8) take the same values on the spectrum of A. Hence f (A) = p(A)
and Eq. (1) is obtained on putting ZkJ = (pk ,(A). It also follows immediately
that A and all of the Zk J are mutually commutative.
It remains only to prove that these matri ce s are linearly independent.
Suppose that
a mk—
L E ckJ Zk1 =O
k=1 1=0
f(A) = E fkZko,
k=1
fk = fk.o, k = 1, 2, ..., s.
9.5 SPECTrtJU. RESOLUTION OF,JCA) 315
Zko = IpkO(A) =
J= 1,J* k
(A — 2,I)/ fi
J= 1.J* k
(2k - A).
(a) E Zko
k=1
= 1;
(h) Zk p Z,=Oik#l;
PROOF. The relation (a) was established in Exercise 1. However, this and the
other properties readily follow from the structure of the components Z kj,
which we discuss next.
First observe that if A = PJP - 1 , where J = diag[J 1, J2, ... , J,] is a
Jordan form for A, then, by Theorem 9.4.5,
,
Zkj = Vkj(A) = P diag[q,,(J1), ... ,'Pkj(JJ)]p -1 (2)
where (kV () (1 S i S t, 1 S k 5 s, 0 5 j 5 mk 1) is a matrix of the form
— .
where N,,, is the ni x ni Jordan block associated with the zero eigenvalue,
and it is assumed that for j = 0 the expression on the right in Eq. (3) is the
x n, identity matrix. Note also that of S mk (see Exercises 7.7.4 and n,
7.7.9).
Case 2. The Jordan block J, (of order ni) is associated with an eigenvalue
A, # Ak. Then again n, S m, and [as in Eq. (9.2.7)] we observe that (hp),
along with its first m, — 1 derivatives, is zero at A = A,. Using (9.4.4) again
we see that
(pkj(J,) = 0. (4)
To simplify the notation, consider the case k = 1. Substituting Eqs. (3)
and (4) into (2) it is easily seen that
1
Z11 = i Pdiag[Nj N,j,2 ,...,Nj,0,...,0]P -1
„
+
for j = 0, 1, ... , m 1 — 1, provided J 1, J2 , ... , J' are the only Jordan block
of J associated with Al . More specifically, Eq. (5) implies
Z10= P diag[I., OP - 1 , (
where r=n 1 + n2 +•••+ nq .
The nature of the results for general Ak should now be quite clear, and titi
required statements follow. III
•=a
function can be represented as in Eq. (1); only the coefficients differ from fk,i
one function to another.
When faced with the problem of finding the matrices Z,,, the obvious
technique is first to find the interpolatory polynomials epk/(A) an d then to put
Zk, _ cpki(A). However, the computation of the polynomials op) may be a
very laborious task, and the following exercise suggests how this can be
avoided.
Example 3 (see Exercises 6.2.4, 6.5.2, and 9.4.1). We want to find the com-
ponent matrices of the matrix
6 2 2
A= —2 2 0 .
0 0 2
We also want to find In A using Theorem 1. We have seen that the minimal
polynomial of A is (A — 2)(2 — 4) 2 and so, for any f defined on the spectrum
of A, Theorem 1 gives
I (A) = f(2)Z10 + I(4)Z20 + f 1 (4)Z21• (7)
We substitute for X) three judiciously chosen linearly independent
polynomials and solve the resulting simultaneous equations for Z10, Z20 ,
and Z2 1 . Thus, putting f (A) = 1, f (A) = A — 4, and f (A) = (A — 4)2 in
turn, we obtain
Z1o+Z2o = I,
2 2 2
—2Z10 + Z21= A-41= —2 —2 0 ,
0 0 —2
0 0 0
4Z10 =(A-41) 2 = 0 0 —4 .
0 0 4
ZkJ = ^ (A — AA )'Z ko • ( 9)
PROOF. Observe first that the result is trivial for j = 0. Without loss of
generality, we will again suppose k = 1 and use the associated notations
introduced above. Since A — .111 = P(J — ). 11)P -1, it follows that
J —I l l = P - '(A — A1 1)P = diag[Nn,, ..., Nne Jq+1 — All, ..., J, — .111].
(10)
where J 1, J2 , ... , Jq are the only Jordan blocks of J associated with 1. 1.
Multiplying Eq. (10) on the right by P - 'Z 1 JP and using Eq. (5), we obtain
j !P 1 (A -— 1,11)Z 1 JP = diag[NI,+, 1, ... , 0, ... , 0].
Using Eq. (5) again, we thus have
j!P - '(A — I,ll)Z LI P = (j + 1)!P -1 Z 1 J+1P
and i
Z1.J+1 "-= .l, l)Z 1J
for j = 0, 1, ... , ink — 1. Applying this result repeatedly, we obtain Eq. (9)
for k = 1. ■
Theorem 3 should now be complemented with some description of the
projectors Zko , k = 1, 2, ... , s (see also Section 9.6).
Theorem 4. With the previous notation,
im Zko = Ker(A — .ik I )mk•
PROOF. We will again consider only the case k = 1. If J is the Jord an form
of A used in the above discussion, then (see Eq. (10))
(J — 1.1I)m' = diag[0, ... , 0, (Jq+ 1 — 1.1 1)"", ... , (J, — A11/ '],
Since n 1, n2 , ... , rig 5 m 1 . The blocks (J, — Ai l)"" are easily seen to be
)i onsingular for r = q + 1, ... , t, and it follows that the kernel of (J —
spanned by the unit vectors el , e2 , ..., e„ where r = n 1 + n 2 + • • • + n q .is
`Let .50 denote this space.
We also have (A — .111)m' = P(J — 4I)141P -1 , and it follows immediately
that x e Ker(A — %1 I)m' if and only if P' x e .So
Using Eq. (6) we see that x e Im Z 10 if and only if
P - 'x a Im[I, 0]P -1 = So
`and hence Ker(A — 1.1 1)m' = Im Z10. •
320 9 FuNCnoris or MpTStcxs
It turns out that the result of the theorem cannot be extended to give a n .
.91 is equal to the number of Jordan blocks associated with A k having order
mk .
Exercise 8. If x, y are, respectively, right and left eigenvectors (see Section
4.10) associated with A1 , prove that Z10 x = x and yTZ10 = yT•
Exercise 9. If A is Hermitian, prove that Zko is an orthogonal projector
onto the (right) eigenspace of Ak .
Exercise 10. Consider the n x n matrix A with all diagonal elements
equal to a number a and all o ff-diagonal elements equal to 1. Show that the
eigenvalues of A are A l = n + a — 1 and Az = a — 1 with AZ repeated.
n — 1 times. Show also that
11 1 n— 1 —1 —1 —1
111 1 1 —1 n— 1 —1 —1
Z 10 =n s Zso =
11 1 —1 —1 n— 1
It was shown in Chapter 6 (see especially Theorem 6.2.3 and Section 6.31
that, given an arbitrary matrix A e C"'", the space C" is a direct sum of th1
generalized eigenspaces of A. Now, Theorem 9.5.4 states that each comp* a,
matrix Zko is a projector onto the generalized eigenspace associated wi
eigenvalue Ak , k = 1, 2, ... , s. Furthermore, the result of Theorem 9.5.2(al,
the resolution of the identity, shows that the complementary projectt
I — 40 can be expressed in the form
,
1 — Zko = E ZJo
O,Jfk
9.6 COMPONENT MATRICES AND INVARIANT SUBSPACES 321
Thus, both Im 40 and Ker Zko are A-invariant and the following result holds.
Theorem 1. The component matrix Z ko associated with an eigenvalue A, of
A E C " is a projector onto the generalized eigenspace of Ak and along the
sum of generalized eigenspaces associated with all eigenvalues of A di fferent
from Ak.
The discussion of Section 4.8 shows that if we choose bases in Im Zko and
Ker Zko, they will determine a basis for CA in which A has a representation
diag[A1 , A2], where the sizes of A l and A2 are just the dimensions of Im Zk0
and Ker Zko, respectively. In particular, if these bases are made up of Jordan
chains for A, as in Section 6.4, then A l and A 2 both have Jordan form.
For any A E CV" and any z a(A), the function f (A) _ (z — A)_ 1 is
defined on a(A); hence the resolvent of A, (zl — A) -1 , exists. Holding A
fixed, the resolvent can be considered to be a function of z on any domain in C
not containing points of a(A), and this point of view plays an impo rt ant role
in matrix analysis. Indeed, we shall develop it further in Section 9.9 and make
substantial use of it in Chapter 11, in our introduction to perturbation theory.
A representation for the resolvent was obtained in Exercise 93.2:
d Mk - 1
(ZI A) - I = , kJ
EE
—
kl — Ak! +1 Z
and, using Eq. (1) and Exercise 9.5.6, the resolvent has Im(I — Z, 0) as its
range. Now as z —► Al , elements of (zl — A)- ' will become unbounded.
But, since the limit on the right of the equation exists, we see that multiplica-
tion of the resolvent by I — Z,0 has the effect of eliminating the singular part
of (zl — A) - ' at z = A l. In other words, the restriction of the resolvent to
the subspace
Im(1 - Z lo) = E
J=2
Im ZJo
as no singularity at z = 61,1 .
322 9 FUNCTIONS OF MATRICES
A^ e a(A) with A # Ak . Show that fk(A) = Zko and use Exercise 9.3.4 to show
that Zko is idempotent.
Exercise 2. If A = Al is an eigenvalue of A and A1, ... , A, are the distinct
eigenvalues of A, show that, for any z a(A),
!Z
(zI — A) - ' _ E (z - Ay+ 1 + E(z),
,
where E(z) is a matrix that is analytic on a neighborhood of A and Z 11E(z)
= E(z)Z 11 = O. Write E = E(A) and use this representation to obtain
Theorem Z ❑
(Note that, in general, the cosine and sine functions must be considered as
defined on C.) ❑
We can also discuss the possibility of taking pth roots of a matrix with the
aid of Theorem 1. Suppose that p is a positive integer and let f1(A) = A11 P ,
it follows that h(A) is defined on the spectrum of A and assigns values given .
by the formulas in (1). Let r(p) denote any polynomial that has the same
values as g(µ) on the spectrum of B and consider the function r(f(A)). If
the values of h(A) and r(f (A)) coincide on the spectrum of A, then h(A)
r(f (1)) assigns zero values on the spectrum of A and, by Theorem 1,
h(A) = r(f(A)) = r(B) = g(B) = eU(A)),
and the result is established.
Now observe that in view of (1) the values of h(A) and r(f(1)) coincid
on the spectrum of A if
r``' (µk) Ble(E+k)
for i = 0,1, ..., mk — 1, k = 1,2, ..., s.
9.8 SEQUENCES AND SERIES OF MATRICES 325
Let us show that {g(p k), g'(pk), ... , g(mk-1) 04)} contains all the values of
g(p) on the spectrum of B. This fact is equivalent to saying that the index
nk of the eigenvalue pk of B does not exceed in or that the polynomial
P(14 = (µ — µ,) m' (!s — µ2r2 • • • (it — r,)I' k annihilates B (1 5 k S s). To
see this, observe that q(2) = WO.)) = f( 2) - f (2 1))rn' • • • (f o)
has zeros Ai, with multiplicity at least mk (k = 1, 2, ... , s) and therefore
q(A) assigns zero values on the spectrum of A. Thus, by Theorem 1 with
t = 1, p(B) = p(f (A)) = 0 and p(A) is an annihilating polynomial for B.
Now defining the polynomial r(p) as satisfying the conditions (2), we obtain
a polynomial with the same values as g(p) on the spectrum of B and the proof
is completed. Note that r(p) is not necessarily the interpolatory polynomial
for g(p) on the spectrum of B of least possible degree. •
Exercise 4. Prove that for any A e V",",
sin2A=2sin AcosA. ❑
C; In
(e) QA,Q -1 -• QAQ-1 ;
µ.. (d) diag[Ap , Bp] -+ diag[A, B].
,,Hint.These results follow from fundamental properties of sequences of
Complex numbers. For part (b), if a p -• a and fip -4 as p -+ co, write
gQ) p
ap fip -all= (Œp - a)Y + (Pp — a+(a, — a)( p - fi). ❑
326 9 F U NCTIONS OF MATRICES :
We now consider the case in which the members of a sequence of mat rice s
are defined as functions of A e C" x". Thus, let the sequence of functions,
LO, f2(2), ... be defined on the spectrum of A, and define A, °- fp(A)M
p = 1, 2, .... For each term of the sequence A 1 , A2 , ... , Theorem 9.5.1 yields
s mk - 1
✓ p(A) - E E f(2)Z, .
k=1 J =0
Now the component matri ce s Zki depend only on A and not on p, so that ,
and
a mR - 1
lim A, =
p-.ro
E
k=1
E f N(2k)ZkJ = f(A)• (3
J =0
fp;I(Ak) = E (4)
r,r= 1
where the yr, (i, r = 1, 2, ... , n) depend on j and k but not on p. Now if
lim Ap exists as p - ► co, then so does lim air and it follows from Eq. (4)
that lim f v(A) also exists as p -+ co for each j and k. This completes the
first part of the theorem.
If f p 1(tk) -► f -'(A k) as p - ► co for each j and k and for some f (t), then
Eq. (3) shows that A,, -+ f (A) as p -+ co. Conversely, the existence of lim Al,
as p - ► co provides the existence of lim f p (.1 k) as p -+ co and hence the
equality (2). Thus, we may choose for f (A) the interpolatory polynomial
whose values on the spectrum of A are the numbers lim p „ fpP(4),
= 1,Z ... , s, j = 0, 1, ... , m k — 1. The proof is complete. • k
We now wish to develop a similar theorem for series of matrices. First,
the series of matrices E;,°_,) A is said to converge to the matrix A (the
sum of the series) if the sequence of partial sums B,. , A converges
to A as y ► co. A series that does not converge is said to diverge.
-
E1 up(.10 = f u
=
(ilk)
^ u p(A) = f(A),
p=1
&nd vice versa.
We can now apply this theorem to obtain an important result that helps
to justify our development of the theory of functions of matrices. The next
'-theorem
t shows the generality of the definition made in Section 9.3, which
<anay have seemed rather arbitrary at first reading. Indeed, the properties
described in the next theorem are often used as a (more restrictive) definition
,̀
a function of a matrix.
328 9 FUNCTIONS OF MATRICES
^
f(A) = ^ a p(A — 2a l)" (6)
p=o
PROOF. First we recall that f (A) exists if and only if f o(2k) exists for
j = 0, 1, ... , ink — 1 and k = 1, 2, ..., s. Furthermore, the function f(2)
given by Eq. (5) has derivatives of all orders at any point inside the circle
of convergence and these derivatives can be obtained by differentiating
the series for 1(2) term by term. Hence f(2) is obviously defined on the
spectrum of A if 1 2k — 20 1 < r for k = 1, 2, ... , s. Also, if 12k — 20 1 = r for
some k and the series obtained by ink — 1 differentiations of the series in
(5) is convergent at the point 2 = 4, then f (4), f'(2, ,, ... , rim- 1%)
exist and, consequently, f (A) is defined. Note that, starting with the existence
of f (A), the above argument can be reversed.
If we now write up(2) = ap(2 — 20)" for p = 0, 1, 2, ... , then up(2) is certainly
defined on the spectrum of A and under the hypotheses (a) or (b), the series
Ep 1 uP)(2k), k = 1, 2, ... , s, j = 0, 1, ... , mk — 1, all converge. It now
follows from Theorem 2 that rp . up(A) converges and has sum f(A)
and vice versa. ■
Note that for functions of the form (5), a function f (A) of A can be defined
by formula (6). Clearly, this definition cannot be applied for functions having
only a finite number of derivatives at the points At a a(A), as assumed in
Section 9.3.
If the Taylor series for f(A) about the origin converges for all points in the
complex plane (the radius of convergence is infinite), then f(A) is said to
be an entire function and the power series representation for f(A) will-
converge for all A e V". Among the most import ant functions of this kind
9.9 THE RESOLVENT AND THE CAUCHY THEOREM FOR MATRICES 329
are the trigonometric and exponential functions. Thus, from the correspond-
ing results for (complex) scalar functions, we deduce that for all A e C"'",
m - 1)
sin A = E
p .0 (2p + 1)!
( A 4+1, cos A = A p,
(2P)!
px
A" Azp+i œ A zp
e4 = , sinh A =
5 ,4) (2p+ 1 )!
cosh A = / .
P
P =0 ! p=0 (2p)!
If the eigenvalues2 1, 2 2 , ..., 2" of A satisfy the conditions '21 l < 1 for
j = 1, 2, .. , n, then we deduce from the binomial theorem and the logarithmic
.
series that
(I — A) -1
0)
=E AP, log(/ + A) =
A". (-1Y- '
(7)
p =0 P p=1
Exercise 2. Prove that if A e C"'", then eA(d+t) = eAaeA' for any complex
numbers s and t.
Exercise 3. Prove that if A and B commute, then eA +B = eAe8, and if
e"+B)t = eAreB` for all t, then A and B commute.
then
eAt = [ef ^ t
e'
t 1
tJ'
eBr = 1 O E-1( eU—
ea _ cos t sin t
— —sin t cos ti.
Exercise 5. Show that the sequence A, A z , A 3, ... converges to the zero-
matrix if and only if 121 < 1 for every 2 E Q(A). O
d
t (A(t))2 =
AwA + AA(t),
(1).
dt (A(t))2 # 2AA
}
Exercise 3. Prove that, when p is a positive integer and the matrices exist,
d
E AJ
dt (A(t))P = jel
IA" ►
AP-J,
d P
(A(t))-P = —A -P - P. ❑
dt dt A
The notation
A(1,) d.ï
JL
will be used for the integral of A(A) taken in the positive direction along a
path L in the complex plane, which, for our purposes, will always be a simple
piecewise smooth closed contour, or a finite system of such contours, without
points of intersection.
-1
Let A e V".". The matrix (AI — A) is known as the resolvent of A
and it has already played an important part in our analysis. Viewed as di
function of a complex variable A, the resolvent is defined only at those pointy
9.9 THE RESOLVENT AND THE CAUCHY THEOREM FOR MATRICES 331
A that do not belong to the spectrum of A. Since the matrix A is fixed in this
section, we use the notation RA = (Al — A)"' for the resolvent of A.
Exercise 4. Verify that for A tt a(A),
dA R^ `
and, more generally,
❑
.1p
Hint. Use Exercise 4.14.4(a).
Theorem 1. The resolvent RA of A e C" x " is a rational function of A with
poles at the points of the spectrum of A and R oe = a Moreover, each AA e a(A)
is a pole of RA of order m k , where m k is the index of the eigenvalue Ak .
PROOF. Let m(A) = D= o ocl A' be the minimal polynomial of A. It is easily
seen that
m(A) - m(µ)
A—p
_ E1 v .,21A
1.1=o
A # µ,
Since m(A) = 0, it follows that, for the values of A for which m(A) # 0,
1 m-1
RA = m(A) J. m1A)A i, (1)
where mi(A) (0 5 j 5 m — 1) is a polynomial of degree not exceeding m — 1.
The required results now follow from the representation (1). ' ■
Thus, using the terminology of complex analysis, the spectrum of a
matrix can be described in terms of its resolvent. This suggests the app lication
of results from complex analysis to the theory of matrices and vice versa.
In the next theorem we use the classical Cauchy integral theorem to obtain
its generalization for a matrix case, and this turns out to be an important and
Useful result in matrix theory.
First, let us agree to call a function f (A) of the complex variable A analytic
on an open set of points D of the complex plane iff (A) is continuously differ-
entiable at every point of D. Or, what is equivalent, f (A) is analytic if it has a
€1 Convergent Taylor series expansion about each point of D. If f (A) is analytic
332 9 FUNCTIONS O F MATRICES
and the result follows from the spectral resolution of Theorem 9.5.1. •
Note that the relation (3) is an exact analog of (2) in the case j = 0•
(A — Aar. is simply replaced by (AI — A) -1. It should also be remarked
that (3) is frequently used as a definition of f (A) for analytic functions f(A).
Corollary 1. With the notation of Theorem 2,
x dA, A = 2m f AR x dA.
I 2ni .fLR L
Lp
—
f
(A .1pYR2 d,1 = r! Zp,(2ni),
We remark that, for those familiar with Laurent expansions, Eq. (4)
expresses Zkj as a coefficient of the Laurent expansion for the resolvent
RI about the point 4.
Using the results of Theorems 9.5.2 and 9.5.4, and that of Theorem 3
for j = 0, the following useful result is obtained (see also Theorem 9.6.1).
Theorem 4. With the previous notation, the integral
1
ni f R x dd,
yk=Zk o = 2, 1<k5s,
Zk0
_ f
1
2ni (AI — A) - 1 (1A.
where A e es" and f(t) is a prescribed function of time. We will now analYS
these problems with no hypotheses on A e C" "".
It was seen in Section 4.13 that primitive solutions of the form x o eA4, whet
Ao is an eigenvalue and x o an associated eigenvector, play an important pat
9.10 APPLICATIONS TO DIFFERENTIAL EQUATIONS 335
xo(t) = (k tk1 )^
x1
+ ... + 1 xk-1 + xk, (4)
Theorem 1. Let A E C"" and let .11 ,.12, ... , .1, be the distinct eigenvalues of
A of algebraic multiplicity ai, a2 , ... , a„ respectively. If the Jordan chains for
2k (1 5 k 5 s) are
X11,x iks , xk
r(E)
vkl^
.b21.•x23,
x(k)
►k, l x(k)
rk, 2
x(k)
rk, ► k. r,;
where DI 1 vkt = ak, then any solution x(t) of (1) is a linear combination of the
n vector-valued functions
J
u_1
Ai) + ... + 1 ! xski- 1 + x4 j) eAkt,
[
(6)
— 1)!
where j = 1, 2, ... ,vkt ,i= 1,2, ... rk , k= 1, 2, ... ,s.
In other words, the expressions in (6) form a basis in the solution space .° of
(1) and therefore are referred to as the fundamental solutions of (1). Note also
that ilk has geometric multiplicity rk and xj, , x k1 span Ker(A — .lkn.
Exercise 1. Find the general form of the solutions (the general solution) of
the system
ic i 6 2 2 x1
1(t) = 1±2I = — 2 2 0 x2 = Ax(t).
Z3 0 0 2 x3
SOLUTION. It was found in Exercise 6.5.2 that the matrix A has an eigenvalue
Ai = 2 with an associated eigenvector [0 —1 1]T and an eigenvalue A2 = 4
associated with the eigenvector [2 —2 OF and generalised eigenvector :
E rpx,(0) = xo
pa l
1 0 0
Now we show how the solution of (8) can be found by use of the notion of a
function of A. Start by using the idea in the beginning of the section but
applying it directly to the matrix differential equation f = Ax. We look for
solutions in the form eA'z with z e C". Exercise 9.9.1 clearly shows that the
vector- valued function x(t) = e"Iz is a solution of (1) for any z e C":
Let x(t) be a solution of (1). Computing x(0), we see from Theorem 3 that
x(t) = e"`x(0). Hence we have a further refinement:.
Theorem 4. Any solution x(t) of (1) can be represented in the form x(t) = e"`z
for some z e C".
It is a remarkable and noteworthy fact that all the complications of
Theorem 1 can be streamlined into the simple statement of Theorem 4. This -
demonstrates the power of the "functional calculus" developed in this
chapter.
Another representation for solutions of (1) is derived on putting f (A) = e1
in Theorem 9.5.1 and using the spectral resolution for e"`. Hence, denoting
x(0) = xo , we use Theorem 3 to derive
3 33k-1 3
irk-1
x(t) = EE
k=1 J=0
Z, tJelk`xo= k=1
E (zko + zkit + ... + zk m-. )eikt,
where zkJ = ZkJxo and is independent oft. Thus, we see again that the solution
is a combination of polynomials in t with exponential functions. If A is
a simple matrix, the solution reduces to
x(t) = E zko el``,
k=1
and we see that each element of x(t) is a linear combination of exponential
functions with no polynomial contributions (see Section 4.13).
Exercise 3. By means of component mat rices, find the solution of the initial
value problem discussed in Exercise 2. ❑
Finally, consider the inhomogeneous Eq. (2) in which the elements of
f are prescribed piecewise-continuous functions f1(t), ... , f"(t) of t that are
not all identically zero. Let Se be the space of all solutions of the associated
homogeneous problem x = Ax and let if be the set of all solutions of
k = Ax + f If we are given just one solution y e .5; then (compare with
Theorem 3.10.2) z e .f if and only if z = y + w for some w e .9'. The veriIi
cation of this result is left as an exercise. Thus, to obtain .°l we need to know
just one solution of the inhomogeneous equation, known as a particular.
integral, and add to it all the solutions of the homogeneous equation, each
of which we express in the form eAtx for some x e C".
We look for a solution of k = Ax + f in the form
x(t) = e"`z(t).
If there is such a solution, then s = Ax + e"`i and so a"`t = f hence
z (t) = e -At0 x0 + f e-Asf(t) dr,
t0
9,10 APPLICATIONS TO DEFERENTIAL EQUATIONS 339
eA(e—to)xo J to
e"^r—tf (t) dti
to
The integrals
r pt t
e"' - ')e" di, a J (t —
T)ea(t-near
d.r, a f e20 - clear dt
a^ o o 0
are easily found to be ate', fat 2e4', and fae2t(e2i — 1), respectively. Hence
y(t) = ate' Z1oe1 + lat 2e¢eZ11e1 + lae 2t(e2` — 1)Z20e1
1 2 a(1 + t)te41
= late (2 101 + t — 2 .1) = —at 2e4 t 1.
0 0 0
The solution is then given by x(t) = e"txo + y(t).
340 9 FUNCTIONS OF MnTtuc@s
y(t) = Cx(t).
Thus x(t) has values in C" determined by the initial-value problem (supposing
xo a C" is given) and is called the state vector. Then y(t) e Cr for each t Z 0
and is the observed or output vector. It represents the information about the
state of the system that can be recorded after "filtering" through the matrix
C. The system (1) is said to be observable if y(t) __ 0 occurs only when
xo = O. Thus, for the system to be observable all nonzero initial vectors
must give rise to a solution which can be observed, in the sense that the output
vector is not identically zero.
The connection with observable pairs, as introduced in Section 5.13, is
established by the first theorem. For convenience, write
C
Q = Q(C, A) = C.A. ,
CA" -1
a matrix of size nr x n.
9.11 OBSERVABLE AND CONTROLLABLE SYSTEMS 341
Theorem 1. The system (1) is observable if and only if Ker Q(C, A) _ {0},
that is, if and only if the pair (C, A) is observable.
Before proving the theorem we establish a technical lemma that will
simplify subsequent arguments involving limiting processes. An entire
function is one that is analytic in the whole complex plane; the only possible
singularity is at infinity.
Lemma 1. For any matrix A e C"x" there exist entire functions /i 1 (t),
, h(t) such that ..
the spectrum of any n x n matrix A. Since f in(A) = tie r`, it follows from
Theorem 9.5.1 that
a mk - 1
f(A) = E JE=0 t'e 1```Zk!•
k=1
(3)
Since Zk.i = gpkl(A) for each k, j and "kJ is a polynomial whose degree does
not exceed n [see the discussion preceding Eqs. (9.2.6) and (9.2.7)], and because
the 9" depend only on A, the right-hand term in Eq. (3) can be rearranged
as in Eq. (2). Furthermore, each coefficient fr (t) is just a linear combination
of functions of the form tje rk' and is therefore entire. •
PROOF OF THEOREM I. If Ker Q(C, A) # {0} then there is a nonzero x o such
that CA 1xo = 0 for j = 0, 1, ... , n — 1. It follows from Lemma 1 that
Ce4'xo = 0 for any real t. But then Theorem 9.10.3 and Eqs. (1) yield
y(t) = Ce Atxo ra 0
and the system is not observable. Thus, if the system is observable, the pair
(C, A) must be observable.
Conversely, if the system is not observable, there is an x o # 0 such that
y(t) = Ce A'x0 = O. Differentiating repeatedly with respect to t, it is found
that
Q(C, A)ei ltxa m 0
and Ker Q(C, A) # (0). Consequently, observability of the pair (C, A) must
imply observability of the system. •
It has been seen in Section 5.13 that there is a notion of "controllability"
of matrix pairs that is dual to that of observability. We now develop this
342 9 FUNCTIONS OF MATRICES
notion in the context of systems theory. Let A e C""" and B e C""m and
consider a differential equation with initial condition
1(t) = Ax(t) + Bu(t), x(0) = xo . (4)
Consistency demands that x(t) e C" and u(t) e Cm for each t. We adopt the
point of view that u(t) is a vector function, called the control, that can be
used to guide, or control, the solution x(t) of the resulting initial-value
problem to meet some objective. The objective that we consider is that of
controlling the solution x(t) in such a way that after some time t, the function
x(t) takes some preassigned vector value in C".
To make these ideas precise we first define a class of admissible control
functions u(t). We define tri m to be the linear space of all Cm-valued functions
that are defined and piecewise continuous on the domain t >_ 0, and we
assume that u(t) e'fie . Then observe that, using. Eq. (9.10.10), when u(t) a ^m
is chosen, the solution of our initial-value problem can be written explicitly
in the form
x(t; xo , u) = eA`xo +
J e0'44Bu(s) ds.
0
(5)
where z ,(t) takes values in Ck, then z,(t) = A iz i(t) + B,u(t) is a controllable
system.
Exercise 4. Establish the following analogue of Lemma 1. If A e
then there are functions ,(1), ... , rp"(A) analytic in any neighborhood not
containing points of a(A) such that
(1.I — A) - ' = tPja)A i-1 D
^=1
A(A) — rl — + 1— A.]
A—A2 A '
A -1 = — e"`dt,
J0
9.12 MISCELLANEOUS EXERCISES 347
ZkjZkt o t)Zkj+t+ O S j, t 5 Mk — 1,
\J
with the convention that Zkt = 0 if I z m k .
Hint. Use Eq. (9.5.9).
11. Let L(A) = D= o A'L' be an n x n A-matrix over C, let A e C"x", and let
L(A) = E+= o L i A i. Prove that
LU)(A k)Zk j,
(a) L(A) = E E
k=1 j= 0
in Section 9.2, and substitute A for A. For the second part use Exercise
9.5.1(b).
14. If A, B e C"" and X = X(t) is an n x n matrix dependent on t, check
that a solution of
X=AX+XB, X(0) = C,
is given by X = e 4tCe °t.
15. Consider the matrix differential equations
(a) X = AX, (b)X = AX + XB,
in which A, B eV" are skew-Hermitian matrices and X = X(t).
Show that in each case there is a solution matrix X(t) that is unitary for
each t.
Hint. Use Exercise 14 with C = I for part (b).
16. Let A(t), B(t) a C" x" for each t e [t,, t 2] and suppose that any solution ':
Consider a linear space .9" over the field of complex or real numbers. A
real-valued function defined on all elements x from .So is called a norm (on a) .
350
10.1 THE NOTION OF A NORM 351
E
(b) Ilxlil = J=1
Hint. For part (d) the Minkowski inequality should be assumed, that is,
for any p Z 1 and any complex n-tuples (x 1, x2 , ... , xn) an d (y1, Y z, - • • , Yn),
1/p n n p
(, Ixi
, II) (,ixir)
E " + (JE iYJr ( ) 1'
+»
352 10 NORINS AND BOUNDS FOR EIGENVALUSS
Exercise 3. Let 0 1, 62 , ..., b"} be a fixed basis for a linear space .9', and for
any x e SP, write x =_ 1 ai b»
(a) Show that llxIL , = max i sJs" IaJI defines a norm on Y.
(b) If p > 1, show that Ilxll p = (7., 1Œ1")11p defines a norm on So. ❑ ''
The euclidean norm of Exercise 2(c) is often denoted by II lIE and is just...
one of the Holder norms of part (d). It is clear from these exercises that many
different norms can be defined on a given finite-dimensional linear space, and
linear space combined with one norm produces a different "normed linear
space" from the same linear space combined with a different norm. For
example, C" with II II m is not the same normed linear space as C" with II 112 .
For this reason it is sometimes useful to denote a normed linear space by a ,
pair, x,11 11.
It is also instructive to consider the examples of Exercise 2 as special cases
of Exercise 3. This is done by introducing the basis of unit coordinate vector&
into C", or R", in the role of the general basis {b 1,... ,b„) of Exercise 3.
The notation II L. used for the norm in Exercise 3(a) suggests that it might .
This can be justified in a limiting sense in the following way: If x = [,,J=1 aib
and there are v components a, (1 S i 5 n) of absolute value IIxll0 = p
= maxi vs" las t, then
11x11,=(Ia1I p +1Œ21p +•..+I;I p)1/r _ p(V + +•••+
are bounded and closed in .9 (see Appendix 2). It follows that the greatest
lower bounds
Yi = inf 11x112,
:tee,
12 = xec2
inf 11x111 (2)
Y1 x II
1 11x11 2
11x111 2 11x01
and
12
5 Irk _ Ill xll
11x112 I 1 Nxll2
Taking r, = y2 and r2 = 1/y 2 , inequalities (1) are obtained and the theorem
is proved. •
Exercise 8. Verify that equivalence of norms on .0 is an equivalence
relation on the set of all norms on So. ❑
Theorem 2 can be useful since it can be applied to reduce the study of ,
In Section 10.1 the emphasis placed on a norm was its usefulness as.
measure of the size of members of a normed linear space. It is also verbfi .
They correspond to the axioms for a metric, that is, a function defined on
pairs of elements of the set 9 that is a mathematically sensible measure of
distance.
The set of all vectors in a normed linear space whose distance from a fixed
point xo is r > 0 is known as the sphere with center xo and radius r. Thus,
written explicitly in terme of the norm, this sphere is
(x E.9':11x — xoll = r).
The set of x e 9 for which II x — x, S r is the ball with centre x o and radius
r. The sphere with centre 0 and radius r = 1 is called the unit sphere and plays
an important part in the analysis of normed linear spaces, as it did in the
proof of Theorem 10.1.2. The unit ball is, of course, {x a So: Ilxll 5 1).
It is instructive to examine the unit sphere in R2 with the norms I) 11m ,
11 112, and II Ili introduced in Exercise 10.1.2. It is easy to verify that the
port ions of the unit spheres in the first quadrant for these norms (described
by p = co, 2, 1, respectively) are as sketched in Fig. 10.1. It is also instructive
to examine the nature of the unit spheres for Holder norms with intermediate
values of p and to try to visualize corresponding unit spheres in 883 .
Exercise 1. Show that the function II II defined on C" by
11x11 = Ixil + ... + Ix"-1I +
where x = [x, x2 x"]T, is a norm on C" (or Re). Sketch the unit
spheres in R 2 and R3. ❑
X2 A
Ir
0 1 xi
Fig.10.1 Unit spheres in A2 for p-norms.
356 10 NORMS AND BOUNDS FOR EIGBNVALUE9 '
Norms can be defined on a linear space in a great variety of ways and give
rise to unit balls with varied geometries. A property that all unit balls have in
common, however, is that of convexity. (See Appendix 2 for a general dis-
cussion of convexity.)
Theorem 1. The unit ball in a normed linear space is a convex set.
PROOF. Let So, II U be a normed linear space with unit ball B. If x, y e B,
then for any t e [0, 1], the vector z = tx + (1 — t)y is a member of the line
segment joining x and y. We have only to prove that z e B. But the second and :.
third norm axioms immediately give
xk = E a}k161, k = 1, 2, . . .
J^1
Exercise 3. Show that, if x k -4 xo, yk -' yo, and al, -► ao as k - ► oo, where ao
and the ak are scalars, then Ilxk xo II 0 and ak xk + yk ► aoxo + yo as
-
-
k - ► oo.
Exercise 4. Show that if {xi}:1 1 converges in one norm to xo , then it
converges to xo in any other norm.
Hint. Use Theorem 10.1.2. ❑
The next theorem shows that coordinatewise convergence an d convergence
in norm for infinite sequences ( an d hence series) in a finite-dimensional
space are equivalent. However, the result is not true if the space is not of
finite dimension. Indeed, the possibility of different forms of convergen ce is
a major issue in the study of infinite-dimensional linear spaces.
Theorem 2. Let (x,,),T, be a sequence in a finite-dimensional linear space .9'
The sequence converges coordinatewise to x o a .9' if and only if it converges
to xo in (any) norm.
PROOF. Let (b , 62 ,— , 6„} be a basis in the linear space .P and write
.1
i=i
k , oo, since n and the basis are fixed. Hence, we have convergence in norm.
üs Conversely,
-
I
Ilxk - xohI,, =I E (aï - '
i=1 I max Iar - o I.
^)bJ co = lsjsn
thu s, Ilxk - Roll - ► 0 implies Ilxk - R oll -► 0 and hence
maxIar - arI 0
1 sJan
tas k . co.
- This clearly implies coordinatewise convergence. •
358 10 NORMS AND Bourras FOR EIOSNVALUBs
We start this section by confining our attention ,.to norms on the linear.
space C""", that is, to norms of square matrices. Obviously, all properties
of a general norm discussed in Sections 1 and 2 remain valid for this particular
case. For instance, we have the following.
Exercise 1. Verify that any norm on C""" depends continuously on the
matrix elements.
Hint. Consider the standard basis {E, J}i J=1 for C"x" (see Exercise 3.5.1)
and use Theorem 10.1.1. 0
The possibility of multiplying any two matrices A, B e C"'" gives rise to
an important question about the relation between the norms IIAII, IIBII and .
the norm IIABO of their product. We say that the norm II I on C" x " is a
matrix norm or is submultiplicative (in contrast to a norm of a matrix as a
member of the normed space C"") if
The most useful matrix norms are often those that are easily defined in
terms of the elements of the matrix argument. It is conceivable that a measure
of magnitude for matrices could be based on the magnitudes of the eigen-
values, although this may turn out to be of no great practical utility,
Since the eigenvalues are often hard to compute. However, if A 1 , ... , An are the
eigenvalues of A, then PA = max i ,i,„1,11 1 is such a measure of magnitude
and is known as the spectral radius of A.
Consider the matrix N of Exercise 6.8.6. It is easily seen that all of its
eigenvalues are zero and hence the spectral radius of N is zero. Since N is
not the zero-matrix, we see immediately that the spectral radius does not
satisfy the first norm axiom. However, although the spectral radius is not a
norm, it is intimately connected with the magnitude of matrix norms. Indeed,
matrix norms are often used to obtain bounds for the spectral radius. The
next theorem is a result of this kind.
'Theorem 1. If A e C"" and FAA is the spectral radius of A then, for any matrix
norm,
IIAII Z t1A•
360 10 NORMS AND BOUNDS FOR ElomNvel.UB.3;
IIA4 2 s lI A II llxll 2,
which is equivalent to Eq. (4).
Exercise 10. Prove that the matrix norm defined in Exercise 7(a) is coin
patible with the Holder norms on C" for p = 1, 2, co.
Exercise 11. Let II II, be a norm on C" and let S e C"'" be nonsin
Show that Ilxfls = IlSxll, is also a norm on C". Then show that II Ns and t
10.3 MATRIX NORMS 361
Hint. Let f (z) = z' and use Theorem 9.5.1. (The hypothesis that A t has index
one is made for convenience. Otherwise, the limit on the right of (5) is a little
more complicated.).
r>. Exercise 13. (The power method for calculating eigenvalues). Let A e C" x "
and have a dominant eigenvalue A t of index one (as in Exercise 12). Let x 0
be any vector for which x o Ker A", lIxoll = 1 in the euclidean norm, and
Zioxo # 0. Define a sequence of vectors x o , x1 , x2 , ... of norm one by
1
x, = — Ax,-1, r = 1, 2, ... ,
kr
^r
(a) The sequence is well-defined (i.e., k, # 0 for any r).
(b) As r - ► oo, k - lAII ,
,
(Note that by normalizing the vectors x, to have euclidean norm equal tot '
one, they are not defined to within a unimodular scalar multiple. Thus, the
presence of Q, in (6) is to be expected. In numerical practice a different
normalization is likely to be used, with II II I for example. But as in so many :
Ax, _ 1 = _1 . 1• . k2 k1 A'xo•
Since Ker A' = Ker A" for all r Z n (ref. Ex. 4.5.14), x o Ker A', and so
Ax,_ 1 # O. Hence k, # 0 and (a) follows by induction.
Then observe
k, = IIA'xoll/IIA' -l xoll• (7)
Define pi = a -1Zloxo, where a = IlZioxoll # 0, and use the preceding
exercise to show that I t11 'IIA'xou -• a. Combine with (7) to obtain part (b
Use the fact that
1 ,
x= k,...k 2 k 1 Ax0
together with Theorem 9.5.1 to show that
k,..•k 2 k 1
x,--+ ap t ,
We now investigate a method for obtaining a matrix norm from any giy
vector norm. If II II„ is a vector norm in C" and A eC"x", we consider
quotient NAxll./Ilx defined for each nonzero x e C". The quotient is
viously nonnegative; we consider the set of all numbers obtained wh
takes all possible nonzero values in C" and define
f (A) = sup
Il 4dv
x"C".:*° Ilxlly
10.4 INDUCED MATRIX NORMS 363
f(A) =
IlAxll„ _ max IIAxII„
sup
xEC^.x#0 IIxIIv xeC ^. IIxil.s 1
for all nonzero x E C" and any A e C" %". Using Eq. (3) once more (with respect(
to B), we have
f (AB) s f (A)IIBxoll„ 5 f (A)f (B)0x011, = f(A)f (B),
and the property (10.3.1) holds. Part (a) of the theorem is thus proved and
we may write f (A) = IIA II• ',
Observe that part (b) of the theorem is just the relation (3). For part (c) we
have, with an element xo defined as in Exercise 1(b),
BAH = f(A) = IIAxoll, s 11A111IIxo11„ = IlAllt,
for any matrix norm compatible with II 11„. •
Exercise 2. Verify that 11 1 11 = 1 for any induced matrix norm.
Exercise 3. Let N. E CV” denote a matrix with ones in the i, i + 1s1
positions (1 5 i S n — 1) an d zeros elsewhere. Check that there are induced
norms such that
(a) IIN"II = n; (b) IIN"il = 1; (c) ONII < 1.
Hint. Consider the vector norm of Exercise 10.2.1 for part (a) an d thi
norms ?,
1 -t " 1 "-2 1
11x11, _ Ixtl + Ix21 + ... + k Ix"- tl + Ix"I
`k! ('e)
with k = 1 an d k > 1 for parts (b) and (c), respectively. O I I
Some of the most important an d useful matrix norms are the induct
norms discussed in the next exercises. In pa rt icular, the matrix norm induce
10.4 INDUCED MATRIX NORMS 365
by the euclidean vector norm is known as the spectral norm, and it is probably
the most widely used norm in matrix analysis.
Exercise 4. If A is a unitary matrix, show that the spectral norm of A is 1.
Exercise S. Let A e C"'" and let AA be the spectral radius of A*A. If Q II.
denotes the spectral norm, show that IHAlls = 4 2. (In other words, IiAlls
is just the largest singular value of A.)
SOLUTION. The matrix A*A is positive semi-definite and therefore its
eigenvalues are nonnegative numbers. Hence AA is actually an eigenvalue
of A*A.
Let x i, x2 , ... , x" be a set of orthonormal eigenvectors of A*A with
associated eigenvalues 0 5 Al 5 A2 5 • • • S A1,,. Note that AA = and if
= "=1 al xi , then A* Ax = Ei=1 a çAixi . Let II Its denote the euclidean
Vector norm on C". Then
IIAxill = (Ax, Ax) = (x, A* Ax) _ E tai 1 2Ai•
i= 1
For x e C" such that II x lI e = 1, that is, E7=1 I ai l2 = 1, the expression
'above does not exceed A" and attains this value if a l = a2 = • • • = a _ 1 = 0
"
'Exercise 7. If A, U e C"'" and U is unitary, prove that, for both the euclidean
matrix and spectral norms, IIAII = II UA II = lIA U II and hence that, for these
two norms, IIAII is invariant under unitary similarity tr an sformations.
int. For the euclidean norm use Exercise 10.3.7(c); for the spectral norm
use Exercises 4 and 10.1.6.
xercise 8. If A is a normal matrix in V" and II II, denotes the spectral
orm, prove that IIAII = pa, the spectral radius of A. (Compare this result
.
(Note that we form the row sums of the absolute values of the matrix elements,
and then take the greatest of these sums.)
SOLUTION. Let x = [x1 x2 xaT be a vector in C" with Ilxllm = 1.
Then
Thus,
max IIAxIIa, s IIAII P,
Ilxllao =1
and so the norm induced by if H oe cannot exceed OA II P . To prove that they are
equal we have only to show that there is an x o aC" with 'ko l a, = I and .
IlAxolla, IIAII P . It is easy to see that if HAIL = lakJI, then the vector
xo = [xi a2 ajT, where
IakJl if akJ # 0,
oJ = a^
0 if akJ = O,
j = 1, 2, ... , n, satisfies the above conditions.
Exercise 10. Show that the matrix norm induced by the vector norm II N1
of Exercise 10.1.2 is given by
(Note that we form the column sums of the absolute values of the matrix
elements, and then take the greatest of these sums.)
Hint. Use the fact that MO, = IIA * IIP •
Exercise 11. If n II denotes the spectral norm and II II P and II Il y are at
,
Before leaving the idea of inducing matrix norms from vector norms, we
ask the converse question: Given a matrix norm II II, can we use it to induce
a compatible vector norm? This can be done by defining H IL by
IixIly = IIxaTII (4)
for some fixed nonzero vector a.
Exercise 12. Let
A
_ 3/2 -1/2- 1
—1/2 1
Use Exercise 11 to show that IIAIL 5 2 and Exercise 8 to show that
IIAII =(5 +0)/4 te. 1.8.
:
Almost all the vector norms we have introduced in the exercises depend
only on the absolute values of the elements of the vector argument. Such
norms are called absolute vector norms. The general theory of norms that we
have developed up to now does not assume that this is necessarily the case.
We shall devote the first part of this section to proving two properties of
absolute vector norms.
One of the properties of an absolute vector norm that we shall establish
is called the monotonic property. Let us denote by 1x1 the vector whose
elements are the absolute values of the components of x, and if x i, x2 e i8",
we shall say that x 1 5 x 2 (or x2 z x 1 ) if this relation holds for corresponding
Ë: Components of x 1 and x2 . A vector norm II II,, on C" is said to be monotonic
1 f I x l I S Ix21 implies IIx1II,. 5 IIx2 for any x l , x2 a C".
368 10 Nona AND BOUNDS FOR EIGBNVALUM
The second property we will develop concerns the induced matrix norm.
a matrix norm is evaluated at a diagonal matrix
D = diag[d1, d2 , ... , d„j,
we think highly of the norm if it has the property IID p = max i I dj 1. We will
show that u h has this property if and only if it is a norm induced by an.
absolute vector norm.
Theorem 1 (F. L. Bauer, J. Stoer, and C. Witzgallt). If II H. is a vector
norm in C" and Il II denotes the matrix norm induced by II IL, then the following
conditions are equivalent:
(a) I u,, is an absolute vector norm;
(b) h L. is monotonic;
(c) UDR = maxi sjs„ ldj l for any diagonal matrix
D = diag[d 1 d2 , ... , d„]
,
from C""".
PROOF. (a) . (b) We shall prove that if II II is an absolute vector norm
.
and Ilxlly = F(x1 , x2 , ... , x"), where x = [x1 x2 x„]T, then F is a'
nondecreasing function of the absolute values of x 1, x2 , ... , x„. Let us focus
attention on F as a function of the first component, x 1. The same argument
will apply to each of the n variables. Thus, we suppose that x 2 , x3 , ... , x„ are
define
f(x) = F(x,x2,x3,...,x")
and consider the behaviour of f(x) as x varies through the real numbers.
If the statement in part (b) is false, there exist nonnegative numbers p, q for' ;
which p < q and f (p) > f (q). Now since f (x) = f (I x p, we have f (— q) = f (q)f
and so the vectors x 1 = [ —q x 2 x"]T and x2 = [q x2
belong to the ball consisting of vectors x for which Ilxlly 5 f(q)• This ball is
convex (Exercise 10.2.2) and since the vector x o = [p x2 x„]T . can, :;
be written in the form xo = tx 1 + (1 — t)x2 for some t e [0, 1], then xo also;
belongs to the ball. Thus Nic o ll, S f (q). But 11x0 11„ = f(p) > f (q), so we arriveH
at a contradiction and the implication (a) (b) is proved.
(b) (c) Suppose without loss of generality that D # O and define :;.
d = maxi sjsn Idjl. Clearly, I Dxl5 Idxl for all x e C" and so condition (b)
implies that IIDxII, 5 Ildxp„ = dQxp,,. Thus
IIDxII. 5 d.
IIDII = sup
s#o Üxll,
f Nwnerische Math. 3 (1961), 257-264.
10.5 ABSOLUTE VECTOR NORMS 369
Theorem 2. If II II is the matrix norm induced by the vector norm II IIv and
1 is the corresponding lower bound, then
1 /IIA -I II if det A # 0,
I(A) =
0 if det A = 0.
PROOF. If det A # 0, define a vector yin terms of a vector x by y = Ax.
Then for any x # 0, y # 0,
Prove also that equality obtains if A is normal and II IIv is the euclidean
vector norm. ❑
Note that by combining the result of Exercise 2 with Theorem 10.3.1, WE
can define an annular region of the complex plane within which the eigen-
values of A must li e.
Exercise 3. Show that if A is nonsingular then, in the notation of Theorem
lIAIxII, z l(A)Ilxliv• ❑
10.6 THE GtRiGonrri THEOREM 371
E
k=I
ajk xk=Axj, j= 1, 2, ... , n.
disk. Since the eigenvalues of B(t) are continuous functions of t, and the jth
disk is always isolated from the rest, it follows that there is one and only one
eigenvalue of B(t) in the jth disk for all t e [0, 1]. In particular, this is the
case when t = 1 and B(1) = A.
This proves the second part of the theorem for in = 1. The completion
of the proof for any m < n is left as an exercise. •
Note that by applying the theorem to AT, a similar result is obtained using
column sums of A, rather than row sums, to define the radii of the new disks.
The n disks in (1) are often referred to as the Gerlgorin disks.
Exercise 1. Sketch the Gerilgorin disks for the following matrix and for its
transpose:
0 1 0 i
_ 1 6 1 1
A
#i 1 51 0
0i 2 -
A similar proof can be performed for column sums, and the result
follows. ❑
Note that p in Eq. (2) can be viewed as the matrix norm induced by the
vector norm II x II (Exercise 10.4.9) and similarly for y (Exercise 10.4.10).
This shows that the result of Exercise 2 is a particular case of Theorem 10.3.1
(for a special norm).
Exercise 3. With the notation of Eq. (2), prove that
I det AI min(p", v")
for any AeC"x". ❑
IaiiI > I= pj
k=1
In this section two useful results are presented that can be seen as refine-
ments of Ger8gorin's theorem. However, they both involve the further hypo-
thesis that the matrix in question be "irreducible." This is a property that is
easily seen to hold in many practical problems. We make a formal definition
here and return for further discussion of the concept in Chapter 15.
If n z 2, the matrix A E C"" is said to be reducible if there is an n x n
permutation matrix P such that
[A u Ai sl
PTAP = (1)
0 A ^2J '
where A 11 and A22 are square matrices of order less than n. If no such P
exists then A is irreducible.
If A is considered to be a representation of a linear transformation Ton
an n-dimensional space .5°, the fact that A is irreducible simply means that,
among the basis vectors for .9' chosen for this representation, there is no
proper subset that spans an inva riant subspace of A (see Section 4.8 and Eq.
(4.8.3) in particular).
Exercise 1. If A e 08"" is reducible, prove that for any positive integer p,
the matrix AP is reducible. Also, show that any polynomial in A is a reducible
matrix.
Exercise 2. Check that the square matrices A and AT are both reducible
or both irreducible. ❑
The first result concerns a test for a matrix A e C' to be nonsingular.
Clearly, Theorem 10.6.1 provides a test (or class of tests) for nonsingularity
(see Exercise 10.6.4): A is certainly nonsingular if it is diagonally dominant.
In other words, if kil l > L" I aJk I for j = 1, 2, ... , n, then the origin of the
complex plane is excluded from the union of the Gerigorin disks in which
6(A) must lie. These inequalities can be weakened at the expense of the
irreducibility hypothesis.
10.7 GERIkOwN DISKS AND lRRBDUCIBLE MATRICES 375
= Ik = E a jk xk I
1,ksj
and comparing with Eq. (5), we obtain equalities in (5) for j = 1, 2, ... , r.
Hence it follows from Eq. (6) that
IaJki = 0, j = 1,2,...,r,
k=r+1
n 1 1 1 2-1 0 0
1 2 0 0 —1 2 —1
1 0 3 ; , 0 0
0 , —1
1 0 0 n 0 0 -1 2
are nonsingular. O
The next result shows that either all eigenvalues of an irreducible matrix A
lie in the interior of the union of Gerigorin's disks or an eigenvalue of A is a
common boundary point of all n disks.
Theorem 2 (0. Taussky). Let A = [a jk]j, k = 1 be a complex irreducible
matrix and let A be an eigenvalue of A lying on the boundary of the union of
the Gerigorin disks. Then 2 is on the boundary of each of the n Gerigorin disks:
PROOF. Given the hypotheses of the theorem, let I A — a,, I = pi for some i,
1 5 i 5 n. If A lies outside the remaining n — 1 Gerigorin disks, then
f A — ajjl > pi for all j s i. Thérefore by Theorem 1 the matrix AU — A must
be nonsingular. This contradicts the assumption that A is an eigenvalue of A.
Thus, since 2 must lie on the boundary of the union of the disks, it must be a
point of intersection of m circumferences where 2 5 m 5 n. Thus there are
m equalities in the system
if A* = A, then C = O and all the eigenvalues of A are real. This suggests that
a norm of C may give an upper bound for the magnitudes of the imaginary
parts of the eigenvalues of A. Similarly, if A* = —A, then B = O and all the
eigenvalues of A have zero real parts, so a norm of B might be expected to
give an upper bound for the magnitudes of the real parts of the eigenvalues
of A. Schur's theorem contains results of this kind. In the following statement
9.e 2 and .!m 2 will denote the real and imaginary parts of the complex
number ^l.
E 5 IIA0 2,
t =1
"
211 2 5 111311 2,
tE
=1
"
E 5 11c11 2,
i=1
IIBII 2 = E I IZ + Eid,,+drrl2
i=i 2 rs, 2
+ E Id2 12
=E I9^e2I2 z El gie .1t12.
t=1 rs, t=1
E I ^9n Ri{2 5 E
n
a„ 2 ar
5 T2n(n — 1).
i= 1 r,ss1, ► ta
Since A is real, all of its complex eigenvalues arise in complex conjugate
pairs, so that every nonzero term in the sum on the left appears at least twice.
Hence 21.14m R1(2 S T2n(n — 1) and the result follows. ■
Before leaving these results, we should remark that the bounds obtained
in the corollaries are frequently less sharp than the corresponding bounds
obtained from Gerigorin's theorem. This is well illustrated on applying
Corollary 1 to Exercise 10.6.1.
Exercise 1. Prove that I det Ai S n"12p", where p = max,,, I a„ I.
SOLUTION. First of all note the inequality of the arithmetic and geometric
means: namely, for any nonnegative numbers a 1 , 02 , ... , a",
+a2+•••+a"
(ala2 . •. a.)11" < a1
/1 " r1 "
s (n (AI(é) S Inn 2P2
and the result follows on taking s̀quare roots.
Acta Math. 25 (1902), 359-365.
380 10 NORMS AND BOUNDS FOR EIGENVALUES
E IA1I2 s iE
lai =1
µi,
6. If II II is the vector norm of Exercise 10.1.7, and II II and II II. denote the
„
where (1/g) + (1/p) = 1. By use of part (b) for these norms, obtain the
Holder inequality.
8. If A = ab* and II II is the matrix norm induced by a vector norm Q
prove that IIAII = IIaII„IlbIla, where II lid is defined in Eq. (1).
9. If x # 0, prove that there is a y e C" such that y*x = IIxII.IlYlla = 1.
10. Prove that if Il Had is the dual norm of II 11d, then IIxIIaa = 11x1I„ for all
x e C". The norm II lid is defined in Eq. (1).
11. Let II Q be a norm in C""" (not necessarily a matrix norm). Define
[ F* Ij
together with Exercise 10.4.5.
_ [F* I ][ 0 I — F*F ][ 0I ]
16. Let A E C" 1", B e C"" 'and U E Co" with orthonormal columns, i.e. CPU
=4.If A = UBU* show that MII, = 11611.
CHAPTER 11
Perturbation Theory
Given A e C" X", consider the equation Ax = b (x, b e C") and assume that
det A s 0, so that x is the unique solution of the equation.
In general terms, a problem is said to be stable, or well conditioned, if
"small" causes (perturbations in A or b) give rise to "small" effects (perturba-
tions in x). The degree of smallness of the quantities in question is usually
measured in relation to their magnitudes in the unperturbed state. Thus if
x + y is the solution of the problem after the perturbation, we measure the
magnitude of the perturbation in the solution vector by IIyIk/Ilxil. for some
vector norm II it,. We call this quotient the relative perturbation of x.
Before investigating the problem with a general coefficient matrix A, it
is very useful to examine the prototype problem of variations in the coeffi-
cients of the equation x = b. Thus, we are led to investigate solutions of
(I + mg = b, where M is "small" compared to I. The departure of C
from x will obviously depend on the nature of (I + M) -1 , if this inverse
383
384 11 PERTURBATION 1Hra ►nt .
exists. In this connection we have one of the most useful results of linear
analysis:
Theorem 1. 1f II If denotes any matrix norm for which DIP = 1 and iif MMN <
1, then (I + M) - ' exists,
(I +M) - ' =I —M + M2— •
and
1
II(I + M) - A
' 1 — IIMII
Note that in view of Theorem 10.3.1, the hypotheses of the theorem can be
replaced by a condition on the spectral radius: p a < 1.
PRON. Theorem 10.3.1 implies that for any eigenvalue d o of M, I4I S
HMO < 1. Moreover, the function f defined by f (A) = (1 + A)'' has a power
series expansion about A = 0 with radius of convergence equal to 1. Hence
(see Theorem 9.8.3), f (M) = (1 + M)`' exists and
(I+M) - '=I—M+M 2 —••
Ax = b,
with det A # 0, and we suppose that b is changed to b + k, that A is changed
to A + F, and that x + y satisfies
(A + FXx + y) = b + k. (1),
We will compute an upper bound for the relative perturbation iylly/IIxfL.
Subtracting the first equation from the second, we have
(A+F)y+Fx k,
or
4(I+ A`1F)y = k —Fx.
We assume now that S = HA -1 11 IIFII < 1, and that pi
lIA -1 Fil 5 HA -1 H IIFII < 1, Theorem 1 implies that (I + A -1 F)-i exists,
and that h(I + A -1 F) - ' II < (1 — S)' 1. Thus
y = (I + A -1F)''A - lk — (I + A -1F)-1A -1Fx,
and, if II II, is any vector norm compatible with H II,
IIA -tlI S
IIYII. s 1—S Ilkll. + 1— S I1x11,.
S= HA-1 11 IIFu —
K(A)Alu•il
Thus, we obtain finally
Exercise 1. Prove that if HA - 'II UFII < 1, then IIFII < IIAII.
1
If, in addition,
b— , 0.1
0.2
k—
—e
0
F
e e
00
e
0
, ,
1.0 0 00 0
show that
if 20a
HxL 1 + e'
Exercise 3. Prove that for any A e C"""and any matrix norm, the condition
number x(A) satisfies the inequality x(A) Z 1.
Exercise 4. If x(A) is the spectral condition number of A, that is, the condition
number obtained using the spectral norm, and s 1 and s" are the largest and
smallest singular values of A, respectively, prove that ic(A) = s i/s".
Exercise S. If x(A) is the spectral condition number of a normal nonsingular
matrix A e C" x ", and 2 and 2„ are (in absolute value) the largest and smallest
eigenvalues of A, respectively, prove that x(A) = I 21 I/I I
Exercise 6. Prove that the spectral condition number K(A) is equal to 1
if and only if ail is a unitary matrix for some a # 0.
Exercise 7. If the linear equation Ax = b, det A # 0, is replaced by the
equivalent equation Bx = A*b, where B = A*A, prove that this may be done
at the expense of the condition number; that is, that x(B) Z ic(A).
Exercise 8. Use Theorem 1 to prove that MP —, 0 as p —. oo. More
precisely, prove that MP —. 0 as p oo if and only if the spectral radius of
M is less than 1.
Exercise 9. If A e C""" satisfies the condition HA 11" < k for some matrix
norm II II, all nonnegative integers p, and some constant k independent of p,
show that A also satisfies the resolvent condition: for all complex z with
IzI > 1, the matrix zI — A is nonsingular and
5 IzJ _ 1 ,
II(zI — AY - ill
(This is a "worst case" in which the general error bound of (2) is actually
attained. Note the result of Ex. 10.4.8) ❑
if we define v(A) = infp ic(P), the infimum being taken with respect to all
matrices P for which P -1AP is diagonal. This result may be viewed as a
lower bound for the spectral radius of A; coupled with Theorem 10.3.1, it
yields the first half of the following result.
Theorem 1. If A is a simple matrix, II it is a matrix norm induced by an
absolute vector norm, and v(A) = infp ic(P), where P -1AP is diagonal, then
IIAII s max 1.1115 HAP,
1515n
and (1)
1(A) S min Rd 5 v(A)l(A),
I sJsn
where 1(A) is defined in Eq. (10.5.1).
t Numerische Math. 3 (1961), 241-246.
388 11 PERTURBATION THEORY
PROOF. The first inequality in (1) is just the result of Exercise 10.5.2. For
the second inequality, the equation A = PDP -1 and Corollaries 10.5.1 and
10.5.2 imply that
1(A) z l(P)1(1) -1 ) min I.tI.
11.15n
Theorem 10.5.2 then gives
1(A) (K(P))-1 min I AJI
1SJSn
where x = P - 1y # O. Thus,
(pi — D)x = P -1 BPx (2)
and hence
l(^ — D) 5 101 — D)xll. = IIP-1BPxII. 5 IIP
-1 BPII 5 ,<(P) PO -
IIxI1. IIxII,.
Therefore we have
min i p — Ail 5 K(P) IIBII.
1 s1Sn
We can easily obtain a stronger form of this theorem. Observe that P will
diagonalize A if and only if P will diagonalize al + A for any complex
number a, and similarly for Q, B, and —aI + B. Since
(aI+ A)+(—aI+B)=A+B,
we can now apply Theorem 3 to al + A and —al + B.
B has all its elements equal to e, prove that the eigenvalues of A + B lie in
the disks
1, 2, ..., n.
{z:^z — 2 + ^J
(
11 5 ;1E11, j=
Exercise 3. I(f̀ A is a normal matrix and B = diag f b,, ... , b„] is real, prove
that the eigenvalues of A + B lie in the disks
{z: l z —(a+AJ)i S f}, j= 1,2,...,n,
where A„ AZ , ... , A are the eigenvalues of A and
2a = max b J + min b J and 2/3 = max bJ — min b 1.
l sJsn i sJsn 1sJs „ 1sJsn
11.3 ANALYTIC PERTURBATIONS 391
Exercise 4. Let A r` x ", and let µ i, ... , p,, be the eigenvalues of t(A + AT ).
that is, A l(C) = C312 (defined by a particular branch of the function C"2) and .
2 2(C) = - 2 1(0. Thus, we see that the eigenvalues are certainly not analytic
in a neighbourhood of C = 0. ❑
It should be noted that det(AI - A(C)) is a polynomial in A whose coeffi-
cients are analytic functions of C. It then follows from the theory of algebraic
functions that
(a) If Ai is an unrepeated eigenvalue of A, then Ai(C) is analytic in a
neighbourhood of C = 0;
(b) If Ai has algebraic multiplicity m and Ak(C) -► Ai for k = a,, . , a,,,,
then Ak(C) is an analytic function of C 111 in a neighborhood of C = 0, where
I S in and C 1j1 is one of the I branches of the function Cul. Thus Ak(C) has a
power-series expansion in Clip.
We shall prove result (a) together with a result concerning the associated
eigenvectors. Note that, in result (b), we may have I = 1 and m > 1, in which
case Ak(C) could be analytic. As might be expected it is case (b) with m > 1
that gives rise to difficulties in the analysis. The expansions in fractional
powers of C described in (b) are known as Puiseux series, and their validity
will be assumed in the discussions of Section 11.7.
It is when we try to follow the behavior of eigenvectors that the situation .
becomsrpliatd.Themrx
A(C) = [0 0]
serves to show that even though the eigenvalues may be analytic, the eigen-
vectors need not be continuous at C = 0.
11.4 PERTURBATION OF THE COMPONENT MATRICES 393
no = ksl
Zko(C),
Y(C)=Z+CCI+C Z C2+...,
A(C)Y(C) = AZ + CB ' + 12B2 + .. •
for matrices C r , Br independent of C, r = 1, 2, ... .
Y(C) = Z + CC' + C2 C2 + • • •
where
c , R rr dz, r = 1, 2, ... .
2rti ,l i
Rz(C) _[ p2 z
_y -1
Z2 1 y3 r2 z]
11.5 PERTURBATION OF AN UNREPHATBD EIOBNVALUE 395
Partial fraction expansions now give
1 1 c- 1/2 c -1/2
1 z —C3/2 + z+ C3 /2 -- c3/2 Z-j-^i
R:(0) =I C1/S2 c1/ 2
1
S 1
z -0312 z +C3/2
z —
Y3/2+ z .^^
x(0) — Y(C)x.
YTY(Ox
Since Y(0) is analytic in a neighborhood of l; = 0 and yTY(0)x = 1, it
follows that x(0) is analytic in a neighborhood of C = O.
Finally, Eq. (5) also implies xT(Ç)y = 1, so the relation y(C)xT(C) = YT({)
implies yg) = YT(Oy, which is also analytic in a neighborhood of C = O.
•
We have now proved the existence of power series expansions in C for an
unrepeated eigenvalue and its eigenvectors. The next step in the analysis is
the discussion of methods for computing the coefficients of these expansions.
We shall first discuss a particular problem that will arise repeatedly later
on. If A is an eigenvalue of the matrix A e C"x" and b # 0, we want to solve
the equation
(AI — A)x = b. (1)
Given that a solution exists (i.e., b e Im(AI — A)), we aim to express a
solution x of (1) in terms of the spectral properties of A. If A1, A2 , ... , A, are the
distinct eigenvalues of A and A = AI we first define the matrix
_ a m g -1
^ !
+i Zk^, (2)
E ^
A - Air)f
just as in Eq. (9.6.2) but replacing A I and E1 by A and E. Multiply Eq. (1) on the
left by E and use Theorem 9.6.2 to obtain
(I — Z)x = Eb. (3)
(With the convention of Sections 4 and 5 we write Z for Z 10 .) Since ZE = O
it is easily seen that the vector x = Eb satisfies Eq. (3), and we would like
398 11 PERTURBATION THEORY
Thus, we complete the solution for the first-order perturba tion coeffi-
cients as follows:
dit ) = yTA (I)x, xt ' ) = EA(')x (8)
The argument can easily be extended to admit the recursive calculation
of At') and x(') for any r. The coefficients of C' in Eq. (4) yield
Ax" + At' )xt' -1) + . • • + At' )x = Axt') + At' )xt'' ) + • • +
or
( - A)x (') = (A(u - A(')I)xt'-I) + ... + (A (') -
If we now assume that A" ), ... , A(' -1) and xt' ), ... , xt' - ' are known, then
)
for small enough ICI (Theorem 11.4.1), the separate component matrices may
have singularities at C = 0 (Exercise 11.4.1).
Suppose now that we impose the further condition that A(C) be Hermitian
and analytic on a neighborhood A' of C = Co in the complex plane. Let
alt(C) be an element of A(C) with j # k, and let z = x + iy with x and y real.
Then define ajt (x + iy) = u(x, y) + iv(x, y), where u and y are real-valued
functions.
With these definitions, the Cauchy-Riemann equations imply that
au_ av au _ av
and
ax ay ay ^ ax '
But aki(C) = ajk(C) = u(x, y) — iv(x, y), which is also analytic on .N: Hence
au _ _ aU au _ ay
and
Ox ôy ôy ax '
These equations together imply that aik(C) is a constant on .N; and the
parameter dependence has been lost. This means that analytic dependence
in the sense of complex variables is too strong a hypothesis, and for the
analysis of matrices that are both analytic and Hermitian we must confine
attention to a real parameter C and assume that for each) and k the real and
imaginary parts of aik are real analytic functions of the real va riable C.
For this reason the final theorem of this chapter (concerning normal
matrices) must be stated in terms of a real parameter C. When applied to
Hermitian matrices this means that the perturbed eigenvalues remain real
and the component matrices (as studied in Theorem 11.4.1) will be Hermitian.
It can also be shown that in this case the perturbed eigenvalues are analytic
in C, whatever the multiplicity of the unperturbed eigenvalue may be. The
perturbed component matrices are also analytic and hence the eigenvectors
of A(C) can be supposed to be analytic and orthonormal throughout a neigh-
borhood of C = 0. These important results were first proved by F. Rellich
in the early 1930s. Rellich's first proof depends primarily on the fact that the
perturbed eigenvalues are real. We shall prove a result (Theorem 2) that
contains these results by means of a more recent technique due to Kato.
(See the references in Appendix 3.)
With this preamble, we now return to a perturbation problem, making
no assumption with regard to symmetry of the matrix A(0). Until we make
some hypothesis concerning the symmetry of A(C), we retain the assumption
of dependence on a complex parameter;. The formulation of corresponding
results for the case of a real variable C is left as an exercise. We first prove that .
Thus, in the Puiseux expansion for A1(C) described in result (b) of Section 11.3,
the coefficients of Co are zero for j = 1, 2, ... ,1— 1.
As in Section 11.4, the eigenvalue A of A = A(0) is supposed to "split"
into eigenvalues A 1(O, ... , Ap(0 for ICI sufficiently small. Also Am=
E,7-0 "A1"1, the matrix Z is the component matrix of A, and Y(0) is the sum
of component matrices Zko(C) of A(C) for 1 5 k 5 p. Thus, Y(0 a Z as -
ICI o.
Lemma 1. With the conventions and hypotheses of the previous paragraphs,
the function
B(C) ° C-1W A(D))Y()
—
— (1)
has a Taylor expansion about C = 0 of the form
= — 1 f (z — A) E CrR;r 1 dz.
21C î L r o m
Now, Y(0) = Z and, since A has index 1, (AI — A)Z = O. Hence the coeffi-
cient of CO on both sides of the last equation must vanish and we find that
.13 .) = (z — A)Rr 1) dz
21Cî fL
and is independent of C. This confirms the existence of the Taylor exp ansion
for B(C).
402 11 PERTURBATION THEORY
— (
-_
+
(z ? E(z)/ A`i) z — A
+ E(z)).
B(0) _
Hence, using Cauchy's theorem of residues,
2
^i f L jtZA
= —ZA( )Z.
(
(function
.
I + analytic near z = A) . dz.
)))
Let xj(C) be a right eigenvector of A(C) associated with one of the eigen-
values A/C), 1 5 j S p. Then Y(C)xi(C) = xj(C) and we see immediately from
Eq. (1) that
B(C)x,(C) = C- 1(A - A)(C))x,(C)• (2)
Thus, C- 10, - A/Ç) is an eigenvalue of B(0, say $(C). But, since B(C) has a
Taylor expansion at C = 0, /3(C) also has an expansion in fractional powers
of C. Consequently, for small enough ICI and some integer 1,
C- 1 (A — Ai(0) = Po + Min + r2 C 2 " + • .
and /o is an eigenvalue of the unperturbed matrix —ZA (1)Z. This implies
A, (C)=A —p o C — fiICI+IIm p2 CI+(2m —
(3)
—..
In other words,
A/C) = A + a 1C + O(ICI' +um)
as ICI 0, and a., is an eigenvalue of ZA (1)Z.
We have proved the first pa rt of the following theorem.
Theorem 1. Let A(C) be a matrix that is analytic in C on a neighborhood of
C = 0, and suppose A(0) = A. Let A be an eigenvalue of A of index 1 and multi-
plicity m, and let A,(C) be an eigenvalue of A(C)for which A j(0) = A. Then there
is a number a j and a positive integer 1 S m such that
A/C) = A + ai l + 0(ICI'+(Iii)) (q)
[Ca —1) 0
Exercise 3. Prove that if ai is an eigenvalue of ZA (11Z of index 1, then there
is a number b1 such that
AA(C) = A + a; C + b1C2 + 0(ICl 2+t,m ), (6)
and that b1 is an eigenvalue of Z(At 11EA(1) + AI21)Z with right eigenvector
x. ❑
Consider the matrix function B(C) of Lemma 1 once more. Since Y(C)
is just a function of A(C) (for a fixed C), A(C) and Y(C) commute. Also Y(C)2 =
Y(C) so we can write, instead of Eq. (I),
B(C) = C Y(C)(AI — A(C))Y(C), (7)
and this representation has a clear symmetry. In particular, if A and Ao 1
happen to be Hermitian, then Y(0) = Z and hence B(0) is also Hermitian.
Thus, B(0) has all eigenvalues of index one and repetition of the process
described above is legitimate, thus guaranteeing dependence of the perturbed
eigenvalues on C like that described by Eq. (6). Furthermore, it looks as
though we can repeat the process indefinitely, if all the derivatives of A(C)
at C = 0 are Hermitian, and we conclude that Ai(C) has a Taylor expansion
about C = O.
But once we bring symmetry into the problem, the considerations of the
preamble to this section come into play. Thus, it will now be assumed that
C is a real parameter, and that A(C) is normal for every real C in some open
interval containing C = 0. Now it follows that whenever A(C) is normal,
Y(C) and Z are Hermitian (i.e., they are orthogonal projectors), and B(C)
404 I I PERTURBATION THEORY
Exercise 4. Show that if C is real and A(C) is normal, then B(C) is normal.
Exercise S. Let A(O be analytic and Hermitian throughout a real neigh-
borhood of C = 0, and suppose that all eigenvalues A I(0), ... , A,(0) of A(0)
are distinct. Show that there is a real neighborhood X of C = 0, real functions
AI (r), ... , A„(C) that are analytic on .N, and a matrix-valued function U(C)
that is analytic on .N' such that, for C e
There are expansions of the form (5) for eigenvectors even when the un-
perturbed eigenvalue has index greater than 1. Indeed, Exercise 6 demon-
strates the existence of a continuous eigenvector. However, even this cannot
be guaranteed if the hypothesis of analytic dependence on the parameter is
removed. This is illustrated in Exercise 7, due to F. Rellich, in which the matrix
is infinitely differentiable at C = 0 but not analytic, and it has no eigenvector
at t = 0 that can be extended in a continuous way to eigenvectors for non-
zero values of C.
Exercise 6. Adopt the notation and hypotheses of Theorem 1, except that
the eigenvalue A of A may have any index m. Show that there is an eigenvector
xi(C) of A(l;) such that the expansion (5) holds for ICI small enough.
SOLUTION. Using statement (b) of Section 11.3, we may assume that there
is an eigenvalue Aq(i;) of A(() that has a power series expansion in powers of
Co for ICI small enough. Then the matrix Da) = Aq(O/ — A(i;) has elementg
with power series expansions of the same kind.
11.7 PERTURBATION OF A MULTIPLE EIOENVALUB 405
1. 1 12
Obviously, 1 is a function of C with a power series expansion in ( 1"I for
IC I small enough. Furthermore, D(C)1(C) = 0, so that 1(C) is an eigenvector
with the required property. To see this, let DJ* be the jth row of D(C). Then
1 2 r
D1./ = k dik =
k=1 D (1 2 r r+ 1
This is zero for each j because if 1 5 j 5 r we have a determinant with two
equal rows, and if j Z r + 1 we have a minor of D of order r + 1.
Exercise 7 (Rellich). Verify that the matrix A(x) defined for all real x by
cos(2/x) sin(2/x)1
A(0) = 0, A(x) = e- tIx =
^ sin(2/x) —cos(2/x)
is continuous (even infinitely differentiable) at x = 0 and that there is no
eigenvector s(x) continuous near x = 0 with v(0) # O. D
CHAPTER 12
406
12.1 Ins NOTION OF A KRONECKER PRODUCT 407
then
cl ub" a11b12 a12b11 a12b12
A ® B = auto" a11b22 a12b21 a12b22
a21b 11 az3blz a22bi1 a22b12
a21b21 a21b22 a22b21 a22b22
Proposition 1. If the orders of the matrices involved are such that all the
operations below are defined, then
(a) If p E f, (14A) ®B = A ®(µB) = µ(A B);
(b) (A+B)®C=(A®C)+(B®C);
(c) A ®(B + C) = (A ®B) + (A ®C);
(d) A®(B®C)=(A®B)®C;
(e) (A ® B)T = AT ®BT.
We need a further property that is less obvious and will be found very
usefuL
Proposition 2. If A, C E Fm"M and B, D e .9"x", then
(A ® BxC ® D) = AC ® BD. (1)
PROOF. Write A 0 B = [a1 B]i s 1, C ® D = [c1jDr 1=1 . The i, jth block
of (A®BxC® D) is
m
Fii = (arkBxcktD) = E Rik ck+BD, for 1 5 i, j 5 m.
k1 k= 1
On the other hand, by the definition of the Kronecker product, AC ® BD =
(7UBD]; i. i, where yu is the i, jth element of AC and is given by the formula
yu = Fk"`=1 a acku . Thus Fu = yu BD for each possible pair (i, j), and the
result follows. •
Proposition 2 has some important consequences, which we list in the
following corollaries.
Corollary 1. If A E FM' m, B E Sr'," then
(a) A ® B = (A ® 1"xIm ® B) = (Im ® B)(A ® I„);
(b) (A ®B)-' = A -1 ®B', provided that A -1 and B - ' exist.
Note that the assertion of Corollary 1(a) states the commutativity of
A® I„ and 1„, B.
Corollary 2. If A 1 , A2, .. , Ap e .9m' m and B 1 , B2 , ... , Bp E F" 'I', then
(Al o BIXA2 ®B2) ... (dip ® Bp) _ (A1A2 ... AO 0(B1B2 ... B p).
The Kronecker product is, in general, not commutative. However, the
next Proposition shows that A ® B and B ® A are intimately connected.
Proposition 3. If A E Srm"m, B E "'"", then there exists a permutation
matrix P e Rm" x m" such that
PT(AOB)P=B ® A.
12.1 Tus NOTION OF A KRONECKER PRODUCT 409
A*"
is said to be the vec function of A and is written vec A. It is the vector formed
by "stacking" the columns of A into one long vector. Note that the vec-
function is linear:
vec(aA + BBB) = a vec A + fi vec B,
for any A, B e Om" and a, fi E F Also, the matrices A 1 , A 2 , ... , Ak from
Cm"" are linearly independent as members of .gym"" if and only if vec A1,
410 12 LINEAR MATRIX EQUATIONS AND GENERALIZED INVERSES
vec A2 , ... , vec Ak are linearly independent members of SP'. The next
result indicates a close relationship between the vec-function and the
Kronecker product.
Proposition 4. If A c Be xi', X e .9"m x ", then
vec(AXB) = (BT ® A) vec X.
PROOF For j = 1, 2, ... , n, the f-column (AXB), 0 can be expressed
A Ai
(AXB)*i = AXB *i = E bk,(AX)wk = 1. (bk,A)X*k,
k=1 k=1
One of the main reasons for interest in the Kronecker product is a beauti-
fully simple connection between the eigenvalues of the matrices A and B and
A ® B. We obtain this relation from a more general result concerning com-
posite matrices of Kronecker products.
Consider a polynomial p in two vari ables, x and y, with complex co-
efficients. Thus, for certain complex numbers cjj and a positive integer 1,
p(x, y) = E 0c^Jx1y'.
t.J=
If A e Cm"m and B e C"'", we consider matrices in Cm""m" of the form
where J 1 and J2 are matrices in Jordan normal form (Section 6.5). It is clear
that J; is an upper-triangular matrix having ,ti, ... , 2f,, as its main diagonal
Jour. Math. pures app!. V 6 (1900), 73 - 128.
412 12 LINEAR MATRIX EQUATIONS AND GENERALIZED INVERSES
elements and similarly p i ... p„ are the main diagonal elements of the
, ,
,
upper-triangular matrix J.
Furthermore, from Exercise 12.1.5 it follows that J; ®J is also an upper-
triangular matrix. It is easily checked that its diagonal elements a re A; pi
for r = 1, 2, ... , m, s _ 1, 2, ... , n. Hence the matrix p(J i ; J2) is upper-
triangular and has diagonal elements p(4, pi). But the diagonal elements of
an upper-triangular matrix are its eigenvalues; hence p(J 1 ; J2) has eigen-
values p(A„ p,) for r = 1, 2, ... , m, s = 1, 2, ... , n.
Now we have only to show that p(J i ; J2 ) and p(A; B) have the same
eigenvalues. On applying Corollary 2 to Proposition 12.1.2, we find that
Ji®J2=PA`P - 1 ®QBJQ -1
_ (P ® Q)(A' ® Bi)(P- i ® Q 1),
and by Corollary 1 to Proposition 12.1.2, P -1 ®Q -1 = (P ® Q) -1 . Thus,
Ji® J2 =(P® QXA'®B'(P®Q) -1 ,
and hence
p(J1; J 2 ) = (P ® Q)p(A; BXP ® Q) -1.
This shows that p(J i ; J2) and p(A; B) are similar an d so they must have the
same eigenvalues (Section 4.10). •
The two following special cases (for p(x, y) = xy an d p(x, y) = x + y) are
probably the most frequently used.
Corollary 1. The eigenvalues of A ® B are the mn numbers A, p„ r = 1,
2,...,m,s= 1,2, . . . , n.
Corollary 2. The eigenvalues of (I„ ® A) + (B ® I„,) (or, what is equivalent
in view of Exercise 4.11.9, of (1, ® AT) + ( BT ® 1„ a) are the mn numbers
2,+p„r= 1,2, ... ,m s = 1,2,...,n.
,
m(n — 1) zeros.
Hint. Express B as a Kronecker product.
12.3 APPLICATIONS OF THE KRONECKER PRODUCT 413
•
recall that a matrix is nonsingular if and only if all its eigenvalues are non-
zero.
There is a special case in which we can obtain an explicit form for the
solution matrix X of Eq. (3).
Theorem 3. If all eigenvalues of the matrices A E C "1 and B e C"" have
negative real parts (that is, A and B are stable), then the unique solution X of
Eq. (3) is given by
PROOF. First observe that the conditions of the theorem imply that A and .
J0 0
Hence assuming, in addition, that
Z(co) = lim e'Ce r` = 0, (6)
CO
itis found that the matrix given by Eq. (4) is the (unique) solution of Eq. (3).
Thus it suffices to check that, for stable A and B, the assumptions we have
had to make are valid. To this end, write the spectral resolution (Eq. (9.5.1))
for the function e4` (0 < t < co):
Il Mk - 1
- (7)
eA` = E E t'e'k"Zkj,
k=1 j=0
radii pA and pia , respectively, and µA µ8 < 1, show that the equation
X = AXB + C (8)
has a unique solution X and that this solution is given by
X=>A'CB'.
J=o
Exercise 3. Show that Eq. (8) has a unique solution if 2p # 1 for all .
s.t =1
undefined elements.
Hint. Observe that if A, = A„ then a„ = min(k„ k1). ❑
I-
Thus, X i, X2 , ..., X. constitute a basis in the solution space of AX = X
12.4 COMMUTING MATRICES 419
Exercise 4. Show that if A e C"" has all its eigenvalues distinct, then
every matrix that commutes with A is simple. ❑
12.5 Solutions of AX + XB = C
Lo I )' 0 — B, (1)
commute. Thus, applying Theorem 12.4.1 to these two matrices, the next
result readily follows.
Theorem 1. Let A e Cm"m, B E C""" and let A = PJ 1P' 1 , B = QJ2 Q -1,
where
J1 = diag[t11Ik, + Nk,, • • . , 2PIkp + Nky]
and
J2 = diag[p 11„ + N,,, . . . , 119 1u + N,41
are Jordan canonical forms of A and B, respectively. Then any solution X of
AX + XB = 0 (2)
is of the form X = PYQ -1 , where Y = [Y.J;;r 1 is the general solution of
J1 Y + YJ2 = 0, with Y„ a Ck• x s = 1, 2, ... , P, t = 1, 2, ... , q. Therefore
Y has the properties Y„ = 0 if 2, # —µt , and Y„ is a matrix of the form
described in Theorem 12.4.1 for A. = — µt
The argument used in Section 12.4 can also be exploited here.
Corollary 1. The general form of solutions of Eq. (2) is given by the formula
X = (3)
!-1
are similar.
[0 B] —
and
[ 0 —B ] (6)
t Proc. Amer. Math. Soc. 3 (1952), 392-396.
I SIAM !our. Appl. Math. 32 (1977), 707-710.
12.5 SOLUTIONS OF AX + XB = C 423
PROOF. If Eq. (5) has a solution X, it is easily checked that the matrices in
(6) are similar, with the transforming matrix
Lo I ].
Conversely, let
[A p_ c
0 —B = 1 0 —B]P
] [
(7)
1;)1
Tl(Z) _ [ — Z[ —B],
T2(Z) _ [ 0 B]z.z[0
—B]'
where Z e C(m+") x lm+"), and obse rve that it suffices to find a matrix in
Ker T2 of the form
[0 _I].
Indeed, in this case a straightforward calculation shows that the matrix U
(8)
,])
q,
Thus, So c Im . Since the reverse inclusion is obvious, Im 91 = .i Now
we have Im 9 2 c = Im 91. But, on the other hand. (see Eq. (4.5.2)),
dim(Ker q,,) + dim(Im q,,) = dim(Ker TI), i = 1, 2,
and therefore Eqs. (9) and (10) imply that dim(Im 91) = dim(Im 2) and,
consequently, that
Im 91 =Img 2 =9 (11)
Now observe that
such that
Ro
VA .2 y°
Wo])
This means that VO = 0 and W° = I. Thus we have found a matrix in
Ker T2 of the desired form (8). This completes the proof. ■
—
is solvable. To describe all solutions of Eq. (2), that is, the set of all left
inverses of A, we first apply a permutation matrix P a .9"x" such that the
n x n nonsingular submatrix Al in PA occupies the top n rows, that is,
XAX X. (2)
Therefore we define a generalized inverse of A e SP"'" to be a matrix
X e.r"M" for which Eq. (1) and (2) hold, and we write X = Ai for such a
matrix.
Proposition 1. A generalized inverse A' exists for any matrix A e
PROOF Observe first that them x n zero matrix has the n x m zero matrix
as its generalized inverse. Then let r # 0 be the rank of A. Then (see Section
2.7)
A = Y (3)
0 0
for some nonsingular matrices R e V e .9"1 ". It is easily checked
that mat rice s of the form
1. BI
Al = V -I R-1 ,
(4)
BZ BZB1
for any B1 E F"`°" -'), 82 e Pm." satisfy the conditions (1) and (2). it
12.7 GaNeiwaz® INvenses 429
Thus, a generalized inverse always exists and is,. generally, not unique.
Bearing in mind the decomposition (3), it is easy to deduce the following
criterion for uniqueness of the generalized inverse.
Exercise 1. Prove that the generalized inverse of A is unique if and only if
A =0 or A is a square and nonsingular matrix. ❑
Finally, combining these results with the fact that AA' is a projector it is
found that
.gym = Im(AA') -J- Ker(AA') = Im A -j - Ker A',
which is equation (5).
Since the defining equations for A' are symmetric in A, A' we can replace
A, A l in (5) by A', A, respectively, and obtain (6). . ■
Corollary 1. If A' is a generalized inverse of A, then AA' is a projector onto
Im A along Ker A', and A'A is a projector onto Im A' along Ker A.
Note also the following property of the generalized inverse of A that follows
immediately from the Corollary, and develops further the relationship with
properties of the ordinary inverse.
Exercise 2. If A e 9m", prove that
Then there is a generalized inverse A' of A such that Ker A' = and
ImA'=.Y.
PROOF. Let A have rank r and write A in the form (3). It iseasily seen that
r
V(Ker A) = Kerl ;' 1, R -1 (Im A) 0]
LLL
and then, using (7), that
]
12.7 GENERALIZED INVERSES 431
Consequently, for some r x (m — r) matrix X and (n — r) x r matrix Y,
Im[1],
=R
Now define A' by
AI. V-i rl• -X ] R1
LY —YX
and it is easily verified that A' is a generalized inverse of A with the required
properties. ■
This section is to be concluded with a result showing how any generalized
inverse can be used to construct the orthogonal projector onto Im A; this
should be compared with Corollary 1 above in which AA' is shown to be a
projector onto Im A, but not necessarily an orthogonal projector, and also
with a construction of Section 5.8. The following exercise will be useful in
the proof. It concerns situations in which left, or right, cancellation of matrices
is permissible.
Exercise 3. Prove that whenever the matrix products exist,
(a) A*AB = A*AC implies 'AB = AC
(b) BA*A = CA*A implies BA* = CA*.
Hint. Use Exercise 5.3.6 to establish part (a) and then obtain part (b) from
(a). D
Pro position 5. If A e .9""" and (A*A)' is any generalized inverse of A*A,
then P = A(A*A)'A* is the orthogonal projector of .5F" onto Im A.
PROOF. Equation (1) yields
A*A(A*A)'A*A = A*A,
and the results of Exercise 3 above show that
A(A*A)'A*A = A, A*A(A*AYA* = A*. (8)
Using either equation it is seen immediately that P2 = P.
Using Proposition 2(c) one may write
P* = A((A*A)')*A* = A(A*A)'A*,
where (A*A).' is (possibly) another generalized inverse of A*A. It can be
shown that P* = P by proving (P — P*)(P — P*)* = O. Write
(P — P*)(P — P*)* = PP* + P*P — P 2 —
432 12 LINEAR MATRIX EQUATIONS AND GENERALIZED INVERSES
and use equation (8) (for either inverse) to show that the sum on the right
reduces to zero.
Obviously, P is a projector of F into Im A. To show that the projection
is onto Im A, let y e Im A. Then y = Ax for some x e fir"' and, using the first
equality in (8), we obtain
y = A(A*A) IA*(Ax) = Pz,
where z = Ax, and the result follows. •
PROOF. Let X 1 and X2 be two matrices satisfying the Eqs. (3) through (6).
By Eqs. (4) and (5), we have
Xi = Xi A*X? = AX,Xi , i = 1, 2,
and hence XÎ — XZ = A(X I X? — X2 XI). Thus
Im(Xl — X2) c Im A. (7)
On the other hand, using Eqs. (3) and (6), it is found that
A* = A*X? A* = X IAA*, i = 1, 2,
which implies (X 1 — X 2)AA* = 0 or AA*(Xi — Xz) = O. Hence
Im(Xt — X?) c Ker AA* = Ker A*
(by Exercise 5.3.6). But Ker A* is orthogonal to Im A (see Exercise 5.1.7)
and therefore it follows from Eq. (7) that Xt — X2 = O. ■
It was pointed out in Exercise 5.1.7 that for any A e 'a,
Im A ® Ker A* =
Ker A EDIm A*= .^".
Comparing this with Eqs. (1) and (2) the following proposition is obtained
immediately.
Proposition 2. For any matrix A
Ker A + = Ker A*, Im A + = Im A.
A straightforward computation can be applied to determine A + from A.
Let r = rank A and let
A = FR*, F E.Fmxr, e frx"
R* (8)
be a rank decomposition of A, that is, rank F = rank R* = r. Noting that
the r x r matrices F*F and R*R are of full rank, and therefore invertible, we
define the n x n matrix
X = R(R*R) -1(F*F)-1 F*. (9)
It is easy to check that this is the Moore-Penrose inverse of A, that X = A.
Note that the linearly independent columns of A can be chosen to be the
columns of the matrix F in Eq. (8). This has the useful consequences of the
next exercise.
Exercise 1. Let A be m x n and have full rank. Show that if m 5 n, then
A + = A*(AA*) -1 and if m Z n, then A + _ (A*A) -1A*. ❑
434 12 LINEAR MATRIX EQUATIONS AND GENERALIZED INVERSES
Now recall that in the proofs of Theorems 5.7.1 and 5.7.2, we introduced
a pair of orthonormal eigenbases {x 1 , x2 , ..., x„} and {y 1 , y2 , ..., y,,) of
the matrices A*A and AA*, respectively, such that
if i =1,2,...,r,
AxI = llJsi Yr (10 )
0 if i = r + 1, . . . , n,
where it is assumed that the nonzero singular values s i (1 5 i S r) of A arc
associated with the eigenvectors x 1 , x2 , ..., x,. Note that since A*Ax I
six, for i = 1, 2, ... , n, the multiplication of (10) by A* on the left yields
,x, if i = 1, 2, ... , r, (11)
A*_ 5
y^ 0 if i = r + 1, . . , m.
`
12.9 Tus BEST APPROXIMATE SOLUTION OF Ax = b 435
The bases constructed in this way are called the singular bases of A. It
turns out that the singular bases allow us to gain deeper insight into the
structu re of Moore-Penrose inverses.
_ a
Proposition 4. Let A e.fm" and let s 1 Z s2 > - • - s,› 0 = s,. +1 = • • • = s„
be the singular values of A. Then si 1, s2 '1 , ... s^ 1 are the nonzero singular
,
values of A + . Moreover, if {xj }i=1 and ()gil l are singular bases of A, then
{y j}lo1 and {x i }7. j are singular bases of A + , that is,
OI 1xI
A+yt = (12)
if i= r , +^ 1, .. r, m,
(A if i= 1, 2, ..., r,
+)*xi = {s^ (13)
0 1yj if i = r+1,...,n.
PROOF. Consider the matrix (A + )*A + and observe that, by Exercise 2(c)
and Proposition 2,
Ker((A + )*A+) = Ker((AA*)+) = Ker AA*,
and each subspace has dimension m — r. Thus A + has m — r zero singular
values and the bases {y in.. 1 and {x1}7"=1 can be used as singular bases of A +
-
Furthermore, applying A + to Eqs. (10) and using the first result of Exercise 3,
we obtain, for xi a Im A*, i = 1, 2, ... , n,
ts1/1+yi r
xj A + Axi — 0
if i = r '+ ' 1, . , n,
which implies (12). Clearly, si 1 (i = 1, 2, ... , r) are the nonzero singular
values of A.
Now rewrite the first relation in Exercise 3 for the matrix A*:
(A*) + A*z = z, (14)
for any z e Im A, and apply (A + )* = (A*) + to (11). Then, since
yl, ... , y, belong to Im A, we have
if i = 1, 2, ..., r,
yj A + *A* yj Jsi(A+)*xI
— ( ) 0 if i=r+1,...,n,
and the result in (13) follows. •
Exercise 4. If A = UDV* as in Theorem 5.7.2, and det D # 0, show that
A + = VD -1 U*. ❑
where A e 9'1 and b e FP", and we wish to solve for x e 9r". We have seen .
that there is a solution of Eq. (1) if and only if b e 1m .A. In general, the
solution set is a "manifold" of vectors obtained by a "shift" of Ker A, as
described in Theorem 3.10.2. Using Exercise 12.8.3, we see that the vector
xo = A+ b is one of the solutions in this manifold:
Axa = AA + b=b,
and it is the only solution of Eq. (1) if and only if A is nonsingular, and then
A+= A -1 .
Also, a vector x e 9z" is a solution of Eq. (1) if and only if llAx — bit = 0
for any vector norm II II in 5m•
Consider the case b #1m A. Now there are no solutions of Eq. (1) but,
keeping in mind the preceding remark, we can define an approximate solution
of Ax = b to be a vector x o for which llAx — bit is minimized:
llAx° — by = min llAx — bit (2)
se f"
in the Euclidean vector norm II Il. (Then xo is also known as a least squares
solution of the inconsistent, or overdetermined, system Ax = b. Note that if x0
is such a vector, so is any vector of the form xo + u where u E KerA.)
Now for any A e Sw"' x" and any b e .i', the best approximate solution of
Eq. (1) is a vector xo e F" of the least euclidean length satisfying Eq. (2).
We are going to show that x o = A + b is such a vector. Thus, if b e Im A, then
A + b will be the vector of least norm in the manifold of solutions of Eq. (I
(or that equation's unique solution), and if b 4 1m A, then A+b is the unique
vector which attains the minimum of (2), and also has minimum norm.
Theorem 1. The vector x0 = A+b is the best approximate solution of
Ax = b.
PROOF. For any x e F" write
A(x —A + b)+(I—AA+x—b),
Ax—b=
and observe that, because I — AA+ is the orthogonal projector onto am A)
(ref. Corollary 12.7.1), the summands on the right are mutually orthogonal
vectors:
A(x — A+b)eImA, (I — AA+x—b)e(ImA) 1 .
It remains only to show that 11x1II z IIxoII for any vector x 1 E SF" minimiz-
ing 1IAx — bu . But then, substituting x 1 instead of x in Eq. (3), 11Ax1 — bU
II Axo — bII yields A(x 1 — xo) = 0, and hence x 1 — xo e Ker A. According to
Proposition 12.8.2, x o = A + b e Im A* and, since Ker A 1 Im A*, it follows
that x0 1(x1 — x0). Applying the Pythagorean theorem again, we finally
obtain
11x111 2 = 11x011 2 + Ilxl — x011 2 z 11x0112 .
The proof of the theorem is complete. II
Note that the uniqueness of the Moore—Penrose inverse yields the unique-
ness of the best approximate solution of Eq. (1).
Exercise 1. Check that the best approximate solution of the equation
Ax = b, where
A= 20,
—1 3
1 1
b =101,1
(A *6, x,)
a1 = = 1, 2, ..., r.
3I2
PROOF. In Eq. (5) and in this proof we use the notation ( , ) for the standard
inner product on Fe', or on F. Thus, (u, v) = v*a. Let y1 , Y2 , ... , y", be
'too 12 LINEAR MATRIX EQUATIONS AND GENERALIZED INVERSES
Yi = s Ax i , i 1, 2, ..., r,
and therefore
1
Pi s-1
s7 —= 1 (b, Axt) =
i i (A b, xis
si si
for i = 1, 2, ... , r. This completes the proof. •
3. Prove that the Kronecker product of any two square unitary (Hermitian,
positive definite) matrices is a unitary (Hermitian, positive definite)
matrix.
4. Let A e .tscm"m, B e .F""" be fixed and consider the transformations T1 ,
T2 E 2;(3=1"") defined by the rules
for any X e."1 ". Prove that the matrix representations of Ti and T2
with respect to the standard basis [A iry: for .r'" " (see Exercise
3.5.1(c)) are the matrices
BT ® A and (I" ® A) + (BT ®1,„),
respectively.
5. If A e .r"' "m, B e .94 ", and II II denotes either the spectral norm or the
norm of Exercise 10.4.9, prove that
8. Let A E .-m"", B E Fr" (r 5 min(m, n)) be matri ces of full rank. Prove
that
(AB) + = B+ A + .
Find an example of two matrices A and B such that this relation does
not hold.
Hint. Use Eq. (12.8.9) with A = F, B = R*.
9. If A e F"" and A = HU = U 1H1 are the polar decompositions of A,
show that A + = U*H + = Hi UT are those of A + .
Hint. Use Proposition 12.8.3d.
10. Prove that a matrix X is the Moore-Penrose inverse of A if and only if
XAA* = A* and X = BA*
for some matrix B.
440 12 LINEAR MATRIX EQUATIONS AND GENERALIZED INVERSES
Stability Problems
441
442 13 STABILITY PROBLEMS
,
x(t) = [:
(t)]
in which x 1(t) and x2(t) are real-valued functions of t. The level curves for a
positive definite form u (see Exercise 5.10.6) are ellipses in the x 1 , x2 plane
(Fig. 1), and if xo = x(to) we locate a point on the ellipse v(x) = v(x o) cor-
responding to x(to). Now consider the continuous path (trajectory) of the
solution x(t) in a neighbourhood of (x,(to), x2(to)). If w > 0 then t) < 0,
and the path must go inside the ellipse v(x) = v(x 0) as t increases. Further-
more, the path must come within an arbitrarily small distance of the origin
for sufficiently large t.
Returning to a more precise and general discussion, let us now state an
algebricvsonfthlaLypunoverm.Mgalint
theorems are to be proved later in this section, as well as, a mo re general
stability theorem (Theorem 5). ''
13.1 THE LYAPUNOV STABILITY THEORY 443
v (x)—k
A— [ 1 a
have positive real parts. Show that the matrix
4+a2 —2a
H=
[
—2a 4 + a 2
is positive definite, and so is AH + HAT.
Exercise 1. Check that the Matrix
A = P(S — Q),
P
where and Q are positive definite, is stable for any skew-Hermitian matrix S.
Hint. Compute A*P - ' + P - 'A and use Theorem 1.
Exercise 3. Show that if 9te A = (A + A*) is negative definite then A is
stable. Show that the converse does not hold by considering the matrix
A =
of 1s'
P - t AP = J = C
r
J
,o1 (8)
2 J,
Q- 0
rl —Ri 1R21.
I J
Conversely, suppose that 6(A) = Q, so that J has a Jordan form as de-
scribed in Eq. (8). A Hermitian matrix H is to be constructed so that Eqs.
(6) and (7) are valid. To this end, apply Theorem 1, which ensures the exist-
ence of negative definite mat rices H1 e C° x °, H2 a C011-P" ( °) such that
— J1H1 — 111J1 > 0,
J2 H2 + H 2 J2 > 0.
Now it is easily checked by use of Sylvester's law of inertia that the matrix
H—P[0i H 1P*
2
is Hermitian and satisfies condit on (6). It remains to recall that H 1 < 0,
H2 < O, and hence
In H = In(—H1) + In H2 = {p, n — p, 0) = In A. •
13.1 THE LYAPUNOV STABILITY THEORY 447
In contrast to Theorem 2 the next results are concerned with the case when
AH + HA* = W is merely a positive semi -definite matrix. First, let us
agree to write In A 5 In B for any two n x n matrices A and B such that
a(A) 5 n(B), v(A) 5 v(B).
Clearly, in this case 6(A) z 5(B).
Note that if A is stable, then certainly 5(A) = 0 and the proposition shows
that H is semidefinite, a result that has already been seen in Exercise 12.3.1.
In A 5 In H. (12)
PROOF. Denote A, = A + ell' for a real. Then the matrix
A,H + HAB = (AH + HA*) + 2sf
448 13 Sreslunv Pao9LEMs
It follows that x*W = 0T and, as noted before, this implies that x*B = OT.
Hence
x *AH = x*W — x*HA* = OT. (14)
Now we have
x*ABB*A*x < x*AWA*x = x*(A 2HA* + AH(A*) 2)x = 0,
using. Eq. (14) twice. Thus, x*AB = OT. Repeating this procedure, it is found
that x*AkB = OT for k = 0, 1, ... , n — 1. Therefore we have found an x # O
such that
x*[B AB A"" t B) = O T,
and the pair (A, B) fails to be controllable. This shows that, actually, det H # 0
and hence the requirements of Theorem 3 are fulfilled. •
It is clear that if (A, W) is controllable, then we may take B = W" 2 (the
unique positive semi-definite square root of W) and obtain the next conclu-
sion.
Corollary 1 (C. T. Chent; H. K. Wimmert). If AH + HA* = W,
_
H = H*, W > 0, and (A, W) is controllable, then 5(A) = 5(H) = 0 and
In A = In H.
Now a useful generalization of the classical stability theorem is readily
proved.
Theorem 5. Let A e C"'" with (A, B) controllable, and let W be positive
semi-definite, with
W BB* Z O.
Then the conclusions (a) and (b) of Theorem 1 hold.
PROOF. For conclusion (a) follow the proof of Theorem 1 and use Lemma
9.11.2 together with the controllability of (A, B) to establish the definite
property of H.
Conclusion (b) is a special case of Theorem 4. •
We conclude by noting that the inertia we have been using is often referred
to as the inertia of a matrix with respect to the imaginary axis. In contrast,
the triple {n'(A), v'(A), b'(A)}, where 7l(A) (respectively, v'(A) and b'(A))
0 1 0 0
—c„ 0 1
S = 0 —c„_ 1 0 (15)
0 I
0 0 —c2 —c1
satisfies the Lyapunov equation
STH + HS=W,
Exercise S. If c1, c2 , ... , c„ in (15) are nonzero real numbers, show that
the pair of matrices (W, S1) is controllable and hence
n(S) = n — k, v(S) = k, 5(S) = 0,
We have seen that the stability problem for a linear differential equation
with constant coefficients is closely related to the location of the eigenvalues
of a matrix with respect to the imaginary axis. An analogous role for linear
difference equations is played by the location of eigenvalues relative to the
unit circle.
To illustrate these ideas, let A e C"" and a nonzero x o e C" be given, and
consider the simplest difference equation:
PROOF. First suppose that A is stable and let A 1, A2 , ... , R" denote the eigen-
values of A, so that IA,' < 1 for i = 1, 2, ... , n. Then (see Exercise 9.4.3)
A* + I is nonsingular and therefore we may define the matrix
C = (A* + I)' 1(A* — I). (3)
452 13 Sunnily PROBLEMS
Observe that, by Exercise 9.4.4, the matrix C is stable (with respect to the
imaginary axis). A little calculation shows that for any H e C""
CH + HC* _ (A* + I) - '[(A* — 1)H(A + I) + (A* + 1)H(A — I)]
x (A+ 11 1
= 2(A* + I)' I [A*HA — H](A + I) - '.
and hence
C( —II) + ( —H)C* = 2(A* + I) -1 [H — A*HA)(A + I) -1 . (4)
Now apply Theorem 13.1.1(a) to Eq. (4) to see that, if H — A*HA is posi ti ve
definite, so that the matrix on the right of Eq. 4 is positive definite, then H is
unique and is positive definite.
The converse is proved with the line of argument used in proving Theorem
13.1.1(b). Suppose that Eq. (2) holds with H > O. If A is an eigenvalue of A
corresponding to the eigenvector x, then Ax = Ax and x*A* = Ix*, so that
Eq. (2) implies
x* Vx = (1 — ..r)x*Hx. (5)
The positive definiteness of the matrices V and H now clearly implies I M < 1
for each eigenvalue A of A. •
Following the lead set in the previous section, we now seek a generalization
of Theorem 1 of the following kind: retain the hypotheses V > O and
H* = H, det H # 0, for matrice s of Eq. (2) and look for a connection between
the inertia of H (relative to the imaginary axis) and that of A (relative to the
unit circle). Theorem 2 provides such a generalization and is due to Taussky,
Hill, and Wimmer.t
Theorem 2. Let A eV". If H is a Hermitian matrix satisfying the Stein
equation (2) with V > 0, then H is nonsingular and the matrix A has a(H)
(respectively, v(H)) eigenvalues lying inside (respectively, outside) the (open)
unit circle and no eigenvalues of modulus 1.
Conversely, if A has no eigenvalues of modulus 1, then there exists a Hermitian
matrix H such that Eq. (2) holds and its inertia gives the distribution of eigen-
values of A with respect to the unit circle, as described above.
PaooF. If Eq. (2) is va lid, then the relation (5) shows that A has no eigen-
values of modulus 1. Proceeding as in the proof of Theorem 13.1.2, let the
Jordan form of A be
J = (Ôl fl =s- IAs (6)
L
f See, respectively, J. Algebra 1 (1964), 5-10; Linear Alg. Appl. 2 (1969), 131-142; J. Math.
Anal. Appt 14 (1973), 164-169.
13.2 STABILITY WITH RESPECT TO THE UNIT CIRCLE 453
where J1 e CP ", J2 e CO - P1 " (" - P1, and the eigenvalues of J 1 and J2 lie
inside and outside the unit circle, respectively. Preserving the notation of
the proof of Theorem 13.1.2, Eq. (2) yields the equation
R — J*RJ=Vo , Vo> 0,
as required.
Conversely, suppose that I d I # 1 for any eigenvalue J of A. Then A can
be transformed to the Jordan form (6). By Theorem 1 there are posi ti ve
definite matrices H 1 e C" "P, H2 e C0-0 x 1" - P) such that
H1 -
Ji H1J1 = V1, Vl >0,
J2 1
H2 - (4)-1H2 = V2 , V2 > 0.
Hence the positive definite matrix
Theorem 4. If Eq. (7) holds and the pair (A*, V) is controllable, then A has no
eigenvalues of modulus 1, det H # 0, and the assertion of Theorem 3 is valid.
454 13 STABILITY PROBLEMS
The proofs of the stability results given in the following sections depend
on two broad areas of development. The first is the development of inertia
theory for matrices, which we have already presented, and the second is a
more recent development of properties of Bezout mat rices for a pair of
scalar polynomials, which will now be discussed.
Let a(2) = a121 and b(2) = Lt. 0 b 1d1 (a, # 0, b„, # 0, l Z m) be
_o
arbitrary complex polynomials. Since b(2) can be viewed as E j bi,v, where
b,,,,. 3 = • • • = bl = 0, we may assume without loss of generality that the
polynomials are of the form Ei= o a121, a1 # 0, and D.° b121. The expression
a(2)b(p) — b(2)a(k)
—µ
is easily verified to be a polynomial in the two variables A and p:
0 —1 4
B=—1 4 0
4 0 —2
al 0 0
arises very frequently. For a reason that will be clear in a moment (see
Exercises 3 and 4), we call S(a) the symmetrizer of a(d). Note that S(a) does
not depend on ao .
Exercise 3. If Ca denotes the companion matrix of a(..),
0 1 0 0
0 0 1 .
Ca = . , E x,,
0 Ca C,
0 0 1
—ao —a l —a,_ 1
where âj = aj lad , j = 0, 1, ... ,1 — 1, check that
—a0 0 0
O a2 a,
S(a)CC = 0 .
O a, 0 0
and note that this matrix is symmetric. D
Define the polynomial â(2) = 3.'a(.t -1 ) = ad! + • • • + a,_ 1A + a,; its
symmetrizer is
a1_1 01_2 a0
S(â) = al- 2 ao 0
(3)
ao 0 0
Another device will also be useful. Define the permutation matrix P (the
rotation matrix) by
O 0 0 1
O 01 0
P= (4)
01
10 0 0
456 13 STABILITY PROBLEMS
and observe that P2 = I and that S(a)P and PS(a) are Toeplitz mat rices:
a, 02 ail a, 0 0
S(a)P = 021 , PS(a) _ _
:
0 0 a, al 02 a,
Exercise 4. Generalize the result of Exercise 3. Namely, show that, for any
15 k 51-1,
S(a)C; = diag[—PS 1P, S2], P, S1 E Ckxk, S2 e C I- k1x11-k1, (5)
where S i and S2 stand for the symmetrizers of the polynomials al(A) =
ak + ak - 1 A + • . • + ao Ak and a2(A) = ak + ak+ l A + • .. + a,A r-k, respec-
tively.
Also, show that
S(a)Câ = —PS(â)P. 0 (6)
The formula derived in the following proposition will be useful in the
sequel.
Proposition 1. With the previous notation, the Bezoutian B of scalar poly-
nomials a(A) and b(A) is given by
B = S(a)S(6)P — S(b)S(d)P, (7)
PROOF. Rewrite Eq. (1) in the form
I-1
(A — µ) E 7;00 = a(4b(1i) — b(A)a(IA
I =o
where y,(p) = y,t pJ for i = 0, 1, ..., 1 — 1. Comparing the coefficients
of 2' on both sides of the equation we obtain
Yo(!P) — py1(11) = a1b(11) — b1a(11),
Y1(') — PY2(11) = a2b(1+) — b2a(1 1),
a(µ)µ' -1
Now a little calculation will verify that for any polynomial c(µ) with
degree at most 1,
-')i -1 = S(C)P col(µ` -1 )i= i + PS(c) col(µ' -1)i= > µt, (9)
col(c(i)µt
where by definition
Ro
( 10)
col(RitôatR 1
Ri_ I
and, in general, Ro ,..., R,_ 1 may be scalars or matrices of the same size.
I.
Using Eq. (9) for a(2) and b(2), we find from Eq. (8) that
B col(t')j_ 1 = S(aXS(()P col(ie 1 )1_ 1 + PS(b) col(µ' -1 ),='µt)
— S(bXS(11)Pccol(µ'' 1)1.1 + PS(a) col(µ' - 1 )1=, µ').
Note that for any a(A) and b(2),
S(a)PS(b) = S(b)PS(a), S(â)PS(b) = S(b)PS(â). (12)
-Thus, using the first equality in (12), we obtain
B col(µ'' 1 ); , l = S(a)S(6)P col(ie 1)1. 1 — S(b)S(â)P col(14' - 1)i_ ' .
Since the last relation holds for any p e C, and the Vandermonde matrix
for distinct µ l , µ2 , ... , pt is nonsingular (see Exercise 2.3.6), the required
formula (7) is established. •
A dual expression for the Bezoutian B = Bez(a, b) is presented next.
Exercise S. Check that
—B = PS(â)S(b) — PS(b)S(a). (13)
Hint. Rewrite Eq. (9) in terms of rows. ❑
458 13 STABILITY PROBLEMS
It is easily seen that, if the polynomial a is fixed then the matrix Bez(a, b)
is a linear function of the polynomial b. Indeed, if b(2) and c(2) are poly-
nomials with degree not exceeding ! and /I, y e C, we see that in the definition
of Eq. (1),
a(l){Pb(µ) + Yc(µ)} — {$b( 2) + Yc( 2)}a(11)
= I3{a(2)b(µ) — b(2)a(µ)} + y{a(2)c(µ) — c(2)a(µ)}.
This immediately implies that
Bez(a, pb + yc) = $ Bez(a, b) + y Bez(a, c). (14)
In part icular, writing b(2) = ET .° b Jai we have
/11
Bez(a, b) = E bj Bez(a, kt
J =o
).
(15)
j =kt 1
as required. •
13.3 THE BEZOUTIAN AND T E RESULTANT 459
T C°
PT b(C^=P ,
RTI- t
PP °
and combining with the first equation of (17) we have
pr
PROOF. For definiteness, let 1= deg a(2) Z deg b(a). Write the corres-
ponding Bezout matrix B = Bez(a, b) in the first form of (17) and observe
that dot B # 0 if and only if b(C °) is nonsingular. Since the eigenvalues of the
latter matrix are of the form b(,1 1), where a 1, a2 , ... , at are the zeros of
a(a), the result follows. •
13 STABILITY PROBLEMS
Let a(a)= El" a,A',ai#0, and b(A)= Er. o b,..',b.# 0, where Izm.
As in the definition of the Bezout matrix, we may assume without loss of
generality that b(A) is written in the form ri. o bid,', where the coefficients
of the higher powers are, perhaps, zeros. With this assumption, we define
the resultant (or Sylvester) matrix R(a, b) associated with the polynomials
a(2) and b(A) as the 21 x 21 matrix
ao al ai- a,
l
ao a1 aI 1 ail
R(a, b) = -
b1 b^, 0 U
bo b 1 b,, 0 0
Partitioning R(a, b) into 1 x I matri ces and using the notation introduced
above we may also write
R(a, b) _[S(aP
) PS(a)
S(1)P PS(b) •
Using Eqs. (7) and (12) it follows that
r - 0 ^
R(a, b) = r
PS(d)P S(a)1
L S(b) S(a) LBez(a, b) 0 J
1 0 PS(â)P S(a)
(21)
0 Bez(a, oji. I 0
This equation demonstratés an intimate connection between the resultant
of a(A) an d b(A) and their Bezout matrix. In particular, it is clear from equa-
tion (21) that R(a, b) and Bez(a, b) have the same rank. Also, the resultant .
matrix (as well as the Bezoutian) can be used in solving the problem of
whether the polynomials a(1) and b0.) have zeros in common.
Theorem 2. The polynomials a(A) and b(t) defined above have no common
zeros if and only if the resultant matrix R(a, b) is nonsingular.
PROOF. Since S(a) is nonsingular, it follows immediately from Eq. (21) that
R(a, b) is nonsingular if an d only if Bez(a, b) is nonsingular. So the result
follows from Theorem 1. •
There is an interesting and more general result that we will not discuss':
Namely, that the degree of the greatest common divisor of a(d) and b(k
is just 1 rank R(a, b).
—
Note that the use of the resultant matrix may be more convenient than the
use of the Bezoutian in the search for common divisors since, despite its
double size, the elements of the resultant are just the coefficients of the given
polynomials (or zero).
13.4 TIM REMOTE AND THE RouTx—HuRwrr2 TE EoxeMs 461
Exercise 6. Verify that the Bezout matrix associated with the polynomials
a(A) = Fi=0 aj A' and a(A) = a,At is skew-Hermitian. Combine this
with Exercise 1 to show that the elements of Bez(a, a) are all pure imaginary.
Exercise 7. Check that the Bezout matrix B associated with the poly-
nomials a(2) _ E+=o a ; Ai, a, # 0, and b(.) = Em o b,2', I >_ m, satisfies the
equation
C.B = BC., (22)
where C a and Ca denote the first and second companion matrices associated
with a(A), respectively. (The second companion matrix is defined by Ca =
CI, and Ca is defined in Exercise 3.)
Hint. Apply Eq. (17).
Exercise 8. Let Bez(a, b) denote the Bezout matrix associated with the
polynomials a(A) and b(A) of degrees l and m (1 > m), respectively. Show that
Bez(6, 6) = —P Bez(a, b)P,
where P is defined in Eq. (4).
Exercise 9. Let a(1) and b(A) be two polynomials of degrees 1 and in .
(1 > m), respectively. Denote as = aa(2) = a(A + a), b a = ba(A) = b(A + a),
and check that
Bez(a., ba) = VQ' 1 Bez(a, b)( 1/10)T,
r -
i)
where 11,1) = ai- F] anditsumeh = 0 for j < i.
(
\,1 / F. Jm O 1 .
Exercise 10. Prove that, for any polynomials a(A) and b(A) of degree not
exceeding I,
0 ]R(a, B = Bez(a, b).
b) — B 0] ^
❑
RT(— b, a)[ [
with complex coefficients. The inertia In(a) of the polynomial a(A) is defined
as the triple of nonnegative integers (n(a), v(a), b(a)), where n(a) (respectively,
v(a) and b(a)) denotes the number of zeros of a(A), counting multiplicities,
with positive (respectively, negative and zero) real pa rt s. Like matrices, the
inertia of a polynomial a(A) is also called the inertia of a(A) with respect to the
imaginary axis. In particular, a polynomial a(A) with n(a) = b(a) = 0 is said
to be a stable polynomial (with respect to the imaginary axis).
In contrast, there is the triple In'(a) = {n'(a), v'(a), (5'(a)}, where n'(a),
v'(a), b'(a) denote the number of zeros of a(A) lying in the open upper half-
plane, the open lower half-plane, and on the real axis, respectively. This t riple
is referred to as the inertia of a(A) with respect to the real axis. Clearly,
b'(â)} = {n(a), v(a), b(a)),
where d = â(A) = a(-11). If a()) is a polynomial with real coefficients, then its
complex zeros occur in conjugate pairs and hence n'(a) = v'(a).
The main result of this section is due to C. Hermitet and is concerned
with the dist ribution of zeros of a complex polynomial with respect to the real
axis. The formulation in terms of the Bezoutian is due to M. Fujiwara: in
1926. Indeed, it was Fujiwara who first used Bezout mat rices to develop a
systematic approach to stability criteria, including those of Sections 13.4 and
13.5.
Consider a polynomial a(A) = Ei s o a J .ij (a, # 0) and let â(2) = D.. ât Ai.
If B = Bez(a, â) denotes the Bezout matrix associated with a(A) and n(A), then
it is skew-Hermitian (see Exercise 13.3.6) and it is also nonsingular if and only
if a(2) and â(2) have no zeros in common (Theorem 13.3.1). Thus, det B # 0 if
and only if a(A) has no real zeros and no zeros occuring in conjugate pairs.
t C. R. Acad. Sci. Paris 35 (1852), 52-54; 36 (1853), 294-297. See also J. Jar die Reine u.
Angew. Math. 52 (1856), 39 51.
-
= Ca — [0 0 V],
where y = Lao air' — iio i9r ', ..., at- r al ' — al-
Thus
COB — BC„ = [0 0 v]B = W, (1)
and we are now to show that W > O. Using the first factorization of (13.3.17),
—
(—iCa)*1 B +
I
(i B)(—iCa) = W Z 0, (2)
1
In( — iCa ) = In(-7B),
Corollary 1. All the zeros of the polynomial a(A) lie in the open upper half-
plane if and only if (1/i)B > O, where B = Bez(a, a).
464 13 STABILITY PROBLEMS
note that the matrix B = BD is nonsingular if and only if a(A) has no pure
imaginary zeros or zeros symmetric relative to the origin.
Theorem 2. With the notation of the previous paragraph, the matrix B = BD
is Hermitian. 1f B is also nonsingular, then
n(a) = v(B),
v(a) = RA,
ô(a) = b(B) = O.
PROOF. The assertion of the theorem follows from Theorem 1 and the
easily verified relation
F( 13)F* = -B,
where F = diag[i' -1, i' -2, ..., 1, 1 ] and B is the Bezoutian associated with
a1(A) = a(—iA) and ai(A) = n(i)). •
Corollary 1. A polynomial a(A) is stable if and only if the matrix B is positive
definite.
Example 2. Consider the polynomials a(A) = A 2 — 2A + 5 and a( — A) _
A2 + 2A + 5. The corresponding Bezoutian is
L 1 0Lo 5, —
] Li 0 0 5] —
]{ L 0 4] '
and hence the matrix
20 0
B
[ 0 —4
is negative definite. Thus both zeros, A i and A2 , of the polynomial lie in the
open right half-plane, as required, since A l = 1 + 2i and 22 = 1 — 2i. ❑
If a(A) in Theorem 2 is a real polynomial, then those i, jth elements of the
matrix B with i + j odd are zeros. This is illustrated in the following exercise.
Exercise 3. Check that the matrix B defined in Theorem 2 and associated
with the real polynomial a(A) = ao + ad . + a2 A2 + a3 A3 + a4 A4 is
aval 0 ao a3 0
=2 0 ala2 -apa3 0 ala4
aa a3 0 a2 a3 — al a4 0 1 .
0 ala4 0 a3a4
466 13 STABILITY Piwsums ;1
The existen ce of symmetrical patterns of zeros in the matrix allows us to
compute the inertia of the matrix
Bl 0
QTBQ = 0 [ B2
instead of In A Here
Q = CQi Q2] (3)
is a permutation matrix defined by
Q = [e1 e 3 er7, Q2 = Cet e4 e!],
=
where el, e2 , • , , , e, are the unit vectors in C°, k = l - 1 and j = l if l is even,
k land)=l - 1iflis odd.
a l a2 - ao a3 ala4
B1 = , B2 = a3a4S ❑
ao a3 a2 a3aoa3
Laoal - ala4 J [
a 14 a
_
and define the inertia of a(A) with respect to the unit circle to be the triple of
numbers a(a) {a+(a), a_ (a), ao(a)}, where a +(a) (respectively, a_(a) and
ao(a)) denotes the number of zeros of a(A), counted with their multiplicities,
lying inside the open unit circle (respectively, outside and on the unit cir-
cumference).
Theorem 1. Let a(A) _ ^i=v a AJ (a, # 0) and let B stand for the Bezoutian
,
associated with it(A) and â(A) °- »40. -1 ). If P is a rotation matrix then the
13.5 Tue ScHua-Coate THEOREM 467
Corollary 1. A polynomial a(A) is stable with respect to the unit circle if and
only if the matrix , is positive definite.
1 0 0 bTC, 1
where bT = [bo b1 k_ 1]
and A is the 1 x 1 matrix with all elements
equal to 1. Furthermore, — iBFD = (iDF*B)* = DAD and therefore Eq. (5)
implies
(iCa) *B + B(iCa) = W, (6)
where W = 2DAD Z 0.
Since b'(a) = 0, then b(iC a) = 0 and, by Theorem 13.1.3, In(iCa) = In B.
When B is nonsingular, Proposition 13.1.2 can be applied to Eq. (3) to
yield the inequalities in (2).
The additional condition •'(a) = b(iCa) = 0 together with det B # 0
yields (in view of Theorem 13.1.3 and Eq. (3)) equality throughout (2).
This completes the proof. •
Corollary 1. If, in the previous notation, B < 0 or B > 0, then the zeros of the
polynomial a(A) lie in the open upper half-plane or open lower half-plane,
respectively.
Example 1. The polynomial a(A) = A2 — 1 has two real zeros. The perturba-
tion ib(A) = 2i2 satisfies the hypothesis of Theorem 1 and
B = I?
U
]
I Z =21> 0.
^]
For brevity, we refer to h(2) and g(2), as well as to the obvious representa-
tion
a(A) = h(2 2) + 28(22), (1)
as the L-C splitting of a(2).
Theorem 1. Let Eq. (1) be the L-C splitting of the real polynomial a(.1).
Denote
—Az )
if I is even,
al(A) = h(-.1g( AZ if lis odd;
{Ag( A2) if lis even,
a2(2) =
—h(—A2) if lis odd.
If the Bezout matrix B associated with a 1(A) and a 2(A) is nonsingular, then a(1)
has no pure imaginary zeros and
h(
n(a) = n(B), v(a) = v(B).
PROOF. Let I be even. Define the polynomial 141) by 5(1) = a(iA)
22 ) + a9(— 22) = a 1(2) + 1a2(2) and observe that 5(A) has no real
—
(2)
_
zeros. Indeed, if there were a real A o such that a 1(20) + ia2(Ao) = 0 then,
taking complex conjugates, we would obtain a1(20) — îa2(20) = O. Thus,
it would follow by comparison that a 2(%0) = 0 and hence a t(lo) = O. This
conflicts with the assumed relative primeness of a 1(2) and a2(2) (implied
by det B # 0).
Now applying Theorem 13.6.1, we obtain
v(B) = a'(11) = v(a),
n(B) = v'(â) = n(a),
5(B) = 5'(îï) = 5(a),
and the theorem is established for the case of an even I.
If l is odd, then write
-iâ(A) = 111(— A2 ) — ih(—A 2) = a1(2) + ia2(A)
and observe that the zeros of â(2) and — îü(2) coincide. Now apply Theorem
13.6.1 _again. •
Corollary 1. The polynomial a(A) is stable if and only if B = Bez(a 1i a2)
is negative definite.
PROOF. If a(2) is stable, then the polynomials a1(2) and a2(2) defined above
are relatively prime. Indeed, if a1(20) = a2(A0) = 0 then also a 1(.10) =
a2(20) = 0 and S(2) = a(A0) = O. Hence a(2) has complex zeros µ and
— k (p = ila) or a pure imaginary zero i2 0 (Ao = Ao). In both cases we
contradict the assumption that all zeros of a(2) lie in the open left half-plane.
Thus, B is nonsingular and, by the theorem, must be negative definite. •
Example 1. Consider the polynomial a(A) = (A — 1) 2(2 + 2) and the
representation
— id(A) = — ia(iA) = A( — 3 — 22 ) — 2i.
472 13 STABILITY PROBLEMS
Theorem 2. If a(1) is a real polynomial and h(A) and g(A) form the L-C
splitting of a(A), where the Bezoutians B 1 = Bez(h(A), g(A)) and B2 = Bez(h(A),
Ag(2)) are nonsingular, then a(A) has no pure imaginary zeros and
PROOF. We give the proof for the case of 1 even, say 1= 2k.
Let B denote the Bezout matrix associated with the polynomials a1(A)
and a2(A) defined in Theorem 1. Applying Proposition 13.3.2 and Eq.
(13.3.5), it is found that
k
—1)1`ai 0 0
13.7 THE LIÉNARD-CHIPART CRITERION 473
A^1aj-1
— —
0
a2 +
[ (_1a23 (-1)kal
A^y2^ =
s-1—
(-1)kai 0 0
Thus, by use of Eq. (5), it follows from Eq. (4) that
QTBQ = diag[B2, B1], (6)
where, appealing to Eq. (13.3.5) and writing C for the companion matrix of
8( — 4
k
Bi = E (- 1y-1a2l-1 diag nil_ 1 , 42)-
1=1
— a2 a4
= a4 0 + (-1)k-1a,-1Ck -1)
(all —a3 C+
(- 1)kai 0 0
where P = diag[1, —1, ... , (-1)k - 1 ) a Ck. Now Sylvester's law of inertia
applied to Eqs. (6) and (7) yields, in view of Theorem 1, the required result
provided 1 is even. The case of 1 odd is similarly established. •
Example 2. Let a(2) be the polynomial considered in Example 1. We have
h(d) = 2, g(2) _ — 3 + 2, and
6 1
_Bl —2, B2 =
[-2 ]•
Thus In B 1 = {0, 1, 0), In By = {1, 1, 0), and a(2) has two zeros in the open
right half-plane and one zero in the open left half-plane, as required.
Example3. Leta(.l) = (d — 1)3(2 + 2). Then h(2) = —2 — 32 + 22,9(2)
5 — 2, and
r--17 .
—1 ]' B2= [ 1 2 2]
Since In B1 = {1, 1, 0), In B2 = {2, 0, 0), then, by Eq. (3), In(a) = {3, 1, 0),
as required. 0
A simplified version of the criteria of Theorem 2 will be presented in
Section 13.10.
r(2) _ E
.1=0
and construct the corresponding (truncated) Hankel matrix H of the Markov
parameters h i : H = [hi+J-1] ,,.1. It turns out that the results obtained in
Sections 13.3 and 13.4 can be reformulated in terms of the inertia of a Hankel
matrix of Markov parameters. This is possible in view of the relationship
between Hankel and Bezout matrices that is established in the next
proposition.
I 3.8 THE Mnslcov CRITERION 475
bT C! = h; diag —
Cp C1
where hj+1 = Ch1+1 hp-2 hi+ ,] and j = 1, 2, ... , l — 1. Combining
this with Eq. (3) we obtain
= HS(c).
bTc -1 c }
h(A) Jgo
t Zap. Petersburg Akad. Nauk, 1894. See also "The Collected Works of A. A. Markov,"
Moscow, 1948, pp. 78-105. (In Russian.)
13.8 THE MARKOV CRITERION 477
for sufficiently large 121. Therefore by Eq. (1) the matrices By = B(h(2), Ag(2))
and H2 are congruent. This proves the theorem in the even case.
Now consider the case when 1 is odd. Here deg h(A) < deg g(2) and
h(2) _ j h(2) _ E h j
hj+ 1 A - ^
g( A) 1= o Ag(A) j =1 j2-
Thus H 1 (respectively, H2) is congruent to the Bezoutian — B 2 = B(Ag(A), h(A))
(respectively, —B 1 = B(g(2), h(2))), and Eqs. (8) follow from Theorem
13.7.2. •
Corollary 1. A real polynomial a(2) is stable if and only if H 1 > 0 and
H2 < 0, where H 1 and H2 are defined in Eqs. (6).
and the properties H 1 > 0, H2 < 0 yield the stability of the given polynomial.
D
478 13 STABILITY PROBLEMS
called the Hurwitz matrix associated with a(a.). We shall show that the
Hermitian matrix H from Section 13.8 and the non-Hermitian Hurwitz
matrix R have the same leading principal minors. We state a more general
result due to A. Hurwitz.t
Proposition 1. Let b(A) = ri=o b » M and c(.i.) = _o c,)) be two real
polynomials with c, # 0, and Iet h o , h 1 , h2 , ... denote the Markov parameters
of the rational function r(2) = b(A.)/c(A). Then if we define b i = c j = 0 for
j<0,
C, C, -1 C, -2 : +1
b, b,-1 b!- 2:+ 1
hl h2 h,
0 c, c,_ 1 c1 -2:+ 2
h2
^'det hs+ 1 = det 0 b, br-1 b1-2s+2 •
•
h: h:+1 h2:- 1
0 c, c,_a
A br b,-:
for s = 1, 2, ... , 1. (2)
Note that the matrices on the left and right of Eq. (2) have sizes s and 2s,
respectively.
t Math. Ann. 46 (1895), 273-284.
13.9 THE ROUTTH—HURWITZ THEOREM 479
Hence, denoting the unit vectors in Ces by u1 , u2 , ..., u2,, we easily check
that
Cis- 1
T
b2a -1
T
0 C 2s - 2
(4)
0 o C,
bs -
where
ck= [0 0 c, c,_ 1 c,_k], k = 0,1,..., 2s - 1,
and cj = b1 = 0 for j > 1. Observe that the Toeplitz matrix in Eq. (4) coin-
cides with the matrix in Eq. (3), and therefore their determinants are equal.
Computing the determinant on the left side in Eq. (4), we find that it is equal
to
MT
I
UT2
•
[ 0
( —1)sei' det =48 det ^ ,
Ts
h2s— t Vs Ws _l'
•
•
heT .1
480 13 STABILITY PROBLEMS
where
ho h1 h,71 h, ha+ 1 h2a - 1
0 ho h, h 26-2
1/a c Wa =
.h11'
0 0 ho h1 h: -1 h,
hs h2a-1
as required. ■
The famous Routh-Hurwitz stability result is presented in the following
theorem.
Theorem 1. If a(A) is a real polynomial and there are no zeros in the sequence
Al, A2, • • • , Al, (5)
of leading principal minors of the corresponding Hurwitz matrix, then a(a) has
no pure imaginary zeros and
A2 Al )
n(a) = V a1, A1, -A-3-o, ... , ,
a-1
A1 AI
v(a) = 1 —Val , A
1 , A2 , ... , P a1 , A1 , 2 , ..., , (6)
Al AI - 1/ Al A l - 1/
where V (respectively, P) denotes the number of alterations (respectively,
constancies) of sign in the appropriate sequence.
PROOF. Consider the case of I even, say I = 2k. For s = 1, 2, ... , k, denote
by I5" (respectively, on
the leading principal minor of order s of the matrix
H1 (respectively H2) defined in Eqs. (13.8.6). For s = 1,2 ... , 1, let A, stand
for the leading principal minor of order s of the Hurwitz matrix R. Applying
Eq. (2) for the polynomials bÇt) = .ig(A) c(.1) = h(.1), where h(A) and g(A)
,
Now apply Eq. (2) for b(A) = g(1), c(A) = h(.1) to obtain, for s = 1, 2, ... , k,
a, a, - z
0 a,_ 1 al - 3
at al - 2
a,2s8,(1) = det 0 • = alA2s- 1•
(8)
0 0 a_1 1
0 0 a,
Using Theorem 13.8.2 and the Jacobi rule, if A. # 0 for i = 1, 2, 1,. ,
we obtain
n(a) = V(1, o? , ... , 61t)) + P(1, (5(1 ), ... , 5 12)), (9)
where V (respectively, P) denotes the number of alterations (respectively,
constancies) of sign in the appropriate sequence. In view of Eqs. (7) and (8),
we find from Eq. (9) in the case of a positive a t that
n(a) = V(1, A1, A3, ..., A,_1) + P(1, - A2, A4, ..., ( - 1)kAl)
= V(1, A1, 1663, ..., A,-1) + V( 1, A2, A4, ..., Al), (10)
which is equivalent (in the case a, > 0) to the first equality in Eq. (6). The
second equality in Eq. (6), as well as the proof of both of them for al < 0,
is similar. It remains to mention that the case for 1 odd is established in a
similar way. •
Corollary .1. A real polynomial a(2) of degree 1 with a, > 0 is stable if and
only if
A,> 0,A2>0,...,A, >o. (11)
Example 1. If a(A) = (d - 1)3(A + 2) (see Example 13.7.3), then
- 1 5 0 0
- 1 -3 -2 0
R 0 -1 5 0'
0 1 -3 -2
with A l = -1, A2 = -2, A3 = -8, A4 = 16. Hence
x(a) = V(1, - 1, 2, 4, - 2) = 3, v(a) = 1,
as required. O
Note that the Gundelfinger-Frobenius method of Section 8.6 permits us
to generalize the Routh-Hurwitz theorem in the form (10) or in the form
n(a) = V(al, Al, A3, ... 9 A,) + V(1, A2, ... , A , - 1), (12)
482 13 STABILITY PROBLEMS
The notion of the Cauchy index for a rational function clarifies some
properties of the polynomials involved in the quotient and, as a result, plays
an important role in stability problems.
Consider a real rational function r(A) = b(A)/c(2), where b(A) and c(A)
are real polynomials. The difference between the number of points at which
r(A) suffers discontinuity from — co to + co as A increases from a to b
(a, b e 11, a < b) and the number of points at which ,Q) suffers discontinuity
from + co to — oo on the same interval is called the Cauchy index of r(A)
on (a, b) and is written 4b(2)/c(A). Note that a and b may be assigned
the values a = — oo and/or b = + co.
Exercise 1. Verify that if r(A) = pc/(A — 2e), where µo and Ac are real, then
1 if ito > 0,
I+^ =sgnµ c = —1 if po < 0,
0 if µo = 0.
13.10 THE CAUCHY INDEX AND ITS APPLICATIONS 483
PROOF. Let c(2) = c 0(2 — A l)"'(2 — A2)µ2 ... (A — A;J"k, where 2; # Al for
i # j, i, j = 1, 2, ... , k. Suppose that only the first p distinct zeros of c(2)
are real. If p = 0, then c'(2)/c(A) has no real poles and hence the Cauchy
index is zero. If 1 5 p < k, then
—
ci(2) = k p _ t +
' °
c(2) ;61 2-2; ; 2 — A;
where r l(A) has no real poles. It remains now to apply the result of Exercise 2.
•
Our first theorem links the Cauchy index of polynomials b(A) and c(A)
with the inertia properties of the corresponding Bezout matrix. First recall
the notion of the signature of a Hermitian matrix as introduced in Section
5.5.
Theorem 1. Let b(A) and c(2) be two real polynomials such that
I± — sig B. (1)
m c(A)
Note that, in view of Eq. (13.8.1), the Bezout matrix B in Eq. (1) can be replaced
by the Henkel matrix H of the Markov parameters of b(A)/c(A).
484 13 STABILITY PROBLEMS
PROOF. Let us establish Eq. (1) for the case when all the real zeros of the
polynomial c(2) are simple. Then we may write the rational function
r(2) = b(A)/c(2) in the form
r(2) _ E 2 -µ12i+ r 1 (.1),
i s1
(2)
where Al, 22 , ... , 2, are the distinct real zeros of the polynomial c(2) and
r, (A) is a real polynomial without real poles. Since
1
2 — 2i t 2k+ 1 (3)
for sufficiently large A, we may substitute from Eq. (3) into Eq. (2) and find
hro 1
-
Differentiating Eq. (3), the Markov parameters can be evaluated. The details
are left as an exercise. •
The result of Theorem 1 has several applications. We will start with some
simplifications of the Liénard-Chipart and Markov stability criteria, but
we first present an interesting preliminary result.
13.10 Tm CAucnv INDEX AND ris APPLICATIONS 485
where s + 2k = n and the numbers A i , pi , vj are all real. Obviously, the con-
dition that a(A) has only zeros with negative real parts implies that A t > 0,
i = 1, 2, ... , s, and also that the quadratic polynomials A2 + p A + vj ,
j = 1, 2, ... , k have zeros in the open left half-plane. The last requirement
yields pi > 0, vi > 0 for all j. Now it is obvious from Eq. (7) that since the
•
coefficients of a(A) are sums of products of positive numbers, they are
positive.
Note that the condition (6) is not sufficient for the stability of (0), as the
example of the polynomial A2 + 1 shows.
Thus, in the investigation of stability criteria we may first assume that the
coefficients of a(A) are positive. It turns out that in this case the determinantal
inequalities in (13.9.11) are not independent. Namely, the positivity of the
Hurwitz determinants Ai of even order implies that of the Hurwitz deter-
minants of odd order, and vice versa. This fact will follow from Theorem 4.
We first establish simplifications of the Liénard-Chipart and Markov
stability criteria. This is done with the help of the Cauchy index.
PROOF. The necessity of the condition follows easily from Theorem 13.7.2
and Proposition 2. The converse statemer . will be proved for the even case,
= 2k, while the verification of the odd case is left as an exercise.
Let B1 = B(h, g) > 0 and assume the coefficients a o, a2 , ... , at of h(A)
are positive. Then obviously h(A) > 0 for all positive A e Furthermore,
if h(A 1) = 0 for Al > 0, then g(A 1) # 0 since otherwise the Bezoutian B will
be singular (Theorem 13.3.1). Thus, although the fractions g(A)/h(A) and
Ag(A)/h(A) may suffer infinite discontinuity as A passes from 0 to + ao, there
486 13 STABILITY PROBLEMS
I° ^^)
h(A)
— I°- 29(2)
h(A)
—
and therefore
I+a' 8(2) — l o 9(2) + lo e(2) — lo 9^2) _ —10 29(2) (8 )
m
h(A) - b(A) h(A) h(2) h(A)
But the Bezoutian B 1 = Bez(h, g) is assumed to be positive definite, and
therefore, by Theorem 1,
I „ }=k .
Applying Theorem 1 again, we find from Eq. (8) that B2 = Bez(h, Ag) is
negative definite. Theorem 13.7.2 now provides the stability of the polynomial
a(A). •
Note that Theorem 2 can be reformulated in an obvious way in terms of
the Bezoutian B2 = Bez(h, g) and the coefficients of h(A) (or g(A)).
The congruence of the matri ces B1 and H 1 defined in Theorem 13.8.2
leads to the following result.
Theorem 3 (The Markov criterion). A real polynomial a(A) is stable if
and only if the coefficients of h(A) from the L-C splitting of a(A) have the same
sign as the leading coefficient of a(A) and also H 1 > O.
The remark made after Theorem 2 obviously applies to Theorem 3. This
fact, as well as the relations (13.9.7) and (13.9.8), permit us to obtain a
different form of the Liénard-Chipart criterion.
Theorem 4 (The Liénard-Chipart criterion). A real polynomial a(A) =
. 0 a,Ai, a, > 0, is stable if and only if one of the following four conditions
holds:
SOLUTION, To find the Newton sums of a(A), we may use formulas (9) and
(10) or the representation (11). We have s o = 4, s, = 2, si = 0, s3 = 2,
s4 = 4, s, = 2, s6 = 0, and hence the corresponding Hankel matrix H is
4 202
H
_ 2 024
0 2 4 2'
2 420
Since In H = {2,1,1), there is one pair of distinct complex conjugate
zeros (y = 1) and one real zero (n — y = 1). It is easily checked that, actually,
a(A) = (t — 1) 2(A2 ÷ 1), and the results are confirmed. ❑
CHAPTER 14
Matrix Polynomials
It was seen in Sections 4.13 and 9.10 that the solution of a system of
constant-coefficient (or time-inva riant) differential equations, A(t) — Ax(t)
= f(t), where A E C"" and x(0) is prescribed, is completely determined
by the properties of the matrix-valued function M — A. More specifically,
the solution is determined by the zeros of det(AI — A) and by the bases
for the subspaces Ker(AJ — A).1 of C", j = 0, 1, 2, ... , n. These spectral
properties are, in turn, summarized in the Jordan form for A or, as we shall
now say, in the Jordan form for ,i,I — A.
The analysis of this chapters will admit the desc ription of solutions for
initial-value problems for higher-order systems:
L,x(1)(t) xtt-ti(t) + ... + L i xt t)(t) + Lo
x(t) = f(t), ( 1)
+ L`- i
where Lo , L t, ..., L, a C""", det Lt # 0, and the indices denote derivatives
with respect to the independent variable t. The underlying algebraic problem
now concerns the associated matrix polynomial L(A) = Et _ o AiLj of degree
1, as introduced in Section 7.1. We shall confine our attention to the case in
which det L, # 0. Since the coefficient of A l" in the polynomial det L(A) is
just det L,, it follows that L(A) is regular or, in the terminology of Section
7.5, has rank n.
As for the first-order system of differential equations, the formulation
of general solutions for Eq. (1) (even when the coefficients are Hermitian)
requires the development of a "Jordan structure" for L(A), and that will be
done in this chapter. Important first steps in this direction have been taken
t The systematic spectral theory of matrix polynomials presented here is due to I. Gohberg,
P. Lancaster, and L. Rodman. See their 1982 monograph of Appendix 3.
489
490 14 MATBIx POLYNOMIALS
in Chapter 7; we recall particularly the Smith normal form and the elemen-
tary divisors, both of which are invariant under equivalence transformations
of L(A). In Chapter 7, these notions were used to develop the Jordan structure
of 21 — A (or of A). Now they will be used in the analysis of the spectral
properties of more general matrix polynomials.
_
C,,a
r
It has been seen on several occasions in earlier chapters that if a(A)
aril is a monic scalar polynomial then the related 1 x I companion
matrix,
0
0
—a0 —al
1
0
0
1
—a2
0
1
—a, - 1
(1)
is a useful idea. Recall the important part played in the formulation of the
first natural normal form of Chapter 7 and in the formulation of stability
criteria in Chapter 13 (see also Exercises 6.6.3 and 7.5.4). In pa rt icular, note
that a(A) = det(AI — Ce), so that the zeros of a(A) correspond to the eigen-
values of C,.
For a matrix polynomial of size n x n,
L(2) _ E
/= o
2'L! , det L, # 0, (2)
This means that the latent roots of L(A) (as defined in Section 7.7) coincide
with the eigenvalues of CL . Admitting an ambiguity that should cause no
difficulties, the set of latent roots of L(A) is called the spectrum of L(2) and
is written a(L). From the point of view of matrix polynomials, the relation
(4) says that L(A) and Al CL have the same invariant polynomial of highest
—
degree. However, the connection is deeper than this, as the first theorem
shows.
Theorem 1. The In x In matrix polynomials
[L(A) 01
and AI, — CL
0 Itt - 1*
are equivalent.
PROOF. First define In x In matrix polynomials E(A) and F(A) by
I 0 0
—AI I
F(A) = 0 —AI I ,
I 0
0 —AI I
BI- 1 (A) Bi- 2(2)Bo(A)
—1 0 0
E(A) =
0 —I
0 - I 0
where B0(A) = L,, Br+1(A) = AB,(A) + L I - r - 1 for r=0,1,...,1-2, and
I = I. Clearly
det E(A) ±det L,, det F(A) = 1. (5)
Hence F(A) -1 is also a matrix polynomial. It is easily verified that
E(A)(AI — C L)F(A) - 1
L(2) O
[
O iv- On] =
determines the equivalence stated in the theorem. ■
492 14 MATRIX POLYNOMIALS
Theorem 1 (together with Theorem 7.5.1 and its corollary) shows that all
of the nontrivial invariant polynomials (and hence all of the elementary
divisors) of L(A) and AI — C L coincide. Here, "nontrivial" means a poly-
nomial that is not identically equal to 1.
Exercise 1. Establish the relations (4) and (5).
Exercise 2. Show that if A is an eigenvalue of CL then the dimension of
Ker(AI — CL) cannot exceed n. (Note the special case n = 1.)
Exercise 3. Formulate the first companion matrix and find all elementary
divisors of (a) L(A) = All„ and (b) L(A) = All„ — I„. ❑
Observe now that there are other matrices A, as well as CL , for which
Af — A and diag[L(A), 10 _ 1 0 are equivalent. In particular, if A is a matrix
similar to C L , say A = TC L T - 1, then AI — A = T(21 — C L)T' 1 , so that
AI — A and AI — CL are equivalent and, by the transitivity of the equivalence
relation, AI — A and diag[L(A), Iii _ 10] are equivalent. It is useful to make
a formal definition: If L(A) is an n x n matrix polynomial of degree 1 with
nonsingular leading coefficient, then any In x In matrix A for which AI — A
and diag[L(A), It,._ 1) „] are equivalent is called a linearization of L(A).
The next theorem shows that the matrices A that are similar to CL are,
in fact, all of the linearizations of L(A).
Theorem 2. Any matrix A is a linearization of L(A) if and only if A is similar
to the companion matrix C L of L(A).
PROOF. It has been shown that if A is similar to CL then it is a linearization
of -L(A). Conversely, if A is a linearization of L(A) then AI — A — L(A) +-
1(1 _ 1N, (using as an abbreviation for "equivalent to"). But by Theorem
1, L(A) -I- 1 1 ,_ 1)„ . AI — C L so, using transitivity again, AI — A — AI — C L .
Then Theorem 7.6.1 shows that A and CL are similar. ■
This result, combined with Theorem 6.5.1 (or 7.8.2), shows that there is a
matrix in Jordan normal form that is a linearization for L(A) and, therefore,
it includes all the information concerning the latent roots of L(A) and their
algebraic multiplicities (i.e., the information contained in a Segre character-
istic, as described in Section 6.6).
Exercise 4. Use Exercise 6.6.3 to find a matrix in Jordan normal form that
is a linearization for a scalar polynomial (i.e., a 1 x 1 matrix polynomial
L(A)).
Exercise S. Find matrices in Jordan normal form that are linearizations
for the matrix polynomials of Exercise 3. Do the same for
[A2 —A
L(A)= 0 A2
14.2 STANDARD TRIPLES AND PAIRS 493
I
O. I
E(A)
0
L
and this means that the first n columns of E(A) -1 are simply equal to R 1.
Since the first n rows of F(A) are equal to P i, if we equate the leading n x n
submatrices on the left and right of Eq. (3), we obtain Eq. (1). •
Three matrices (U, T, V) are said to be admissible for the n x n matrix
polynomial L(A) of degree 1 (as defined in Eq. (14.1.2)) if they are of sizes
n x in, In x In, and In x n, respectively. Then two admissible triples for
L(A), say (U1 , 71, V1) and (U2, T2, V2), are said to be similar if there is a non-
singular matrix S such that
-1 T2S,
U1= U2S, T1=S V1 =S -1 Vi (4)
Exercise 1. Show that similarity of triples is an equivalence relation on
the set of all admissible triples for L(1). ❑
Now, the triple (P 1i CL , R 1) is admissible for L(A), and any triple similar
to (P 1r CL , R1) is said to be a standard triple for L(2). Of course, with a trans-
forming matrix S = I, this means that (P 1, CL, R1) is itself a standard triple.
Note that, by Theorem 14.1.2, the second member of a standard triple for
L(A) is always a linearization of L(A). Now Theorem 1 is easily generalized.
Theorem 2. If (U, T, V) is a standard triple for L(A) and A it a(L), then
L(2) - ' = U(AI — T)-1V.
14.2 STANDARD TRIPLES AND PAIRS 495
Observe that (Al — C L)" 1 = (S" 1(.Î.I — T)S) -1 = S -1(11 — T)" 1 S, and
substitute into Eq. (1) to obtain the result. ■
Thus we obtain a representation of the resolvent of L().) in terms of the
resolvent of any linearization for L(2). Our next objective is to characterize
standard triples for L(2) in a more constructive and direct way; it is not
generally reasonable to try to find the matrix S of the proof of Theorem 2
explicitly. First we will show how to "recover" the companion matrix CL
from any given standard triple.
Q
1UU T (5)
UT' -1
Then Q is nonsingular and CL = QTQ" 1.
P1
P1CL =— I IN (6)
C'-1
P1 L
"1. •
Substitute for P1 and CL from Eqs. (4) to obtain QS = I,„ . Obviously, Q
is nonsingular and S = V. In particular, CL = QTQ
In fact, this lemma shows that for any standard triple (U, T, V), the primitive
triple (P 1 , CL , R 1) can be recovered from Eqs. (4) on choosing S =
where Q is given by Eq. (5).
(c)
"1
(a) The matrix Q of Eq. (5) is nonsingular;
(b) L,UT'+L,_ 1 UT' +•••+L 1 UT+L o U=O.
V = Q - 1 R 1 , where Q and R 1 are given by Eqs. (5) and (2), respectively.
496 14 MATRIX POLYNOMIALS
PROOF. We first check that conditions (a), (b), and (c) are satisfied by the
standard triple (P1, CL , R1). Condition (a) is proved in the proof of Lemma
1. The relation (6) also gives
= (PICi 1)CL = [0
PICi O I]CL = [— Lo —L,_1].
Consequently,
I.rPICL = [ Lo —L1
— Ll._ 1]. -
(7)
But also,
Pi
f-1 P1CL
E LLPICt _ [Lo L1 1]
4_—1] = [ Lo L1 Lr- I].
=o
•
P 1 C ► -1
L
So, adding the last two relations, we get Ei,.0 UP 1 q = 0, which is just the
relation (b) with (U, T, V) replaced by (P 1, CL , R1). In this case Q = I,
so condition (c) is trivially satisfied.
Now let (U, T, V) be any standard triple and let
U = PI S, T = S - 'CL S, V = S -1R 1 .
Then
U Pi
Q UT
=
UT'-' PICi 1
using (6) so that (a) is satisfied. Then
1
E L, UTT
J=0
= E L,P1C{) S =
i- o
[0 0 L^ C2,
interior. Then integrating Eq. (9) term by term and using the theorem of
residues,
1 f ,11L(4_ i d2—f0LT, if j = 0,1,..., l-2;
2rci L if j = / — 1.
But also (see Theorem 9.9.2),
UT1—'V UT2'' 2 V
U
UT
_ [V TV T' - 'V]. (11)
F UT1: — '
Both matrices in the last factorization are in x in and must be nonsingular,
since the matrix above of triangular form is clearly nonsingular.
Using (10) o more we also have, for j = 0, 1, ..., l — 1,
It has now been shown that U and T satisfy conditions (a) and (b) of
Theorem 3. But condition (c) is also contained in Eqs. (11). Hence (U, T, V)
is a standard triple. •
Corollary 1. If (U, T, V) is a standard triple for L(.1), then (VT, TT, U1) is a
standard triple for L T()) .l'Lj, and (V*, T*, U*) is a standard triple
for L*(2)
PROOF. Apply Theorem 2 to obtain Eq. (8). Then take transposes and apply
Theorem 4 to get the first statement. Take conjugate transposes to get the
second. •
Exercise 6. Let (U, T, V) be a standard triple for L(1.) and show that
U
UT
Q= , R = [V TV T' - 'V], (13)
UT' - '
then
RSL Q = I. (14)
PI L
and, using Eq. (15),
R = S - '[R, C LR, Ci 1 R1] =
Thus, RSLQ = (S -1Si 1)SLS = I, as required. •
By definition, all standard triples for a matrix polynomial L(A) with non-
singular leading coefficient are similar to one another in the sense of Eqs.
(14.2.4). Thus, there are always standard triples for which the second term,
the linearization of L(A), is in Jordan normal form, J. In this case, a standard
triple (X, J, Y) is called a Jordan triple for L(l.). Similarly, the ma tr ices
(X, J) from a Jordan triple are called a Jordan pair for L(A). We will now
show that complete spectral information is explicitly given by a Jordan
triple.
Let us begin by making a simplifying assumption that will subsequently
be dropped. Suppose that L(A) is an n x n matrix polynomial of degree 1
with det L, # 0 and with all linear elementary divisors. This means that any
Jordan matrix J that is a linearization for L(A) is diagonal. This is certainly
the case if all latent roots of L(A) a re distinct. Write J = diag[2„ 22 i ... , A,"].
Let (X, J) be a Jordan pair for L(A) and let xi e C" be the jth column of
X, for j = 1, 2, ... , In. Since a Jordan pair is also a standard pair, it follows
from Theorem 14.2.3 that the In x ln matrix
X
XJ
(1)
XJ` - '
is nonsingular. Also,
L,XJ'+•••+L,XJ+L a X =0. (2)
14.3 THE STRUCTURE OF JORDAN TRIPLES 501
First observe that for each j, the vector x; # 0, otherwise Q has a complete
column of zeros, contradicting its nonsingularity. Then note that each term
in Eq. (2) is an n x In matrix; pick out the jth column of each. Since J is
assumed to be diagonal, it is found that for j = 1,2 ... , In,
L1 21.1 x; +•••+L 12;x; + L o x; =O.
In other words L(2i)x; = 0 and x; # O; or x; E Ker L(2;).
Now, the nonzero vectors in Ker L(2;) are defined to be the (right) latent
vectors of L(A) corresponding to the latent root 2,. Thus, when (X, J) is a
Jordan pair and J is diagonal, every column of X is a latent vector of L(2).
The general situation, in which J may have blocks of any size between 1
and In, is more complicated, but it can be described using the same kind of
procedure. First it is necessary to generalize the idea of a "latent vector" of
a matrix polynomial in such a way that it will include the notion of Jordan
chains introduced in Section 6.3. Let L (4(2) be the matrix polynomial ob-
tained by differentiating L(..) r times with respect to 2. Thus, when L(A) has
degree 1, L(')(2) = O for r > 1. The set of vectors x o , x1 , ... , xk , with x0 # 0,
is a Jordan chain of length k + 1 for L(2) corresponding to the latent root
20 if the following k + 1 relations hold:
L(2 0 )xo = 0;
L(2 o)x i + 0) 0;
—11 L (2o)xo =
(3)
Now we will show that if (X, J) is a Jordan pair of the n x n matrix poly-
nomial 1(2), then the columns of X are made up of Jordan chains for L(2).
Let J = diag[J 1 , ... , J,], where J; is a Jordan block of size n; , j = 1, 2, ... , s.
Form the partition X = [X1 X2 X,], where X; is n x n; for
j = 1, 2, ... , s, and observe that for r = 0, 1, 2, ...
XJ' _ [X 1.1i XzJi ... X,J;].
502 14 MATRIX POLYNOMIALS
Jr.) =
Using the expressions in (4) and examining the columns in the order
1, 2, ... , nj, a chain of the form (3) is obtained with Ao replaced by Ai, and
so on. Thus, for each j, the columns of X j form a Jordan chain of length nj
corresponding to the latent root A i .
The complete set of Jordan chains generated by X 1, X2 , ..., X, is said
to be a canonical set of Jordan chains for L(2). The construction of a canonical
set from a given polynomial L(A) is easy when J is a diagonal matrix: simply
choose any basis of latent vectors in Ker L(2 i) for each distinct Ai. The con-
struction when J has blocks of general size is more delicate and is left for
more intensive studies of the theory of matrix polynomials. However, an
example is analyzed in the first exercise below, which also serves to illustrate
the linear dependencies that can occur among the latent vectors and Jordan
chains.
Exercise 2. Consider the 2 x 2 monic matrix polynomial of degree 2,
L(2)
240
1] 2 -1] + [1 0] _ [A + 1
+ [1 A(A O 1)] ,
and note that det L(A) = A3(A — 1). Thus, L(A) has two distinct latent roots:
Al =O and A2 =1.Since
Now observe that the Jordan chain for L(2) corresponding to the latent
root 22 = 1 has only one member, that is, the chain has length 1. Indeed,
if xo = [0 air and a # 0, the equation for x 1 is
[21 01x1 + [21
L(1)x1 + LIU(1)xo =
O] [a] = [0]
which has no solution.
The corresponding equation at the latent root d, = 0 is
L(0)xl + L" ► 0)xo
(
[O 0]xI
+ [1 — O aJ — [0 ],
] [
which has the general solution x, = [a 131T, where fi Is an arb trary complex
number.
A third member of the chain, x2 , is obtained from the equation
1 _Oil [fil + [01 0e01
L(0)x2 + Ltl)(0)xI + L(2)( 0)xo = [01 0]x2 + [0
1][
U
= [1 0]x2+[2a /1] — [0]'
which gives x2 = [-2a + fi AT for arbitrary y.
For a fourth member of the chain, we obtain
1 OJ [^]
[0
[1 o]x3+ [
2a + M y] [0]
- — - '
02 22
—L(A)—[A •
Verify that
0 0 0 0
_ [0 1 0 01 0 0 1 0
X
— [1 0 1 0j '
J_ 0 0 0 1
0 0 0 0
is a Jordan pair for L(2).
Exercise 5. Find a Jordan pair for each of the scalar polynomials L 1(2) =
(2 — 2e)' and L2(A) = (2 — AIM — 2 2)9, where Al # A2 and p, q are posi-
tive integers. 0
Now suppose that (X, J, Y) is a Jordan triple for L(d). The roles of X and
J are understood, but what about the matrix Y? Its interpretation in terms
of spectral data can be obtained from Corollary 14.2.1, which says that
(YT, JT, XT) is a standard triple for LT(2). Note that this is not a Jordan
triple, because the linearization JT is the transpose of a Jordan form and so
is not, in general, in Jordan form itself. To transform it to Jord an form we
introduce the matrix
P = diag[P1, P 2 , ... , Pa],
where Pi is the ni x ni rotation matrix:
0 0 1
.1 0
P' _ 0 l ' •
10 0J
(Recall that J = diag[J 1, ... , J,], where Ji has size ni .) Note that P 2 = I
(so that 13- ' = P) and P* = P. It is easily verified that for each j,P^ J^ P1 = J^
and hence that PJTP = J. Now the similarity transformation
YTP, J = PJTP, PX T
of (YT, JT, XT) yields a standard triple for LT(2) in which the middle term is
in Jordan form. Thus, ( YTP, J, PXT) is a Jordan triple for LT(2), and our
14.3 THE STRUCTURE OF JORDAN TRIPLES 505
discussion of Jordan pairs can be used to describe Y. Thus, let Ybe partitioned
in the form
ri
Y2
Y= ,
•
Y3
where Y is n; x n forj = 1, 2, ... , s. Then YTP = [YIP; YS P,], and
it follows that, for each j, the transposed rows of Y taken in reverse order
form a Jordan chain of L T(A) of length n ; corresponding to A; .
These Jordan chains are called left Jordan chains of L(a.), and left latent
vectors of L(A) are the nonzero vectors in Ker L T(A;), j = 1, 2, ... , s. Thus,
the leading members of left Jordan chains are left latent vectors.
Exercise 6. (a) Let A e C"'" and assume A has only linear elementary
divisors. Use Exercises 4.14.3 and 4.14.4 to show that right and left eigen-
vectors of A can be defined in such a way that the resolvent of A has the
spectral representation
(AI — A)-1 =
j=1
E AX—iA; .
(b) Let L(A) be an n x n matrix polynomial of degree 1 with det L1 # 0
and only linear elementary divisors. Show that right and left latent vectors
for L(A) can be defined in such a way that
L(A) -1 =E =
xjyT
— .I
1 AA ; •
j
Exercise 7. Suppose that A; e a(L) has just one associated Jordan block
J;, and the size of J; is v. Show that the singular part of the Laurent expansion
for L -1 (A) about A; can be written in the form
v v-k T
r xv-k-Kg
k=1 r : 0a-";)k
for some right and left Jordan chains {x 1, ... , x„} and (y 1 , ... , yy), re-
spectively (see Exercise 9.5.2).
Hint. First observe that
(A — A;) -1 (A — A ;) -2 (A —.l;)` r
0 (A—A;)-1
(AI — J1 ) -1 =
(A —Ay) -1 (A—.l;) -2
0 0 (A — A;)- 1
506 14 MATRIX POLYNOMIALS
j
dt (xo ems`) = 4 xo eA0t,
( '
so that
` d i
4-e-ii
d)(x0e4) =
f.o L j( dt) (xoez0`) = L(0)x0 = 0.
Now suppose that xo , x1 are two leading members of a Jordan chain
of L(A) associated with the latent root 61.0 . We claim that
uo(t) = xo ea0` and u 1(t) = (tx o + x1)e101
are linearly independent solutions of Eq. (2). We have already seen that the
first function is a solution. Also, the first function obviously satisfies
d 1
,lo1 u0 (t)=0. (3)
dt 1
Now observe that, for the second function,
d .Z 10 1tx
( +
0 x )e'°`
1 = x 0 e4`.
dt—
14.4 APPLICATIONS TO DIFFERENTIAL EQUATIONS 507
In other words,
d
201 u l (t) = uo(t),
(dt
so that, using Eq. (3),
2
d (4)
— 201 u1(t) = 0.
\dt
L( dt ) — L(2 o) + 1I Lt 11 (2o)
(5)
Now Eq. (4) implies that
d 1^
/ u1(t) = 0, j = 2,3,....
^ t —2o
Thus
_ (L(A0)xo)te' 0` + (L(Ao)x1 +
1 ! Lc11(2o)xo)e
0,
using the first two relations of Eqs. (14.3.3), which define a Jordan chain.
Thus u 1(t) is also a solution of Eq. (2).
To see that uo(t) and u1 (t) are linearly independent functions of t, observe
the fact that auo(t) + & 1 (t) =0 for all t implies
t(ax0) + (ax 1 + fixo) = 0.
Since this is true for all t we must have a = 0, and the equation reduces to ,
/k I tI
uk-1(t) = 1
j
E ?^ xk-1 -J) I (6)
PROOF. Let functions uo(t), ... , uk_ 2(t) be defined as in Eqs. (6) and, for
convenience, let u1(t) be identically zero if i = —1, —2 .... It is found (see
,
° = L(ât)ak-1(t)
Now the function e'°` can be cancelled from this equation. Equating the
vector coefficients of powers of t to zero in the remaining equation, the
relations (14.3.3) are obtained and since we assume x o # 0 we see that A. is
a latent root of L(%) and xo , x„ ... , x„_ , is a corresponding Jordan chain. e
Exercise 1. Complete the proof of Theorem 1.
Exercise 2. Use Theorem 1 and the results of Exercise 14.3.2 to find four
linearly independent solutions of the system of differential equations
d2x
_0,
de
d2 x2
dtl — d + x,= 0.
SOLUTION Written in the form of Eq. (2), where x = [x, x23T, we have
[O 1x(2)(1) + [0 _O lx(1)(t) + [0 U=
1x(t)
0. (7)
vo(t) _ [01e`.
1
Then there is the latent root A2 = 0 with Jordan chain [0 13T, [1 03T,
[ — 2 01T, and there are three corresponding linearly independent solutions:
roots and a canonical set of Jordan chains. The completion of this argument
is indicated in Exercise 2 of this section. But first we turn our attention to the
question of general solutions of initial-value problems in the spirit of Theo-
rems 9.10.4 and 9.10.5. Our main objective is to show how a general solution
of Eq. (14.4.1) can be expressed in terms of any standard triple for L(A).
Consider first the idea of linearization of Eq. (14.4.1), or of Eq. (14.4.2),
as used in the theory of differential equations. Functions xo(t), ... , x,_ 1 (t)
are defined in terms of a solution function x(t) and its derivatives by
xo(t) = x(t),
lX-1)
xi(t) = x(1)(t), .. ., x,-1(t) = x (t)• ( 1)
Then Eq. (14.4.1) takes the form
In fact, it is easily verified that Eqs. (1), (2), and (3) determine a one-to-one
correspondence between the solutions x(t) of Eq. (14.4.1) and x"(t) of Eq. (4).
Thus, the problem is reduced from order 1 to order 1 in the derivatives, but
at the expense of increasing the size of the vector variable from 1 to In and,
of course, at the expense of a lot of redundancies in the matrix CL .
However, we take advantage of Theorem 9.10.5 and observe that every
solution of Eq. (4) has the form
for some z e C`". Now recall the definitions (14.2.2) of mat rices P1 and R1
1 , CL, R 1 ) for L(2). Noting that P 1x(t) = x(t), it isinthesadrpl(P
found that, multiplying the last equation on the left by P 1 ,
The step fitom the lemma to our main result is a small one.
Theorem 1. Let (U, T, V) be a standard triple for L(A). Then every solution
of Eq. (14.4.1) has the form
for some z e
PROOF. Recall that, from the definition of a standard triple (see the proof of
Theorem 14.2.2), there is a nonsingular S e Cb"'" such that
PI = US, C, = S- 'TS, R, = S - 'V
Also, using Theorem 9.4.2, ecL" = S- l eT"S for any a E R. Consequently,
1 ecL° = Ue T "S, Pi eciAR, = Ue T°i!
Substitution of these expressions into Eq. (5) with a = t concludes the
proof. •
In practice there a re two important candidates for the triple (U, T, V) of
the theorem. The first is the triple (P1, CL, R 1), as used in the lemma that
expresses the general solution in terms of the coefficients of the differential
equation. The second is a Jordan triple (X, J, Y), in which case the solution
is expressed in terms of the spec tral properties of L(J.). The latter choice of
triple is, of course, intimately connected to the ideas of the preceding section.
Exercise 1. Suppose that, in Eq. (14.4.1),
_ fo for t Z 0,
f(t) 0 for t < O.
Let Lo be nonsingular and N4 J, < 0 for every 2 e a(L). Show that the solu-
tion x(t) satisfies
lim x(t) = La ' fo .
t-.m
Exercise 2. Let (X, J) be a Jordan pair for the matrix polynomial L(2)
associated with Eq. (14.1.2). Show that every solution of that equation has
512 14 Merlux POLYNOMIALS
the form x(t) = Xe"z for some z e C'". Hence show that the solution space
has a basis of In functions of the form described in Theorem 14.4.1. ❑
Theorem 1 can be used to give an explicit representation for the solution
of a classical initial-value problem.
Theorem 2. Let (U, T, V) be a standard triple for L(2). Then there is a unique
solution of the differential equation (14.4.1) satisfying initial conditions
xt(0) = xj , j = 0, 1, ... , 1 — 1, (7)
for any given vectors xo, x1 , ... , x,_ 1 e C". This solution is given by Eq. (6)
with
xo
z = [V TV T' -1 V]SL , (8)
xr- 1
where SL is the symmetrizer for L(2).
PROOF. We first verify that the function x(t) given by Eqs. (6) and (8) does,
indeed, satisfy the initial conditions (7). Differentiating Eq. (6), it is found that
x(0) U
xc1 ►(0)
= UT z = Qz, (9)
1. xu-11(0) UT:1-1
in the notation of Eq. (14.2.13). But the result of Theorem 14.2.5 implies that
QRS L = I, and so substitution of z from Eq. (8) into Eq. (9) shows that
x(t) defined in this way satisfies the initial conditions.
Finally, the uniqueness of z follows from Eq. (9) in view of the invertibility
of Q. •
ar) = AL xo
10 1) = `1/ AL 1 x + At x1,
The verification of the linear independence of the vectors a 10), 00), v") , v(2) in
.9'(C2) is left as an exercise.
Since the solution space has dimension 4, every solution has the form
a1 n10) + 012 00) + Œ3v1» + 0,4 v(2)
for some 01 1, 012 , a,, 014 a C. ❑
Consider now the formulation of a general solu tion of the inhomogen-
eous difference equation (1). Observe first that the general solution x e .9'(C")
of (1) can be represented in the following way:
x = x 1°) + xu),
(4)
where x10) is a general solution of the homogeneous equation (2) and x 11)
is a fixed particular solution of Eq. (1). Indeed, it is easily checked that the
sequence in Eq. (4) satisfies Eq. (1) for every solution x 1o) of Eq. (2) and,
conversely, the difference x — x11) is a solution of Eq. (2).
Now we take advantage of the idea of linearization to show how a general
solution of Eq. (1) can be expressed in terms of any standard triple of L(t).
where the matrices P 1 and R I are defined by Eq. (14.2.2) and z e C'" is arbitrary.
14.6 DIFFERENCE EQUATIONS 515
PROOF. First observe that the general solution x 01 = ( 4), 41, ...) of the
homogeneous equation (2) is given by
x11 =P1Ciz, j =0,1,2,... (6)
where z e C'" is arbitrary. This is easily verified by substitution and by using
Theorem 7.10.2 to prove that Eq. (6) gives all the solutions of Eq. (2).
Thus it remains to show that the sequence (0, x1», x(20, ...), with
!-1
x(i1) = P1 E CC k-1R1fk, j. 1, 2, ...,
k0
is a particular solution of Eq. (1). This is left as an exercise. We only note that
the proof relies, in particular, on the use of Theorem 14.2.3 and Exercise
14.2.6 applied to the companion triple (P 1, CL , R 1 ). •
Using the similarity of the standard triples of L(R) and the assertion of
Lemma 1, we easily obtain the following analogue of Theorem 14.5.1.
Theorem 3. If (U, T, V) is a standard triple for L(2), then every solution of
Eq. (2) has the form x o = Uz, and for j = 1, 2, ..
J-1
X = UTjz + U
!
E T1-k-1Vfk,
=0
where z e V".
We conclude this section by giving an explicit solution of the initial-value
problem for a system of difference equations (compare with Theorem 14.5.2).
Theorem 4. Let (U, T, V) be a standard triple for L(A). There is a unique
solution of Eq. (1) satisfying initial conditions
xj = ai, j = 0, 1, . . . , 1 — 1, (7)
for any given vectors ao , a t, ..., a,_ 1 e C". This solution is given explicitly
by the results of Theorem 3 together with the choice
Fao
Tr-111SL ai ,
z=[V TV
a,-1
where S i, is the symmetrizer for L(A).
The proof of this result is similar to that of Theorem 14.5.2.
Exercise 2. Complete the proofs of Theorems 1, 2, and 4.
516 14 MATRIX POLYNOMIALS
It has already been seen, in Theorem 14.2.2, that any standard t riple for a
matrix polynomial L(A) (with det Li # 0) can be used to give an explicit
representation of the (rational) resolvent function L(.1) - '. In this section
we look for a representation of L(A) itself, that is, of the coefficient matrices
Lo , L I , ... , L,, in terms of a standard triple. We will show that, in particular,
the representations given here provide a generalization of the reduction of a
square matrix to Jordan normal form by similarity.
Theorem 1. Let (U, T, Y) be a standard triple for a matrix polynomial
L(A) (as defined in Eq. (14.1.2)). Then the coefficients of L(A) admit the follow-
ing representations:
LI = ( UT' -, V) - h (1 )
U -'
UT (2)
[L0 Li Li- I] = —LIUT'
UTr -1
and
Lo
= —[V TV T' - 'V] - 'T'V L i . (3)
Li _ 1
PROOF. We first show that Eqs. (1) and (2) hold when (U, T, V) is the special
standard triple (P 1, CL , R1 ). First, Eq. (1) is obtained from Eq. (14.2.12).
Then Eqs. (14.2.6) and (14.2.7) give the special case of Eq. (2):
PI 1 -1
L,PI Ci
PC
i L = _[Lo L i ... Li-i]•
PICT-1
14.7 A REPRESENTATION THEOREM 517
PICT = UT 1 .
P,CL 1 UT I - 1
1
and Eqs. (1) and (2) are established.
To obtain Eq. (3), observe that Corollary 14.2.1 shows that (VT, TT, U1 )
is a standard triple for LT(A), so applying the already established Eq. (2),
VT -1
= LTVT(TT)e VT TT
-[Lô Li L4--1]
V T(r)I-1
_
The idea of division of one matrix polynomial by another was introduced
in Section 7.2. Recall that a matrix polynomial L 1(2) with invertible leading
coefficient is said to be a right divisor of a matrix polynomial L(2) if L(t)
L 2(2)L 1 (A) for some other matrix polynomial L 2(A). The reason for requiring
that L 1(A) have invertible leading coefficient is to avoid the multiplicities of
divisors that would arise otherwise. For example, a factorization like
[00
0, _ [b2' 0] [c20 dot"]'
where a, b, c, d e C and k, m, n are nonnegative integers, should be excluded,
not only because there are so many candidates, but mainly because such
factorizations are not very informative. Here we go a little' further and con-
fine our attention to divisors that are monic, where the coefficient of the
highest power of A is I. In fact, it will be convenient to consider only monic
divisors of monic polynomials.
It turns out that, from the point of view of spectral theory, multiples of
matrix polynomials are more easily analyzed than questions concerning
divisors. One reason for this concerns the question of existence of divisors.
Thus, it is not difficult to verify that the monic matrix polynomial
e
(1)
L 122 J0
has no monic divisor of degree 1 in 2. A thorough analysis of the existence
question is beyond the scope of this brief in tr oduction to questions concern-
ing spectral theory, multiples, and divisors.
Standard triples, and hence spectral properties (via Jordan triples), of
products of monic matrix polynomials are relatively easy to describe if
triples of the factors are known. Observe first that the following relation
between monic matrix polynomials, L(2) = L2(2)L 1(2), implies some
obvious statements:
(a) a(L 1) v a(L 2) = a(L);
(b) Latent vectors of L 1(2) and Lz(2) are latent vectors of L(2) and
LT(A), respectively.
Some of the comp li cations of this theory are associated with cases in which
a(L,) n a(L 2) # QJ in condition (a). In any case, it is possible to extend
statement (b) considerably. This will be done in the first corollary to the
following theorem.
t4.8 MULTIPLES AND DIVISORS 519
Theorem 1. Let L(A), L,(A), L 2(A) be monic matrix polynomials such that
L(A) = L 2(A)L l (A), and let (U 1, T1 , V1), (U2 , T2 , V2) be standard triples
for L,(A) and L 2 (A), respectively. Then the admissible triple
where Theorem 14.2.2 has been used. The result now follows from Theorem
14.2.4. •
To see the significance of this result for the spectral properties of L(A),
LA, and L 2(A), replace the standard triples for L I(A) and L2(A) by Jordan
triples (X1, J,, Y,) and (X2, J2, Y2), respectively. Then L(A) has a st andard
triple (not necessarily a Jordan triple)
[X1 0],
`J1
L0 J2 1J
Y2X '
i ^2 1. (3)
Although the second matrix is not in Jordan form, it does give impo rtan t
partial information about the spectrum of L(A) because of its triangular
form. Property (b) of Theorem 14.2.3 implies, for example, (see also Eq.
(14.3.4)) that ES- 0 LJX 1 J1 = 0, where L(A) = r;„ o AJLi. This implies the
following corollary.
Given the standard triple (3) for L(1), it is natural to ask when the two
matrices
J1 Y2 X 1 J1 0
, (4)
[ 0 J2 ] [
0 J2
are similar, in which case a Jordan form for L(1) is just the direct sum of
Jordan forms for the factors. An answer is provided immediately by Theorem
12.5.3, which states that the matrices of (4) are similar if and only if there
exists a solution Z (generally rectangular) for the equation
J1Z — ZJ2 = Y2 X1. (5)
This allows us to draw the following conclusions.
Corollary 2. Let (X 1 , J1 , Y1) and (X2, J2, 172) be Jordan triples for L 1(1)
and L 2(2), respectively. Then diag[J1, J2] is a linearization for L(2) _
L 2(1)L 1(1) if and only if there is a solution Z for Eq. (5); in this case
pi 1 ]. (6)
[0 J] - [0 I][0 Y J2 1][0 I
PROOF. Observe that
[0 I ] 10 1] _1
so that the relation (6) is, in fact, the explicit similarity between the ma tr ices
of (4). To verify Eq. (6), it is necessary only to multiply out the product on
the right and use Eq. (5). •
An important special case of Corollary 2 arises when r(L1) n Q(L2) = 0,
for then, by Theorem 12.3.2, Eq. (5) has a solution and it is unique. Thus,
Q(L 1) n a(L 2) = O is a sufficient condition for diag[J 1, J2] to be a lineariza-
tion of L2(1)L 1 (.i)•
R1 =
L; -
has rank n and its columns form Jordan chains for C L. It follows that Im R 1
L-invariant subspace of dimension n. Development of this line of argu- isaC
ment leads to the following criterion.
(2)
X — [I O], T = 0 [ T2 ,
T2
Defining an In x In matrix Q by
0
X I
S U2
XT *
Q = = S2
-1
XTr S r- 1
S' - 1
Y=
I-1
-É^ Ll S^
^=o
for some z e C". Therefore, comparing the first n coordinates of p and Q,a,
it is found that z = 0 and consequently q ► = O.
By using Theorem 1, we show in the next example that certain monic
matrix polynomials have no solvents. This generalizes the example in Eq.
(14.8.1).
I
XJ
Q •
,„
. =Im °.
In 0
and (X, J,„) is a Jordan pair for L(.1). If X, is the leading n x n block of
X, then
X1
Qom„=Im X1Jn
X i1 ; n
S, _ 1.r00 011 01 0
S2
Si 1 Si2 1 Si - 1
PROOF. The assertion of the theorem immediately follows from the easily
verified relation
CL = Vdiag[S1 , S 2 , ... , Si] , (4)
where V is given by Eq. (3) (see Exercise 7.11.14). •
The converse problem of finding a monic matrix polynomial with a given
set of solvents is readily solved; it forms the contents of the next exercise.
14.9 SOLVENTS OF MONK MATRIX POLYNOMIALS 525
x(t)
+ [ —3
-2 t) +
]x( [0 2Jx(t) = 0, (6)
X2
+ [ -2 —3 JX + [0 2J - O. (7)
Therefore
ess ` — r-101 [0 [
tee] - 2 0 1= [ —e ;tet e]
Thus, the general solution of Eq. (6) is
and hence
x,(t) =at +al t+a3 e`,
x2(t) = a2 — 2a3te' + a4 e`,
for arbitrary a l, a2 , 6C3, a4 E C. ❑
Nonnegative Matrices
527
528 15 NONNEGATIVE MATRICE
P3
Fig. 15.1 The directed graph of a matrix.
15.1 IRREDucielB Menucaa 529
2 0 0
3 1 0
1
Exercise 2. Verify that the directed graph associated with the matrix
0 4 —1
fails to be strongly connected.
Exercise 3. Show that if all the off-diagonal entries of a row A (or a
column of A) are zero, then the graph of A is not strongly connected.
SoLU'rioN. In the case of a row of A, say row i, there are no directed lines
P1P1 starting at node i. So there is no connection from node i to any other
node. So the graph is not strongly connected. ❑
The next two exercises form part of the proof of the next theorem.
Exercise 4. Let A e C""" have the partitioned form
[All A ui
A=
0 AY2 '
where A 11 is k x k and 1 5 k 5 n — 1 (so that A is trivially reducible).
Show that the directed graph of A is not strongly connected.
SOLUTION. Consider any directed path from a node i with i > k The first
segment of the path is determined by the presence of a nonzero element
au in the ith row of A. This row has zeros in the first k positions, so it is
possible to make a connection from node i to node j only if j > k Similarly,
the path can be extended from node j only to another node greater than k.
Continuing in this way, it is found that a directed path from node i with
i > k cannot be connected to a node less than k + 1. Hence the directed
graph is not strongly connected.
Exercise 5. Let A e C"'" and let P be an n x n permutation matrix. Show
that the directed graph of A is strongly connected if and only if the directed
graph of PAPT is strongly connected.
SOLUTION. Observe that the graph of PAPT is obtained from that of A
just by renumbering the nodes, and this operation does not affect the con-
nectedness of the graph. ❑
Theorem 1. A square matrix is irreducible if and only if its directed graph
is strongly connected.
PROOF. Exercise 4 shows that if A is reducible, then the graph of PAPT
is not strongly connected, for some permutation matrix P. Then Exercise 5
shows that this is equivalent to the statement that A itself has a directed
graph that is not strongly connected. Thus, a strongly connected directed
graph implies that the matrix is irreducible. Proof of the converse statement
is postponed to Exercise 15.3.3. IN
15 NONNEGATIVE MATRICES
Let A E C" „ ” with elements aik and let I Al denote the nonnegative matrix
in R"'" with elements la p, I. Clearly, one of the properties shared by A and
IA I is that of reducibility or irreducibility. In addition, many norms of A
and IA I take the same value (see Sections 10.3 and 10.4). However, the
relationship between the spectral radii of A and IA I is more delicate, and
it is frequently of interest. It is plausible that pA 5 11 141 , and we first show
that this is the case. We return in Section 15.4 to the important question of
conditions under which 114 <
Theorem 1. Let A E C" X" and B e R" 5 ". If I A 1 5 B, then pA 5 Pa.
PROOF. Let e bean arbitrary positive number and define the scalar multiples
A,= (p H + e) -t A, B,=(ue+ e) 1 B.
Clearly 141 5 B., and it is easily seen (see Exercise 1) that for k = 1, 2, ...
IAEI`
Now it is also clear that it& < 1, and so (see Exercise 9.8.5) B; - ► 0 ask --6, oo.
Since IA( 5 I A.1k 5 B:, it follows that A; -- 0 as k -► eo and so (by Exercise
9.8.5 again) /4, < 1. But this implies µ A < PB + a, and since a is arbitrary,
It follows immediately from the GerIgorin theorem (Section 10.6) that pn < 1.
Now we have D -1 A = I — B, and it follows from Theorem 2 that D-1 A
is an M-matrix. Consequently, A is also an M-matrix.
With the second set of hypotheses we follow the same line of argument
but use the corollary to Theorem 10.7.2 to ensure that AB < 1. MI
Exercise 3. Show that, under either of the hypotheses of Theorem 3,
In A = {n, 0, 0). If, in addition, A is symmetric then A is positive definite.
Exercise 4. Let A be a lower-triangular matrix, with positive elements on
the main diagonal and nonpositive elements elsewhere. Show that A is an
M-matrix.
Exercise S. Show that if A is an M-matrix and D is a nonnegative diagonal
matrix, then A + D is an M-matrix.
Exercise 6. Show that a symmetric M-matrix is positive definite. (Such a
matrix is called a Stieltjes matrix.)
SOLUTION. Let A be a symmetric M-matrix and A e o(A). If A S 0 then,
by Exercise 5, A — AI is also an M-matrix and must therefore be nonsingular.
This contradicts the assumption that A e o(A). Thus A e o(A) implies A > 0
and hence A is positive definite. ❑
Exercise 2. Let A e I8" " be nonnegative and irreducible, and let ail be
the i, j element of A9. Show that there is a positive integer q such that ail) > 0.
Show also that if 4#(A) is the minimal polynomial of A, then we may choose
q 5 deg(0).
Hint. Observe that A(I + A)" -1 > 0 and use the binomial theorem. For
the second part let r(A) be the remainder on division of A(1 + A)" 1 by '(A)
and use the fact that r(A) > O.
Exercise 3. Let matrix A be as in Exercise 2.
(a) Show that a;f > 0 if and only if there is a sequence of q edges in the
directed graph of A connecting nodes P1 and Pj .
(b) Use the result of Exercise 2 to show that A has a strongly connected
directed graph. D
If A is a nonnegative irreducible matrix, consider,the real-valued function
r defined on the nonzero vectors x z 0 by
(Ax)c
r(x) = , (4)
l slsn xi
x, #o
where (Ax)1 denotes the ith element of the vector Ax. Then r(x) z 0 and
for j = 1, 2, ... , n, r(x)x i 5 (Ax)j , with equality for some j. Thus r(x)x 5 Ax
and, furthermore, r(x) is the largest number p such that px 5 Ax for this x.
Exercise 4. Find r(x) if
A
= r2 and x = fol.
1 3
SOLUTION. Since Ax = [2 11T, the largest number p such that px 5 Axis
2. Thus r(x) = 2.
Exercise 5. If A = [n(11'11..1 .. 1 and x = [1 1 1]T, show that
r(x) = min
l slan ke1
E aa . ❑
Let 2' denote the domain of the function r, that is, the set of all nonzero
nonnegative vectors of order n, and define the number r (not to be confused
with the function r) by
r — sup r(x). (5)
z,Y
From the definition of r(x), we observe that r is invariant if x is replaced
by ax for any a > 0. Thus, in evaluating this supremum, we need only
15.3 THE PERRON-FROBENIUS THEOREM (I) 535
Hence
r = max r(y), (6)
ye,r
r —a_ 1
—ai1 — a52 r — a„
Expanding by the last column and denoting the i, jth element of B(r) =
adj(rl — A) by b jj(r), we obtain
E (r
1=1
— aJ)bpi(r) = O.
The authors are indebted to D. Flockerzi for this part of the proof.
538 IS NONNEGATIVE MATRICES
But we have seen in Exercise 6 that B(r) > 0 and so b„i(r) > 0 for j = 1,
2, ... , n. It follows that either a l = 0'2 = • • • = Q„ or there exist indices k, 1
such that r < rrk and r > at . Hence the result.
Exercise 8. Using Exercise 7, and without examining the characteristic
polynomial, prove that the following matrix has the maximal real eigenvalue
7 and that there is an eigenvalue equal to 4:
3 0 1 0
1 6 1 1
2 0 2 0
1 3 1 4
Exercise 9., If A > 0 and G is the component matrix of A associated with r,
prove that G > O.
Exercise 10. If A is irreducible and nonnegative and C(2.) is the reduced
adjoint of A (see Section 7.9), prove that C(r) > O.
Hint. Observe that b(r) = 4_ 1 0 > 0 (Section 7.9).
Exercise 11. If A is nonnegative and irreducible, show that A cannot have
two linearly independent nonnegative eigenvectors (even associated with
distinct eigenvalues).
Hint. Use Exercise 10.7.2 and Theorem 1(c) for the matrices A and AT to
obtain a contradiction. ❑
D = diag h
lail' ' land
where a = [a, a")T. Then i w l = 1, ID I = I, and it is easily verified that
a*D = Ia* I. Now, a*A = Aa* implies
la*ID lA = Ala*ID - ',
so that
la*ID - 'AD = wrla*I,
and
Ia*{w - ^D - IAD = rla*l = la*IB.
In other words,
la*I(w-D IAD — B) = 0*.
-
(4)
But B = JAI = 11) - 'AD' and, since Ia*1 > 0*, it follows from Eq. (4)
that B = co -1D 'AD, and hence conclusion (1). ■
A particular deduction from Theorem 1 will be important for the proof of
Theorem 2. Suppose that, in the theorem, A is nonnegative and irreducible.
Apply the theorem with B = A, so that I A I 5B and RA = r are trivially
satisfied. The conclusion is as follows:
Corollary 1. Let A be nonnegative and irreducible and let A e a(A) with
I AI = IL A = r, the spectral radius of A. Then A satisfies the condition
0 Al2 0 0
0 0 A13 0
0
0 0 0 Ak- l.k
Ak, 0 0 0
(a) The matrix A has a real eigenvalue, r, equal to the spectral radius of A;
(6) There is a nonnegative (right) eigenvector associated with the eigen-
value r.
PROOF. Define the matrix D(t) a R"'" by its elements du(t) as follows:
aj if au > 0,
d^j(t) =
tt if chi = O.
Then D(t) > 0 for t > 0 and D(0) = A. Let p(t) be the maximal real eigen-
value of D(t) for t > O. Then all the eigenvalues of D(t) are continuous
functions of t (Section 11.3) and, since p(t) is equal to the spectral radius of
D(t) for t > O, lim,.. o+ p(t) = r is an eigenvalue of D(0) = A. Furthermore,
this eigenvalue is equal to the spectral radius of A.
We also know (see Exercise 11.7.6) that there is a right eigenvector x(t)
that can be associated with p(t) and that depends continuously on t for t Z O.
By Theorem 15.3.1, x(t) > 0 for t > 0 and hence x(t) .z 0 at t = O. •
Exercise 1. What happens if we try to extend the results of Theorem 15.4.2
to reducible mat ri ces by means of a continuity argument?
Exercise 2. If A e R" `" is nonnegative and aj =. 1 ajk , prove that
min aj 5 r5 max al p
j J
Exercise 3. Show that for any reducible matrix A " there exists a
permutation matrix P such that
BII B12 Bik
and the invariance under rotations through 2x/l is not consistent with the
definition of k. Hence k = 1 and we have established the following fact.
- Proposition 1. If A e lr x" is irreducible and nonnegative, and c(A) written
in the form (1) is the characteristic polynomial of A, then k, the index of imprimi-
tivity of A, is the greatest common divisor of the differences
n—n 1, n—n2,...,n—n,.
For example, if
c(2)=2 1° +a 127 + a2 2, a 1, a2 # 0,
then k = 3. However, if
c(2)=2 10 + + a2 2 + a3 , ai , 0 2. a3 # 0,
then k = 1.
546 15 NONNEGATIVE MATRICES
In the next theorem we see that a wide class of nonnegative matrices can
be reduced to stochastic matrices by a simple transformation.
Theorem 2. Let A be an n x n nonnegative matrix with maximal real eigen-
value r. If there is a positive (right) eigenvector x = [x, x 2 xn]r
associated with r and we write X = diag[x,, ... , xn], then
A = rXPX -1 ,
where P is a stochastic matrix.
Note that some nonnegative matrices are excluded from Theorem 2 by the
hypothesis that x is positive (see Theorem 15.5.1).
PROOF. Let P = r - 'X - 'AX. We have only to prove that P is stochastic.
Since Ax = rx we have
C^
L. a►JxJ = rx1. i= .., n.
J=1
But for l = 2, 3, ... , we have 1.1. 1 = 1 (.11 # 1) and it follows from Eq. (3)
that the limit on the right does not exist. Thus, we have a contradiction that
can be resolved only if the existence of P"° implies k = 1. I.
It should be noted that the hypothesis that P be irreducible is not necessary.
It can be proved that the eigenvalue 1 of P has only linear elementary
divisors even when P is reducible. Thus, Eq. (2) is still valid and the abové
argument goes through. The proof of the more general result requires a
deeper analysis of reducible nonnegative matrices than we have chosen to
present.
Note also that, using Exercise 9.9.7, we may write
C(1)
Poe — mU)(1) ,
where CO.) and m(.1) are the reduced adjoint and the minimal polynomial
of P, respectively. If B(2) = adj(IA — P) and c(A) is the characteristic poly-
nomial of P, then the polynomial identity
B(A) _ C(A.)
c(2) m(2)
15 NoNNEOA77Y6 MATRICES
— B(1)
d"(1)'
Yet another alternative arises if we identify Z 10 with the constituent
matrix G 1 of Exercise 4.14.3. Thus, in the case in which P is irreducible,
there are positive right and left eigenvectors x, y, respectively, of P associated
with the eigenvalue 1 such that
yTx = 1 and Poe = xyT. (4)
Exercise 1. A nonnegative matrix P e R x " is called doubly stochastic if
both P and PT are stochastic matri ces. Check that the matrices
440 44+
P1 = 0 4 4 and P2 = 4 4 0
4 0 4 0 3 4
are stochastic and doubly stochastic matrices, respectively, and that Pi and
PZ exist. ❑
of being in state s, at time t„ given that the system is in state s i at time 4-1.
The laws of probability then imply
E qtrltr-1) = 1.
t=1
(3)
Finally, it is assumed that the initial state of the system is known. Thus, if
the initial state is s, then we have p a = e,, the vector with 1 in the ith place
and zeros elsewhere. We deduce from Eq. (2) that at time t, we have
Pr = Q(t,, 4-1)Q(4-1, t,-2)•Q(t1. t0)po. (4)
The process we have just described is known as a Markov chain. If the
conditional probabilities g; j(t, I t, _ 1 ) are independent of t or are the same at
each instant to , t1, t2 , ..., then the process is described as a homogeneous
Markov chain. We shall confine our attention to this relatively simple
problem. Observe that we may now write
Q(t,, tr-1)=Q, r= 1, 2,....
Equation (4) becomes
Pr = QrPo,
and Eq. (3) implies that the transition matrix Q associated with a homog-
enous Markov chain is such that QT is stochastic. For convenience, we define
P — QT
In many problems involving Markov chains, we are interested in the
limiting behavior of the sequence po , p 1, P 2 , .... This leads us to examine
the existence of lim,.., n Q', or lim,„ Y. If P (or Q) is irreducible, then
according to Theorem 15.7.3, the limits exist if and only if P is primitive.
Since P = Q T it follows that Q has an eigenvalue 1, and the associated right
JJL 15 NONN®CiAr1 V8 MATRICES
and left eigenvectors x, y of Eq. (15.7.4) are left and right eigenvectors of Q,
respectively. Thus, there are positive right and left eigenvectors C (= y)
and i (= x) associated with the eigenvalue 1 of Q for which qT4 = 1 and
Q ao or.
We then have
Pm = MST Po) = (gT Po)l;
since gTpa is a scalar. So, whatever the starting vector p o , the limiting vector
is proportional to C. However, p, satisfies the condition (1) for every r
and so Po satisfies the same condition. Thus, if we define the positive vector
C = (C1 42 4,3T so that Et=1 i = 1, then pa, = for every possible
choice of po .
The fact that C > O means that it is possible to end up in any of the n
states s l, s2 , ... , s„. We have proved the following result.
Theorem 1. If Q is the transition matrix of a homogeneous Markov chain
and Q is irreducible, then the limiting absolute probabilities p i(t„) exist if
and only if Q is primitive. Furthermore, when Q is primitive, p i(t a,) > 0 for
i = 1, 2, ... , n and these probabilities are the components of the (properly
normalized) right eigenvector of Q associated with the eigenvalue 1.
APPENDIX 1
A Survey of Scalar
Polynomials
The expression
P(2)=Po+Pi'1 + P2 ,12 + ... +Pt 2 , ' ( 1)
with coefficients P o , pi, ... , pi in a field . and pi # 0, is referred to as
a polynomial of degree I over .F. Nonzero scalars from Sr are considered
polynomials of zero degree, while p(â) = 0 is written for polynomials with
all zero coefficients.
In this appendix we give a brief survey of some of the fundamental pro-
perties of such polynomials. An accessible and complete account can be
found in Chapter 6 of "The Theory of Matrices" by Sam Perlis (Addison
Wesley, 1958), for example.
Theorem 1. For any two polynomials p(2) and q(2.) over F (with g(,i) * 0),
there exist polynomials d(2) and r(A) over . such that
The polynomial d(2) in Eq. (2) is called the quotient on division of p(2) by
q(,1) and r(2) is the remainder of this division. If the remainder r(d) in Eq. (2)
is the zero polynomial, then d(',) (or q('1)) is referred to as a divisor of p(2).
In this event, we also say that d(.1) (or q(A)) divides p(d), or that p(,1) is divisible
by d(2). A divisor of degree 1 is called, linear.
553
554 APPENDIX 1
Given polynomials p t (A), P2(A), ... , pk(A), q(A) over Sr, the polynomial
q(A) is called a common divisor of pi(A) (i = 1, 2, ... , k) if it divides each of
the polynomials pi(A). A greatest common divisor d(A) of MA) (i = 1, 2, ... , k),
abbreviated g.c.d.{pi(2)}, is defined to be a common divisor of these poly-
nomials that is divisible by any other common divisor. To avoid unnecessary
complications, we shall assume in the remainder of this section (if not
indicated otherwise) that the leading coefficient, that is, the coefficient of the
highest power of the polynomial being considered, is equal to 1. Such
polynomials are said to be manic.
Theorem 2. For any set of nonzero polynomials over F, there exists a
unique greatest common divisor.
Theorem 4. Any polynomial over C of nonzero degree has at least one zero
in C.
Dividing p(A) by (A — A,) with the use of Theorem 1, the next exercise is
easily established.
Exercise 5. Show that A, is a zero of p(A) if and only if A — A, is a (linear)
divisor of p(2). In other words, for some polynomial p,(A) over .
P(A) = P,(AXA — A,). ❑ (4)
Let A, be a zero of p(A). It may happen that, along with A — A„ higher
powers, (A — A,) 2, (A — A i r , ... , are also divisors of p(2). The scalar A, e
is referred to as a zero of p(A) of multiplicity k if (A — A,)k divides p(A) but
(A — A,)k+1 does not. Zeros of multiplicity 1 are said to be simple. Other
zeros are called multiple zeros of the polynomial.
Exercise 6. Check that A, is a zero of multiplicity k of p(A) if and only if
P(A,) = P1(A2) = •.. = P(k- VI) = 0 but p/kkA,) # O. ❑
Consider again the equality (4) and suppose p,(A) has nonzero degree.
By virtue of Theorem 4 the polynomial p 1(2) has a zero A2. Using Exercise 5,
we obtain p(A) = p2(A)(A — A 2)A — A l ). Proceeding to factorize p 2(A) and
continuing this process, the following theorem is obtained.
Theorem 5. Any polynomial over C of degree 1 (1 Z 1) has exactly 1 zeros
A l , 22
, A, in C (which may include repetitions). Furthermore, there is a
...,
representation
P(A) = Po - A,)(A - A,_.)• • • (A - (5)
where p, = 1 in the case of a monic polynomial p(A).
Collecting the equal zeros in Eq. (5) and renumbering them, we obtain
P(A) = — A,)kl(A — A2)k2 .. • (A — A,) `•, '
(6)
where A,#A1 (i#j) and k,+k 2 +•••+ k,=I.
Given p,(A), p 2(A), ... , p k(A), the polynomial p(A) is said to be a common
multiple of the given polynomials if each of them divides p(A):
P(A) = q,(A)p,(A), 1 5 i 5 k, (7)
for some polynomials q ;(A) over the same field. A common multiple that is a
divisor of any other common multiple is called a least common multiple of
the polynomials.
Theorem 6. For any set of monic polynomials, there exists a unique monic
least common multiple.
556 APPENDIX I
Exercise 7. Show that the product fk.1 p,(2) is a least common multiple
of the polynomials p 1(2), p2(A), ... , pk(A) if and only if any two of the given
polynomials are relatively prime. O
Exercise 7 can now be used to complete the next exercise.
Exercise 8. Check that if p(2) is the least common multiple of p l(A),
P2(A), • • • , Pk(A), then the polynomials q,(2) (i = 1, 2, ... , k) in Eq. (7) are
relatively prime. ❑
Note that the condition in Exercise 7 is stronger than requiring that the
set of polynomials be relatively prime.
Exercise 9. Prove that if p(A) is the least common multiple of p l(A),
pz(A). • • • , Pk_ 1(1), then the least common multiple of P i(A), p2(A), ... , 14(2)
coincides with the least common multiple of p(A) and pk(2). ❑
A field Sr that contains the zeros of every polynomial over .9' is said to be
algebraically closed. Thus Theorem 4 asserts, in particular, that the field C
of complex numbers is algebraically closed. Note that the polynomial
p(A) = 2 2 + I viewed as a polynomial over the field R of real numbers
shows that R fails to be algebraically closed.
If p(A) is a polynomial over 9 of degree 1, and 1 is the only monic poly-
nomial over 9 of degree less than l that divides p(2), then p(A) is said to be
irreducible over .F Otherwise, p(2) is reducible over 9. Obviously, in view
of Exercise 5, every polynomial of degree greater than 1 over an algebraically
closed field is reducible. In fact, if p(2) is a monic polynomial over an alge-
braically closed field . and p(A) has degree 1, then p(A) has a factorization
P(A) = fl . 1 (A — A;) into linear factors, where A l, 22 , . . . , Al e .F and are
not necessarily distinct. If the field fails to be algebraically closed, then
polynomials of degree higher than 1 may be irreducible.
Exercise 10. Check that the polynomial p(A) = 2 2 + 1 is irreducible over
R and over the field of rational numbers but is reducible over C. ❑
Note that all real polynomials of degree greater than 2 are reducible over
R, while for any positive integer I there exists a polynomial of degree I with
rational coefficients that is irreducible over the field of rational numbers.
APPENDIX 2
Theorem 1. Let X be a closed and bounded set of real numbers and let M
and m be the least upper bound and greatest lower bound, respectively, of the
numbers x in X. Then M, m e X.
557
558 APPeunix 2
Exercise 1. (a) Let f(x) = x, and let .9' be the open interval (0, 1).
Here .9' is bounded but not closed and f is continuous on 5 Clearly M = 1,
m = 0, and f does not attain these values on 5°
(b) With f(x) = 1/x and .1) the set of all real numbers not less than 1,
we have f continuous on 5o and .9' closed but unbounded. In this case f (.So)
is the half-open interval (0, 1] and inffE,, f(x) = 0 is not attained.
SOME THEOREMS AND NOTIONS FROM ANALYSIS 559
(c) With
x, 0<x<1,
i(x) = 0, x = 1,
we have .So = [0, 1], which is closed and bounded, but f is not continuous
on S? Now f (.So) = [0, 1] and suRce . f (x) = 1 is not attained. ❑
A set .9' in thé complex plane C is said to be convex if for any a l, a2 e.9'
and any 0 e [0, 1],
(1 — O)a l + 0a2 e .SP
The convex set that contains the points a l , a2 ak and is included in
, ... ,
any other convex set containing them is called the convex hull of a,, a2 ,
... , ak . In other words, the convex hull is the minimal convex set containing
the given points.
Exercise 2. Prove that the convex hull of a l, a2 , ... , ak in C is the set of
all linear combinations of the form
k
sial +0eœ2 +•••+
ok ak , 0î z 0 and E 0, =1. (1)
t= 1
Barnett, S. Polynomials and Linear Control Systems. New York: Marcel Dekker, 1983. [13]
Barnett, S., and Storey, C. Matrix Methods in Stability Theory. London: Nelson, 1970. [13]
Baumgiirtel, H. Endlich-dimensionale Analytishe St5rungstheory. Berlin: Academie-Verlag,
1972. [11] To appear in English translation by Birkhiiuser Verlag.
Hellman, R. Introduction to Matrix Analysis. New York: McGraw-Hill, 1960. [1-15]
Ben-Israel, A., and Greville, T. N. E. Generalized Inverses: Theory and Applications. New
York: Wiley, 1974. [12]
Berman, A., and Plemmons, R. J. Nonnegative Matrices in the Mathematical Sciences. New
York: Academic Press, 1979.[15]
Boullion, T., and Odell, P. Generalized Inverse Matrices. New York: Wiley (Interscience),
1971. [12]
Campbell, S. L., and Meyer, C. D. Jr., Generalized Inverses of Linear Transformations. London:
Pitman, 1979. [12]
Daleckii, J. L., and Krein, M. G. Stability of Solutions of Differential Equations in Banach
Space. Providence, R. Is.: Trans. Amer. Math. Soc. Monographs, vol. 43,1974. [12, 14]
Dunford, N., and Schwartz, J. T. Linear Operators, Part I: General Theory. New York: Wiley
(Interscience), 5, 9, 10]
560
SUGGESTIONS FOR FURTHER READING 561
Gantmacher, F. R. The Theory of Matrices, vols. 1 and 2. New York: Chelsea, 1959. [1-15]
Glazman, I. M., and Liubich, .1. I. Finite- Dimensional Linear Analysis. Cambridge, Mass.:
M.I.T. Press, 1974. [3-10]
Gohberg, I., Lancaster, P., and Rodman, L. Matrix Polynomials. New York: Academic Press,
1982. [14]
Golub, G. H., and Van Loan, C. F. Matrix Computations. Baltimore, Md.: Johns Hopkins
Univ. Press, 1983. [3, 4, 10].
Halmos, P. R. Finite- Dimensional Vector Spaces. New York: Van Nostrand, 1958. [3-6]
Hamburger, M. L., and Grimshaw, M. E. Linear Transformations in n - Dimensional Vector
Spaces. London: Cambridge Univ., 1956. [3 6]-
Heinig, G., and Rost, K. Algebraic Methods for Toeplitz - like Matrices and Operators. Berlin:
Akademie Verlag, 1984. [13].
Householder, A. S. The Theory of Matrices in Numerical Analysis. Boston: Ginn (Blaisdell),
1964. [10, 15]
Kato, T. Perturbation Theory for Linear Operators. Berlin: Springer-Verlag, 1966 (2nd ed.,
1976). [11]
Krein, M. G., and Naimark, M. A. "The Method of Symmetric and Hermitian Forms in the
Theory of the Separation of the Roots of Algebraic Equations." Lin. and Multilin. Alg. 10
(1981): 265-308. [13]
Lancaster, P. Lambda Matrices and Vibrating Systems. Oxford: Pergamon Press, 1966. [14]
Lerer, L., and Tismenetsky, M. "The Bezoutian and the Eigenvalue-Separation Problem for
Matrix Polynomials." Integral Eq. and Operator Theory 5 (1982): 386-445. [13]
MacDuffee, C. C. The Theory of Matrices. New York: Chelsea, 1946. [12]
Marcus, M., and Minc, H. A Survey of Matrix Theory and Matrix Inequalities. Boston: Allyn
and Bacon, 1964. [1-10, 12, 15]
Mirsky, L. An Introduction to Linear Algebra. Oxford: Oxford Univ. (Clarendon), 1955. [1-7]
Ortega, J. M., and Rheinboldt, W. C. Iterative Solution of Nonlinear Equations in Several
Variables. New York: Academic Press, 1970. [15]
Rellich, F. Perturbation Theory of Eigenvalue Problems. New York: Gordon and Breach,
1969. [11]
Russell, D. L. Mathematics of Finite - Dimensional Control Systems. New York: Mar cel Dekker,
1979. [9]
Varga, R. S. Matrix Iterative Analysis. Englewood Cliffs, N.J.: Prentice-Hall, 1962. [15]
Wilkinson, J. H. The Algebraic Eigenvalue Problem. Oxford: Oxford Univ. (Clarendon), 1965.
[10]
Wonham, W. M. Linear Multivariable Control: A Geometric Approach. Berlin: Springer-
Verlag, 1979. [9]
Index
563
564 INDEX