Beruflich Dokumente
Kultur Dokumente
. (Real
symmetric matrices, A
nn
with A
T
= A, form an important subclass.)
Section 1.5 described basic spectral properties that will prove of central
importance here, so we briey summarize.
All eigenvalues
1
, . . . ,
n
of A are real ; here, they shall always be
labeled such that
1
2
n
. (2.1)
With the eigenvalues
1
, . . . ,
n
are associated orthonormal eigenvec-
tors u
1
, . . . , u
n
. Thus all Hermitian matrices are diagonalizable.
The matrix A can be written in the form
A = UU
=
n
j=1
j
u
j
u
j
,
where
U = [ u
1
u
n
]
nn
, =
1
.
.
.
nn
.
The matrix U is unitary, U
U = I, and each u
j
u
nn
is an
orthogonal projector.
43
Av
v
v
. (2.2)
Rayleigh quotients are named after the English gentleman-scientist Lord
Rayleigh (a.k.a. John William Strutt, , winner of the
Nobel Prize in Physics), who made fundamental contributions to spectral
theory as applied to problems in vibration [Ray78]. (The quantity v
Av is
also called a quadratic form, because it is a combination of terms all having
degree two in the entries of v, i.e., terms such as v
2
j
and v
j
v
k
.)
If (, u) is an eigenpair for A, then notice that
u
Au
u
u
=
u
(u)
u
u
= ,
so Rayleigh quotients generalize eigenvalues. For Hermitian A, these quan-
tities demonstrate a rich pattern of behavior that will occupy our attention
throughout much of this chapter. (Most of these properties disappear when
A is non-Hermitian; indeed, the study of Rayleigh quotients for such matri-
ces remains and active and important area of research; see e.g., Section 5.4.)
For Hermitian A
nn
, the Rayleigh quotient for a given v
n
can be quickly analyzed when v is expressed in an orthonormal basis of
eigenvectors. Writing
v =
n
j=1
c
j
u
j
= Uc,
then
v
Av
v
v
=
c
AUc
c
Uc
=
c
c
c
c
,
where the last step employs diagonalization A = UU
. The diagonal
structure of allows for an illuminating renement,
v
Av
v
v
=
1
|c
1
|
2
+ +
n
|c
n
|
2
|c
1
|
2
+ + |c
n
|
2
. (2.3)
As the numerator and denominator are both real, notice that the Rayleigh
quotients for a Hermitian matrix is always real. We can say more: since the
Embree draft 23 February 2012
45
eigenvalues are ordered,
1
n
,
1
|c
1
|
2
+ +
n
|c
n
|
2
|c
1
|
2
+ + |c
n
|
2
1
(|c
1
|
2
+ + |c
n
|
2
)
|c
1
|
2
+ + |c
n
|
2
=
1
,
and similarly,
1
|c
1
|
2
+ +
n
|c
n
|
2
|c
1
|
2
+ + |c
n
|
2
n
(|c
1
|
2
+ + |c
n
|
2
)
|c
1
|
2
+ + |c
n
|
2
=
n
.
Theorem 2.1. For a Hermitian matrix A
nn
with eigenvalues
1
, . . . ,
n
, the Rayleigh quotient for nonzero v
nn
satises
v
Av
v
v
[
1
,
n
].
Further insights follow from the simple equation (2.3). Since
u
1
Au
1
u
1
u
1
=
1
,
u
n
Au
n
u
n
u
n
=
n
.
Combined with Theorem 2.1, these calculations characterize the extreme
eigenvalues of A as solutions to optimization problems:
1
= min
v
n
v
Av
v
v
,
n
= max
v
n
v
Av
v
v
.
Can interior eigenvalues also be characterized via optimization problems? If
v is orthogonal to u
1
, then c
1
= 0, and one can write
v = c
2
u
2
+ + c
n
u
n
.
In this case (2.3) becomes
v
Av
v
v
=
2
|c
2
|
2
+ +
n
|c
n
|
2
|c
1
|
2
+ + |c
n
|
2
2
,
with equality when v = u
2
. This implies that
2
also solves a minimization
problem, one posed over a restricted subspace:
2
= min
v
n
vu
1
v
Av
v
v
.
Embree draft 23 February 2012
n1
= max
v
n
vun
v
Av
v
v
All eigenvalues can be characterized in this manner.
Theorem 2.2. For a Hermitian matrix A
nn
,
k
= min
vspan{u
1
,...,u
k1
}
v
Av
v
v
= min
vspan{u
k
,...,un}
v
Av
v
v
= max
vspan{u
k+1
,...,un}
v
Av
v
v
= max
vspan{u
1
,...,u
k
}
v
Av
v
v
.
This result is quite appealing, except for one aspect: to characterize
the kth eigenvalue, one must know all the preceding eigenvectors (for the
minimization) or all the following eigenvectors (for the maximization). Sec-
tion 2.2 will describe a more exible approach, one that hinges on the eigen-
value approximation result we shall next describe.
2.1 Cauchy Interlacing Theorem
We have already made the elementary observation that when v is an eigen-
vector of A
nn
corresponding to the eigenvalue , then
v
Av
v
v
= .
How well does this Rayleigh quotient approximate when v is only an
approximation of the corresponding eigenvector? This question, investigated
in detail in Problem 1, motivates a renement. What if one has a series of
orthonormal vectors q
1
, . . . , q
m
, whose collective span approximates some
m-dimensional eigenspace of A (possibly associated with several dierent
eigenvalues), even though the individual vectors q
k
might not approximate
any individual eigenvector?
This set-up suggests a matrix-version of the Rayleigh quotient. Build
the matrix
Q
m
= [ q
1
q
m
]
nm
,
which is subunitary due to the orthonormality of the columns, Q
m
Q
m
= I.
How well do the m eigenvalues of the compression of A to span{q
1
, . . . , q
m
},
Q
m
AQ
m
,
Embree draft 23 February 2012
AQ =
m
AQ
m
Q
m
A
Q
m
m
AQ
m
Q
m
A
Q
m
.
This matrix has the same eigenvalues as A, since if Au = u, then
Q
AQ(Q
u) = (Q
u).
Thus the question of how well the eigenvalues of Q
m
AQ
m
mm
ap-
proximate those of A
nn
can be reduced to the question of how well
the eigenvalues of the leading m m upper left block (or leading principal
submatrix) approximate those of the entire matrix.
Cauchys Interlacing Theorem
Theorem 2.3. Let the Hermitian matrix A
nn
with eigenvalues
1
n
be partitioned as
A =
H B
B R
,
where H
mm
, B
(nm)m
, and R
(nm)(nm)
. Then the
eigenvalues
1
m
of H satisfy
k
k
k+nm
. (2.4)
Before proving the Interlacing Theorem, we oer a graphical illustration.
Consider the matrix
A =
2 1
1 2
.
.
.
.
.
.
.
.
. 1
1 2
nn
, (2.5)
Embree draft 23 February 2012
m
of H tend toward the extreme eigenvalues
1
and
n
of A. Notice that
for any xed m, at most one eigenvalue of H falls in the interval [
1
,
2
), as
guaranteed by the Interlacing Theorem:
2
2
.
The proof of the Cauchy Interlacing Theorem will utilize a fundamental
result whose proof is a basic exercise in dimension counting.
Lemma 2.4. Let U and V be subspaces of
n
such that
dim(U) + dim(V) > n.
Then the intersection U V is nontrivial, i.e., there exists a nonzero
vector x U V.
Proof of Cauchys Interlacing Theorem. Let u
1
, . . . , u
n
and z
1
, . . . , z
m
denote the eigenvectors of A and H associated with eigenvalues
1
Embree draft 23 February 2012
n
and
1
m
. Dene the spaces
U
k
= span{u
k
, . . . , u
n
}, Z
k
= span{z
1
, . . . , z
k
}.
To compare length-m vectors associated with H to length-n vectors associ-
ated with A, consider
Y
k
=
z
0
n
: z Z
k
.
Since dim(
U) = nk +1 and dim(Y
k
) = dim(Z
k
) = k, the preceding lemma
ensures the existence of some nonzero
w
U
k
Y
k
.
Since the nonzero vector w Y
k
, it must be of the form
w =
z
0
for nonzero z Z
k
. Thus
w
Aw = [ z
0]
H B
B R
z
0
= z
Hz, z Z
k
.
The proof now readily follows from the optimization characterizations de-
scribed in Theorem 2.2:
k
= min
v
b
U
k
v
Av
v
v
w
Aw
w
w
=
z
Hz
z
z
max
xZ
k
x
Hx
x
x
=
k
.
The proof of the second inequality in (2.4) follows by applying the rst
inequality to A. (Proof from [Par98].)
For convenience we state a version of the interlacing theorem when H is
the compression of A to some general subspace R(Q
m
) = span{q
1
, . . . , q
m
},
as motivated earlier in this section.
Cauchys Interlacing Theorem for Compressions
Corollary 2.5. Given any Hermitian matrix A
nn
and subunitary
Q
m
nm
, label the eigenvalues of A as
1
2
n
and the
eigenvalues of Q
m
AQ
m
as
1
2
m
. Then
k
k
k+nm
. (2.6)
Embree draft 23 February 2012
k
k
= max
v
k
v
(Q
k
AQ
k
)v
v
v
where the maximization follows from applying Theorem 2.2 to Q
k
AQ
k
. We
can write this maximization as
k
= max
v
k
v
(Q
k
AQ
k
)v
v
v
= max
v
k
(Q
k
v)
A(Q
k
v)
(Q
k
v)
(Q
k
v)
= max
xspan{q
1
,...,q
k
}
x
Ax
x
x
.
Embree draft 23 February 2012
k
= max
vspan{u
1
,...,u
k
}
v
Av
v
v
. (2.7)
Thus, there exists a distinguished k-dimensional subspace such that the
maximum Rayleigh quotient over that subspace is
k
=
k
. From this it
follows that
k
= min
dim(U)=k
max
vU
v
Av
v
v
,
with the minimum attained when U = span{u
1
, . . . , u
k
}. Likewise, we can
make an analogous statement involving maximizing a minimum Rayleigh
quotient over n k + 1-dimensional subspaces. These are known as the
CourantFischer minimax characterizations of eigenvalues.
CourantFischer Characterization of Eigenvalues
Theorem 2.6. For a Hermitian matrix A
nn
,
k
= min
dim(U)=k
max
vU
v
Av
v
v
= max
dim(U)=nk+1
min
vU
v
Av
v
v
. (2.8)
2.3 Positive Denite Matrices
A distinguished class of Hermitian matrices have Rayleigh quotients that
are always positive. Matrices of this sort are so useful in both theory and
applications that they have their own nomenclature.
Positive Denite Matrices and Kin
Let A be Hermitian. Then
if v
Au = u
u = .
If A is positive denite, then = u
j=1
j
u
j
.
As seen throughout this chapter,
v
Av =
n
j=1
j
|
j
|
2
1
n
j=1
|
j
|
2
.
If v = 0, then 0 = v
2
=
n
j=1
|
j
|
2
, and since all the eigenvalues are
positive, we must have
v
Av > 0.
We have just proved a simple but fundamental fact.
Theorem 2.7. A Hermitian matrix is positive denite if and only if all
its eigenvalues are positive.
This result, an immediate consequence of the denition of positive de-
niteness, provides one convenient way to characterize positive denite matri-
ces; it also implies that all positive denite matrices are invertible. (Positive
semidenite matrices only have nonnegative eigenvalues, and hence they can
be singular.)
Taking v to be the kth column of the identity matrix, v = e
k
, we also
see that positive denite matrices must have positive entries on their main
diagonal:
0 < v
Av = e
k
Ae
k
= a
k,k
.
Similarly, Q
1 0
0 1
1 0
0 1
1 0
0 1
1 0
0 1
.
Even the zero matrix has a few square roots, some not even Hermitian:
0 0
0 0
0 1
0 0
0 0
1 0
.
Yet in each of these cases, you know there is one right square root: the rst
ones listed that is, the positive semidenite square root of these positive
semidenite matrices I and 0. The others are just monsters [Lak76].
kth Root of a Positive Denite Matrix
Theorem 2.8. Let k > 1 be an integer. For each Hermitian positive
semidenite matrix A
nn
, there exists a unique Hermitian positive
semidenite matrix B
nn
such that B
k
= A.
Proof. (See, e.g., [HJ85].) The existence of the kth root is straightforward.
Unitarily diagonalize A to obtain A = UU
, where
=
1
.
.
.
.
Now dene
D :=
1/k
1
.
.
.
1/k
n
,
where here we are taking the nonnegative kth root of each eigenvalue. Then
dene the Hermitian positive semidenite matrix B = UDU
, so that
B
k
= UD
k
U
= UU
= A.
Embree draft 23 February 2012
j=1
1/k
j
=1
=j
z
;
see, e.g., [SM03, 6.2]. Now evaluate at A to obtain
(A) = (UU
) = U()U
= U
(
1
)
.
.
.
(
n
)
= U
1/k
1
.
.
.
1/k
n
= B,
i.e., (A) = B. We shall use this fact to show that B and C commute:
BC = (A)C = (C
k
)C = C(C
k
) = C(A) = CB,
where we have used the fact that C commutes with (C
k
), since (C
k
) is
comprised of powers of C.
Invoking Theorem 1.13 for the Hermitian (hence diagonalizable) matrices
B and C, we can nd some V for which VBV
1
and VCV
1
are both
diagonal. The entries on these diagonals must be the eigenvalues of B and
C. Without loss of generality, assume that V produces the eigenvalues of
B in the order
VBV
1
=
1/k
1
.
.
.
1/k
n
.
Embree draft 23 February 2012
1
.
.
.
.
Since A = B
k
= C
k
, we have VB
k
V
1
= VC
k
V
1
, so
VB
k
V
1
=
1
.
.
.
k
1
.
.
.
k
n
= VC
k
V
1
.
Since C is positive semidenite, the eigenvalues of C are nonnegative, hence
we must conclude that
j
=
1/k
j
for j = 1, . . . , n. Since B and C have the
same eigenvalues and eigenvectors, they are the same matrix: B = C. It
follows that the Hermitian positive denite kth root of A is unique.
2.3.2 Positive deniteness in optimization
Positive denite matrices arise in many applications. For example, Taylors
expansion of a suciently smooth function f :
n
about a point
x
0
n
takes the form
f(x
0
+c) = f(x
0
) +c
f(x
0
) +
1
2
c
H(x
0
)c +O(c
3
), (2.9)
f(x
0
)
n
is the gradient of f evaluated at x
0
, and H(x
0
)
nn
is the
Hessian of f,
[ H] =
2
f
x
2
1
2
f
x
n
x
1
.
.
.
.
.
.
.
.
.
2
f
x
1
x
n
2
f
x
2
n
.
Note that H(x
0
) is Hermitian provided the mixed partial derivatives are
equal. We say x
0
is a stationary point when f(x
0
) = 0. In the immediate
vicinity of such a point equation (2.9) shows that f behaves like
f(x
0
+c) = f(x
0
) +
1
2
c
H(x
0
)c +O(c
3
),
and so x
0
is a local minimum if all local changes c cause f to increase,
i.e., c
H(x
0
)c > 0 for all c = 0. Hence x
0
is a local minimum provided
Embree draft 23 February 2012