Sie sind auf Seite 1von 44

M.Sc.

Simulation Sciences, Summer Semester 2019

Fast Iterative Solvers


Prof. Georg May

Linear Algebra Review

We review a few basic facts from linear algebra. This collection is merely a reference, and
some theorems are cited without proof. The intent is to give an overview of the linear algebra
prerequisites for the class ”Fast Iterative Solvers”. Everything that appears here will be used in
class.

1 Vectors
We denote as Rn the set of column vectors, or n-vectors,
 
x1
 x2 
x :=  .  , xj ∈ R, j = 1, . . . , n.
 
.
 . 
xn

A row vector is obtained by the transpose of a column vector

xT := (x1 , x2 , . . . , xn ) .

The set of complex vectors Cn is introduced in an entirely analogous manner. In the complex case,
however, it shall prove useful to introduce the Hermitian transpose 1

xH := (x1 , . . . , xn ),

As we consider both real and complex vectors, we frequently use the notation x ∈ Kn , where either
K = R or K = C.
The notion of a set is elementary, but not quite sufficient for our purposes. In order to carry
out meaningful analysis, we need a little more structure. A convenient setting is that of a linear
space.

Definition 1.1 Let S be a set, and let K be a field of scalars. (We take either K = R or
K = C.) Assume that for any u, v ∈ S, an operation u + v is defined, such that u + v ∈ S.
Furthermore, assume that for any v ∈ S, and any α ∈ K an operation αu is defined, such that
αu ∈ S. We call these operations addition, and scalar multiplication, respectively. We call S a
linear space, provided that for any u, v, w ∈ S, and α, β ∈ K, the following properties hold:
• Associativity of addition: u + (v + w) = (u + v) + w

1 Recall that the conjugate of a complex number z = x + ı̂y is defined as z = x − ı̂y. Here ı̂ is the imaginary unit

with ı̂2 = −1.

1
• Commutativity of addition u + v = v + u
• Identity element of addition: There exists an element 0 ∈ S, such that u + 0 = u.
• Inverse elements of addition: For each u ∈ S, there exists an element −u ∈ S, such that
u + (−u) = 0

• Distributivity of scalar multiplication with respect to vector addition: α(u + v) = αu + αv


• Distributivity of scalar multiplication with respect to field addition: (α + β)u = αu + βv
• Compatibility of scalar multiplication with field multiplication: α(βu) = (αβ)u

• Identity element of scalar multiplication: There exists an element 1 ∈ K such that 1v = v.

If, for x, y ∈ Kn , and α ∈ K, we define addition and multiplication with a scalar as


   
x1 + y1 αx1
 x2 + y2   αx2 
x + y :=  , αx :=  .  , (1.1)
   
..
 .   .. 
xn + yn αxn
the set Kn forms a linear space over the field K. Loosely speaking, this ”componentwise” definition
of addition and multiplication ensures that all the properties of Definition 1.1 are inherited from
the field K, over which Kn is defined.

1.1 Scalar Product and Vector Norms


A scalar product (or: inner product) is a mapping of two vectors to the field K, endowed with
certain properties:

Definition 1.2 We call a mapping h·, ·i : Kn × Kn → K a scalar product (or: inner product)
if, for x, y, z ∈ Kn , and α, β ∈ K, the following properties hold:

Linearity : hαx + βz, yi = α hx, yi + β hz, yi (1.2)


Conjugate Symmetry : hx, yi = hy, xi (1.3)
Definiteness : hx, xi > 0 for x 6= 0. (1.4)

Note that the first and second argument together imply anti-linearity for the second argument:
hx, αy + βzi = α hx, yi + β hx, zi. For K = R, conjugate symmetry (1.3) reduces to ”normal”
symmetry, hx, yi = hy, xi, and the scalar product is bilinear (i.e. linear in both arguments).
Also note that conjugate symmetry implies that hx, xi ∈ R, even if K = C. This means that
Property (1.4) is well-defined2 . Note that Property (1.4) by itself means that hx, xi = 0 ⇒ x = 0.
However, the reverse implication also holds by the first two properties. For a proper scalar product,
we thus have
hx, xi = 0 ⇔ x = 0. (1.5)
2 We use the relation > in the standard way as applied to real numbers.

2
From Definition 1.2 you may guess that there is more than one way to define a scalar product.
In fact, we will define and use several different scalar products. However, you may already be
familiar with the following best-known example of a scalar product:

Example 1.1 Let x, y ∈ Kn , and define


n
X
H
y x := xi y i . (1.6)
i=1

If K = R, this reduces to
n
X
yT x := xi yi . (1.7)
i=1

It is easy to verify that this fulfills all the properties stated in Definition 1.2. 2

The scalar product defined in (1.6) is sometimes called the standard scalar product in Kn . It
is important enough to reserve a special notation for it. We shall use round brackets to denote the
standard scalar product: (x, y) := yH x.
Analysis in Kn usually makes extensive use of scalar products. For example, the notion of
orthogonality is based on it:

Definition 1.3 Two vectors x, y ∈ Kn are said to be orthogonal with respect to the scalar
product h·, ·i, if
hx, yi = 0.

Scalar products also play a role in defining vector norms. Let us first, however, give a general
definition of a norm:

Definition 1.4 We say that the mapping ||·|| : Kn → R is a norm if, for all x, y ∈ Kn , and
α ∈ K, there holds

Definiteness : ||x|| > 0 for x 6= 0. (1.8)


positive Homogeneity : ||αx|| = |α| ||x|| (1.9)
Triangle Inequality : ||x + y|| ≤ ||x|| + ||y|| (1.10)

The first two properties also imply the equivalence

||x|| = 0 ⇔ x = 0. (1.11)

A linear space equipped with such a mapping is called a normed space.

3
Example 1.2 Let x ∈ Kn . The p-norm is defined for 1 ≤ p < ∞ as
X  p1
||x||p := |xi |p .

it may be shown that this definition satisfies all the properties (1.8) through (1.10). In the limit
p → ∞ one obtains ||x||∞ = maxi |xi |. This latter norm is also sometimes called maximum norm
for obvious reasons. 2

A particularly interesting special case of the p-norm is given for p = 2, namely,


qX p
||x||2 := |xi |2 = (x, x). (1.12)

The norm ||x||2 is apparently related to the scalar product (·, ·). We say the norm is induced by
the scalar product (·, ·). A relevant question is whether any scalar product can be used to define a
norm in a similar manner. The answer is ’yes’. But before we can prove this, we need the following
important inequality:

Theorem 1.1 (Cauchy-Schwarz Inequality) Let h·, ·i be a scalar product. Then, for all
x, y ∈ Kn there holds p p
| hx, yi | ≤ hx, xi hy, yi (1.13)

Proof For y = 0 there is nothing to prove. So assume y 6= 0, and consider the vector

hx, yi
z := x − y (1.14)
hy, yi

By construction there holds hz, yi = 0. Consequently we have from (1.14):

hx, yi 2
 
hx, yi hx, yi
hx, xi = z+ y, z + y = hz, zi +
hy, yi
hy, yi hy, yi hy, yi
2
|hx, yi|
= hz, zi +
hy, yi
2
|hx, yi|

hy, yi

This implies (1.13). 

Now we are ready to prove that any norm induced by a scalar product is a proper norm:

Theorem 1.2 Let h·, ·i be a scalar product. Then the mapping ||·|| : Kn → R,
p
||x|| := hx, xi (1.15)
defines a proper norm.

4
Proof We need to show the properties (1.8) through (1.10). Definiteness follows from p definition
of a scalar product. Furthermore, by linearity of the scalar product, we have ||αx|| = |α|2 hx, xi =
|α| ||x|| for all α ∈ K. Consider now for x, y ∈ Kn arbitrary,
2 2 2
||x + y|| = hx + y, x + yi = ||x|| + hx, yi + hy, xi + ||y||
2 2
= ||x|| + hx, yi + hx, yi + ||y||
2 2
= ||x|| + 2Re {hx, yi} + ||y||
2 2
≤ ||x|| + 2| hx, yi | + ||y||
2 2
≤ ||x|| + 2 ||x|| ||y|| + ||y||
2
= (||x|| + ||y||) .

where we have used the Cauchy-Schwarz inequality. So the triangle inequality (1.10) holds, and
the proof is complete. 

Given the definition of norms, and in particular in view of Theorem 1.2, we may also expand
our notion of orthogonality a little bit.

Definition 1.5 Two vectors x, y ∈ Kn are said to be orthonormal with respect to the scalar
product h·, ·i, if hx, yi = 0, and furthermore both vectors have unit magnitude in the norm
induced by the scalar product h·, ·i.

In class we use norms frequently to measure the ”size” of vectors. One may wonder what norm
one should use. The notion of equivalent norms facilitates this choice:

Definition 1.6 Two norms ||·||α , ||·||β are said to be equivalent if there exist positive real num-
bers C, D, such that, for all x ∈ Kn , there holds

C ||x||α ≤ ||x||β ≤ D ||x||α .

The following statement makes things even simpler:

Theorem 1.3 In Kn all norms are equivalent.

In fact, Theorem 1.3 is true for all finite-dimensional spaces. But since we do not have a formal
definition of dimensionality yet (one will be given in Section 2), we simply note Theorem 1.3 as a
fact, and omit the proof.

5
2 Bases, Dimensionality, and Subspaces
2.1 Linear (In)dependence and Basis Vectors

Definition 2.1 We say the vectors v1 , . . . , vr ∈ Kn , are linearly dependent if there are coeffi-
cients α1 , . . . , αr ∈ K, not all zero, such that
r
X
αj vj = 0.
j=1

From the definition of linear dependence follows immediately that adding another vector to a set
of linearly dependent vectors gives again a set of linearly dependent vectors. If a non-trivial linear
combination as in Definition 2.1 is not possible, then the collection of vectors is said to be linearly
independent:

Definition 2.2 The vectors v1 , . . . , vr ∈ Kn , are said to be linearly independent if the only
way of obtaining
Xr
0= αj vj , αj ∈ K,
j=1

is α1 = · · · = αr = 0.

It is easy to see that any subset of m < r linearly independent vectors is again linearly in-
dependent. We shall soon understand the significance of linear independence. Apparently linear
combination of vectors plays a major role. Before going further, we define the set of all possible
linear combinations of a given number of vectors:

Definition 2.3 We define the span of v1 , . . . , vr ∈ Kn as the set:


( r
)
X
span(v1 , . . . , vr ) := w ∈ Kn : w = αk vk , αk ∈ K
k=1

It is easy to see that span(v1 , . . . , vr ) is a linear space. The question arises if there exists a
finite number of vectors, such that we can write any other vector in Kn as a linear combination of
these vectors. That this is indeed possible is easy to see by giving a simple example:

6
Example 2.1 Define the vectors

e1 = (1, 0, . . . , 0),
e2 = (0, 1, . . . , 0),
..
.
en = (0, 0, . . . , 1).

Then we can write any given vector v ∈ Kn as a linear combination


n
X
v= ck ek (2.1)
k=1

if we take ck = vk . Noting that vk = (v, ek ), we can also write


n
X
w= (w, ek )ek . (2.2)
k=1

Of course, we could add an arbitrary vector x ∈ Kn to the set introduced in Example 2.1 and
we would (trivially) still be able to write any vector in Kn as a linear combination of this new set,
i.e we could write for arbitrary v ∈ Kn ,
n
X
v= ck ek + dx, (2.3)
k=1

by simply setting d = 0, and choosing the remaining coefficients as in Example 2.1. We can remove
such redundancies by requiring linear independence:

Definition 2.4 We say the vectors v1 , . . . , vr ∈ Kn form a basis for Kn , if they are linearly
independent and span(v1 , . . . , vr ) = Kn .

Having a set of basis vectors thus allows us to write any vector in Kn as a linear combination of
the basis elements (because they span Kn ), and we shall see momentarily that linear independence
implies that this representation is unique. However, first note carefully that the definition of basis
does not seem to specify the number of vectors one needs to form a basis. It is not even a priori
clear whether this number is unique. However, note the following theorem:

Theorem 2.1 Exactly n linearly independent (but otherwise arbitrary) vectors v1 , . . . , vn ∈ Kn


form a basis for Kn .

We defer the proof, which requires a few preparetory steps, to Appendix A.


The number of vectors, which one needs to form a basis, is defined as the dimension dim(Kn ),
and we can conclude:

7
Theorem 2.2 The linear space Kn is n-dimensional.

The choice of a basis is not unique. There are many different ways in which one may choose the
individual basis vectors. However, once a set of basis vectors has been chosen, the representation
of any element of Kn in this basis is unique:

Theorem 2.3 For a given basis of Kn , v1 , . . . , vn , the linear combination


n
X
w= αj vj (2.4)
j=1

is unique for any w ∈ Kn , meaning the coefficients α1 , . . . , αn ∈ K are unique.

Proof Let us assume there is another set of coefficients β1 , . . . , βn ∈ K for which (2.4) holds.
Then, clearly,
Xn
0=w−w = (αj − βj )vj . (2.5)
j=1

Since the basis vectors are linearly independent we must have αj = βj for all j = 1, . . . , n. 

Consider again the basis given in Example 2.1. We could find the coefficients in expansion (2.1)
very easily by inspection. It is easy to see, however, that the representation formula (2.2) holds
for any orthonormal basis.3

Theorem 2.4 Let v1 , . . . , vn ∈ Kn be orthonormal with respect to a certain scalar product


h·, ·i. Then we can write any vector w ∈ Kn
n
X
w= hw, vi i vi (2.6)
i=1

Proof Orthonormality implies linear independence, for if we write


n
X
0= αk vk ,
k=1

where the vk are mutually orthonormal, and take the scalar product with vj for 1 ≤ j ≤ n,
we obtain α1 = · · · = αn = 0. This means that a set of n orthonormal vectors forms a basis.
Consequently, for any vector w ∈ Kn we can find coefficients α1 , . . . , αn ∈ Kn , such that
n
X
w= αk vk . (2.7)
k=1
3 Orthonormality was defined in Definition 1.5 for two vectors, but is easily extended to a set of vectors by

requiring orthonormality for any two vectors of the set.

8
Take the scalar product with vj for j = 1, . . . , n on both sides to obtain
n
X
hw, vj i = αk hvk , vj i , j = 1, . . . n.
k=1

If the vj are orthonormal, we have hvk , vj i = δkj , where δkj is the Kronecker delta4 , and thus

αk = hw, vk i . (2.8)

2.2 Subspaces of Kn
Let v1 , . . . , vm , be a collection of m ≤ n linearly independent vectors in Kn . Consider the set
Tm := span(v1 , . . . , vm ). It is clear that in the special case m = n one has Tm = Kn . What
happens for m < n? As mentioned before, it is easy to see that Tm is a linear space. Indeed, for
any x, y ∈ Tm we have, for some coefficients ck , dk ∈ K,
m
X m
X m
X
x+y = ck v k + dk v k = (ck + dk )vk ∈ Tm .
i=1 i=1 i=1

Thus the sum of any two elements of Tm again belongs to Tm . Furthermore, for arbitrary α ∈ K,
we can write
Xm m
X
αx = α ck vk = (αck )vk ∈ Tm .
i=1 i=1

Thus, Tm is a linear space in its own right. The vk , (k = 1, . . . , m) form a basis for that space, and
thus dim(Tm ) = m. Clearly any w ∈ Tm also belongs to Kn , but For m < n the converse is not
true. We say that Tm is a proper subspace of Kn . (When m < n, we write Tm ⊂ Kn . For clarity
we write Tm ⊆ Kn for the case m ≤ n, to denote possible equality of Tm and Kn .)

Example 2.2 Consider the space R2 . Any fixed vector w ∈ R2 spans a one-dimensional
subspace T . For any other vector v ∈ R2 , we have v ∈ T if v can be written v = cw, for some
c ∈ R. The subspace T can thus be characterized as a straight line going through the origin. 2

Example 2.3 Consider the space R3 . Two linearly independent vectors, v1 , v2 , span a two-
dimensional subspace T . Any vector w ∈ R3 that can be written w = c1 v1 + c2 v2 for some
c1 , c2 ∈ R, lies in T , i.e. w ∈ T ⊂ Rn . The subspace T can be characterized as a plane including
the origin. 2

4 The Kronecker delta is defined as δkj = 1, if k = j, and 0 otherwise.

9
2.3 Orthogonalization and Orthogonal Projections
We have seen in Section 2.1 that it is very convenient to work with an orthonormal basis. In this
section we will show that any V ⊆ Kn admits such a basis. We show this in a constructive manner.
Given linearly independent vectors, w1 , . . . , wm , the Gram-Schmidt procedure (Algorithm 2.1)
outputs a set of orthonormal vectors v1 , . . . , vm which span the same space.

Algorithm 2.1 Gram-Schmidt Orthogonalization


w
1: v1 := ||w1 ||
1
2: for j = 1, . . . m − 1 do
ej+1 := wj+1 − ji=1 hwj+1 , vi i vi
P
3: v
v
4: vj+1 := ||evj+1
e
j+1 ||
5: end for

Note that h·, ·i can be any scalar product. The norm ||·|| is understood to be the norm induced
by the scalar product. By induction it is easy to see that all vk are obtained by a linear combination
of the w1 , . . . , wm . Because of the linear independence of this set, none of the newly generated
vectors are the zero vector, and the algorithm is well defined. (That is to say, the normalization
steps 1 and 4 are well-defined.) Furthermore, we have the following theorem:

Theorem 2.5 Let m ≤ n, and let w1 , . . . , wm ∈ Kn be linearly independent. The


vectors v1 , . . . , vm , generated by Algorithm 2.1 are mutually orthonormal. Furthermore,
span(w1 , . . . , wm ) = span(v1 , . . . , vm )

Proof First, we note that, since all vk are obtained by a linear combination of the w1 , . . . , wm ,
span{v1 , . . . , vm } ⊆ span{w1 , . . . , wm }. If indeed the vectors v1 , . . . , vm are orthonormal (hence
linearly independent), one has dim(v1 , . . . , vm ) = dim(w1 , . . . wm ) = m, which allows us to
conclude span{v1 , . . . , vm } = span{w1 , . . . , wm }.
Consequently, showing that hvi , vk i = δik for i, k = 1, . . . , m proves the theorem. We use
induction. First, it is easy to see that hv2 , v1 i = 0. Indeed, by step 3 we have,

he
v2 , v1 i = ||v2 || hv2 , v1 i = hw2 , v1 i − hw2 , v1 i hv1 , v1 i = 0.
| {z }
=1

Since v2 6= 0, and since v1 , v2 are normalized (cf. steps 1 and 4), hvi , vk i = δik for i, k = 1, 2.
This is the base of the induction. The induction hypothesis is then that for some j there holds
hvi , vk i = δik , for i, k = 1, . . . , j. Then we have from step 3, for any k = 1, . . . , j,
j
X
||vj+1 || hvj+1 , vk i = hwj+1 , vk i − hwj+1 , vi i hvi , vk i
| {z }
i=1
=δik
= hwj+1 , vk i − hwj+1 , vk i
= 0

And, since vj+1 is normalized, we have hvi , vk i = δik for i, k = 1, . . . j + 1. This completes the
induction step to show that the vectors v1 , . . . , vm are orthonormal. 

10
The Gram-Schmidt process is usually realized by implementing step 3 using repeated overwrites.
(The modified Gram-Schmidt procedure, cf. Algorithm 2.2.) The ← denotes an overwrite, i.e. the
value previously stored for a variable, is overwritten by the expression on the right-hand side. It
is important to note, however, that in exact arithmetic, the two algorithms are equivalent.

Algorithm 2.2 modified Gram-Schmidt Orthogonalization


w
1: v1 := ||w1 ||
1
2: for j = 1, . . . m − 1 do
3: vj+1 := wj+1
4: for i = 1, . . . j do
5: vj+1 ← vj+1 − hvj+1 , vi i vi
6: end for
v
7: vj+1 ← ||vj+1 j+1 ||
8: end for

At the heart of the Gram-Schmidt orthogonalization is the concept of orthogonal projections.

Theorem 2.6 Let S be a linear subspace of Kn , and let h·, ·i be a scalar product with induced
norm ||·||. Then for each v ∈ Kn there exists a unique element w ∈ S such that

||v − w|| ≤ ||v − x|| , ∀x ∈ S (2.9)

Furthermore, (2.9) is equivalent to

hv − w, xi = 0, ∀x ∈ S (2.10)

Proof We first show that for each v ∈ Kn there exists unique w ∈ S such that (2.10) holds.
Let the dimension of S be m ≤ n. Consider an orthonormal basis v1 , . . . , vm for S and define for
arbitrary v ∈ Kn ,
Xm
w := hv, vk i vk . (2.11)
k=1

We show that v − w is orthogonal to all elements in S. Any vector x ∈ S can be written as


m
X
x= ck vk
k=1

for some coefficients c1 , . . . , cm ∈ K. This means


* m m
+
X X
hv − w, xi = v− hv, vk i vk , ck vk
k=1 k=1
m
X m
X Xm
= ck hv, vk i − cl hv, vk i hvk , vl i
| {z }
k=1 l=1 k=1
=δkl
m
X m
X
= ck hv, vk i − ck hv, vk i
k=1 k=1
= 0

11
To show uniqueness, assume first that there are two elements w1 , w2 ∈ S for which (2.10)
holds. Then we have for all x ∈ S:

hv − w2 , xi − hv − w1 , xi = 0
| {z } | {z }
=0 =0
⇔ hw1 − w2 , xi = 0
2
This must hold in particular for x = w1 − w2 , and so ||w1 − w2 || = 0. This implies w1 = w2 .
Finally, we show that (2.9) and (2.10) are equivalent. Assume first that (2.9) holds, and set
e := v −w. Every vector is orthogonal to the zero vector, so for x = 0, (2.10) holds. Now consider
arbitrary x ∈ S\{0}, and define
he, xi
α := 2 .
||x||
By assumption we must have
||e − αx|| ≥ ||e|| , ∀x ∈ S.
So consider
2 2 2
||e − αx|| = ||e|| − α he, xi − α hx, ei + |α|2 ||x||
2 | he, xi |2 | he, xi |2 2
= ||e|| − 2 2 + 4 ||x||
||x|| ||x||
2 | he, xi |2
= ||e|| − 2
||x||
2
≤ ||e|| .

We thus get a contradiction, unless

he, xi = hv − w, xi = 0, ∀x ∈ S.

Now assume that for some given v ∈ Kn , and some w ∈ S eq. (2.10) holds. Consider arbitrary
z ∈ S. One has
2 2 2 2 2
||v − z|| = ||(v − w) + (w − z)|| = ||v − w|| + ||w − z|| ≥ ||v − w||

where the second equality holds, because w − z ∈ S, and by assumption v − w is orthogonal to


all vectors in S.


The vector w defined in eq. (2.11) is called the orthogonal projection of v onto the subspace
S. This helps us understand how the Gram-Schmidt procedure works: For each new basis vector,
one subtracts out the orthogonal projection onto the space spanned by the previous basis vectors.
The resulting vector is necessarily orthogonal to all vectors in that space.
In general, a projector (not necessarily orthogonal) can be defined as follows:

Definition 2.5 A mapping P : Kn → Kn is called a projector, if


P (P v) = P v (2.12)
2
An alternative way of stating this is P = P .

12
Example 2.4 Let v, w ∈ Kn with ||w|| = 1, and let S := span{w}. Consider PS v := hv, wi w.
If we denote c := hv, wi, we have

PS (PS v) = PS cw
= hcw, wi w
= c hw, wi w
= cw
= PS v

A projector thus produces a vector that lies in the space S ⊆ Kn , but projecting the resulting
vector onto S again (or, for that matter, applying the projector to any element x ∈ S), will have
no effect. We say, the space S is invariant under the action of the projector P .

3 Matrices and Linear Systems


We define Km×n as the set of matrices
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= . , aij ∈ K, i = 1, . . . , m, j = 1, . . . , n.
 
..
 .. . 
am1 am2 . . . amn

Often we use shorthand notation to characterize a matrix in terms of its entries,

A = (aij ) i=1,...,m .
j=1,...,n

Often the index range is omitted. Alternatively, we sometimes use the notation A = (a1 | . . . |an ) ∈
Km×n to denote a matrix composed of columns ak ∈ Km .
It is easy to show that set of matrices is a (finite-dimensional) linear space, if we define the
matrix addition by
 
a11 + b11 a12 + b12 . . . a1n + b1n
 a21 + b21 a22 + b22 . . . a2n + b2n 
A + B :=  , (3.1)
 
.. ..
 . . 
am1 + bm1 am2 + bm2 ... amn + bmn

and scalar multiplication by


 
αa11 αa12 ... αa1n
 αa21 αa22 ... αa2n 
αA =  . (3.2)
 
.. ..
 . . 
αam1 αam2 ... αamn

13
3.1 Matrices as Operators
Matrices are often used to represent linear operators in Euclidean vector space. Indeed, any matrix
A = (a1 | . . . |an ) ∈ Km×n defines a linear operator A : Kn → Km , x 7→ Ax = y, if we define the
matrix-vector product as
Xn
Ax := xi ai ≡ y ∈ Km . (3.3)
i=1

A matrix-vector product is thus seen to be a linear combination of the columns of the matrix A.
The components of y can equivalently be written
n
X
yi = aij xj , i = 1, . . . , m. (3.4)
j=1

While (3.4) is perhaps the more conventional way to write the result of a matrix-vector product,
the form (3.3) is often more useful from an analytical point of view.
However, for a given A, not all vectors y ∈ Rm can necessarily be obtained as a result of such
an operation. This leads us to the following definition:

Definition 3.1 We define the range of a matrix A ∈ Km×n as the set

range(A) := {y ∈ Rm : y = Ax for some x ∈ Rn }

It is clear that range(A) ⊆ Rm . In view of eq. (3.4) it is clear that range(A) = Rm if A has
m linearly independent columns. Note that this is only possible if n ≥ m. If A does not have
sufficiently many linearly independent columns, then there must be vectors in Km that cannot be
obtained from Ax for any x ∈ Kn .

Definition 3.2 The number of linearly independent columns of a matrix A is called the column
rank of the matrix.

So clearly range(A) is equal to the column rank of A. We may also define the row rank of a
matrix as the number of linearly independent rows. It turns out that in practice we do not need
to distinguish between row rank and column rank:

Theorem 3.1 The row rank of a matrix is equal to the column rank.

Proof The proof proceeds in several steps. Let us say a column (row) of a matrix A ∈ Km×n is
redundant if it can be expressed as a linear combination of the other columns (rows). We show
that, (1) removing any redundant columns or rows does not change either row rank or column
0 0
rank of a matrix. Then we show that, (2) the matrix A0 ∈ Km ×n with m0 ≤ m and n0 ≤ n,
produced by removing all redundant rows and column, is necessarily square. This allows us to
conclude that the row rank is equal to the column rank.

14
1. We only show that removing a redundant column does not change the row rank of a matrix.
(It does certainly not change the column rank, by definition!) The corresponding statement
about removing redundant rows can be proved in an entirely similar manner.
Let us assume that the j th column of A is redundant. Write ak := (ak1 , . . . , akn ) to denote
the k th row of A. Similarly, we denote the k th row with the j th element removed as a0k . To
show that removing a redundant column does not change the row rank, it is enough to show
that a linear combination of ak sums to zero if and only if a linear combination of a0k , sums
to zero, i.e. for any 1 ≤ r ≤ m,
r
X r
X
ck ak = 0 ⇐⇒ ck a0k = 0. (3.5)
k=1 k=1

The direction =⇒ is trivial. Assume now that the right equality in (3.5) holds. Then the
left-hand side holds as well, provided the elements of the j th column sum to zero, i.e.
r
X
ck akj = 0.
k=1

But since the j th column is redundant, we can write for some coefficients di , not all zero,
r
X r
X n
X
ck akj = ck di aki
k=1 k=1 i=1
i6=j

n r
!
X X
= di ck aki =0
i=1 k=1
i6=j

The term in brackets vanishes by the right-hand side of (3.5).


2. By definition, all columns and all rows of A0 are linearly independent. The column rank
of A0 (and hence of A) is thus equal to the number of remaining columns n0 , and the row
rank is equal to the number of remaining rows m0 . To see that this implies that the matrix
is square, assume m0 6= n0 , say m0 > n0 . This leads to a contradiction, as no more than n0
0
vectors can be linearly independent in Kn .

Instead of row rank and column rank, we simply speak of the rank of a matrix A ∈ Km×n ,
written rank(A). We can easily deduce the following corollary

Corollary 3.1 The maximum rank a matrix A ∈ Km×n can have is rank(A) = min(n, m).

If the maximum rank is achieved we say the matrix has full rank. An important example of
n × n matrices which do not have full rank, and hence whose range is a proper subset of Kn , is the
matrix-representation of projectors, which we have introduced in section 2.3:

15
Example 3.1 Let x = (x1 , x2 )T ∈ R2 . We define the matrix
 
T x1 x1 x1 x2
P := xx = .
x2 x1 x2 x2

For arbitrary y ∈ R2 , the result of the operation Ay is always a vector in direction of x:


  
x1 x1 x1 x2 y1
Py = = (y, x)x
x2 x1 x2 x2 y2

This shows that P is nothing but the matrix representation of the orthogonal projector that was
defined in Example 2.4. Thus, range(P ) = {αx, α ∈ R} ⊂ R2 is a one-dimensional subspace of
R2 . 2

Definition 3.3 We define the nullspace of a matrix A ∈ Km×n as the set

null(A) := {x ∈ Rn : Ax = 0}

Definition 3.4 If the null space of a matrix A ∈ Km×n contains only the zero-vector, we say
the matrix non-singular. Otherwise the matrix is called singular.

(Note that for m < n the matrix A ∈ Km×n is always singular.) From the definition of the
matrix-vector product, eq. (3.3), it is again easy to see that a matrix is non-singular if and only if
its columns are linearly independent.
Consider the successive application of two matrix-vector products. It is clear that for x ∈ Kn
the composition B(Ax) is well defined for matrices A ∈ Kk×n and B ∈ Km×k , where m, k ∈ N
are arbitrary. The result is a vector y ∈ Km . This motivates the definition of the matrix-matrix
product:

Definition 3.5 Let A ∈ Kk×n and B ∈ Km×k . Then BA := C ∈ Km×n , where


k
X
cij = bil alj .
l=1

With this definition we have Cx = B(Ax). Note that the number of rows of A must be the
same as the number of columns of B for BA to be well-defined. In general AB 6= BA. For square
matrices A ∈ Kn×n , we define the identity matrix I = (e1 | . . . |en ), i.e. the matrix having the basis

16
vectors defined in Example 2.1 as columns:
 
1 0 ... 0
 0 1 ... 0 
I= . (3.6)
 
.. .. ..
 . . . 
0 0 ... 1

From the definition of the matrix-matrix product, it is clear that AI = IA = A.

3.2 Matrix Norms


It turns out that we can equip matrices with a proper norm. At first, this will be done in such a
way, as to do justice to the characterization of matrices as linear operators. For convenience we
restrict ourselves to the square matrices A ∈ Kn×n .

Definition 3.6 Let 1 ≤ p ≤ ∞, and define for A ∈ Kn×n

||Ax||p
||A||p := max . (3.7)
nx∈K
x6=0
||x||p

where ||·||p is the vector-p-norm introduced in Example 1.2.

This norm measures in the ||·||p norm the maximum change of length, which the matrix-operator
can cause, given all possible (non-zero) input vectors. (We say the matrix norm is induced by the
norm ||·||p .) Norms of the type (3.7) are also called operator norms. It is relatively easy to verify
that they satisfy the axioms put forth in Definition 1.4. Definiteness and positive homogeneity
follow in a relatively straightforward manner from the definition. To show the triangle inequality,
we first note that by homogeneity, (3.7) is equivalent to

||A||p = max ||Ax||p .


||x||p =1

It is often more convenient to use this form. (It saves some writing, if nothing else.) Now consider
 
||A + B||p = max ||(A + B)x||p ≤ max ||Ax||p + ||Bx||p
||x||p =1 ||x||p =1

≤ max ||Ax||p + max ||Bx||p = ||A||p + ||B||p


||x||p =1 ||x||p =1

Operator norms of the type (3.7) are thus proper norms in the sense of Definition 1.4, operating
on the elements of Kn×n instead of Kn . However, there is more:

Theorem 3.2 The norm (3.7) satisfies

||AB||p ≤ ||A||p ||B||p (3.8)

17
Proof First note that
||ABx||p ||ABx||p ||Bx||p
||AB||p = max = max
n
x∈K
x6=0
||x||p n x∈K
x6=0
||Bx||p ||x||p

(We have implicitly restricted ourselves to taking the maximum over such x that are not in the
null space of B. Clearly this is not a restriction as ABx = 0 otherwise.) Setting y := Bx, we can
write
||Ay||p ||Bx||p
||AB||p ≤ max max = ||A||p ||B||p .
y∈Kn
y6=0
||y||p
x∈Kn
x6=0
||x||p


Property (3.8) is sometimes called sub-multiplicativity. It is often put forth as a separate axiom
when one wishes to define general norms for matrices.

Definition 3.7 Any mapping ||·|| : Kn×n → R that satisfies the axioms of Definition 1.4, and
additionally satisfies (3.8) is called a matrix norm.

We have shown that the operator norm (3.7) is a proper matrix norm. There are other ways,
however, to define proper matrix norms. If we think of all the elements of the matrix as being
elements of a large vector of size n2 , and apply the usual 2−norm, i.e.
  12
n
X
||A||F :=  |aij |2  ,
i,j=1

we obtain another matrix norm. The usual norm properties are inherited from the definition of the
2−vector norm, and one can show that sub-multiplicativity also holds. (When applied to matrices
this norm is usually called the Frobenius-norm. Hence the subscript F .) We have noted before, in
Theorem 1.3, and it is worth pointing out again, that on finite dimensional spaces all norms are
equivalent, so in particular all matrix norms are equivalent.
Definition 3.6 seems a bit unwieldy for practical purposes. Does one have to compute the
maximum in (3.7) each time one wishes to evaluate the norm of a matrix? Luckily, for important
special cases there are explicit formulae that make the computation of a matrix norm quite easy.
As an example, we note:

Theorem 3.3 Let A ∈ Kn×n . The norm ||A||∞ is given by


n
X
||A||∞ = max |aij | , (3.9)
i=1,...n
j=1

i.e. the maximum absolute row sum.

18
Proof Using the definition of the norm ||·||∞ for vectors, one has

n n n
X X X
||Ax||∞ = max aij xj ≤ max |aij xj | ≤ ||x||∞ max |aij |
1≤i≤n 1≤i≤n j=1 1≤i≤n
j=1 j=1

We have shown
n
X
||A||∞ ≤ max |aij |.
1≤i≤n
j=1

We show that the reverse inequality also holds, which completes the proof.
We may assume that A contains non-zero entries. (Otherwise (3.9) is trivial.) Let k be the
index of a non-zero row, and let x be such that xj = 1 whenever akj = 0, and xj = akj /|akj |
otherwise. This construction yields ||x||∞ = 1, so that

n n n
X X X
||A||∞ ≥ ||Ax||∞ = max aij xj ≥
akj xj = |akj |.
1≤i≤n
j=1 j=1 j=1

We can do such a construction for any non-zero row k. For a zero row the same inequality holds
trivially, so we can conclude
Xn
||A||∞ ≥ max |akj |
1≤k≤n
j=1

In a similar manner one can prove

Theorem 3.4 The norm ||A||1 is given by


m
X
||A||1 = max |aij | ,
j=1,...n
i=1

i.e. the maximum column sum.

We give more examples below.

3.3 Determinants
The determinant of a matrix is defined only for square matrices. There are several ways in which
a determinant can be defined. (All, of course, leading to the same result.) Here, we choose a
recursive definition:

Definition 3.8 Let A ∈ Kn×n . The determinant of A is defined by setting det A = a for n = 1,
where a is the lone element of the matrix, and for n > 1,
n
X
det A = (−1)i+j aij det Aij (3.10)
i=1

19
where 1 ≤ j ≤ n is arbitrary, and Aij ∈ K(n−1)×(n−1) is the matrix that is obtained by deleting
the ith row and the j th column of A. The expressions (−1)i+j det Aij are called the cofactors
of A.

The recursive Definition 3.8 is called the Laplace expansion. It is independent of whether one
sums over the rows or columns in (3.10), i.e.
n
X
det A = (−1)i+j aij det(Aij ) 1 ≤ j ≤ n arbitrary
i=1
Xn
= (−1)i+j aij det(Aij ) 1 ≤ i ≤ n arbitrary
j=1

Example 3.2 The determinant of a 1 × 1 matrix, a 2 × 2 matrix, and a 3 × 3 matrix are given
by, respectively,

det(a11 ) = a11
 
a11 a12
det = a11 a22 − a12 a21
a21 a22
 
a11 a12 a13
det  a21 a22 a23  = a11 (a22 a33 − a32 a23 ) + a12 (a31 a23 − a21 a33 ) + a13 (a21 a32 − a31 a22 )
a31 a32 a33
2

Determinants satisfy the following property

Theorem 3.5 Let A, B ∈ Kn×n . Then

det(AB) = det A det B (3.11)

3.4 Solution of a Linear System


Consider a square matrix A ∈ Kn×n . A matrix-vector product is understood as a ”forward”
problem: We are given a vector x ∈ Kn as input, and compute y = Ax. This is a straightforward
computation. Of particular interest in this class, however, is the ”inverse” problem of finding, for
given b ∈ Kn , a vector x ∈ Kn , such that

Ax = b. (3.12)

This is called a linear system of equations. It is much less trivial to compute this vector x. In fact,
it is not guaranteed that such a vector exists. We have already seen in section 3.1 that a necessary
condition is that the matrix have full rank. If it does not have full rank, the system can only be

20
solved if it so happens that the right-hand side lies in the space spanned by the columns of A (the
”column space”). We note a few equivalent conditions for (3.12) to possess a unique solution:

Theorem 3.6 Let A ∈ Kn×n . The following statements are equivalent

1. For each b ∈ Kn the system (3.12) has a unique solution x ∈ Kn .


2. A in (3.12) has full rank.
3. The columns (and rows) of A are linearly independent.
4. We have det(A) 6= 0

5. The system (3.12) with b = 0 has the unique solution x = 0.


6. There exists a unique matrix A−1 such that I = A−1 A = AA−1 .

The matrix A−1 is called the inverse of A.


Of course the question arises how to compute the solution x of the system (3.12). An important
role is played by the LU decomposition of the matrix A. We first define permutation matrices. We
denote a permutation of the index tuple (1, 2, . . . , n) by σ. Consider for example the index tuple
(1, 2, 3). Then a permutation is written σ = (σ1 , σ2 , σ3 ) = (2, 3, 1).

Definition 3.9 Let σ be a permutation of the index tuple (1, 2, . . . , n). And let ei be the canon-
ical basis vectors for Kn , defined in Example 2.1. Then a permutation matrix is defined as

Pσ = eσ1 | eσ2 | . . . |eσn .

This simply means that a permutation matrix can be obtained from the identity by exchanging
columns. Multiplying a matrix A from the left by a permutation matrix, i.e. computing A e = PA
has the effect of exchanging the rows of A according to the permutation σ that defines P . (Similarly,
AP exchanges columns.) Furthermore, we define

Definition 3.10 We say a matrix A ∈ Kn×n is lower (respectively upper) triangular if aij = 0
for all i < j (respectively i > j). We say a matrix is unit triangular (or normalized triangular)
if aii = 1 for all 1 ≤ i ≤ n.

Theorem 3.7 For every non-singular matrix A ∈ Rn×n there exists a permutation matrix P ,
and a corresponding unique normalized lower triangular matrix L, and a unique upper triangular
matrix U , such that
P A = LU

21
The theorem stipulates normalization of L. This is merely a convention. Alternatively U may
be normalized, and a unique LU factorization exists with non-normalized L. The factorization is
unique, once either L or U is normalized.
Given a factorization like this, the linear system (3.12) can be rewritten as

Ly = P b,
U x = y.

We have thus converted one linear system of equations into two such systems. Nevertheless, solving
this is now easy since L and U are triangular matrices. (You should convince yourself that solving
for y and x with these triangular matrices is indeed easy!) This means that first one solves a linear
system for y, and subsequently a second linear system is solved for x using the previous result y.

3.5 Properties of Matrices


Matrices may possess certain special properties that play an important role in solving linear sys-
tems. We give an overview of some of them in this section.

Definition 3.11 Let A ∈ Km×n and B ∈ Kn×m . We say B is the Hermitian transpose of A
if bji = aij . We usually write AH := B for the Hermitian transpose. (For K = R, we simply
have bij = aji . In this case we speak of a transposed matrix, denoted AT .)

One may verify a few simple rules:

(A + B)H = AH + B H
(AB)H = B H AH

Definition 3.12 A square matrix A ∈ Kn×n is called Hermitian if AH = A. If AH = −A it


is called skew-hermitian. (A real hermitian matrix is a symmetric matrix, i.e. AT = A. A real
skew-hermitian matrix, i.e. AT = −A, is called skew-symmetric.)

It is trivial to split any square matrix A ∈ Kn×n into a hermitian and a skew-hermitian part:
1 1
A= (A + AH ) + (A − AH ) .
2 | {z } 2 | {z }
Hermitian skew−Hermitian

Definition 3.13 A symmetric matrix A ∈ Rn×n which satisfies xT Ax ≥ 0 for all x ∈ Rn is


called symmetric positive semi-definite. If strict inequality holds for all x ∈ Rn \{0}, it is called
symmetric positive definite (s.p.d.).

Symmetric positive definite matrices play an important role in discretization of PDE, since such
systems arise quite naturally for many problems. Factorizations of such matrices are simpler than
for general non-singular matrices, in a sense specified by the following theorems

22
Theorem 3.8 Let A ∈ Rn×n be symmetric positive definite. Then there exists a unique fac-
torization
A = LU
where L is a normalized lower triangular matrix, and U is an upper triangular matrix.

This means the LU factorization may thus be computed without permutation. However, for
s.p.d. matrices there is an even better way to factorize. First note the following definition

Definition 3.14 A matrix D ∈ Kn×n with the property dij = 0 for i 6= j is called a di-
agonal matrix. We often characterize a diagonal matrix in terms of its entries by writing
D = diag(d11 , . . . , dnn ).

Theorem 3.9 Let A ∈ Rn×n be symmetric positive definite. Then there exists a unique fac-
torization
A = LDLT
where L is a normalized lower triangular matrix, and D is a diagonal matrix with di,i > 0 for
1 ≤ i ≤ n.

This factorization is called the Cholesky factorization. Alternatively, the Cholesky factorization
may be written as A = L e T L,
e where Le is a non-normalized lower triangular matrix, which is related
e = LD , where D1/2 = diag( d1,1 , . . . , dn,n ), and di,i are the entries of the diagonal
1/2
p p
to L by L
matrix defined in Theorem 3.9.

Definition 3.15 Let A ∈ Kn×n . We say a matrix is diagonally dominant if


n
X
|aii | ≥ |aij |, i = 1, . . . , n
j=1
j6=i

If strict inequality holds we say the matrix is strictly diagonally dominant. If strict inequality
holds for at least one row, then the matrix is called irreducibly diagonally dominant.

Definition 3.16 A matrix U ∈ Kn×n is called unitary if U H U = I, where I is the identity


matrix.

Note that this implies that the columns (and rows) of U are mutually orthonormal in the
standard scalar product. This definition also implies U −1 = U H . For the real case, K = R, a

23
matrix Q ∈ Rn×n satisfying QT Q = I is called orthogonal. An important property of a unitary
matrix is that it does not change the 2-norm of a vector:

Theorem 3.10 Let x ∈ Kn and U ∈ Kn×n be a unitary matrix. Then

||U x||2 = ||x||2 .

Proof Let y := U x, and write U = (u1 | . . . |un ), where the ui ∈ Kn are mutually orthonormal.
Then !
n
X Xn Xn
2 2
||y||2 = (y, y) = xi ui , xi ui = |xi |2 = ||x||2
i=1 i=1 i=1


Theorem 3.11 let U1 , U2 ∈ Kn×n be unitary. Furthermore, let

U := U1 U2 .

Then U is a unitary matrix.

Proof A straightforward computation yields

U H U = (U1 U2 )H U1 U2 = U2H U1H U1 U2 = U2H U2 = I


| {z }
=I

Inductively one can prove that the product of any number of unitary matrices is again a unitary
matrix.
For some iterative solvers for linear systems Hessenberg matrices play an important role:

Definition 3.17 We call a matrix A ∈ Km×n upper Hessenberg, respectively lower Hessenberg,
if aij = 0 for i > j + 1, respectively i < j − 1.

This simply means that an upper (lower) Hessenberg matrix has only zero-entries below the
first subdiagonal (above the first superdiagonal).

3.6 Matrices and Bases


In class we often use convenient matrix notation. The representation of vectors in an orthogonal
basis wj is a good example. Let us restrict ourselves for the time being to the standard scalar
product h·, ·i ≡ (·, ·), and consider first (2.8), the formula for the coefficients αj that multiply the

24
basis vectors in (2.7). Those coefficients, when collected in a vector a = (α1 , . . . , αn )T , can be
obtained by a matrix-vector product
a = W T v, (3.13)
where W ∈ Rn×n is an orthogonal matrix having the basis vectors wj as columns. Since the
coefficients for the representation in a certain basis are unique for a given vector, these coeffi-
cients themselves can be viewed as an alternative, but equivalent representation of the vector v.
Conversely, given the coefficients a, the assembly of the vector v in the basis W can be written

v = W a, (3.14)

which is a trivial consequence of (3.13), and the fact that W is an orthogonal matrix. However, one
should also appreciate the equivalence between (3.14) and the summation in (2.7): A matrix-vector
product is nothing but a linear combination of the columns of the matrix!5
n
X
Wa = αi wi
i=1

Thinking of a matrix-vector product in this way is the key to understanding basic concepts in this
class.
Now consider a matrix V ∈ Rn×m , which is not square, but has (much) fewer columns than
rows. These columns span a subspace T ⊂ Rn . If the columns are orthonormal, we can view a
matrix vector product
V T x =: y ∈ Rm
as the representation of the orthogonal projection of the vector x ∈ Rn onto T ⊂ Rn (cf. Sec-
tion 2.3). That is to say, the vector y ∈ Rm contains those coefficients, which correspond to the
representation of the projected vector in the basis given by the columns of V . Given the coefficients
y we can assemble the projected vector x e as
m
X
x
e =Vy = yi vi .
i=1

Even though it belongs to a lower-dimensional subspace T ⊂ Rn , the vector x e is an n-vector (i.e.


contains n entries). However, using the representation by coefficients, y ∈ Rm , an m-dimensional
coefficient-vector is enough to represent x
e. (Of course the corresponding basis must be fixed!)

4 Eigenvalues and Eigenvectors


The concepts outlined in this section do not apply to rectangular matrices A ∈ Km×n with m 6= n.
We shall therefore consider only square matrices here.

Definition 4.1 Let A ∈ Kn×n . If for nonzero v ∈ Cn there holds

Av = λv

for some λ ∈ C, we call λ an eigenvalue of A. The vector v is called (right) eigenvector.

5 Recall: A matrix-vector product Ax = b produces a vector in the column space of A.

25
Note carefully that we explicitly define eigenvectors and eigenvalues over the field of complex
numbers C. That is to say even if the matrix is real, i.e. we have K = R, we can have complex
eigenvalues and eigenvectors. This will become clearer shortly.
From Definition (4.1) it is immediately obvious that eigenvectors are defined only up a scalar
multiplicative factor. That is to say, if v is an eigenvector, say of a matrix A ∈ Rn×n , then αv is
also an eigenvector of the same matrix, for any α ∈ R\{0}. In particular, we can always assume
that an eigenvector is normalized, i.e. ||v|| = 1 in any convenient norm.

Definition 4.2 We define the spectrum of a square matrix A as

σ(A) = {λ ∈ C : λ is an eigenvalue of A}.

How does one determine eigenvalues? How many eigenvalue/ eigenvector pairs does a matrix
have? Let us first go back to Definition 4.1, and deduce that the matrix A − λI is singular, if λ is
an eigenvalue of A. Hence we have

Theorem 4.1 Let A be a square matrix, and λ ∈ σ(A). Then λ satisfies the characteristic
equation
det(A − λI) = 0.

Note that det(A − λI) is a polynomial of degree n in λ. Thus it is clear that an n × n matrix
always has exactly n eigenvalues. This follows from the fundamental theorem of algebra, which
states that a polynomial of degree n always has exactly n roots. Note, however, that some of these
may be multiple roots, so that we cannot deduce that a matrix always has n distinct eigenvalues.
Also it is clear that eigenvalues may be complex, even if the matrix is real, since polynomials with
real-valued coefficients may well have complex roots. (Example: f (x) = x2 + 1.) For real matrices
then, complex eigenvalues come as complex conjugate pairs.

4.1 Properties of Eigenvalues

Theorem 4.2 Let A ∈ Kn×n be singular. Then A has at least one eigenvalue λ = 0. Con-
versely, a non-singular matrix cannot have a zero eigenvalue.

Proof If A is singular there exists x 6= 0 such that Ax = 0, which means that x is an eigenvector
with eigenvalue λ = 0. By the same rationale, any non-singular matrix cannot have a zero
eigenvalue, since otherwise it is implied that Ax = 0 for some x 6= 0, which contradicts the
assumption. 

Theorem 4.3 Let A be Hermitian. Then all eigenvalues of A are real.

26
Proof Let λ be an eigenvalue of A, and let v be a corresponding eigenvector. Assume for
convenience that ||v||2 = 1. Then

Av = λv ⇒ vH Av = λ. (4.1)

By taking the hermitian transpose, and using AH = A we also have

vH A = λvH ⇒ vH Av = λ. (4.2)

Comparing (4.1) with (4.2), we see that λ = λ, and hence the eigenvalues are real. 

Note that for the case A ∈ Rn×n we deduce that symmetric matrices have real eigenvalues.

Theorem 4.4 Let A ∈ Rn×n be symmetric positive definite. Then all eigenvalues of A are real
and positive.

Proof By Theorem 4.3 we know that symmetric matrices have real eigenvalues. By definition of
positive definiteness, for any non-zero vector x, we have xT Ax > 0. Take x to be an eigenvector
of A, and write
0 < xT Ax = xT xλ (4.3)
2
Since xT x = ||x||2 > 0 we must have λ > 0. 

The converse implication also holds, but we defer the proof to a later section. Evidently the
number zero cannot be an eigenvalue of a symmetric positive definite matrix. Therefore we have

Corollary 4.1 Symmetric positive definite matrices are non-singular.

Theorem 4.5 Let A be an n × n matrix, and let λ ∈ σ(A). For any natural number, there
holds
λk ∈ σ(Ak )

Proof The proof is a simple calculation of the eigenvalue of Ak , given an eigenvalue λ of A with

27
corresponding eigenvector v:

Ak v = AA . . . A} v
| {z
k
= AA . . . A} λv
| {z
k−1
..
.
= AA . . . A} λl v
| {z
k−l
..
.
= λk v

Theorem 4.5 actually also holds for negative integers. In particular, we have the following
theorem:

Theorem 4.6 Let A ∈ Kn×n be non-singular, and λ ∈ σ(A) with corresponding eigenvector v.
Then λ−1 ∈ σ(A−1 ) with the same eigenvector v.

Proof For λ ∈ σ(A) one has

Av = λv
−1
⇔A Av = λA−1 v
1
⇔ v = A−1 v.
λ


Theorem 4.7 Let A ∈ Kn×n , and λ ∈ σ(A) with corresponding eigenvector v. Then, for any
µ ∈ C, we have λ − µ ∈ σ(A − µI) with the same eigenvector v.

Proof A straighforward computation yields

(A − µI)v = Av − µv = (λ − µ)v

By definition then, λ − µ is an eigenvalue of A − µI, and v is the corresponding eigenvector. 

What about linear combinations of matrices? Can we say anything about the eigenvalues of
αA, or A + B if we know the eigenvalues of A and B? Unfortunately, there is a simple relation
only for the former case. But at least we can state a simple fact about the eigenvalues of sums of
matrices, for such eigenvalues that correspond to the same eigenvector.

28
Theorem 4.8 Let A, B ∈ Kn×n , and let α ∈ K. Assume λ ∈ σ(A) with corresponding
eigenvector v, and µ ∈ σ(B), again with corresponding eigenvector v. Then the following
properties hold:

αλ ∈ σ(αA)
λ + µ ∈ σ(A + B)

Proof Both statements can be shown by a direct computation. Indeed, we have

(αA)v = α(Av) = α(λv) = (αλ)v,

and
(A + B)v = Av + Bv = λv + µv = (λ + µ)v.


The assumption that eigenvalues of two different matrices share the same eigenvector, is very
restrictive. There are a few applications where this becomes relevant. In general, however, we
cannot assume that eigenvectors are shared in this manner.

Theorem 4.9 Let A ∈ Kn×n matrix with eigenvalues λ1 , . . . , λn . Then there holds
n
X
trace(A) = λi (4.4)
i=1
Yn
det(A) = λi (4.5)
i=1

4.2 Eigenvectors and Transformations


Counting multiplicity, a matrix of dimension n × n has exactly n eigenvalues. What about the
corresponding eigenvectors? It is often of great importance that all the eigenvectors of a matrix
be linearly independent, or better yet: orthogonal.

Definition 4.3 We say a matrix A ∈ Cn×n is defective if it does not possess a full set of n
linearly independent eigenvectors. Otherwise it is called non-defective.

We should start motivating why non-defective matrices play an important role. We start by
defining similar matrices.

29
Definition 4.4 Two matrices A, B ∈ Kn×n are called similar if there exists a non-singular
matrix X, such that
B = X −1 AX (4.6)
The mapping (4.6) is called a similarity transformation. We say A is similar to B.

For the special case that X is unitary, we can write B = X H AX, and we speak of a unitary
transformation.

Theorem 4.10 Similar matrices have the same eigenvalues, including multiplicity.

Proof Let X be a non-singular square matrix, and let A be a square matrix of the same size.
We show that A and B = X −1 AX have the same characteristic polynomial:

det(B − λI) = det(X −1 AX − λI)


= det(X −1 (A − λI)X)
= det(X −1 ) det(X) det(A − λI)

We obtain immediately from Theorem 3.5 that det(X −1 ) det(X) = det(I) = 1. 

The most important thing about non-defective matrices is that they can be diagonalized :

Theorem 4.11 Let A ∈ Kn×n be non-defective. Then there exists a non-singular matrix V ∈
Cn×n , whose columns are the eigenvectors of A, such that

V −1 AV = Λ (4.7)

where Λ = diag(λ1 , . . . , λn ) is a diagonal matrix, whose entries are the n eigenvalues of A.

Proof If we arrange the eigenvectors of the matrix A as columns of the matrix V , we obtain
directly from Definition 4.1
AV = V Λ
By assumption V is non-singular, and (4.7) follows. 

Theorem 4.11 tells us that if a matrix possesses a full set of linearly independent eigenvectors, it
is similar to a diagonal matrix. We would like to know for which types of matrices diagonalization
is possible. Perhaps surprisingly, the answer is not so straightforward. For general matrices one
cannot easily tell whether they are defective. At least for eigenvectors corresponding to distinct
eingenvalues, we can state the following:

30
Theorem 4.12 Let A be a square matrix, and λa , λb ∈ σ(A) with λa 6= λb . Then the corre-
sponding eigenvectors are linearly independent. If A is Hermitian, the corresponding eigenvec-
tors are orthogonal.

Proof Assume first that λa 6= λb , and the corresponding eigenvectors va and vb are linearly
dependent. This means that va = αvb for some constant α. Since eigenvectors are specified only
up to a constant, we may set va = vb . Then we have

0 = A(va − va ) = (λa − λb )va .

But this is in contradiction to the assumption λa 6= λb .


Now let A be Hermitian, i.e. A = AH . Using the Hermitian transpose, we compute

Avb = λb vb ⇐⇒ vbH A = λb vbH

(Recall that eigenvalues of Hermitian matrices are real.) Taking the dot product with va we have

vbH Ava = λb vbH va ⇐⇒ 0 = (λa − λb )vbH va .

If λa 6= λb the eigenvectors must be orthogonal. 

Thus, if a matrix A ∈ Kn×n possesses n distinct eigenvalues, it is not defective. However, this
condition, while being sufficient, is far too restrictive. Having distinct eigenvalues is not necessary
for a matrix to have a full set of linearly independent eigenvectors. In fact we show below that some
important classes of matrices always possess a full set of linearly independent (or even orthogonal)
eigenvectors, even though the eigenvalues are not necessarily distinct. To prepare the necessary
theorems, we first note the following result

Theorem 4.13 Let A ∈ Kn×n be partitioned as


 
a11 a12
A=
0 a22

where a11 ∈ Kp×p and a22 ∈ Kq×q , such that p + q = n. Then the eigenvalues of A are the
union of the eigenvalues of a11 and those of a22 , counting multiplicities.

Proof Let λ ∈ C be any eigenvalue of A. Applying the partition of (4.8) to the corresponding
eigenvector v, i.e.  
v1
v= , v 1 ∈ Cp , v 2 ∈ C q ,
v2
one has     
a11 a12 v1 v1
=λ ,
0 a22 v2 v2
or
a11 v1 + a12 v2 = λv1
a22 v2 = λv2

31
Now there are two possibilities:

1. v2 6= 0: In this case λ is an eigenvalue of a22 with corresponding eigenvector v2 .


2. v2 = 0: Then v2 is not an eigenvector of a22 , but instead one has a11 v1 = λv1 , and λ is an
eigenvalue of a11 with corresponding eigenvector v1 . (Note: if v2 = 0, then v1 6= 0, since
by assumption v 6= 0 is an eigenvector of A.)

So any eigenvalue of A is an eigenvalue of either a11 or a22 , i.e. σ(A) ⊂ σ(a11 ) ∪ σ(a22 ). But the
set σ(a11 ) ∪ σ(a22 ) cannot be larger than σ(A), since a11 has exactly p eigenvalues, and a22 has
exactly q eigenvalues, counting multiplicities, and p + q = n. So σ(A) = σ(a11 ) ∪ σ(a22 ). 

By induction one proves that Theorem 4.13 holds for more and smaller blocks on the diagonal.
In particular, for triangular matrices the eigenvalues are simply the entries on the diagonals. (This
can also be inferred directly from the characteristic polynomial: if T is a triangular matrix, then
det(T − λI) = (a11 − λ) . . . (ann − λ).) Thus it is clear from Theorem 4.10, that if a similarity
transformation exists, which transforms a general matrix A to a triangular matrix T , the eigenval-
ues of A must be on the diagonal of T . We now show that such a transformation (even a unitary
one) exists for any square matrix.

Theorem 4.14 Let A ∈ Kn×n . Then there exists a unitary U ∈ Cn×n , such that

U H AU = T (4.8)

where T ∈ Cn×n is upper triangular.

Proof We give a constructive proof. Let λ1 , . . . , λn be the eigenvalues of A, arranged in any


order. Let v1 be the normalized eigenvector corresponding to λ1 . We can (possibly after applying
Gram-Schmidt orthogonalization) add vectors z2 , . . . , zn to extend the vector v1 to an orthonormal
basis for Kn . Let U1 := (v1 |z2 | . . . , zn ) be the unitary matrix containing these vectors as columns.
Then  
λ1 ∗
U1H AU1 = ,
0 A1
where ∗ are unspecified values, and A1 is an n − 1 × n − 1 matrix. The values of A1 remain
unspecified, but since the eigenvalues remain unchanged by the unitary transformation (includ-
ing multiplicities), we can infer from Theorem 4.13 that the matrix A1 must have eigenvalues
λ2 , . . . , λn . Now take the eigenvector of A1 corresponding to the eigenvalue λ2 . We complete an
orthogonal basis for Kn−1 , form a unitary matrix U e2 ∈ K(n−1)×(n−1) having the basis vectors as
columns (in particular the eigenvector as the first column) and write
 
H λ2 ∗
U2 A1 U2 =
e e
0 A2

If we set  
1 0
U2 :=
0 U
e2
we have  
λ1 ∗ ∗
U2H U1H AU1 U2 =  0 λ2 ∗ 
0 0 A2

32
where A2 is a (n − 2) × (n − 2) matrix. We can continue this construction in a similar fashion
until we have transformed A to triangular form. Since the product of unitary matrices is unitary,
the theorem is proved. 

The decomposition (4.8) is called the Schur decomposition. It states that every square matrix
is unitarily similar to a triangular matrix. Note that the Schur decomposition is not unique.
Furthermore, a real matrix may contain complex eigenvalues. So, in general, it must be expected
that it is indeed a unitary matrix, having complex entries, that reduces A to triangular form, even
if A is real. We may want to avoid complex arithmetic. If we use only real matrices to transform
A (recall that a real unitary matrix is an orthogonal matrix), we have the following theorem:

Theorem 4.15 Let A ∈ Rn×n . Then there exists orthogonal Q ∈ Rn×n such that
 
R11 R12 . . . R1m
 0 R22 . . . R2m 
QT AQ =  .  =: R (4.9)
 
. .. .. ..
 . . . . 
0 0 ... Rmm

and each block Rii is either scalar or a 2-by-2 matrix. The number of 2-by-2 blocks in (4.9) is
equal to the number of complex conjugate eigenvalue pairs, which appear as eigenvalues of these
small submatrices. This determines what m is. We say the matrix R is quasi-triangular.

The Schur decomposition is not only important from an analytical point of view, but also from
a practical perspective. Many computational methods of computing eigenvalues implement a pro-
cedure for computing a Schur decomposition by successive unitary (or orthogonal) transformations.
This means that if we succeed in transforming a matrix as in (4.9), we can solve for the eigenvalues
of A by considering much smaller submatrices, i.e. at most 2 × 2 problems. It can be shown that
If the matrix A contains only real eigenvalues, then R in (4.9) is truly triangular (i.e. there are no
2 × 2 blocks). This is the case for symmetric matrices.
Using the Schur form, we can also obtain some general theorems as relatively simple corollaries.
We first introduce normal matrices

Definition 4.5 A square matrix A ∈ Kn×n is called normal if

AH A = AAH (4.10)

If A ∈ Rn×n , a normal matrix satisfies AT A = AAT . The importance of normality can hardly
be overestimated. This is because of the following theorem:

Theorem 4.16 A matrix is normal if and only if there exists a unitary matrix U , such that

U H AU = Λ

where Λ = diag(λ1 , . . . , λn ), whose entries are the n eigenvalues of A.

33
Proof It is easy to show that a unitary transform of a matrix A is normal if and only if A is
normal. The matrix T in Theorem 4.14 is thus normal if and only if A is. A triangular matrix,
however, is normal if and only if it is diagonal. 

So not only can normal matrices be diagonalized, they also admit a complete set of unitary
eigenvectors. (We often call such a set an eigensystem.)

Example 4.1 Symmetric real matrices are normal. In the complex case, this is true for Her-
mitian matrices. Furthermore, for complex (real) matrices, one has that unitary (orthogonal)
matrices, as well as skew-hermitian (skew-symmetric) matrices are always normal. 2

While there are normal matrices that have none of the above special properties, in practice,
the absence of such properties almost always means that a matrix is non-normal, and will not
admit an orthogonal eigensystem. For example, if we are dealing with a general non-symmetric
real matrix, we have to assume that it is not normal. (In principle, one can check whether or not
a matrix satisfies the defining characteristic of normality by a simple, although expensive, test, i.e.
by carrying out the two matrix-matrix multiplications in (4.10).)
For real symmetric matrices we infer from Theorem 4.15:

Theorem 4.17 Let A ∈ Rn×n be symmetric. Then A is non-defective. Furthermore, there


exists an orthogonal (real) matrix Q such that

QT AQ = Λ

where Λ is diagonal, and contains the eigenvalues of A.

To know that symmetric matrices always possess a full set of orthonormal eigenvectors is incred-
ibly useful. In particular, the eigenvectors of a symmetric matrix A ∈ Rn×n form an orthonormal
basis for Rn . We shall use this property again and again in this course. A first such application is
to prove the converse of Theorem 4.4:

Theorem 4.18 A symmetric matrix A ∈ Rn×n is positive definite if and only if all eigenvalues
are positive.

Proof In Theorem 4.4 we have already shown that all eigenvalues of an s.p.d. matrix are Ppositive.
n
To see the converse, note that by Theorem 4.17 we can write any vector x ∈ Rn as x = i=1 ci vi ,
where the vi are mutually orthonormal eigenvectors of A, and the ci are real coefficients. Assume

34
now that all eigenvalues of A are positive, and consider

n
!T n
!
X X
T
x Ax = ci vi A ci vi
i=1 i=1
n X
X n
= ci cj viT Avj
i=1 j=1
Xn X n
= ci cj λj viT vj
i=1 j=1
n
X
= c2i λi ≥ 0
i=1

with equality if and only if all coefficients vanish, i.e. x = 0. (We have used orthonormality of
the eigenvectors, i.e. viT vj = δij .) 

It is easy to see that the inverse of a symmetric matrix is symmetric. Thus, if we apply the
previous theorem to symmetric positive definite matrices, we again obtain a nice corollary:

Corollary 4.2 The inverse of a symmetric positive definite matrix is a symmetric positive
definite matrix.

4.3 The Spectral Radius

Definition 4.6 Let A ∈ Kn×n . We define

ρ(A) := max{|λ| : λ ∈ σ(A)},

where ρ is called the Spectral Radius of A.

In the context of iterative solvers for linear systems, many important results use the spectral radius
of a matrix. We first note the following facts

Theorem 4.19 Let A ∈ Kn×n . For any  > 0, there is a matrix norm ||·||, such that

ρ(A) ≤ ||A|| ≤ ρ(A) + .

Furthermore, the first inequality holds for any matrix norm.

35
Proof We first show that the left inequality holds for any matrix norm. To prove the second
inequality, we explicitly construct a norm that satisfies it.
1. Let λ be the eigenvalue of A for which |λ| = ρ(A), and let v be the corresponding eigenvector.
Define V = (v| . . . |v), and consider

|λ| ||V || = ||λV || = ||AV || ≤ ||A|| ||V || ,

where ||·|| is any matrix norm. Upon division by ||V ||, we obtain the first inequality.
2. To prove the second inequality, first note that according to Theorem 4.14, there is a unitary
matrix U , such that T := U H AU is upper triangular. Now define Dt = diag(t, . . . tn ). A
direct computation reveals

λ1 t−1 u12 t−2 u13 . . . t−(n−1) u1n


 

 λ2 t−1 u23 . . . t−(n−2) u2n 
−(n−3)
 λ 3 . . . t u 3n

Dt T Dt−1 = 
 
 . ..


 
−1
 λn−1 t un−1,n 
λn

Using Theorem 3.4, we note that we can choose t sufficiently large, such that
Dt T Dt−1 = Dt U H AU Dt−1 ≤ ρ(A) + .

1 1

It is easy to verify that ||A||B := BAB −1 is a proper matrix norm for any non-
singular matrix B , and any matrix norm ||·||. Setting B = Dt U H , and noting that
B −1 = (Dt U H )−1 = U Dt−1 , the proof is complete.

The next two theorems play an important role in the convergence analysis of many iterative
solvers for linear systems:

Theorem 4.20 Let A ∈ Kn×n . Then limk→∞ Ak = 0 if and only if ρ(A) < 1

Proof First, assume that limk→∞ Ak = 0. Let v be any eigenvector of A, and let λ be the
corresponding eigenvalue. By assumption, and using Theorem 4.5, λk v = Ak v → 0, so we must
have |λ| < 1 for any eigenvalue of A. Conversely,
if ρ(A) k< 1, by Theorem 4.19, there is a matrix
norm ||·||, such that ||A|| < 1. This implies Ak ≤ ||A|| → 0, as k → ∞. Of course, the norm
can be zero if and only if the matrix is the zero matrix, and the Theorem is proved. 

1/k
Theorem 4.21 Let A ∈ Kn×n , and ||·|| be any matrix norm. Then limk→∞ Ak = ρ(A).

36
Proof Using Theorem 4.5, we have ρ(Ak ) = ρ(A)k , and by Theorem 4.19,
1
ρ(A) ≤ Ak k ∀k ∈ N. (4.11)

Now fix  > 0. By Theorem 4.8 we know that the matrix A := (ρ(A) + )−1 A has spectral radius
ρ(A ) < 1. So, by Theorem
4.20, Ak → 0 for k → ∞. But this means that there exists an integer
K , such that A < 1 for all k > K , and consequently, Ak ≤ (ρ(A) + )k for all such k. But
k

this is the same as k k1


A ≤ (ρ(A) + ), ∀k > K .
k k1
Since this is true for any  > 0, we have limk→∞ A ≤ ρ(A), and together with (4.11) this
proves the Theorem. 

Using the spectral radius, we can give a convenient characterization of the matrix norm induced
by the Euclidean norm:

Theorem 4.22 Let A ∈ Kn×n . There holds


2
||A||2 = ρ(AH A)

where ρ denotes the spectral radius. In particular, if A is real symmetric, we have ||A||2 = ρ(A).

Proof First note that the matrix AH A is hermitian. Consequently, there exists a unitary ma-
trix U such that U H AH AU = diag(λ1 . . . , λn ). The eigenvalues λi , i = 1, . . . n, are real and
2
non-negative. (To see the non-negativity, let u, λ be an eigenpair, and notice 0 ≤ ||Au||2 =
H n
(u, A Au) = λ. ). We can write any x ∈ K
n
X
x= αi ui , αi ∈ K,
i=1

where u1 , . . . , un are the columns of U , i.e. the eigenvectors of AH A. Thus


2
||Ax||2 = (Ax, Ax) = (x, AH Ax)
n n
!
X X
= αi ui , λi αi ui
i=1 i=1
n
X
= λi |αi |2
i=1
2
≤ ρ(AH A) ||x||2
2
This proves that ||A||2 ≤ ρ(AH A). Equality is achieved if we take x = u1 , where u1 is an
eigenvector corresponding to the largest eigenvalue.
If A is real symmetric, i.e. AH = AT = A, Theorem 4.5 yields immediately ||A||2 = ρ(A). 

37
4.4 The Field of Values
We shall see that, for large matrices, eigenvalues are extremely expensive to compute (At least if
one would like to know all eigenvalues). This is true even for sparse matrices. Thus, one often has
to make do with an estimate of the spectrum. A useful definition in this context is given by the
field of values

Definition 4.7 Let A be a square matrix. The field of values is defined as


 
(Ax, x) n
F(A) := 0 6
= x ∈ C
(x, x)

Theorem 4.23 The field of values of an arbitrary square matrix A is a convex set that contains
the spectrum of A. If A is a normal matrix, then the field of values is the convex hull of the
spectrum.

We do not prove Theorem 4.23, but only a special case: For Hermitian (symmetric) matrices,
recall that all the eigenvalues are real. The field of values is an interval on the real line, and we
have

Theorem 4.24 (Rayleigh-Ritz) Let A be a Hermitian matrix, and let λmax and λmin be the
maximum and minimum eigenvalue of A, respectively. Then

(Ax, x)
λmax = max (4.12)
x6=0 (x, x)

and
(Ax, x)
λmin = min (4.13)
x6=0 (x, x)

Proof We only show (4.12). The proof of (4.13) is similar. Because A is Hermitian there exists
a unitary matrix U = (u1 , . . . , un ), where the ui are eigenvectors of A. Let x ∈ Kn be any
Pn vector
and consider a representation in the basis composed of eigenvectors. We can write x = i=1 αi ui
for some coefficients αi . Noting (ui , uj ) = δij for 1 ≤ i, j ≤ n, a straightforward computation
yields Pn Pn
(Ax, x) λi |αi |2 λmax i=1 |αi |2
= Pi=1n 2
≤ P n 2
= λmax
(x, x) i=1 |αi | i=1 |αi |
Now choosing x = umax , where umax is the eigenvector corresponding to the eigenvalue λmax , we
see immediately
(Ax, x)
= λmax
(x, x)


38
5 Condition Number

Definition 5.1 Let A ∈ Kn×n , and let ||·|| be any matrix norm. We call
(
||A|| A−1 A is non − singular
κ||·|| (A) := ,
∞ otherwise

the condition number of A with respect to the norm ||·||.

Usually we restrict ourselves to condition numbers with respect to induced matrix norms ||·||p , in
which case we denote the condition number as κp . The following bound holds independently of
the norm on which the condition number is based:

Theorem 5.1 Let A ∈ Kn×n be a non-singular matrix. One has

1 ≤ κp (A) < ∞ (5.1)

Proof The matrix norms of A and A−1 are finite, so it is clear that κp (A) < ∞. Furthermore
we have
1 = ||I||p = AA−1 p ≤ ||A||p A−1 p = κp (A).

It is worth pointing out that, if the matrix A is invertible, by definition κ||·|| (A) = κ||·|| (A−1 ).
Condition numbers play an important role in numerical linear algebra. Let us illustrate this by
the following theorem

Theorem 5.2 Let A ∈ Kn×n be a non-singular matrix, and let x, b ∈ Kn be vectors, such that
Ax = b. Further, let ∆x be such that A(x + ∆x) = b + ∆b. Then

||∆x||p ||∆b||p
≤ κp (A) (5.2)
||x||p ||b||p

Proof By linearity, we have A∆x = ∆b. Furthermore,

||∆x||p = A−1 ∆b p ≤ A−1 p ||∆b||p .



(5.3)

and
||b||p = ||Ax||p ≤ ||A||p ||x||p . (5.4)
Thogether, (5.3) and (5.4) imply (5.2). 

39
This allows us to use the condition number of a matrix to estimate how a perturbation in
the in the right-hand side of a linear system, affects the solution. This also gives an idea of the
amplification of round-off errors during the numerical solution of such a linear system. Another
application (which can be proved similarly) is for the relative error in the solution of a linear
system:

Theorem 5.3 Let x e ∈ Kn be an approximation to x = A−1 b, where A ∈ Kn×n is any non-


singular matrix. Upon defining the residual as r := b − Ae
x, the following relation holds

||e
x − x||p ||r||p
≤ κp (A) (5.5)
||x||p ||b||p

Such a relationship is useful if one wishes to asses the quality of an approximation to a solution
of a linear system. Computing the error directly is impossible, if the exact solution is not known.
(This is normally the case, for otherwise there would be little incentive to consider an approximate
solution.) The residual, on the other hand, is computable. The condition number thus tells us
whether a small residual is indicative of a small error. However, the condition number may itself
be difficult to compute. (Note that the inverse of A appears in the definition.) Matrices with
special properties often have condition numbers with special properties also, which often facilitates
computation of κ(A):

Theorem 5.4 Let A ∈ Rn×n be symmetric positive definite. Then


λmax
κ2 (A) = (5.6)
λmin
where λmax and λmin are the maximum and minimum eigenvalues of the matrix A, respectively.

Proof Recall from Corollary 4.2 that A−1 is symmetric positive definite, and hence all its eigen-
values are positive. Using Theorems 4.6 and 4.22, eq. (5.6) follows immediately. 

Theorem 5.5 Let U ∈ Kn×n be a unitary matrix. Then

κ2 (U ) = 1 (5.7)

We have previously established Theorem 4.20, which states that powers of matrices decay to
the zero-matrix if and only if the spectral radius of that matrix is smaller than unity. This is an
important result in the context of iterative solution methods for linear systems. For such methods
the error decay is often related to powers of matrices, such that, if (4.20) holds, the error tends
to zero. However, this says nothing about the transient behavior of such an iteration. Perhaps
surprisingly, the condition of the eigensystem often plays an equally important role in the analysis
of matrix iterations. Consider, for example, the following theorem:

40
Theorem 5.6 Let A be a non-defective square matrix. There holds

ρ(A)n ≤ ||An ||2 ≤ κ2 (V )ρ(A)n (5.8)

where V is the matrix containing the eigenvectors of A.

If A is not normal, the condition number of the eigensystem can become very large. (This
actually happens in many applications.) Since the factor κ2 (V ) is independent of n, the results of
theorem 4.20 still hold, but there might be considerable amplification of a sequence Ak for k < ∞,
before, finally, Ak → 0. On a computer, where one can only represent finitely many floating point
numbers with finite precision this can have disastrous effects.
To assess the accuracy of an approximation
to an eigenvalue/eigenvector pair, one can a pos-
teriori compute the residual Ave − λe
ev . Does a small residual always guarantee a small error in

the approximation of an eigenvalue? The following Theorem gives the answer:

Theorem 5.7 Let A ∈ Kn×n be non-defective, and let λ


e and v v − λe
e be such that Ae ev = r 6= 0.
Then there exists an eigenvalue λ of A such that

|λ − λ|
e ≤ κ2 (V ) ||r|| (5.9)

where V = (v1 | . . . |vn ), and the vk are the eigenvectors of A.

Proof From the fact the V diagonalizes A we obtain

v − λe
Ae ev = r

⇐⇒ (V ΛV −1 − λV
e V −1 )e
v = r.

Since r 6= 0 implies that λ


e is not an eigenvalues of A, we can invert the matrix on the left-hand
side. We may assume that ||e v||2 = 1, and so

1 = (V −1 (Λ − λI)V )−1 r
e
2
−1 −1
= V (Λ − λI) V r
e
2
−1
−1
≤ V ||V || − λI) ||r||2
e
2 2 (Λ
2

Using Definition 5.1, and the fact that the 2-norm of a diagonal matrix is the maximum modulus
of the entries on the diagonal, we obtain
 
e −1 ||r|| .
1 ≤ κ2 (V ) max |λi − λ| (5.10)
1≤i≤n 2

Noting that
 
e −1 = 1
max |λi − λ| , (5.11)
1≤i≤n min1≤i≤n |λi − λ|
e
the result follows. 

41
For normal matrices (for which the condition number is unity) we have the nice result that
a small residual automatically bounds the error on the eigenvalue. Conversely, for non-normal
matrices, small perturbations in the residual (regardless of where it comes from) can have an
extremely large effect on the approximation of the eigenvalues. It is interesting to note that the
effect of perturbation in the data is thus primarily governed by the condition of the eigensystem
V , which can be large, even if A itself is well conditioned!

42
A Proof of Theorem 2.1
To prove the theorem we need a few preparatory steps. We first show that from any collection
of r vectors, which span Kn , we can extract a basis. That is to say, every such collection must
necessarily contain a basis, perhaps along with some redundant, linearly dependent elements. (An
example is eq (2.3), where we have a basis along with one redundant vector.)

Theorem A.1 Let v1 , . . . , vs ∈ Kn be linearly independent, and let w1 , . . . , wr ∈ Kn be such


that span(v1 , . . . , vs , w1 , . . . , wr ) = Kn . Then there is 0 ≤ m ≤ r, such that the vectors
v1 , . . . , vs , together with m vectors appropriately chosen from w1 , . . . , wr form a basis.

Proof We prove this by induction.


1. If r = 0, then the vectors v1 , . . . , vs already form a basis, and there is nothing to show. This
forms the base of the induction.
2. We must show that if the theorem holds for r = t (induction hypothesis), then it holds also for
r = t + 1. So consider the vectors v1 , . . . , vs , w1 , . . . , wr for r = t + 1. If the vectors v1 , . . . , vs
already form a basis the theorem holds. If span(v1 , . . . , vs ) 6= Kn , then there must be at least one
k (1 ≤ k ≤ r), such that we cannot write wk as a linear combination of the vectors v1 , . . . vs 6 .
Hence the vectors v1 , . . . , vs , wk are linearly independent. If this augmented set is not a basis, by
the induction hypothesis, we can select from the remaining t vectors w1 . . . wk−1 , wk+1 , . . . , wr
some elements to complete the basis. 

Now we return to the question of how many elements we may need to form a basis, and whether
there is a unique number of linearly independent elements that form a basis. They key to answering
that questions is the next theorem.

Theorem A.2 Let both v1 , . . . , vr ∈ Kn , and w1 , . . . , wm ∈ Kn form a basis of Kn . Then, for


each vj we can find a wk such that the vectors v1 , . . . , vj−1 , vj+1 , . . . , vr , wk form a basis as
well.

Proof If we remove one vector, say vj from the first basis, then clearly

span(v1 , . . . , vj−1 , vj+1 , . . . , vr ) 6= Kn

(At least the vector which we have removed, vj , cannot be written as a linear combination
of the remaining vectors.) Consequently, there must be at least one vector wk of the second
basis which cannot be written as a linear combination of the vectors v1 , . . . , vj−1 , vj+1 , . . . , vr 7 .
Consequently, the vectors v1 , . . . , vj−1 , vj+1 , . . . , vr , wk are linearly independent. To see that
these vectors form a basis we must show that

span(v1 , . . . , vj−1 , vj+1 , . . . , vr , wk ) = Kn . (A.1)

We know that, if we add again the vector vj ,

span(v1 , . . . , vj−1 , vj , vj+1 , . . . , vr , wk ) = Kn (A.2)


6 For otherwise span(v1 , . . . , vs ) = span(v1 , . . . , vs , w1 , . . . , wr ) = Kn .

43
(since all the vectors of the first basis are present.) Suppose that (A.1) is not true. Then, according
to Theorem A.1, the vectors v1 , . . . , vj−1 , vj , vj+1 , . . . , vr , wk in (A.2) form a basis. But this is
impossible, as wk can be written as a linear combination of the v1 , . . . , vr . Thus (A.1) is true. 

Now we are ready to conclude:

Proof (of theorem 2.1) First we show that any basis must have the same number of vectors.
To that end, we assume that there exist two bases, one containing more vectors than the other.
Let v1 , . . . , vr ∈ Kn be a basis, and let w1 , . . . , wm ∈ Kn be another basis with m > r. By
Theorem A.2, we can insert all vectors of the first basis into the second. This means that for some
l = m − r vectors wk1 , . . . , wkl of the second basis,

v1 , . . . , vr , wk1 , . . . wkl (A.3)

is again a basis. This is clearly a contradiction, as the vectors in (A.3) cannot be linearly inde-
pendent. (The first r vectors already form a basis.)
Now that we know that any basis must have the same number of vectors, we merely need to
find one specific example to find out how many vectors we need. But this we have already done
in Example 2.1, and the theorem is proved. 

7 For otherwise span(v1 , . . . , vj−1 , vj+1 , . . . , vr ) = span(w1 , . . . , wm ) = Kn .

44

Das könnte Ihnen auch gefallen