Sie sind auf Seite 1von 470

Linear Algebra: Foundations to Frontiers

A Collection of Notes
on Numerical Linear Algebra
Robert A. van de Geijn
Release Date June 23, 2016
Kindly do not share this PDF
Point others to * http://www.ulaff.net instead
This is a work in progress

Copyright 2014, 2015, 2016 by Robert A. van de Geijn.


10 9 8 7 6 5 4 3 2 1
All rights reserved. No part of this book may be reproduced, stored, or transmitted in
any manner without the written permission of the publisher. For information, contact
any of the authors.
No warranties, express or implied, are made by the publisher, authors, and their employers that the programs contained in this volume are free of error. They should not
be relied on as the sole basis to solve a problem whose incorrect solution could result
in injury to person or property. If the programs are employed in such a manner, it
is at the users own risk and the publisher, authors, and their employers disclaim all
liability for such misuse.
Trademarked names may be used in this book without the inclusion of a trademark
symbol. These names are used in an editorial context only; no infringement of trademark is intended.
Library of Congress Cataloging-in-Publication Data not yet available
Draft Edition, November 2014, 2015, 2016
This Draft Edition allows this material to be used while we sort out through what
mechanism we will publish the book.

Contents

Preface

xi

0 Notes on Setting Up
0.1
Opening Remarks . . . . . . . .
0.1.1
Launch . . . . . . . . . . . . . .
0.1.2
Outline . . . . . . . . . . . . .
0.1.3
What You Will Learn . . . . . .
0.2
Setting Up to Learn . . . . . . .
0.2.1
How to Navigate These Materials
0.2.2
Setting Up Your Computer . . .
0.3
MATLAB . . . . . . . . . . . .
0.3.1
Why MATLAB . . . . . . . . .
0.3.2
Installing MATLAB . . . . . . .
0.3.3
MATLAB Basics . . . . . . . .
0.4
Wrapup . . . . . . . . . . . . .
0.4.1
Additional Homework . . . . . .
0.4.2
Summary . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
1
2
3
4
4
4
4
4
4
4
7
7
7

.
.
.
.
.
.
.
.
.
.
.
.
.
.

9
9
9
10
11
12
13
13
13
13
13
14
14
14
14

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

1 Notes on Simple Vector and Matrix Operations


1.1
Opening Remarks . . . . . . . . . . . . .
1.1.1
Launch . . . . . . . . . . . . . . . . . . .
1.1.2
Outline . . . . . . . . . . . . . . . . . .
1.1.3
What You Will Learn . . . . . . . . . . .
1.2
Notation . . . . . . . . . . . . . . . . . .
1.3
(Hermitian) Transposition . . . . . . . .
1.3.1
Conjugating a complex scalar . . . . . .
1.3.2
Conjugate of a vector . . . . . . . . . . .
1.3.3
Conjugate of a matrix . . . . . . . . . . .
1.3.4
Transpose of a vector . . . . . . . . . . .
1.3.5
Hermitian transpose of a vector . . . . .
1.3.6
Transpose of a matrix . . . . . . . . . . .
1.3.7
Hermitian transpose (adjoint) of a matrix
1.3.8
Exercises . . . . . . . . . . . . . . . . .
i

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

1.4
1.4.1
1.4.2
1.4.3
1.5
1.5.1
1.5.2
1.6
1.6.1
1.6.2
1.6.3
1.6.4
1.7
1.7.1
1.8
1.8.1
1.8.2

Vector-vector Operations . . . . . . . . . . . .
Scaling a vector (scal) . . . . . . . . . . . . .
Scaled vector addition (axpy) . . . . . . . . .
Dot (inner) product (dot) . . . . . . . . . . . .
Matrix-vector Operations . . . . . . . . . . . .
Matrix-vector multiplication (product) . . . . .
Rank-1 update . . . . . . . . . . . . . . . . .
Matrix-matrix multiplication (product) . . . . .
Element-by-element computation . . . . . . .
Via matrix-vector multiplications . . . . . . .
Via row-vector times matrix multiplications . .
Via rank-1 updates . . . . . . . . . . . . . . .
Enrichments . . . . . . . . . . . . . . . . . . .
The Basic Linear Algebra Subprograms (BLAS)
Wrapup . . . . . . . . . . . . . . . . . . . . .
Additional exercises . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2 Notes on Vector and Matrix Norms


2.1
Opening Remarks . . . . . . . . . . . . . . . . . .
2.1.1
Launch . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2
Outline . . . . . . . . . . . . . . . . . . . . . . .
2.1.3
What You Will Learn . . . . . . . . . . . . . . . .
2.2
Absolute Value . . . . . . . . . . . . . . . . . . .
2.3
Vector Norms . . . . . . . . . . . . . . . . . . . .
2.3.1
Vector 2-norm (Euclidean length) . . . . . . . . .
2.3.2
Vector 1-norm . . . . . . . . . . . . . . . . . . .
2.3.3
Vector -norm (infinity norm) . . . . . . . . . . .
2.3.4
Vector p-norm . . . . . . . . . . . . . . . . . . . .
2.4
Matrix Norms . . . . . . . . . . . . . . . . . . . .
2.4.1
Frobenius norm . . . . . . . . . . . . . . . . . . .
2.4.2
Induced matrix norms . . . . . . . . . . . . . . . .
2.4.3
Special cases used in practice . . . . . . . . . . . .
2.4.4
Discussion . . . . . . . . . . . . . . . . . . . . . .
2.4.5
Submultiplicative norms . . . . . . . . . . . . . .
2.5
An Application to Conditioning of Linear Systems
2.6
Equivalence of Norms . . . . . . . . . . . . . . . .
2.7
Enrichments . . . . . . . . . . . . . . . . . . . . .
2.7.1
Practical computation of the vector 2-norm . . . . .
2.8
Wrapup . . . . . . . . . . . . . . . . . . . . . . .
2.8.1
Additional exercises . . . . . . . . . . . . . . . . .
2.8.2
Summary . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

16
16
18
19
20
20
23
26
26
27
29
30
31
31
32
32
32

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

35
35
35
36
37
38
38
38
40
40
41
41
41
42
43
45
45
46
47
48
48
48
48
51

3 Notes on Orthogonality and the Singular Value Decomposition


Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Opening Remarks . . . . . . . . . . . . . . . . . . . .
3.1.1
Launch: Orthogonal projection and its application . . .
3.1.2
Outline . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.3
What you will learn . . . . . . . . . . . . . . . . . . .
3.2
Orthogonality and Unitary Matrices . . . . . . . . . .
3.3
Toward the SVD . . . . . . . . . . . . . . . . . . . . .
3.4
The Theorem . . . . . . . . . . . . . . . . . . . . . .
3.5
Geometric Interpretation . . . . . . . . . . . . . . . .
3.6
Consequences of the SVD Theorem . . . . . . . . . .
3.7
Projection onto the Column Space . . . . . . . . . . .
3.8
Low-rank Approximation of a Matrix . . . . . . . . .
3.9
An Application . . . . . . . . . . . . . . . . . . . . .
3.10
SVD and the Condition Number of a Matrix . . . . . .
3.11
An Algorithm for Computing the SVD? . . . . . . . .
3.12
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . .
3.12.1
Additional exercises . . . . . . . . . . . . . . . . . . .
3.12.2
Summary . . . . . . . . . . . . . . . . . . . . . . . .
4 Notes on Gram-Schmidt QR Factorization
4.1
Opening Remarks . . . . . . . . . . . . .
4.1.1
Launch . . . . . . . . . . . . . . . . . . .
4.1.2
Outline . . . . . . . . . . . . . . . . . .
4.1.3
What you will learn . . . . . . . . . . . .
4.2
Classical Gram-Schmidt (CGS) Process .
4.3
Modified Gram-Schmidt (MGS) Process .
4.4
In Practice, MGS is More Accurate . . . .
4.5
Cost . . . . . . . . . . . . . . . . . . . .
4.5.1
Cost of CGS . . . . . . . . . . . . . . . .
4.5.2
Cost of MGS . . . . . . . . . . . . . . .
4.6
Wrapup . . . . . . . . . . . . . . . . . .
4.6.1
Additional exercises . . . . . . . . . . . .
4.6.2
Summary . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

5 Notes on the FLAME APIs


Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.0.1
Outline . . . . . . . . . . . . . . . . . . . . .
5.1
Motivation . . . . . . . . . . . . . . . . . . . .
5.2
Install FLAME@lab . . . . . . . . . . . . . .
5.3
An Example: Gram-Schmidt Orthogonalization
5.3.1
The Spark Webpage . . . . . . . . . . . . . . .
5.3.2
Implementing CGS with FLAME@lab . . . . .
5.3.3
Editing the code skeleton . . . . . . . . . . . .
5.3.4
Testing . . . . . . . . . . . . . . . . . . . . . .
5.4
Implementing the Other Algorithms . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

53
53
53
53
62
63
64
67
69
69
73
77
78
80
82
83
84
84
84

.
.
.
.
.
.
.
.
.
.
.
.
.

85
85
85
86
87
88
93
97
99
100
100
101
101
101

.
.
.
.
.
.
.
.
.
.

103
103
104
104
104
104
104
105
107
108
109

6 Notes on Householder QR Factorization


6.1
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1
Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.3
What you will learn . . . . . . . . . . . . . . . . . . . . . . . .
6.2
Householder Transformations (Reflectors) . . . . . . . . . . . .
6.2.1
The general case . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2
As implemented for the Householder QR factorization (real case)
6.2.3
The complex case (optional) . . . . . . . . . . . . . . . . . . .
6.2.4
A routine for computing the Householder vector . . . . . . . . .
6.3
Householder QR Factorization . . . . . . . . . . . . . . . . . .
6.4
Forming Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5
Applying QH . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6
Blocked Householder QR Factorization . . . . . . . . . . . . .
6.6.1
The UT transform: Accumulating Householder transformations .
6.6.2
The WY transform . . . . . . . . . . . . . . . . . . . . . . . . .
6.6.3
A blocked algorithm . . . . . . . . . . . . . . . . . . . . . . .
6.6.4
Variations on a theme . . . . . . . . . . . . . . . . . . . . . . .
6.7
Enrichments . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.1
Additional exercises . . . . . . . . . . . . . . . . . . . . . . . .
6.8.2
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

7 Notes on Rank Revealing Householder QR Factorization


7.1
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1
Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.2
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.3
What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2
Modifying MGS to Compute QR Factorization with Column Pivoting . . .
7.3
Unblocked Householder QR Factorization with Column Pivoting . . . . . .
7.3.1
Basic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.2
Alternative unblocked Householder QR factorization with column pivoting .
7.4
Blocked HQRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5
Computing Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6
Enrichments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6.1
QR factorization with randomization for column pivoting . . . . . . . . . .
7.7
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.7.1
Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.7.2
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notes on Solving Linear Least-Squares Problems
8.0.1
Launch . . . . . . . . . . . . . . . . . .
8.0.2
Outline . . . . . . . . . . . . . . . . .
8.0.3
What you will learn . . . . . . . . . . .
8.1
The Linear Least-Squares Problem . . .
8.2
Method of Normal Equations . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

111
111
111
113
114
115
115
116
117
118
119
122
127
129
129
131
131
135
138
138
138
142

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

143
143
143
144
145
146
147
147
147
151
151
151
151
154
154
154

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

155
155
156
157
158
158

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

8.3
8.3.1
8.3.2
8.4
8.5
8.5.1
8.5.2
8.6
8.7
8.8
8.8.1
8.8.2

Solving the LLS Problem Via the QR Factorization . . . . . . . . . . . . .


Simple derivation of the solution . . . . . . . . . . . . . . . . . . . . . . .
Alternative derivation of the solution . . . . . . . . . . . . . . . . . . . . .
Via Householder QR Factorization . . . . . . . . . . . . . . . . . . . . . .
Via the Singular Value Decomposition . . . . . . . . . . . . . . . . . . . .
Simple derivation of the solution . . . . . . . . . . . . . . . . . . . . . . .
Alternative derivation of the solution . . . . . . . . . . . . . . . . . . . . .
What If A Does Not Have Linearly Independent Columns? . . . . . . . . .
Exercise: Using the the LQ factorization to solve underdetermined systems .
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

159
159
160
161
162
162
163
163
170
170
170
170

9 Notes on the Condition of a Problem


9.1
Opening Remarks . . . . . . . . . . . . . . . . . . . . . .
9.1.1
Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.2
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.3
What you will learn . . . . . . . . . . . . . . . . . . . . .
9.2
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3
The Prototypical Example: Solving a Linear System . . . .
9.4
Condition Number of a Rectangular Matrix . . . . . . . .
9.5
Why Using the Method of Normal Equations Could be Bad
9.6
Why Multiplication with Unitary Matrices is a Good Thing
9.7
Balancing a Matrix . . . . . . . . . . . . . . . . . . . . .
9.8
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.8.1
Additional exercises . . . . . . . . . . . . . . . . . . . . .
9.9
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.9.1
Additional exercises . . . . . . . . . . . . . . . . . . . . .
9.9.2
Summary . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

173
173
173
173
174
175
176
176
180
181
182
182
184
184
185
185
185

10 Notes on the Stability of an Algorithm


10.0.1
Launch . . . . . . . . . . . . . . . . .
10.0.2
Outline . . . . . . . . . . . . . . . .
10.0.3
What you will learn . . . . . . . . . .
10.1
Motivation . . . . . . . . . . . . . . .
10.2
Floating Point Numbers . . . . . . . .
10.3
Notation . . . . . . . . . . . . . . . .
10.4
Floating Point Computation . . . . . .
10.4.1
Model of floating point computation .
10.4.2
Stability of a numerical algorithm . .
10.4.3
Absolute value of vectors and matrices
10.5
Stability of the Dot Product Operation
10.5.1
An algorithm for computing D OT . . .
10.5.2
A simple start . . . . . . . . . . . . .
10.5.3
Preparation . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

187
187
188
189
190
191
193
193
193
194
194
195
195
195
197

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

10.5.4
10.5.5
10.5.6
10.5.7
10.6
10.6.1
10.6.2
10.7
10.7.1
10.7.2
10.7.3
10.8
10.8.1
10.8.2

Target result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A proof in traditional format . . . . . . . . . . . . . . . . . . . . . . .
A weapon of math induction for the war on (backward) error (optional)
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stability of a Matrix-Vector Multiplication Algorithm . . . . . . . . . .
An algorithm for computing G EMV . . . . . . . . . . . . . . . . . . .
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stability of a Matrix-Matrix Multiplication Algorithm . . . . . . . . . .
An algorithm for computing G EMM . . . . . . . . . . . . . . . . . . .
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

199
200
200
203
203
203
203
205
205
205
206
207
207
207

11 Notes on Performance

209

12 Notes on Gaussian Elimination and LU Factorization


12.1
Opening Remarks . . . . . . . . . . . . . . . . . . . .
12.1.1
Launch . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.2
Outline . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.3
What you will learn . . . . . . . . . . . . . . . . . . .
12.2
Definition and Existence . . . . . . . . . . . . . . . .
12.3
LU Factorization . . . . . . . . . . . . . . . . . . . .
12.3.1
First derivation . . . . . . . . . . . . . . . . . . . . .
12.3.2
Gauss transforms . . . . . . . . . . . . . . . . . . . .
12.3.3
Cost of LU factorization . . . . . . . . . . . . . . . .
12.4
LU Factorization with Partial Pivoting . . . . . . . . .
12.4.1
Permutation matrices . . . . . . . . . . . . . . . . . .
12.4.2
The algorithm . . . . . . . . . . . . . . . . . . . . . .
12.5
Proof of Theorem 12.3 . . . . . . . . . . . . . . . . .
12.6
LU with Complete Pivoting . . . . . . . . . . . . . . .
12.7
Solving Ax = y Via the LU Factorization with Pivoting
12.8
Solving Triangular Systems of Equations . . . . . . . .
12.8.1
Lz = y . . . . . . . . . . . . . . . . . . . . . . . . . .
12.8.2 Ux = z . . . . . . . . . . . . . . . . . . . . . . . . . .
12.9
Other LU Factorization Algorithms . . . . . . . . . . .
12.9.1
Variant 1: Bordered algorithm . . . . . . . . . . . . .
12.9.2
Variant 2: Left-looking algorithm . . . . . . . . . . . .
12.9.3
Variant 3: Up-looking variant . . . . . . . . . . . . . .
12.9.4
Variant 4: Crout variant . . . . . . . . . . . . . . . . .
12.9.5
Variant 5: Classical LU factorization . . . . . . . . . .
12.9.6
All algorithms . . . . . . . . . . . . . . . . . . . . . .
12.9.7
Formal derivation of algorithms . . . . . . . . . . . . .
12.10
Numerical Stability Results . . . . . . . . . . . . . . .
12.11
Is LU with Partial Pivoting Stable? . . . . . . . . . . .

211
211
211
212
214
215
215
215
216
218
219
219
221
226
228
229
229
229
232
232
237
237
238
238
239
239
239
241
242

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

12.12
12.12.1
12.12.2
12.13
12.14
12.14.1
12.14.2
12.14.3
12.14.4
12.15
12.15.1
12.15.2
12.15.3
12.15.4
12.16
12.16.1
12.16.2

Blocked Algorithms . . . . . . . . . . . . . . . . . . . . . .
Blocked classical LU factorization (Variant 5) . . . . . . . .
Blocked classical LU factorization with pivoting (Variant 5) .
Variations on a Triple-Nested Loop . . . . . . . . . . . . . .
Inverting a Matrix . . . . . . . . . . . . . . . . . . . . . . .
Basic observations . . . . . . . . . . . . . . . . . . . . . .
Via the LU factorization with pivoting . . . . . . . . . . . .
Gauss-Jordan inversion . . . . . . . . . . . . . . . . . . . .
(Almost) never, ever invert a matrix . . . . . . . . . . . . .
Efficient Condition Number Estimation . . . . . . . . . . .
The problem . . . . . . . . . . . . . . . . . . . . . . . . . .
Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A simple approach . . . . . . . . . . . . . . . . . . . . . .
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Additional exercises . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

13 Notes on Cholesky Factorization


13.1
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.1
Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.2
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.3
What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2
Definition and Existence . . . . . . . . . . . . . . . . . . . . . . . . . .
13.3
Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4
An Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5
Proof of the Cholesky Factorization Theorem . . . . . . . . . . . . . . .
13.6
Blocked Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7
Alternative Representation . . . . . . . . . . . . . . . . . . . . . . . . .
13.8
Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.9
Solving the Linear Least-Squares Problem via the Cholesky Factorization
13.10
Other Cholesky Factorization Algorithms . . . . . . . . . . . . . . . . .
13.11
Implementing the Cholesky Factorization with the (Traditional) BLAS . .
13.11.1 What are the BLAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.11.2 A simple implementation in Fortran . . . . . . . . . . . . . . . . . . . .
13.11.3 Implemention with calls to level-1 BLAS . . . . . . . . . . . . . . . . .
13.11.4 Matrix-vector operations (level-2 BLAS) . . . . . . . . . . . . . . . . . .
13.11.5 Matrix-matrix operations (level-3 BLAS) . . . . . . . . . . . . . . . . .
13.11.6 Impact on performance . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.12
Alternatives to the BLAS . . . . . . . . . . . . . . . . . . . . . . . . . .
13.12.1 The FLAME/C API . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.12.2 BLIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.13
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.13.1 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.13.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

242
242
245
246
247
247
247
248
249
250
250
250
251
253
254
254
254

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

255
255
255
256
257
258
258
259
260
261
261
264
265
265
267
267
270
270
270
274
274
275
275
275
276
276
276

14 Notes on Eigenvalues and Eigenvectors


Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.0.1
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1
Definition . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2
The Schur and Spectral Factorizations . . . . . . . . . . .
14.3
Relation Between the SVD and the Spectral Decomposition
15 Notes on the Power Method and Related Methods
Video . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.0.1
Outline . . . . . . . . . . . . . . . . . .
15.1
The Power Method . . . . . . . . . . . .
15.1.1
First attempt . . . . . . . . . . . . . . . .
15.1.2
Second attempt . . . . . . . . . . . . . .
15.1.3
Convergence . . . . . . . . . . . . . . . .
15.1.4
Practical Power Method . . . . . . . . . .
15.1.5
The Rayleigh quotient . . . . . . . . . . .
15.1.6
What if |0 | |1 |? . . . . . . . . . . . .
15.2
The Inverse Power Method . . . . . . . .
15.3
Rayleigh-quotient Iteration . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

16 Notes on the QR Algorithm and other Dense Eigensolvers


Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.0.1
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.1
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.2
Subspace Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.3
The QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .
16.3.1
A basic (unshifted) QR algorithm . . . . . . . . . . . . . . . . . . .
16.3.2
A basic shifted QR algorithm . . . . . . . . . . . . . . . . . . . . .
16.4
Reduction to Tridiagonal Form . . . . . . . . . . . . . . . . . . . .
16.4.1
Householder transformations (reflectors) . . . . . . . . . . . . . . .
16.4.2
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.5
The QR algorithm with a Tridiagonal Matrix . . . . . . . . . . . . .
16.5.1
Givens rotations . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.6
QR Factorization of a Tridiagonal Matrix . . . . . . . . . . . . . .
16.7
The Implicitly Shifted QR Algorithm . . . . . . . . . . . . . . . . .
16.7.1
Upper Hessenberg and tridiagonal matrices . . . . . . . . . . . . .
16.7.2
The Implicit Q Theorem . . . . . . . . . . . . . . . . . . . . . . .
16.7.3
The Francis QR Step . . . . . . . . . . . . . . . . . . . . . . . . .
16.7.4
A complete algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
16.8
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.8.1
More on reduction to tridiagonal form . . . . . . . . . . . . . . . .
16.8.2
Optimizing the tridiagonal QR algorithm . . . . . . . . . . . . . . .
16.9
Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.9.1
Jacobis method for the symmetric eigenvalue problem . . . . . . .
16.9.2
Cuppens Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .
16.9.3
The Method of Multiple Relatively Robust Representations (MRRR)

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

277
277
278
278
281
283

.
.
.
.
.
.
.
.
.
.
.

285
285
286
286
287
287
288
291
292
292
292
293

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

297
297
298
299
299
304
304
304
306
306
307
309
309
310
312
312
313
314
316
320
320
320
320
320
323
323

16.10
16.10.1
16.10.2
16.10.3

The Nonsymmetric QR Algorithm . . . . .


A variant of the Schur decomposition . . .
Reduction to upperHessenberg form . . . .
The implicitly double-shifted QR algorithm

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

323
323
324
327

17 Notes on the Method of Relatively Robust Representations (MRRR)


17.0.1
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.1
MRRR, from 35,000 Feet . . . . . . . . . . . . . . . . . . .
17.2
Cholesky Factorization, Again . . . . . . . . . . . . . . . .
17.3
The LDLT Factorization . . . . . . . . . . . . . . . . . . . .
17.4
The UDU T Factorization . . . . . . . . . . . . . . . . . . .
17.5
The UDU T Factorization . . . . . . . . . . . . . . . . . . .
17.6
The Twisted Factorization . . . . . . . . . . . . . . . . . . .
17.7
Computing an Eigenvector from the Twisted Factorization .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

329
330
330
333
333
335
338
338
341

.
.
.
.
.

343
344
344
344
348
350

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

355
355
362
372
380
383
393
395
399
404
405
414
419
424
425
429
439

18 Notes on Computing the SVD of a Matrix


18.0.1
Outline . . . . . . . . . . . . . . . . . . .
18.1
Background . . . . . . . . . . . . . . . . .
18.2
Reduction to Bidiagonal Form . . . . . . .
18.3
The QR Algorithm with a Bidiagonal Matrix
18.4
Putting it all together . . . . . . . . . . . .

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

Answers
1. Notes on Simple Vector and Matrix Operations . . . . . . .
2. Notes on Vector and Matrix Norms . . . . . . . . . . . . .
3. Notes on Orthogonality and the SVD . . . . . . . . . . . .
4. Notes on Gram-Schmidt QR Factorization . . . . . . . . . .
6. Notes on Householder QR Factorization . . . . . . . . . . .
8. Notes on Solving Linear Least-squares Problems (Answers)
8. Notes on the Condition of a Problem . . . . . . . . . . . .
9. Notes on the Stability of an Algorithm . . . . . . . . . . . .
10. Notes on Performance . . . . . . . . . . . . . . . . . . .
11. Notes on Gaussian Elimination and LU Factorization . . .
12. Notes on Cholesky Factorization . . . . . . . . . . . . . .
13. Notes on Eigenvalues and Eigenvectors . . . . . . . . . .
14. Notes on the Power Method and Related Methods . . . . .
16. Notes on the Symmetric QR Algorithm . . . . . . . . . .
17. Notes on the Method of Relatively Robust Representations
18. Notes on Computing the SVD . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

A How to Download

441

B LAFF Routines (FLAME@lab)

443

Preface

This document was created over the course of many years, as I periodically taught an introductory course
titled Numerical Analysis: Linear Algebra, cross-listed in the departments of Computer Science, Math,
and Statistics and Data Sciences (SDS), as well as the Computational Science Engineering Mathematics
(CSEM) graduate program.
Over the years, my colleagues and I have used many different books for this course: Matrix Computations
by Golub and Van Loan [22], Fundamentals of Matrix Computation by Watkins [47], Numerical Linear
Algebra by Trefethen and Bau [38], and Applied Numerical Linear Algebra by Demmel [12]. All are books
with tremendous strenghts and depth. Nonetheless, I found myself writing materials for my students that
add insights that are often drawn from our own research experiences in the field. These became a series of
notes that are meant to supplement rather than replace any of the mentioned books.
Fundamental to our exposition is the FLAME notation [25, 40], which we use to present algorithms handin-hand with theory. For example, in Figure 1 (left), we present a commonly encountered LU factorization
algorithm, which we will see performs exactly the same computations as does Gaussian elimination. By
abstracting away from the detailed indexing that are required to implement an algorithm in, for example,
Matlabs M-script language, the reader can focus on the mathematics that justifies the algorithm rather
than the indices used to express the algorithm. The algorithms can be easily translated into code with
the help of a FLAME Application Programming Interface (API). Such interfaces have been created for
C, M-script, and Python, to name a few [25, 6, 30]. In Figure 1 (right), we show the LU factorization
algorithm implemented with the FLAME@lab API for M-script. The C API is used extensively in our
implementation of the libflame dense linear algebra library [41, 42] for sequential and shared-memory
architectures and the notation also inspired the API used to implement the Elemental dense linear algebra
library [31] that targets distributed memory architectures. Thus, the reader is exposed to the abstractions
that experts use to translate algorithms to high-performance software.
These notes may at times appear to be a vanity project, since I often point the reader to our research papers
for a glipse at the cutting edge of the field. The fact is that over the last two decades, we have helped
further the understanding of many of the topics discussed in a typical introductory course on numerical
linear algebra. Since we use the FLAME notation in many of our papers, they should be relatively easy to
read once one familiarizes oneself with these notes. Lets be blunt: these notes do not do the field justice
when it comes to also giving a historic perspective. For that, we recommend any of the above mentioned
texts, or the wonderful books by G.W. Stewart [36, 37]. This is yet another reason why they should be
used to supplement other texts.
xi

Algorithm: A := L\U = LU(A)

AT L AT R

Partition A
ABL ABR
where AT L is 0 0
while n(AT L ) < n(A) do
Repartition

AT L
ABL

AT R
ABR

A00

aT
10
A20

a01

A02

a21

aT12

A22

A00

a01

A02

aT
10
A20

11

aT12

A22

11

a21 := a21 /11


A22 := A22 a21 aT12
Continue with

AT L
ABL

endwhile

AT R
ABR

a21

function [ A_out ] = LU_unb_var4( A )


[ ATL, ATR, ...
ABL, ABR ] = FLA_Part_2x2( A, ...
0, 0, FLA_TL );
while ( size( ATL, 1 ) < size( A, 1 ) )
[ A00, a01,
A02, ...
a10t, alpha11, a12t, ...
A20, a21,
A22 ] = ...
FLA_Repart_2x2_to_3x3( ...
ATL, ATR, ...
ABL, ABR, 1, 1, FLA_BR );
%------------------------------------------%
a21 = a21 / alpha11;
A22 = A22 - a21 * a12t;
%------------------------------------------%
[ ATL, ATR, ...
ABL, ABR ] = ...
FLA_Cont_with_3x3_to_2x2( ...
A00, a01,
A02, ...
a10t, alpha11, a12t, ...
A20, a21,
A22, FLA_TL );
end
A_out = [ ATL, ATR
ABL, ABR ];
return

Figure 1: LU factorization represented with the FLAME notation and the FLAME@lab API.
The order of the chapters should not be taken too seriously. They were initially written as separate,
relatively self-contained notes. The sequence on which I settled roughly follows the order that the topics
are encountered in Numerical Linear Algebra by Trefethen and Bau [38]. The reason is the same as the
one they give: It introduces orthogonality and the Singular Value Decomposition early on, leading with
important material that many students will not yet have seen in the undergraduate linear algebra classes
they took. However, one could just as easily rearrange the chapters so that one starts with a more traditional
topic: solving dense linear systems.
The notes frequently refer the reader to another resource of ours titled Linear Algebra: Foundations to
Frontiers - Notes to LAFF With (LAFF Notes) [30]. This is a 900+ page document with more than 270
videos that was created for the Massive Open Online Course (MOOC) Linear Algebra: Foundations
to Frontiers (LAFF), offered by the edX platform. That course provides an appropriate undergraduate
background for these notes.
I (tried to) video tape my lectures during Fall 2014. Unlike the many short videos that we created for the
Massive Open Online Course (MOOC) titled Linear Algebra: Foundations to Frontiers that are now part
of the notes for that course, I simply set up a camera, taped the entire lecture, spent minimal time editing,
and uploaded the result for the world to see. Worse, I did not prepare particularly well for the lectures,
other than feverishly writing these notes in the days prior to the presentation. Sometimes, I forgot to turn
on the microphone and/or the camera. Sometimes the memory of the camcorder was full. Sometimes I

forgot to shave. Often I forgot to comb my hair. You are welcome to watch, but dont expect too much!
One should consider this a living document. As time goes on, I will be adding more material. Ideally,
people with more expertise than I have on, for example, solving sparse linear systems will contributed
notes of their own.

Acknowledgments

These notes use notation and tools developed by the FLAME project at The University of Texas at Austin
(USA), Universidad Jaume I (Spain), and RWTH Aachen University (Germany). This project involves
a large and ever expanding number of very talented people, to whom I am indepted. Over the years, it
has been supported by a number of grants from the National Science Foundation and industry. The most
recent and most relevant funding came from NSF Award ACI-1148125 titled SI2-SSI: A Linear Algebra
Software Infrastructure for Sustained Innovation in Computational Chemistry and other Sciences1
In Texas, behind every successful man there is a woman who really pulls the strings. For more than thirty
years, my research, teaching, and other pedagogical activities have been greatly influenced by my wife,
Dr. Maggie Myers. For parts of these notes that are particularly successful, the credit goes to her. Where
they fall short, the blame is all mine!

Any opinions, findings and conclusions or recommendations expressed in this mate- rial are those of the author(s) and do
not necessarily reflect the views of the National Science Foundation (NSF).

xv

Chapter

Notes on Setting Up
0.1
0.1.1

Opening Remarks
Launch

Chapter 0. Notes on Setting Up

0.1.2

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.1.3

What You Will Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Setting Up to Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.2.1

How to Navigate These Materials . . . . . . . . . . . . . . . . . . . . . . . . . .

0.2.2

Setting Up Your Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.3.1

Why MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.3.2

Installing MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.3.3

MATLAB Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.4.1

Additional Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.4.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.1

0.2

0.3

0.4

0.1. Opening Remarks

0.1.3

What You Will Learn

Chapter 0. Notes on Setting Up

0.2
0.2.1

Setting Up to Learn
How to Navigate These Materials

To be filled in.

0.2.2

Setting Up Your Computer

It helps if we all set up our environment in a consistent fashion. To achieve this, follow the following
steps:
Download the file LAFF-NLA.zip to a directory on your computer. ()I chose to place it in my home
directory rvdg).
Unzip this file. This will create the directory LAFF-NLA.
The above steps create a directory structure with various files as illustrated in Figure 1.
You will want to put this PDF in that directory in the indicated place! Opening it with * Acrobat
Reader will ensure that hyperlinks work properly.
This is a work in progress. When instructed, you will want to load new versions of this PDF,
replacing the existing one. There will also be other files you will be asked to place in various
subdirectories.

0.3
0.3.1

MATLAB
Why MATLAB

We use MATLAB as a tool because it was invented to support learning about matrix computations. You
will find that the syntax of the language used by MATLAB, called M-script, very closely resembles the
mathematical expressions in linear algebra.
Those not willing to invest in MATLAB will want to consider * GNU Octave instead.

0.3.2

Installing MATLAB

All students at UT-Austin can get a free MATLAB license. Lets discuss on Piazza where to find it.
Everyone else: you can find instructions on how to purchase and install MATLAB from MathWorks.

0.3.3

MATLAB Basics

Below you find a few short videos that introduce you to MATLAB. For a more comprehensive tutorial,
you may want to visit * MATLAB Tutorials at MathWorks and clicking Launch Tutorial.
HOWEVER, you need very little familiarity with MATLAB in order to learn what we want you to
learn about how abstraction in mathematics is linked to abstraction in algorithms. So, you could just skip

0.3. MATLAB

Users
rvdg
LAFF-NLA
LAFF-NLA.pdf .............

The PDF for the document you are reading should be placed
here.

Programming
laff....................Subdirectory in which a small library that we will use resides.
matvec
matmat
util
vecvec
chapter01 .............. Subdirectory for the coding assignments for Chapter 1.
chapter01 answers ..... Subdirectory with answers to coding assignments for Chapter 1.
.
.
.
chapter20 .............. Subdirectory for the coding assignments for Chapter 20.
chapter20 answers ..... Subdirectory with answers for the coding assignments for
Chapter 20.
Videos
Videos-Chapter00.zip . Place this file here before unzipping (if you choose to download videos).
Videos-Chapter00......Subdirectory
created
when
you
unzip
Video-Chapter00.zip.
.
.
.
Videos-Chapter20.zip . Place this file here before unzipping (if you choose to download videos).
Videos-Chapter20......Subdirectory
created
when
you
unzip
Video-Chapter20.zip.
PictureFLAME..............Web tool that allows you to visualize an implementation as
it executes.
Spark......................Web tool with which you will translate algorithms into code
using the FLAME@lab API.

Figure 1: Directory structure for each of the steps (illustrated for Step 1 in subdirectory step1).

Chapter 0. Notes on Setting Up

these tutorials altogether, and come back to them if you find you want to know more about MATLAB and
its programming language (M-script).
What is MATLAB?

* YouTube
* Downloaded
Video

The MATLAB Environment

* YouTube
* Downloaded
Video

MATLAB Variables

* YouTube
* Downloaded
Video

MATLAB as a Calculator

* YouTube
* Downloaded
Video

0.4. Wrapup

The Origins of MATLAB

* to
MathWorks.com
Video

0.4
0.4.1

Wrapup
Additional Homework

For a typical week, additional assignments may be given in this unit.

0.4.2

Summary

You will see that we develop a lot of the theory behind the various topics in linear algebra via a sequence
of homework exercises. At the end of each week, we summarize theorems and insights for easy reference.

Chapter 0. Notes on Setting Up

Chapter

Notes on Simple Vector and Matrix Operations


1.1
1.1.1

Opening Remarks
Launch

We assume that the reader is quite familiar with vectors, linear transformations, and matrices. If not, we
suggest the reader reviews the first five weeks of
Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30].
Since undergraduate courses tend to focus on real valued matrices and vectors, we mostly focus on the
case where they are complex valued as we review.
Read disclaimer regarding the videos in the preface!
The below first video is actually for the first lecture of the semester.
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)
(In the downloadable version, the lecture starts about 27 seconds into the video. I was still learning
how to do the editing...) The video for this particular note didnt turn out, so you will want to read instead.
As I pointed out in the preface: these videos arent refined by any measure!

Chapter 1. Notes on Simple Vector and Matrix Operations

1.1.2

10

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.1.3

What You Will Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.2

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

1.3

(Hermitian) Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.3.1

Conjugating a complex scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.3.2

Conjugate of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.3.3

Conjugate of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.3.4

Transpose of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.3.5

Hermitian transpose of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

1.3.6

Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

1.3.7

Hermitian transpose (adjoint) of a matrix . . . . . . . . . . . . . . . . . . . . . .

14

1.3.8

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

Vector-vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

1.4.1

Scaling a vector (scal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

1.4.2

Scaled vector addition (axpy) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

1.4.3

Dot (inner) product (dot) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

Matrix-vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

1.5.1

Matrix-vector multiplication (product) . . . . . . . . . . . . . . . . . . . . . . .

20

1.5.2

Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

Matrix-matrix multiplication (product) . . . . . . . . . . . . . . . . . . . . . . .

26

1.6.1

Element-by-element computation . . . . . . . . . . . . . . . . . . . . . . . . . .

26

1.6.2

Via matrix-vector multiplications . . . . . . . . . . . . . . . . . . . . . . . . . .

27

1.6.3

Via row-vector times matrix multiplications . . . . . . . . . . . . . . . . . . . .

29

1.6.4

Via rank-1 updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

Enrichments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

The Basic Linear Algebra Subprograms (BLAS) . . . . . . . . . . . . . . . . . .

31

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

1.8.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

1.8.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

1.1

1.4

1.5

1.6

1.7
1.7.1
1.8

1.1. Opening Remarks

1.1.3

What You Will Learn

11

Chapter 1. Notes on Simple Vector and Matrix Operations

1.2

12

Notation

Throughout our notes we will adopt notation popularized by Alston Householder. As a rule, we will use
lower case Greek letters (, , etc.) to denote scalars. For vectors, we will use lower case Roman letters
(a, b, etc.). Matrices are denoted by upper case Roman letters (A, B, etc.).
If x Cn , and A Cmn then we expose the elements of x and A as

0
0,0
0,1 0,n1

1
1,0
1,1
1,n1

x = . and A =
.
..
..
..

..

.
.

n1
m1,0 m1,1 m1,n1
If vectors x is partitioned into N subvectors, we may denote this by

x0

x
1
x = . .
..

xN1
It is possible for a subvector to be of size zero (no elements) or one (a scalar). If we want to emphasize
that a specific subvector is a scalar, then we may choose to use a lower case Greek letter for that scalar, as
in

x0

x=

1 .
x2
We will see frequent examples where a matrix, A Cmn , is partitioned into columns or rows:

abT0

 abT

1
A = a0 a1 an1 = . .
..

abTm1
Here we add the b to be able to distinguish between the identifiers for the columns and rows. When from
context it is obvious whether we refer to a row or column, we may choose to skip the b. Sometimes, we
partition matrices into submatrices:

b0
A
A0,0
A0,1
A0,N1


 A
b1 A1,0
A

A
1,1
1,N1

A = A0 A1 AN1 = . =
.
..
..
..
..

.
.
.

bM1
A
AM1,0 AM1,1 AM1,N1

1.3. (Hermitian) Transposition

13

1.3

(Hermitian) Transposition

1.3.1

Conjugating a complex scalar

Recall that if = r + ic then its (complex) conjugate is given by


= r ic
and its length (absolute value) by
q
p

|| = |r + ic | = 2r + 2c = r + ic )(r ic = = = ||.

1.3.2

Conjugate of a vector

The (complex) conjugate of x is given by


1
x= .
..

n1

1.3.3

Conjugate of a matrix

The (complex) conjugate of A is given by

0,0

0,1

0,n1


1,1 1,n1
1,0

A=
.
..
..
..

m1,0 m1,1 m1,n1

1.3.4

Transpose of a vector

The transpose of x is given by


1
xT = .
..

n1




0 1 n1 .

Notice that transposing a (column) vector rearranges its elements to make a row vector.

Chapter 1. Notes on Simple Vector and Matrix Operations

1.3.5

14

Hermitian transpose of a vector

The Hermitian transpose of x is given by


1
xH (= xc ) = (x)T = .
..

n1

1
= .

..

n1




= 0 1 n1 .

In other words, taking the conjugate of a vector is equivalent to taking the conjugate of each of its elements.

1.3.6

Transpose of a matrix

The transpose of A is given by

0,0
0,1


1,1
1,0

T
A =
..
..

.
.

0,n1

1,n1
..
.

m1,0 m1,1 m1,n1

0,0

1,0

m1,0


1,1 m1,1
0,1

=
..
..
..

.
.

0,n1 1,n1 m1,n1

In other words, taking the conjugate of a matrix is equivalent to taking the conjugate of each of its elements.

1.3.7

Hermitian transpose (adjoint) of a matrix

The Hermitian transpose of A is given by

0,1 0,n1
0,0

1,0
1,1 1,n1
T
H
A =A =
.
..
..

..

m1,0 m1,1 m1,n1

T
0,0
1,0

1,1
0,1

=
.
..
..

0,n1 1,n1

In various texts you may see AH denoted by Ac or A instead.

1.3.8

Exercises

Homework 1.1 Partition A

A=

a0 a1 an1

Convince yourself that the following hold:

abT0

abT
1
= .
..

abTm1

m1,0

m1,1
..
.

m1,n1

1.3. (Hermitian) Transposition

15

a0 a1 an1

T

aT
1
= .
..

aTm1




= ab0 ab1 abn1 .

abT0

abT
1
.
..

abTm1

aT0

a0 a1 an1

H

aH
0

aH
1
= .
..

aH
m1

abT0

abT
1
.
..

abTm1




= ab0 ab1 abn1 .

* SEE ANSWER

Homework 1.2 Partition x into subvectors:

x0

x1

x= .
..

xN1
Convince yourself that the following hold:

x0

x1

x = . .
..

xN1
xT =

x0T

x1T

T
xN1


.

Chapter 1. Notes on Simple Vector and Matrix Operations

xH =

x0H

x1H

H
xN1

16


.
* SEE ANSWER

Homework 1.3 Partition A

A0,0

A0,1

A0,N1

A=

A1,0
..
.

A1,1
..
.

A1,N1
..
.

AM1,0 AM1,1 AM1,N1


N1
where Ai, j Cmi ni . Here M1
i=0 mi = m and j=0 ni = n.
Convince yourself that the following hold:

T
A0,0
A0,1 A0,N1
AT1,0
AT0,0

A
AT
A1,1 A1,N1
AT1,1
1,0

0,1

=
..
..
..
..
..

.
.

.
.
.

AM1,0 AM1,1 AM1,N1


AT0,N1 AT1,N1

ATM1

ATM1,1
..
.

ATM1,N1

A0,0

A0,1

A0,N1

A0,0

A0,1

A0,N1

A1,0
..
.

A1,1
..
.

A1,N1
..
.

A1,0
..
.

A1,1
..
.

A1,N1
..
.

AM1,0 AM1,1 AM1,N1

AM1,0 AM1,1 AM1,N1

A0,0

A0,1

A0,N1

A1,0
..
.

A1,1
..
.

A1,N1
..
.

AM1,0 AM1,1 AM1,N1

AH
0,0

AH
1,0

AH
M1

AH

AH
AH

0,1
1,1
M1,1
=
.
.
..
..
..

H
H
H
A0,N1 A1,N1 AM1,N1

* SEE ANSWER

1.4
1.4.1

Vector-vector Operations
Scaling a vector (scal)

Let x Cn and C, with


1
x= .
..

n1

1.4. Vector-vector Operations

17

Then x equals the vector x stretched by a factor :


1
x = .
..

n1



1

=
.
..


n1

If y := x with

y= .
..

n1

then the following loop computes y:


for i := 0, . . . , n 1
i := i
endfor
Homework 1.4 Convince yourself of the following:


T
x = 0 1 n1 .
(x)T = xT .
(x)H = xH .

x0

x1

.
..

xN1

x0


x1

=
..

.

xN1

* SEE ANSWER

Cost

Scaling a vector of size n requires, approximately, n multiplications. Each of these becomes a floating
point operation (flop) when the computation is performed as part of an algorithm executed on a computer
that performs floating point computation. We will thus say that scaling a vector costs n flops.
It should be noted that arithmetic with complex numbers is roughly 4 times as expensive as is arithmetic
with real numbers. In the chapter Notes on Performance (page 209) we also discuss that the cost of
moving data impacts the cost of a flop. Thus, not all flops are created equal!

Chapter 1. Notes on Simple Vector and Matrix Operations

1.4.2

18

Scaled vector addition (axpy)

Let x, y Cn and C, with


1
x= .
..

n1

and y = .
..

n1

Then x + y equals the vector


1
x + y = .
..

n1



1

+ .
..

n1



1

=
.
..


n1



1

+ .
..

n1

This operation is known as the axpy operation: alpha times x plus y. Typically, the vector y is overwritten
with the result:

0
0
0
0


1
1
1
1

y := x + y = . + . =
+ .
..
.. ..
..
.

n1
n1
n1
n1
so that the following loop updates y:
for i := 0, . . . , n 1
i := i + i
endfor
Homework 1.5

x0

x1

.
..

xN1

Convince yourself of the following:




y0
x0 + y0


y1
x1 + y1


+ . =
..
..
.


yN1

. (Provided xi , yi Cni and N1


i=0 ni = n.)

xN1 + yN1
* SEE ANSWER

Cost

The axpy with two vectors of size n requires, approximately, n multiplies and n additions or 2n flops.

1.4. Vector-vector Operations

1.4.3

19

Dot (inner) product (dot)

Let x, y Cn with


1
x= .
..

n1

Then the dot product of x and y is defined by

H
0
0



1
1
xH y = . .
.. ..


n1
n1

and y = .
..

n1


 1

= 0 1 n1 .
..

n1

n1

= 0 0 + 1 1 + + n1 n1 =

i i .

i=0

The following loop computes := xH y:


:= 0
for i := 0, . . . , n 1
:= i +
endfor
Homework 1.6 Convince yourself of the following:
H

x0
y0

x1 y1

N1
xiH yi . (Provided xi , yi Cni and i=0
ni = n.)
. . = N1
i=0
.. ..

xN1
yN1
* SEE ANSWER
Homework 1.7 Prove that xH y = yH x.
* SEE ANSWER
As we discuss matrix-vector multiplication and matrix-matrix multiplication, the closely related operation xT y is also useful:

0
0
0


 1
1
1

xT y = . . = 0 1 n1 .
.. ..
..

n1
n1
n1
n1

= 0 0 + 1 1 + + n1 n1 =

i i .

i=0

Chapter 1. Notes on Simple Vector and Matrix Operations

20

We will sometimes refer to this operation as a dot product since it is like the dot product xH y, but without
conjugation. In the case of real valued vectors, it is the dot product.
Cost

The dot product of two vectors of size n requires, approximately, n multiplies and n additions. Thus, a dot
product cost, approximately, 2n flops.

1.5
1.5.1

Matrix-vector Operations
Matrix-vector multiplication (product)

Be sure to understand the relation between linear transformations and matrix-vector multiplication by
reading Week 2 of
Linear Algebra: Foundations to Fountiers - Notes to LAFF With [30].
Let y Cm , A Cmn , and x Cn with

0
0,0
0,1 0,n1



1,1 1,n1
1
1,0

y = . , A =
.
..
..
..
..

.
.

m1,0 m1,1 m1,n1


m1
Then y = Ax means that

.
..

m1

0,0

0,1

0,n1


1
and x = .
..

n1

1,1 1,n1
1,0

1
=
.
..
..
..

..
.
.
.

m1,0 m1,1 m1,n1


n1

0,0 0 +0,1 1 + + 0,n1 n1

1,0
0
1,1
1
1,n1
n1

=
.
..

m1,0 0 +m1,1 1 + +m1,n1 n1

This is the definition of the matrix-vector product Ax, sometimes referred to as a general matrix-vector
multiplication (gemv) when no special structure or properties of A are assumed.
Now, partition

abT0


 abT
1
A = a0 a1 an1 = . .
..

T
abm1

1.5. Matrix-vector Operations

21

Focusing on how A can be partitioned by columns, we find that

y = Ax =

a0 a1 an1

 1

.
..

n1

= a0 0 + a1 1 + + an1 n1
= 0 a0 + 1 a1 + + n1 an1
= n1 an1 + ( + (1 a1 + (0 a0 + 0)) ),
where 0 denotes the zero vector of size m. This suggests the following loop for computing y := Ax:
y := 0
for j := 0, . . . , n 1
y := j a j + y
(axpy)
endfor
In Figure 1.1 (left), we present this algorithm using the FLAME notation (LAFF Notes Week 3 [30]).
Focusing on how A can be partitioned by rows, we find that

abT0
abT0 x
0

abT
abT x

1
1
1

y = . = Ax = . x =
.
.
..

..
..

T
T
n1
abm1
abm1 x
This suggests the following loop for computing y := Ax:
y := 0
for i := 0, . . . , m 1
i := abTi x + i
endfor

(dot)

Here we use the term dot because for complex valued matrices it is not really a dot product. In Figure 1.1
(right), we present this algorithm using the FLAME notation (LAFF Notes Week 3 [30]).
It is important to notice that this first matrix-vector operation (matrix-vector multiplication) can be
layered upon vector-vector operations (axpy or dot).
Cost

Matrix-vector multiplication with a m n matrix costs, approximately, 2mn flops. This can be easily
argued in three different ways:
The computation requires a multiply and an add with each element of the matrix. There are mn such
elements.

Chapter 1. Notes on Simple Vector and Matrix Operations

22



xT
Partition A AL AR , x
xB
where AL is 0 columns, xT has 0 elements

Algorithm: [y] := M VMULT UNB VAR 2(A, x, y)




yT
AT
Partition A , y
AB
yB
where AT has 0 rows, yT has 0 elements

while n(AL ) < n(A) do

while m(AT ) < m(A) do

Algorithm: [y] := M VMULT

UNB VAR 1(A, x, y)

Repartition

Repartition

x
0

, 1

xB
x2

AL AR

A0 a1 A2

xT

Continue with

AL AR

endwhile

A0 a1 A2

x
0

, 1

xB
x2

A
y
0
0
yT

aT , 1

1

AB
yB
A2
y2
AT

1 := aT1 x + 1

y := 1 a1 + y

xT

Continue with


A
y
0
0
AT
y
T



aT1 , 1


AB
yB
A2
y2
endwhile

Figure 1.1: Matrix-vector multiplication algorithms for y := Ax + y. Left: via axpy operations (by
columns). Right: via dot products (by rows).
The operation can be computed via n axpy operations, each of which requires 2m flops.
The operation can be computed via m dot operations, each of which requires 2n flops.

Homework 1.8 Follow the instructions in the below video to implement the two algorithms for computing
y := Ax + y, using MATLAB and Spark. You will want to use the laff routines summarized in Appendix B.
You can visualize the algorithm with PictureFLAME.

* YouTube
* Downloaded
Video
* SEE ANSWER

1.5. Matrix-vector Operations

1.5.2

23

Rank-1 update

Let y Cm , A Cmn , and x Cn with

y= .
..

m1

0,0

0,1

0,n1


1,1 1,n1
1,0

A=
.
..
..
..

.
.

m1,0 m1,1 m1,n1


1
and x = .
..

n1

The outerproduct of y and x is given by

yxT




1
1
1

= . . = . 0 1 n1
.. ..
..

m1
n1
m1

0 0
0 1 0 n1

1 0
1 1
1 n1

=
.
..
..
..

.
.
.

m1 0 m1 1 m1 n1

Also,
yxT = y

0 1 n1

0 y 1 y n1 y

This shows that all columns are a multiple of vector y. Finally,

yxT = .
..

m1

0 xT

T 1 x
x =
..

m1 xT

which shows that all columns are a multiple of row vector xT . This motivates the observation that the
matrix yxT has rank at most equal to one (LAFF Notes Week 10 [30]).
The operation A := yxT + A is called a rank-1 update to matrix A and is often referred to as underlinegeneral rank-1 update (ger):

0,0

0,1

0,n1


1,1 1,n1
1,0

.
..
..
..

.
.

m1,0 m1,1 m1,n1

:=

Chapter 1. Notes on Simple Vector and Matrix Operations

24

0 0 + 0,0

0 1 + 0,1

0 n1 + 0,n1

1 0 + 1,0
..
.

1 1 + 1,1
..
.

1 n1 + 1,n1
..
.

m1 0 + m1,0 m1 1 + m1,1 m1 n1 + m1,n1


Now, partition

A=

a0 a1 an1

abT0

abT
1
= .
..

abTm1

Focusing on how A can be partitioned by columns, we find that


 


yxT + A =
0 y 1 y n1 y + a0 a1 an1


=
0 y + a0 1 y + a1 n1 y + an1 .
Notice that each column is updated with an axpy operation. This suggests the following loop for computing A := yxT + A:
for j := 0, . . . , n 1
a j := j y + a j
endfor

(axpy)

In Figure 1.2 (left), we present this algorithm using the FLAME notation (LAFF Notes Week 3).
Focusing on how A can be partitioned by rows, we find that

abT0
0 xT

1 xT abT

1
yxT + A =
+ .
.
..

..

T
T
m1 x
abm1

0 xT + abT0

1 xT + abT

1
=
.
.

..

m1 xT + abTm1
Notice that each row is updated with an axpy operation. This suggests the following loop for computing
A := yxT + A:
for i := 0, . . . , m 1
abTj := i xT + abTj
endfor

(axpy)

1.5. Matrix-vector Operations

25

Algorithm: [A] := R ANK 1 UNB VAR 1(y, x, A)





xT
Partition x , A AL AR
xB
where xT has 0 elements, AL has 0 columns

Algorithm: [A] := R ANK 1 UNB VAR 2(y, x, A)




AT
yT
Partition y , A
yB
AB
where yT has 0 elements, AT has 0 rows

while m(xT ) < m(x) do

while m(yT ) < m(y) do


Repartition

Repartition

x
0 

 

1
, AL AR A0 a1 A2

xB
x2

xT

a1 := 1 y + a1

(axpy)

Continue with


x
0 
 

xT


1 , AL AR A0 a1 A2

xB
x2
endwhile

y
A
0
0
AT

1 , aT


1
yB
AB
y2
A2

yT

aT1 := 1 xT + aT1

(axpy)

Continue with


y
A
0
0
yT
A
T



1 , aT1


yB
AB
y2
A2
endwhile

Figure 1.2: Rank-1 update algorithms for computing A := yxT + A. Left: by columns. Right: by rows.
In Figure 1.2 (right), we present this algorithm using the FLAME notation (LAFF Notes Week 3).
Again, it is important to notice that this matrix-vector operation (rank-1 update) can be layered
upon the axpy vector-vector operation.
Cost

A rank-1 update of a m n matrix costs, approximately, 2mn flops. This can be easily argued in three
different ways:
The computation requires a multiply and an add with each element of the matrix. There are mn such
elements.
The operation can be computed one column at a time via n axpy operations, each of which requires
2m flops.
The operation can be computed one row at a time via m axpy operations, each of which requires 2n
flops.
Homework 1.9 Implement the two algorithms for computing A := yxT + A, using MATLAB and Spark.

Chapter 1. Notes on Simple Vector and Matrix Operations

26

You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
* SEE ANSWER
Homework 1.10 Prove that the matrix xyT where x and y are vectors has rank at most one, thus explaining
the name rank-1 update.
* SEE ANSWER

1.6

Matrix-matrix multiplication (product)

Be sure to understand the relation between linear transformations and matrix-matrix multiplication (LAFF
Notes Weeks 3 and 4 [30]).
We will now discuss the computation of C := AB + C, where C Cmn , A Cmk , and B Ckn .
(If one wishes to compute C := AB, one can always start by setting C := 0, the zero matrix.) This is the
definition of the matrix-matrix product AB, sometimes referred to as a general matrix-matrix multiplication
(gemm) when no special structure or properties of A and B are assumed.

1.6.1

Element-by-element computation

Let

0,0

0,1

0,n1


1,1 1,n1
1,0
C=
..
..
..

.
.

m1,0 m1,1 m1,n1

0,0

0,0

0,1

0,k1

1,1 1,k1
1,0

,A =
..
..
..

.
.

m1,0 m1,1 m1,k1


0,1

0,n1


1,1 1,n1
1,0
B=
..
..
..

.
.

k1,0 k1,1 k1,n1

Then
C := AB +C

k1
k1
p=0 0,p p,0 + 0,0
p=0 0,p p,1 + 0,1

k1
k1

p=0 1,p p,0 + 1,0


p=0 1,p p,1 + 1,1
=
..
..

.
.

k1
k1
p=0 m1,p p,0 + m1,0 p=0 m1,p p,1 + m1,1
This motivates the algorithm

k1
p=0 0,p p,n1 + 0,n1

k1
p=0 1,p p,n1 + 1,n1

..
.

k1
p=0 m1,p p,n1 + m1,n1

1.6. Matrix-matrix multiplication (product)

27

for i := 0, . . . , m 1
for J := 0, . . . , n 1
for p := 0, . . . , k 1
i, j := i,p p, j + i, j
endfor
endfor
endfor

(dot)

Notice that the for loops can be ordered in 3! = 6 different ways.


How to update each element of C via dot products can be more elegantly expressed by partitioning

abT0

abT


1
A = . and B = b0 b1 bn1 .
..

T
abm1
Then

abT0

0,0

0,1

abT 
 1,0
1,1
1

C := AB +C = . b0 b1 bn1 +
.
..
..
..

m1,0 m1,1
abTm1

abT0 b0 + 0,0
abT0 b1 + 0,1

abT0 bn1 + 0,n1

abT b +
abT1 b1 + 1,1

abT1 bn1 + 1,n1


1,0

1 0
=
..
..
..

.
.

abTm1 b0 + m1,0 abTm1 b1 + m1,1 abTm1 bn1 + m1,n1

0,n1

1,n1
..
.

m1,n1

This suggests the following algorithms for computing C := AB +C:


for i := 0, . . . , m 1
for J := 0, . . . , n 1
i, j := abTi b j + i, j
endfor
endfor

(dot)

or

for j := 0, . . . , n 1
for i := 0, . . . , m 1
i, j := abTi b j + i, j
endfor
endfor

(dot)

The cost of any of these algorithms is 2mnk flops, which can be explained by the fact that mn elements
of C must be updated with a dot product of length k.

1.6.2

Via matrix-vector multiplications

Partition
C=

c0 c1 cn1

and B =

b0 b1 bn1

Chapter 1. Notes on Simple Vector and Matrix Operations

28

Algorithm: C := G EMM UNB VAR 1(A, B,C)






Partition B BL BR , C CL CR
where BL has 0 columns, CL has 0 columns
while n(BL ) < n(B) do
Repartition

 
 
 

BL BR B0 b1 B2 , CL CR C0 c1 C2
c1 := Ab1 + c1
Continue with

 

 
 
BL BR B0 b1 B2 , CL CR C0 c1 C2
endwhile
Figure 1.3: Algorithm for computing C := AB +C one column at a time.
Then

 

C := AB +C = A b0 b1 bn1 + c0 c1 cn1


=
Ab0 + c0 Ab1 + c1 Abn1 + cn1
which shows that each column of C is updated with a matrix-vector multiplication: c j := Ab j + c j :

for j := 0, . . . , n 1
c j := Ab j + c j
endfor

(matrix-vector multiplication)

In Figure 1.3, we present this algorithm using the FLAME notation (LAFF Notes, Week 3 and 4).
The given matrix-matrix multiplication algorithm with mn matrix C, mk matrix A, and k n matrix
B costs, approximately, 2mnk flops. This can be easily argued by noting that each update of a column of
matrix C costs, approximately, 2mk flops. There are n such columns to be computed.

Homework 1.11 Implement C := AB + C via matrix-vector multiplications, using MATLAB and Spark.
You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
* SEE ANSWER

1.6. Matrix-matrix multiplication (product)

29

Algorithm: C := G EMM UNB VAR 2(A, B,C)

AT
CT
,C

Partition A
AB
CB
where AT has 0 rows, CT has 0 rows
while m(AT ) < m(A) do
Repartition

A
C
0
0
CT

aT ,
cT

1
AB
CB
A2
C2

AT

cT1 := aT1 B + cT1


Continue with

A
C
0
0
C
AT
T

aT ,
cT
1

1
AB
CB
A2
C2

endwhile
Figure 1.4: Algorithm for computing C := AB +C one row at a time.

1.6.3

Via row-vector times matrix multiplications

Partition

cbT0

cbT
1
C= .
..

cbTm1

abT
1
A= .
..

abTm1

and

abT0

Then

abT0

abT
1
C := AB +C = .
..

abTm1

cbT0

cbT

1
B
+

..

cbTm1

abT0 B + cbT0

abT1 B + cbT1
..
.

abTm1 B + cbTm1

which shows that each row of C is updated with a row-vector time matrix multiplication: cbTi := abTi B + cbTi :

Chapter 1. Notes on Simple Vector and Matrix Operations

for i := 0, . . . , m 1
cbTi := abTi B + cbTi
endfor

30

(row-vector times matrix-vector multiplication)

In Figure 1.4, we present this algorithm using the FLAME notation (LAFF Notes, Week 3 and 4).
Homework 1.12 Argue that the given matrix-matrix multiplication algorithm with m n matrix C, m k
matrix A, and k n matrix B costs, approximately, 2mnk flops.
* SEE ANSWER
Homework 1.13 Implement C := AB + C via row vector-matrix multiplications, using MATLAB and
Spark. You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm
with PictureFLAME. Hint: yT := xT A + yT is not supported by a laff routine. You can use laff gemv
instead.
An implementation can be found in
Programming/chapter01 answers/Gemm unb var2.m (see file only) (view in MATLAB)
* SEE ANSWER

1.6.4

Via rank-1 updates

Partition

A=

a0 a1 ak1

and

B=

b
bT0
b
bT1
..
.
b
bT

k1

The

C := AB +C =

=
=

a0 a1 ak1

b
bT0
b
bT1
..
.
b
bT

+C

k1
T
T
+ a1 b
b1 + + ak1b
bk1 +C
T
T
ak1b
bk1 + ( (a1b
b1 + (a0b
bT0 +C)) )

a0 b
bT0

which shows that C can be updated with a sequence of rank-1 update, suggesting the loop
for p := 0, . . . , k 1
C := a pb
bTp +C
endfor

(rank-1 update)

In Figure 1.5, we present this algorithm using the FLAME notation (LAFF Notes, Week 3 and 4).

1.7. Enrichments

31

Algorithm: [C] := G EMM

UNB VAR 3(A, B,C)

Partition A

AL

AR

,B

BT

BB
where AL has 0 columns, BT has 0 rows
while n(AL ) < n(A) do
Repartition

AL

AR

A0

a1 A2

B
0

bT
,

1
BB
B2

BT

C := a1 bT1 +C
Continue with
B
0

bT
,

1
BB
B2

AL

AR

A0 a1

A2

BT

endwhile
Figure 1.5: Algorithm for computing C := AB +C via rank-1 updates.
Homework 1.14 Argue that the given matrix-matrix multiplication algorithm with m n matrix C, m k
matrix A, and k n matrix B costs, approximately, 2mnk flops.
* SEE ANSWER
Homework 1.15 Implement C := AB + C via rank-1 updates, using MATLAB and Spark. You will want
to use the laff routines summarized in Appendix B. You can visualize the algorithm with PictureFLAME.
* SEE ANSWER

1.7
1.7.1

Enrichments
The Basic Linear Algebra Subprograms (BLAS)

Any user or practitioner of numerical linear algebra must be familiar with the Basic Linear Algebra Subprograms (BLAS), a standardized interface for commonly encountered linear algebra operations. A recommended reading is
Robert van de Geijn and Kazushige Goto
* BLAS (Basic Linear Algebra Subprograms)
Encyclopedia of Parallel Computing , Part 2, Pages 157-164. 2011.

Chapter 1. Notes on Simple Vector and Matrix Operations

32

For my graduate class I have posted this article on * Canvas. Others may have access to this article via
their employer or university.

1.8
1.8.1

Wrapup
Additional exercises

Homework 1.16 Implement GS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (Im not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
* SEE ANSWER
Homework 1.17 Implement MGS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (Im not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
* SEE ANSWER

1.8.2

Summary

It is important to realize that almost all operations that we discussed in this chapter are special cases of
matrix-matrix multiplication. This is summarized in Figure 1.6. A few notes:
Row-vector times matrix is matrix-vector multiplication in disguise: If yT := xT A then y := AT x.
For a similar reason yT := xT + yT is an axpy in disguise: y := x + y.
The operation y := x + y is the same as y := x + y since multiplication of a vector by a scalar
commutes.
The operation := + is known as a Multiply-ACcumulate (MAC) operation. Often floating
point hardware implements this as an integrated operation.
Observations made about operations with partitioned matrices and vectors can be summarized by partitioning matrices into blocks: Let

C0,0
C0,1 C0,N1
A0,0
A0,1 A0,K1

C
A

A
1,0
1,1
1,N1
1,0
1,1
1,K1

C=
, A =

..
..
..
..
..
..

.
.

.
.
.

CM1,0 CM1,1 CM1,N1


AM1,0 AM1,1 AM1,K1

B0,1

B0,1

B0,N1

B
B1,1 B1,N1
1,0

B=
.
..
..
..

BK1,0 BK1,1 BK1,N1

1.8. Wrapup

33

Illustration

k
:=

Label Operation
+

large large large

:=

large large

large

large

General matrix-vector multiplication

axpy
+

large

gemv

dot
:=

General matrix-vector multiplication

axpy

:=

gemv

1
:=

large

General rank-1 update (outer


product if initially C = 0)

large large

:=

ger

large

:=

General matrix-matrix multiplication

:=

gemm

MAC

Multiply accumulate

Figure 1.6: Special shapes of gemm C := AB +C. Here C, A, and B are m n, m k, and k n matrices,
respectively.

Then

C := AB +C =

Chapter 1. Notes on Simple Vector and Matrix Operations

34

K1
p=0 A0,p B p,0 +C0,0

K1
p=0 A0,p B p,1 +C0,1

K1
p=0 A0,p B p,N1 +C0,N1

K1
p=0 A1,p B p,0 +C1,0
..
.

K1
p=0 A1,p B p,1 +C1,1
..
.

K1
p=0 A1,p B p,N1 +C1,N1
..
.

k1
k1
K1
p=0 AM1,p B p,0 +CM1,0 p=0 AM1,p B p,1 +CM1,1 p=0 AM1,p B p,N1 +CM1,N1

(Provided the partitionings of C, A, and B are conformal.)


Notice that multiplication with partitioned matrices is exactly like regular matrix-matrix multiplication
with scalar elements, except that multiplication of two blocks does not necessarily commute.

Chapter

Notes on Vector and Matrix Norms


2.1

Opening Remarks

Video from Fall 2014


Read disclaimer regarding the videos in the preface!
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)

2.1.1

Launch

* YouTube
* Downloaded
Video

35

Chapter 2. Notes on Vector and Matrix Norms

2.1.2

36

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

2.1.3

What You Will Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.2

Absolute Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.3

Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.3.1

Vector 2-norm (Euclidean length) . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.3.2

Vector 1-norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.3.3

Vector -norm (infinity norm) . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.3.4

Vector p-norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.4.1

Frobenius norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.4.2

Induced matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

2.4.3

Special cases used in practice . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

2.4.4

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

2.4.5

Submultiplicative norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

2.5

An Application to Conditioning of Linear Systems . . . . . . . . . . . . . . . . .

46

2.6

Equivalence of Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

2.7

Enrichments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

Practical computation of the vector 2-norm . . . . . . . . . . . . . . . . . . . . .

48

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

2.8.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

2.8.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

2.1

2.4

2.7.1
2.8

2.1. Opening Remarks

2.1.3

What You Will Learn

37

Chapter 2. Notes on Vector and Matrix Norms

2.2

38

Absolute Value

Recall that if C, then || equals its absolute value. In other words, if = r + ic , then
q
p

|| = 2r + 2c = (r ic )(r + ic ) = .
This absolute value function has the following properties:
6= 0 || > 0 (| | is positive definite),
|| = |||| (| | is homogeneous), and
| + | || + || (| | obeys the triangle inequality).

2.3

Vector Norms

* YouTube
* Downloaded
Video
A (vector) norm extends the notion of an absolute value (length) to vectors:
Definition 2.1 Let : Cn R. Then is a (vector) norm if for all x, y Cn and all C
x 6= 0 (x) > 0 ( is positive definite),
(x) = ||(x) ( is homogeneous), and
(x + y) (x) + (y) ( obeys the triangle inequality).
Homework 2.2 Prove that if : Cn R is a norm, then (0) = 0 (where the first 0 denotes the zero vector
in Cn ).
* SEE ANSWER
Note: often we will use k k to denote a vector norm.

2.3.1

Vector 2-norm (Euclidean length)

The length of a vector is most commonly measured by the square root of the sum of the square of the
elements. For reasons that will become clear later in this section, we will call this the vector 2-norm.
Definition 2.3 The vector 2-norm k k2 : Cn R is defined for x Cn by
q
q

kxk2 = xH x = 0 0 + + n1 n1 = |0 |2 + + |n1 |2 .

2.3.

Vector Norms

39

In an undergraduate course, you should have learned that, given real valued vector x and unit length
real valued vector y, the component of x in the direction of y is given by xT yy:

(in this picture, kyk2 = 2.)


The length of x is kxk2 and the length of xT yy is |xT y|. Thus |xT y| kxk2 . Now, if y is not of unit length,
then |xT (y/kyk2 )| kxk2 or, equivalently, |xT y| kxk2 kyk2 . This is known as the Cauchy-Schwartz
inequality for real valued vectors. Here is its generalization for complex valued vectors.
To show that the vector 2-norm is a norm, we will need the following theorem:
Theorem 2.4 (Cauchy-Schwartz inequality) Let x, y Cn . Then |xH y| kxk2 kyk2 .
Proof: Assume that x 6= 0 and y 6= 0, since otherwise the inequality is trivially true. We can then choose
xb = x/kxk2 and yb = y/kyk2 . This leaves us to prove that |b
xH yb| 1 since kb
xk2 = kb
yk2 = 1.
Pick C with || = 1 so that b
xH yb is real and nonnegative. Another way of saying this is that
xH yb = b
yH xb.
b
xH yb = |xH y|. Note that since it is real we also know that b
xH yb = b
Now,
0 kb
x b
yk22
= (x b
y)H (b
x b
y)

(kzk22 = zH z)

= xbH xb b
yH xb b
xH yb+ b
yH yb

(multiplying out)

1 2b
xH yb+ ||2

xH yb = b
yH xb)
(kb
xk2 = kb
yk2 = 1 and b
xH yb = b

= 2 2b
xH yb

(|| = 1)

= 2 2|b
xH yb|

(b
xH y = |xH y|).

Thus |b
xH yb| 1 and therefore |xH y| kxk2 kyk2 .

QED

Theorem 2.5 The vector 2-norm is a norm.


Proof: To prove this, we merely check whether the three conditions are met:
Let x, y Cn and C be arbitrarily chosen. Then
x 6= 0 kxk2 > 0 (k k2 is positive definite):
Notice that x 6= 0 means that at least one of its components is nonzero. Lets assume that
j 6= 0. Then
kxk2 =

q
q
|0 |2 + + |n1 |2 | j |2 = | j | > 0.

Chapter 2. Notes on Vector and Matrix Norms

40

kxk2 = ||kxk2 (k k2 is homogeneous):


kxk2 =
=
=
=

q
|0 |2 + + |n1 |2
q
||2 |0 |2 + + ||2 |n1 |2
q
||2 (|0 |2 + + |n1 |2 )
q
|| |0 |2 + + |n1 |2

= ||kxk2 .
kx + yk2 kxk2 + kyk2 (k k2 obeys the triangle inequality).

kx + yk22 =
=

(x + y)H (x + y)
xH x + yH x + xH y + yH y
kxk22 + 2kxk2 kyk2 + kyk22
(kxk2 + kyk2 )2 .

Taking the square root of both sides yields the desired result.
QED

2.3.2

Vector 1-norm

Definition 2.6 The vector 1-norm k k1 : Cn R is defined for x Cn by


kxk1 = |0 | + |1 | + + |n1 |.
Homework 2.7 The vector 1-norm is a norm.
* SEE ANSWER
The vector 1-norm is sometimes referred to as the taxi-cab norm. It is the distance that a taxi travels
along the streets of a city that has square city blocks.

2.3.3

Vector -norm (infinity norm)

Definition 2.8 The vector -norm k k : Cn R is defined for x Cn by kxk = maxi |i |.


Homework 2.9 The vector -norm is a norm.
* SEE ANSWER

2.4.

2.3.4

Matrix Norms

41

Vector p-norm

Definition 2.10 The vector p-norm k k p : Cn R is defined for x Cn by


kxk p =

p
p

|0 | p + |1 | p + + |n1 | p .

Proving that the p-norm is a norm is a little tricky and not particularly relevant to this course. To prove
the triangle inequality requires the following classical result:
Theorem 2.11 (Holder inequality) Let x, y Cn and 1p + q1 = 1 with 1 p, q . Then |xH y| kxk p kykq .
Clearly, the 1-norm and 2 norms are special cases of the p-norm. Also, kxk = lim p kxk p .

2.4

Matrix Norms

It is not hard to see that vector norms are all measures of how big the vectors are. Similarly, we want
to have measures for how big matrices are. We will start with one that are somewhat artificial and then
move on to the important class of induced matrix norms.

* YouTube
* Downloaded
Video

2.4.1

Frobenius norm

Definition 2.12 The Frobenius norm k kF : Cmn R is defined for A Cmn by


v
um1 n1
u
kAkF = t |i, j |2 .
i=0 j=0

Notice that one can think of the Frobenius norm as taking the columns of the matrix, stacking them on
top of each other to create a vector of size m n, and then taking the vector 2-norm of the result.
Homework 2.13 Show that the Frobenius norm is a norm.
* SEE ANSWER
Similarly, other matrix norms can be created from vector norms by viewing the matrix as a vector. It
turns out that other than the Frobenius norm, these arent particularly interesting in practice.

Chapter 2. Notes on Vector and Matrix Norms

2.4.2

42

Induced matrix norms

Definition 2.14 Let k k : Cm R and k k : Cn R be vector norms. Define k k, : Cmn R by


kAxk
.
x Cn kxk

kAk, = sup

x 6= 0

Let us start by interpreting this. How big A is, as measured by kAk, , is defined as the most that A
magnifies the length of nonzero vectors, where the length of a vector (x) is measured with norm k k and
the length of a transformed vector (Ax) is measured with norm k k .
Two comments are in order. First,
kAxk
= sup kAxk .
kxk =1
x Cn kxk
sup

x 6= 0

This follows immediately from the following sequence of equivalences:


kAxk
Ax
x
= sup k
k = sup kA
k = sup kAyk = sup kAyk = sup kAxk .
kxk
kyk =1
kxk =1
x Cn kxk
x Cn
x Cn
x Cn kxk
sup

x 6= 0

x 6= 0

x 6= 0

x 6= 0
x
y = kxk

Also the sup (which stands for supremum) is used because we cant claim yet that there is a vector x
with kxk = 1 for which
kAk, = kAxk .
In other words, it is not immediately obvious that there is a vector for which the supremum is attained. The
fact is that there is always such a vector x. The proof depends on a result from real analysis (sometimes
called advanced calculus) that states that supxS f (x) is attained for some vector x S as long as f is
continuous and S is a compact set. Since real analysis is not a prerequisite for this course, the reader may
have to take this on faith! From real analysis we also learn that if the supremum is attained by an element
in S, then supxS f (x) = maxxS f (x). Thus, we replace sup by max from here on in our discussion.
We conclude that the following two definitions are equivalent definitions to the one we already gave:
Definition 2.15 Let k k : Cm R and k k : Cn R be vector norms. Define k k, : Cmn R by
kAxk
.
x C kxk

kAk, = maxn
x 6= 0

and
Definition 2.16 Let k k : Cm R and k k : Cn R be vector norms. Define k k, : Cmn R by
kAk, = max kAxk .
kxk =1

In this course, we will often encounter proofs involving norms. Such proofs are often much cleaner if one
starts by strategically picking the most convenient of these two definitions.

2.4.

Matrix Norms

43

Theorem 2.17 k k, : Cmn R is a norm.


Proof: To prove this, we merely check whether the three conditions are met:
Let A, B Cmn and C be arbitrarily chosen. Then
A 6= 0 kAk, > 0 (k k, is positive definite):
Notice that A 6= 0 means that at least one of its columns is not a zero vector (since at least
one element). Let us assume it is the jth column, a j , that is nonzero. Then
kAxk kAe j k ka j k

=
> 0.
ke j k
ke j k
x C kxk

kAk, = maxn
x 6= 0

kAk, = ||kAk, (k k, is homogeneous):


kAxk
kAxk
kAxk
= maxn ||
= || maxn
= ||kAk, .
kxk
x C kxk
xC
x C kxk

kAk, = maxn
x 6= 0

x 6= 0

x 6= 0

kA + Bk, kAk, + kBk, (k k, obeys the triangle inequality).

kA + Bk, =

k(A + B)xk
kAx + Bxk
kAxk + kBxk
= maxn
maxn
kxk
kxk
kxk
xC
xC
xC
maxn

x 6= 0

x 6= 0

maxn

xC
x 6= 0

kAxk kBxk
+
kxk
kxk

x 6= 0

kAxk
kBxk
+ maxn
= kAk, + kBk, .
x C kxk
x C kxk

maxn
x 6= 0

x 6= 0

QED

2.4.3

Special cases used in practice

The most important case of k k, : Cmn R uses the same norm for k k and k k (except that m may
not equal n).
Definition 2.18 Define k k p : Cmn R by
kAxk p
= max kAxk p .
kxk p =1
x C kxk p

kAk p = maxn
x 6= 0

The matrix p-norms with p {1, 2, } will play an important role in our course, as will the Frobenius
norm. As the course unfolds, we will realize that the matrix 2-norm is difficult to compute in practice
while the 1-norm, -norm, and Frobenius norms are straightforward and relatlively cheap to compute.
The following theorem shows how to practically compute the matrix 1-norm:

Chapter 2. Notes on Vector and Matrix Norms

Theorem 2.19 Let A Cmn and partition A =

44

a0 a1 an1


. Show that

kAk1 = max ka j k1 .
0 j<n

Proof: Let J be chosen so that max0 j<n ka j k1 = kaJ k1 . Then



 1

max kAxk1 = max a0 a1 an1 .


..
kxk1 =1
kxk1 =1



n1
=

max k0 a0 + 1 a1 + + n1 an1 k1

kxk1 =1

max (k0 a0 k1 + k1 a1 k1 + + kn1 an1 k1 )

kxk1 =1

max (|0 |ka0 k1 + |1 |ka1 k1 + + |n1 |kan1 k1 )

kxk1 =1

max (|0 |kaJ k1 + |1 |kaJ k1 + + |n1 |kaJ k1 )

kxk1 =1

max (|0 | + |1 | + + |n1 |) kaJ k1

kxk1 =1

= kaJ k1 .
Also,
kaJ k1 = kAeJ k1 max kAxk1 .
kxk1 =1

Hence
kaJ k1 max kAxk1 kaJ k1
kxk1 =1

which implies that


max kAxk1 = kaJ k1 = max ka j k.
0 j<n

kxk1 =1

QED
Similarly, the following exercise shows how to practically compute the matrix -norm:

abT0

abT
1
Homework 2.20 Let A Cmn and partition A = . . Show that
..

T
abm1
kAk = max kb
ai k1 = max (|i,0 | + |i,1 | + + |i,n1 |)
0i<m

0i<m

* SEE ANSWER
Notice that in the above exercise abi is really (b
aTi )T since abTi is the label for the ith row of matrix A.
Homework 2.21 Let y Cm and x Cn . Show that kyxH k2 = kyk2 kxk2 .
* SEE ANSWER

2.4.

2.4.4

Matrix Norms

45

Discussion

While k k2 is a very important matrix norm, it is in practice often difficult to compute. The matrix norms,
k kF , k k1 , and k k are more easily computed and hence more practical in many instances.

2.4.5

Submultiplicative norms

Definition 2.22 A matrix norm k k : Cmn R is said to be submultiplicative (consistent) if it also


satisfies
kABk kAk kBk .
Theorem 2.23 Let k k : Cn R be a vector norm and given any matrix C Cmn define the corresponding induced matrix norm as
kCk = max
x6=0

kCxk
= max kCxk .
kxk
kxk =1

Then for any A Cmk and B Ckn the inequality kABk kAk kBk holds.
In other words, induced matrix norms are submultiplicative. To prove this theorem, it helps to first proof
a simpler result:
Lemma 2.24 Let k k : Cn R be a vector norm and given any matrix C Cmn define the corresponding induced matrix norm as
kCxk
kCk = max
= max kCxk .
x6=0 kxk
kxk =1
Then for any A Cmn and y Cn the inequality kAyk kAk kyk holds.
Proof: If y = 0, the result obviously holds since then kAyk = 0 and kyk = 0. Let y 6= 0. Then
kAk = max
x6=0

kAxk kAyk

.
kxk
kyk

Rearranging this yields kAyk kAk kyk .


We can now prove the theorem:
Proof:

QED

kABk = max kABxk = max kA(Bx)k max kAk kBxk max kAk kBk kxk = kAk kBk .
kxk =1

kxk =1

kxk =1

kxk =1

QED
Homework 2.25 Show that kAxk kAk, kxk .
* SEE ANSWER
Homework 2.26 Show that kABk kAk, kBk .
* SEE ANSWER
Homework 2.27 Show that the Frobenius norm, k kF , is submultiplicative.
* SEE ANSWER

Chapter 2. Notes on Vector and Matrix Norms

2.5

46

An Application to Conditioning of Linear Systems

A question we will run into later in the course asks how accurate we can expect the solution of a linear
system to be if the right-hand side of the system has error in it.
Formally, this can be stated as follows: We wish to solve Ax = b, where A Cmm but the right-hand
side has been perturbed by a small vector so that it becomes b + b. (Notice how that touches the b. This
is meant to convey that this is a symbol that represents a vector rather than the vector b that is multiplied
by a scalar .) The question now is how a relative error in b propogates into a potential error in the solution
x.
This is summarized as follows:
Ax = b

Exact equation

A(x + x) = b + b

Perturbed equation

We would like to determine a formula, (A, b, b), that tells us how much a relative error in b is potentially
amplified into an error in the solution b:
kbk
kxk
(A, b, b)
.
kxk
kbk
We will assume that A has an inverse. To find an expression for (A, b, b), we notice that
Ax + Ax = b + b

= b

Ax

Ax =

and from this


Ax = b
x = A1 b.
If we now use a vector norm k k and induced matrix norm k k, then
kbk = kAxk kAkkxk
kxk = kA1 bk kA1 kkbk.
From this we conclude that
1
kxk

1
kAk kbk

kxk kA1 kkbk.


so that

kxk
kbk
kAkkA1 k
.
kxk
kbk

Thus, the desired expression (A, b, b) doesnt depend on anything but the matrix A:
kbk
kxk
kAkkA1 k
.
| {z } kbk
kxk
(A)

2.6. Equivalence of Norms

47

(A) = kAkkA1 k the called the condition number of matrix A.


A question becomes whether this is a pessimistic result or whether there are examples of b and b for
which the relative error in b is amplified by exactly (A). The answer is, unfortunately, yes!, as we will
show next.
Notice that
There is an xb for which

kAk = max kAxk = kAb


xk,
kxk=1

namely the x for which the maximum is attained. Pick b


b = Ab
x.
b for which
There is an b

b
kA1 xk kA1 bk
=
,
b
kxk6=0 kxk
kbk

kA1 k = max

again, the x for which the maximum is attained.


It is when solving the perturbed system
b
A(x + x) = b
b + b
that the maximal magnification by (A) is attained.
Homework 2.28 Let kk be a matrix norm induced by the kk vector norm. Show that (A) = kAkkA1 k
1.
* SEE ANSWER
This last exercise shows that there will always be choices for b and b for which the relative error is at
best directly translated into an equal relative error in the solution (if (A) = 1).

2.6

Equivalence of Norms

Many results we encounter show that the norm of a particular vector or matrix is small. Obviously, it
would be unfortunate if a vector or matrix is large in one norm and small in another norm. The following
result shows that, modulo a constant, all norms are equivalent. Thus, if the vector is small in one norm, it
is small in other norms as well.
Theorem 2.29 Let k k : Cn R and k k : Cn R be vector norms. Then there exist constants ,
and , such that for all x Cn
, kxk kxk , kxk .
The proof of this result again uses the fact that the supremum of a function on a compact set is attained.
A similar result holds for matrix norms:
Theorem 2.30 Let k k : Cmn R and k k : Cmn R be matrix norms. Then there exist constants
, and , such that for all A Cmn
, kAk kAk , kAk .

Chapter 2. Notes on Vector and Matrix Norms

2.7
2.7.1

48

Enrichments
Practical computation of the vector 2-norm

p
Consider the computation = 2 + 2 where , R. When computing this with floating point numbers, a fundamental problem is that the intermediate values 2 and 2 may overflow (become larger than
the largest number that can be stored) or underflow (become smaller than the smallest positive number
that can be stored), even if the resulting itself does not overflow or underflow.
The solution is to first determine the largest value, = max(||, ||) and then compute
s
 2  2

=
+

instead. A careful analysis shows that if does not overflow, neither do any of the intermediate values
 2  2
encountered during it computation. While one of the terms or may underflow, the other one
equals one and hence the overall result does not underflow. A complete discussion of all the intracacies go
beyond this note.
This insight generalizes to the computation of kxk2 where x Cn . Rather than computing it as
q
kxk2 = |0 |2 + |1 |2 + + |n1 |2
and risk overflow or underflow, instead the following computation is used:
= kxk
s

 



|1 | 2
|n1 | 2
|0 | 2
kxk2 =
+
++
.

2.8
2.8.1

Wrapup
Additional exercises

Homework 2.31 A vector x R2 can


berepresented by the point to which it points when rooted at the
2
origin. For example, the vector x = can be represented by the point (2, 1). With this in mind, plot
1
1. The points corresponding to the set {x | kxk2 = 1}.

2. The points corresponding to the set {x | kxk1 = 1}.

2.8. Wrapup

49

3. The points corresponding to the set {x | kxk = 1}.

* SEE ANSWER
Homework 2.32 Consider

A=

1 2 1
1 1

1. kAk1 =
2. kAk =
3. kAkF =
* SEE ANSWER
Homework 2.33 Show that for all x Cn

1. kxk2 kxk1 nkxk2 .


2. kxk kxk1 nkxk .

3. (1/ n)kxk2 kxk nkxk2 .


* SEE ANSWER
Homework 2.34 (I need to double check that this is true!)
Prove that if for all x kxk kxk then kAk kAk, .
* SEE ANSWER
Homework 2.35 Partition

A=

Prove that

a0 a1 an1

abT0

abT
1
= .
..

abTm1

Chapter 2. Notes on Vector and Matrix Norms

1. kAkF =

50

p
ka0 k2 + ka1 k2 + + kan1 k2 .

2. kAkF = kAT kF .
p
3. kAkF = kb
a0 k2 + kb
a1 k2 + + kb
an1 k2 .
(Note: abi = (b
aTi )T .)

* SEE ANSWER

Homework 2.36 Let for e j Rn (a standard basis vector), compute


ke j k2 =
ke j k1 =
ke j k =
ke j k p =
* SEE ANSWER
Homework 2.37 Let for I Rnn (the identity matrix), compute
kIkF =
kIk1 =
kIk =
* SEE ANSWER
Homework 2.38 Let k k be a vector norm defined for a vector of any size (arbitrary n). Let k k be the
induced matrix norm. Prove that kIk = 1 (where I equals the identity matrix).
Conclude that kIk p = 1 for any p-norm.
* SEE ANSWER

Homework 2.39 Let D =

0
..
.

1
... ...

(a diagonal matrix). Compute

n1

kDk1 =
kDk =
* SEE ANSWER

2.8. Wrapup

51

Homework 2.40 Let D =

0
..
.

1
... ...

(a diagonal matrix). Then

n1
kDk p =

Prove your answer.


* SEE ANSWER
Homework 2.41 Let y Cn . Show that kyH k2 = kyk2 .
* SEE ANSWER

2.8.2

Summary

Chapter 2. Notes on Vector and Matrix Norms

52

Chapter

Notes on Orthogonality and the Singular Value


Decomposition
If you need to review the basics of orthogonal vectors, orthogonal spaces, and related topics, you may
want to consult weeks Weeks 9-11 of
Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30].

Video
The following videos were recorded for the Fall 2014 offering of Numerical Analysis: Linear Algebra.
Read disclaimer regarding the videos in the preface!
* YouTube Part 1

* YouTube Part 2

* Download Part 1 from UT Box

* Download Part 2 from UT Box

* View Part 1 After Local Download * View Part 2 After Local Download
(For help on viewing, see Appendix A.)

3.1
3.1.1

Opening Remarks
Launch: Orthogonal projection and its application

* YouTube
* Downloaded
Video
53

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

54

A quick review of orthogonal projection

Consider the following picture:

z = Ax
w
b

C (A)

Here we consider
A, a matrix in Rmn .
C (A), the space spanned by the columns of A (the column space of A).
b, a vector in Rm .
z, the component of b in C (A) which is also the vector in C (A) closest to the vector b. We know that
since this vector is in the column space of A it equals z = Ax for some vector x Rn .
w, the component of b orthogonal to C (A).
The vectors b, z, w, all exist in the same planar subspace since b = z + w, which is the page on which these
vectors are drawn in the above picture.
Thus,
b = z + w,
where
z = Ax with x Rn ; and
AT w = 0 since w is orthogonal to the column space of A and hence in N (AT ) (the left null space of
A).
Noting that w = b z we find that
0 = AT w = AT (b z) = AT (b Ax)
or, equivalently,
AT Ax = AT b.

3.1. Opening Remarks

55

This is known as the normal equation for finding the vector x that best solves Ax = b in the linear
least-squares sense. More on this later in our course.
Then, provided (AT A)1 exists (which happens when A has linearly independent columns),
x = (AT A)1 AT b.
Thus, the component of b in C (A) is given by
z = Ax = A(AT A)1 AT b
while the component of b orthogonal (perpendicular) to C (A) is given by

w = b z = b A(AT A)1 AT b = Ib A(AT A)1 AT b = I A(AT A)1 AT b.
Summarizing:
z = A(AT A)1 AT b

w = I A(AT A)1 AT b.
Now, we say that, given matrix A with linearly independent columns, the matrix that projects (orthogonally) a given vector b onto the column space of A is given by
A(AT A)1 AT
since A(AT A)1 AT b is the component of b in C (A). Similarly, given matrix A with linearly independent
columns, the matrix that projects a given vector b onto the space orthogonal to the column space of A
(which, recall, is the left null space of A) is given by
I A(AT A)1 AT

since I A(AT A)1 AT b is the component of b in C (A) = N (AT ).
An application to data compression

Consider the picture

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

56

This picture can be thought of as a matrix B Rmn where each element in the matrix encodes a pixel in
the picture. The jth column of B then encodes the jth column of pixels in the picture.
Now, lets focus on the first few columns. Notice that there is a lot of similarity in those columns.
This can be illustrated by plotting the values in the column as a function of the index of the element in the
column:

We plot i, j , the value of the (i, j) pixel, for j = 0, 1, 2, 3 in different colors. The green line corresponds to
j = 3 and you notice that it is starting to deviate some for i near 250.
If we now instead look at columns j = 0, 1, 2, 100, where the green line corresponds to j = 100, we
see that the curve corresponding to that column is dramatically different:

Changing this to plotting j = 100, 101, 102, 103 and we notice a lot of similarity again:

3.1. Opening Remarks

57

Approximating the picture with one column

Now, lets think about this from the point of view taking one vector, say the first column of B, and projecting the other columns onto the span of that vector. The hope is that the vector is representatives and that
therefore the projections of the columns onto that span of that vector is a good approximation for each of
the columns. What does this mean?


Partition B into columns: B = b0 b1 bn1 .
Pick a = b0 to be representative vector for all columns of B.
Focus on projecting b0 onto Span({a}). Another way of thinking of this is that we take A =
and project onto C (A).

A(AT A)1 AT b0 = a(aT a)1 aT b0 = a(aT a)1 aT a = a.


{z
}
|
Since b0 = a
Next, focus on projecting b1 onto Span({a}):
a(aT a)1 aT b1 a
since b1 is very close to b0 .
Do this for all columns, and create a picture with all of the projected vectors by viewing the result
again as a matrix:


T
1
T
T
1
T
T
1
T
a(a a) a b0 a(a a) a b1 a(a a) a b2
Now, remember that if T is some matrix, then


T B = T b0 T b1 T b2 .

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

58

If we let T = a(aT a)1 aT (the matrix that projects onto Span({a}), then


T 1 T
T 1 T
a(a a) a
b0 b1 b2 = a(a a) a B.
We can manipulate this further:

a(aT a)1 aT B = a (aT a)1 BT a
|
{z
}
w

= awT

Finally, we recognize awT as an outer product (a column vector times a row vector) and hence a
matrix of rank one.
If we do this for our picture, we get the approximation on the left:

Notice how it seems like each column is the same, except with some constant change in the grayscale. The same is true for rows. Why is this? If you focus on the left-most columns in the picture,
they almost look correct (comparing to the left-most columns in the picture on the right). Why is
this?
The benefit of the approximation on the left is that it can be described with two vectors: a and w (n
+ m floating point numbers) while the original matrix on the right required an entire matrix (m n
floating point numbers).
The disadvantage of the approximation on the left is that it is hard to recognize the original picture...
What if we take two vectors instead, say column j = 0 and j = n/2, and projected each of the columns
onto the subspace spanned by those two vectors?


Partition B into columns: B = b0 b1 bn1 .
Pick A =

a0 a1

b0 bn/2


.

Focus on projecting b0 onto Span({a0 , a1 }) = C (A):


A(AT A)1 AT b0 = a
because a is in C (A) and a is therefore the best vector in C (A).

3.1. Opening Remarks

59

Next, focus on projecting b1 onto Span({a}):


A(AT A)1 AT b1 a
since b1 is very close to b0 .
Do this for all columns




T
1 T
T
1
T
T
1
T
T
1
T
=
A(A
A)
A
A(A A) A b0 A(A A) A b1 A(A A) A b2
b0 b1 b2
= A (AT A)1 AT B = AW T .
|
{z
}
T
W
Notice that A and W each have two columns and AW T is the sum of two outer products:


 
T

 wT
AW T = a0 a1
= a0 a1 0 = a0 wT0 + a1 wT1 .
w0 w1
wT1
{z
}
|
WT
It can be easily shown that this matrix has rank at most two, which is why AW T is called a rank-2
approximation of B.
We can visualize this with the following picture on the left:

We are starting to see some more detail.


We now have to store only n 2 and m 2 matrices A and W .
Rank-k approximations

We can continue the above by picking progressively more columns for A. The progression of pictures in
Figure 3.1 shows the improvement as more and more columns are used, where k indicates the number of
columns.

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

k=1

k=2

k = 10

k = 25

k = 50

original
Figure 3.1: Progression of approximations.

60

3.1. Opening Remarks

61

The Singular Value Decomposition

Picking vectors for matrix A from the columns of the original picture does not usually yield an optimal
approximation. How then does one compute the best rank-k approximation of a matrix so that the amount
of data that must be stored best captures the matrix? This is where the Singular Value Decomposition
(SVD) comes in.
The SVD is probably the most important result in linear algebra.

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

3.1.2

62

Outline

Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

3.1

Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

3.1.1

Launch: Orthogonal projection and its application . . . . . . . . . . . . . . . . .

53

3.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

3.1.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.2

Orthogonality and Unitary Matrices . . . . . . . . . . . . . . . . . . . . . . . . .

64

3.3

Toward the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

3.4

The Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

3.5

Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

3.6

Consequences of the SVD Theorem . . . . . . . . . . . . . . . . . . . . . . . . .

73

3.7

Projection onto the Column Space . . . . . . . . . . . . . . . . . . . . . . . . .

77

3.8

Low-rank Approximation of a Matrix . . . . . . . . . . . . . . . . . . . . . . .

78

3.9

An Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

3.10

SVD and the Condition Number of a Matrix . . . . . . . . . . . . . . . . . . . .

82

3.11

An Algorithm for Computing the SVD? . . . . . . . . . . . . . . . . . . . . . .

83

3.12

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

3.12.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

3.12.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

3.1. Opening Remarks

3.1.3

What you will learn

63

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

3.2

64

Orthogonality and Unitary Matrices

In this section, we extend what you learned in an undergraduate course about orthogonality to complex
valued vectors and matrices.
Definition 3.1 Let u, v Cm . These vectors are said to be orthogonal (perpendicular) if uH v = 0.
Definition 3.2 Let q0 , q1 , . . . , qn1 Cm . These vectors are said to be mutually orthonormal if for all
0 i, j < n

1 if i = j
H
qi q j =
.
0 otherwise
q
The definition implies that kqi k2 = qH
i qi = 1 and hence each of the vectors is of unit length in addition
to being orthogonal to each other.
For n vectors of length m to be mutually orthonormal, n must be less than or equal to m. This is
because n mutually orthonormal vectors are linearly independent and there can be at most m linearly
independent vectors of length m.
A very concise way of indicating that a set of vectors are mutually orthonormal is to view them as the
columns of a matrix, which then has a very special property:
Definition 3.3 Let Q Cmn (with n m). Then Q is said to be an orthonormal matrix if QH Q = I.
The subsequent exercise makes the connection between mutually orthonormal vectors and an orthogonal
matrix.


mn
Homework 3.4 Let Q C
(with n m). Partition Q = q0 q1 qn1 . Show that Q is an
orthonormal matrix if and only if q0 , q1 , . . . , qn1 are mutually orthonormal.
* SEE ANSWER
If an orthogonal matrices is square, then it is called a unitary matrix.
Definition 3.5 Let Q Cmm . Then Q is said to be a unitary matrix if QH Q = I (the identity).
Unitary matrices are always square and only square matrices can be unitary. Sometimes the term
orthogonal matrix is used instead of unitary matrix, especially if the matrix is real valued.
Unitary matrices have some very nice poperties, as captured by the following exercise.
Homework 3.6 Let Q Cmm . Show that if Q is unitary then Q1 = QH and QQH = I.
* SEE ANSWER
Homework 3.7 Let Q0 , Q1 Cmm both be unitary. Show that their product, Q0 Q1 , is unitary.
* SEE ANSWER

3.2. Orthogonality and Unitary Matrices

65

Homework 3.8 Let Q0 , Q1 , . . . , Qk1 Cmm all be unitary. Show that their product, Q0 Q1 Qk1 , is
unitary.
* SEE ANSWER
The following is a very important observation: Let Q be a unitary matrix with


Q = q0 q1 qm1 .
Let x Cm . Then
H

x = QQ x =

q0 q1

q0 q1 qm1

qH
0
 qH
1
qm1 .
..



q0 q1 qm1

qH
m1

q0 q1 qm1

H

qH
0x

 qH x
1

..

qH
m1 x

H
H
= (qH
0 x)q0 + (q1 x)q1 + + (qm1 x)qm1 .

What does this mean?

The vector x = . gives the coefficients when the vector x is written as a linear combination
..

m1
of the unit basis vectors:



1
0
0



0
1
0



x = 0 . + 1 . + + n1 . = 0 e0 + 1 e1 + + m1 em1 .
..
..
..



0
0
1
The vector

qH
0x

qH x
1
H
Q x=
..

qH
m1 x

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

66

gives the coefficients when the vector x is written as a linear combination of the orthonormal vectors
q0 , q1 , . . . , qm1 :
H
H
x = (qH
0 x)q0 + (q1 x)q1 + + (qm1 x)qm1 .
The vector (qH
i x)qi equals the component of x in the direction of vector qi .
Another way of looking at this is that if q0 , q1 , . . . , qm1 is an orthonormal basis for Cm , then any x Cm
can be written as a linear combination of these vectors:
x = 0 q0 + 1 q1 + + m1 qm1 .
Now,
H
qH
i x = qi (0 q0 + 1 q1 + + i1 qi1 + i qi + i+1 qi+1 + + m1 qm1 )

= 0 q H
q + 1 qH
q + + i1 qH
q
| i{z }0
| i{z }1
| i {zi1}
0
0
0
H
H
+ i qH
i qi + i+1 qi qi+1 + + m1 qi qm1
|{z}
| {z }
| {z }
1
0
0

= i .
Thus qH
i x = i , the coefficient that multiplies qi .
The point is that given vector x and unitary matrix Q, QH x computes the coefficients for the orthonormal basis consisting of the columns of matrix Q. Unitary matrices allow one to elegantly change
between orthonormal basis.
Multiplication by a unitary matrix preserves length (since changing the basis for a vector does not
change its length when the basis is orthonormal).
Homework 3.9 Let U Cmm be unitary and x Cm , then kUxk2 = kxk2 .
* SEE ANSWER
Homework 3.10 Let U Cmm and V Cnn be unitary matrices and A Cmn . Then
kUAk2 = kAV k2 = kAk2 .
* SEE ANSWER
Homework 3.11 Let U Cmm and V Cnn be unitary matrices and A Cmn . Then
kUAkF = kAV kF = kAkF .
* SEE ANSWER

3.3. Toward the SVD

3.3

67

Toward the SVD

* YouTube
* Downloaded
Video
In this section, we lay the foundation for the Singular Value Decomposition. We come very close:
the following lemma differs from the Singular Value Decomposition Theorem only in that it doesnt yet
guarantee that the diagonal elements of D are ordered from largest to smallest.
mm , unitary V Cnn , and diagonal D Rmn
Lemma 3.12 Given A Cmn there
exists unitary
U C

such that A = UDV H where D =

DT L

with DT L = diag(0 , , r1 ) and i > 0 for 0 i < r.

Proof: First, let us observe that if A


= 0 (thezero matrix) then the theorem trivially holds: A = UDV H
, so that DT L is 0 0. Thus, w.l.o.g. assume that A 6= 0.

where U = Imm , V = Inn , and D =

0
We will prove this for m n, leaving the case where m n as an exercise. The proof employs induction
on n.


Base case: n = 1. In this case A = a0 where a0 Rm is its only column. By assumption,
a0 6= 0. Then

 

 H
A = a0 = u0 (ka0 k2 ) 1


where u0 = a0 /ka0 k2 . Choose U1 Cm(m1) so that U = u0 U1 is unitary. Then

A=

a0

where DT L =


0

u0

(ka0 k2 )

ka0 k2

H

and V =

=


u0 U1

ka0 k2
0

H

= UDV H


1 .

Inductive step: Assume the result is true for all matrices with 1 k < n columns. Show that it is
true for matrices with n columns.
Let A Cmn with n 2. W.l.o.g., A 6= 0 so that kAk2 6= 0. Let 0 and v0 Cn have the property that kv0 k2 = 1 and 0 = kAv0 k2 = kAk2 . (In other words, v0 is the vector that maximizes
maxkxk2 =1 kAxk2 .) Let u0 = Av0 /0 . Note that ku0 k2 = 1. Choose U1 Cm(m1) and V1 Cn(n1)




so that U = u0 U1 and V = v0 V1 are unitary. Then
U H AV =

u0 U1

H 

A v0 V1

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

uH
0 Av0
U1H Av0

uH
0 AV1
U1H AV1

0 uH
0 u0
U1H u0

uH
0 AV1
U1H AV1

68

wH

where w = V1H AH u0 and B = U1H AV1 . Now, we will argue that w = 0, the zero vector of appropriate
size:

2


H
0 w


x


H AV xk2
0 B

kU
2
2
20 = kAk22 = kU H AV k22 = max
=
max
x6=0
x6=0
kxk22
kxk22

2
2




H
2
H
0 + w w
0 w



0







0 B
w
Bw
2
2
=



2
2




0
0








w
w
2

(20 + wH w)2
20 + wH w

= 20 + wH w.

Thus 20 20 + wH h which means that w = 0 and U H AV =

0 0
0

(n1)(n1) , and

By the induction hypothesis, there exists unitary U C(m1)(m1)


, unitary V C
D T L 0
with D T L = diag(1 , , r1 ).
D R(m1)(n1) such that B = U D V H where D =
0
0
Now, let

U = U

1 0

0 U

,V = V

1 0

0 V

, and D =

(There are some really tough to see checks in the definition of U, V , and D!!) Then A = UDV H
where U, V , and D have the desired properties.
By the Principle of Mathematical Induction the result holds for all matrices A Cmn with m n.

Homework 3.13 Let D = diag(0 , . . . , n1 ). Show that kDk2 = maxn1


i=0 |i |.
* SEE ANSWER
Homework 3.14 Assume that U Cmm and V Cnn are unitary matrices. Let A, B Cmn with
B = UAV H . Show that the singular values of A equal the singular values of B.
* SEE ANSWER

3.4. The Theorem

69

Homework 3.15 Let A Cmn with A =

and assume that kAk2 = 0 . Show that kBk2

kAk2 . (Hint: Use the SVD of B.)


* SEE ANSWER
Homework 3.16 Prove Lemma 3.12 for m n.
* SEE ANSWER

3.4 The Theorem


Theorem 3.17 (Singular Value Decomposition) Given A Cmn there
exists unitary U Cmm , unitary
V Cnn , and Rmn such that A = UV H where =

T L

with T L = diag(0 , , r1 )

0 0
and 0 1 r1 > 0. The 0 , . . . , r1 are known as the singular values of A.

Proof: Notice that the proof of the above theorem is identical to that of Lemma 3.12. However, thanks
to the above exercises, we can conclude that kBk2 0 in the proof, which then can be used to show that
the singular values are found in order.
Proof: (Alternative) An alternative proof uses Lemma 3.12 to conclude that A = UDV H . If the entries on
the diagonal of D are not ordered from largest to smallest, then this can be fixed by permuting the rows
and columns of D, and correspondingly permuting the columns of U and V .

3.5

Geometric Interpretation

* YouTube
* Downloaded
Video
We will now quickly illustrate what the SVD Theorem tells us about matrix-vector multiplication
(linear transformations) by examining the case where A R22 . Let A = UV T be its SVD. (Notice that
all matrices are now real valued, and hence V H = V T .) Partition

A=

u0 u1

v0 v1

T

Since U and V are unitary matrices, {u0 , u1 } and {v0 , v1 } form orthonormal bases for the range and domain
of A, respectively:

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

R2 : Domain of A.

70

R2 : Range (codomain) of A.

Let us manipulate the decomposition a little:


 0 0
T

 0 0
T


v0 v1
v0 v1
A =
= u0 u1
u0 u1
0 1
0 1


T
=
.
0 u0 1 u1
v0 v1
Now let us look at how A transforms v0 and v1 :

Av0 =

0 u0 1 u1



v0 v1

T

v0 =

0 u 0 1 u 1

1
0

= 0 u 0

and similarly Av1 = 1 u1 . This motivates the pictures


R2 : Domain of A.

R2 : Range (codomain) of A.

3.5. Geometric Interpretation

71

Now let us look at how A transforms any vector with (Euclidean) unit length. Notice that x =

means that
x = 0 e0 + 1 e1 ,
where e0 and e1 are the unit basis vectors. Thus, 0 and 1 are the coefficients when x is expressed using
e0 and e1 as basis. However, we can also express x in the basis given by v0 and v1 :


T
 vT x


T
x = VV
x = v0 v1 0
v0 v1
|{z} x = v0 v1
vT1 x
I



0
.
= vT0 x v0 + vT1 x v1 = 0 v0 + 0 v1 = v0 v1
|{z}
|{z}
1
1
0
Thus, in the basis formed by v0 and v1 , its coefficients are 0 and 1 . Now,

Ax =

0 u0 1 u1



v0 v1

T

x=

0 u0 1 u1



v0 v1

T 

v0 v1

0 u0 1 u1

= 0 0 u 0 + 1 1 u 1 .

1
This is illustrated by the following picture, which also captures the fact that the unit ball is mapped to an
ellipse1 with major axis equal to 0 = kAk2 and minor axis equal to 1 :
R2 : Domain of A.

1 It

R2 : Range (codomain) of A.

is not clear that it is actually an ellipse and this is not important to our observations.

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

72

Finally, we show the same insights for general vector x (not necessarily of unit length).
R2 : Domain of A.

R2 : Range (codomain) of A.

Another observation is that if one picks the right basis for the domain and codomain, then the computation Ax simplifies to a matrix multiplication with a diagonal matrix. Let us again illustrate this for
nonsingular A R22 with



 T

0 0

.
A=
u0 u1
v0 v1
0 1
| {z }
| {z }
|
{z
}
U
V

Now, if we chose to express y using u0 and u1 as the basis and express x using v0 and v1 as the basis, then

T
T
T
0
yb = UU
| {z } y = (u0 y)u0 + (u1 y)u1 =
b1

T
T
T
0 .
xb = VV
|{z} x = (v0 x)v0 + (v1 x)v1 ==
b
1 .
I
If y = Ax then
T
U U T y = UV
x
| {z }x = Ub
|{z}
Ax
yb

so that yb = b
x and

b0

b 1.

These observations generalize to A Cmm .

0 b
0
1 b
1 .

3.6. Consequences of the SVD Theorem

3.6

73

Consequences of the SVD Theorem

Throughout this section we will assume that


A = UV H is the SVD of A Cmn , with U and V unitary and diagonal.

T L 0
where T L = diag(0 , . . . , r1 ) with 0 1 . . . r1 > 0.
=
0 0


U = UL UR with UL Cmr .
V=

VL VR

with VL Cnr .

We first generalize the observations we made for A R22 . Let us track what the effect of Ax = UV H x
is on vector x. We assume that m n.




Let U = u0 um1 and V = v0 vn1 .
Let

x = VV H x =

v0 vn1



v0 vn1

H

x=

v0 vn1

vH
x
 0
..
.

H
vn1 x

H
= vH
0 xv0 + + vn1 xvn1 .

This can be interpreted as follows: vector x can be written in terms of the usual basis of Cn as 0 e0 +
H
+ 1 en1 or in the orthonormal basis formed by the columns of V as vH
0 xv0 + + vn1 xvn1 .
H
H
H
Notice that Ax = A(vH
0 xv0 + + vn1 xvn1 ) = v0 xAv0 + + vn1 xAvn1 so that we next look at
how A transforms each vi : Avi = UV H vi = Uei = iUei = i ui .

Thus, another way of looking at Ax is


H
Ax = vH
0 xAv0 + + vn1 xAvn1
H
= vH
0 x0 u0 + + vn1 xn1 un1
H
= 0 u0 vH
0 x + + n1 un1 vn1 x

H
= 0 u0 vH
0 + + n1 un1 vn1 x.

Corollary 3.18 A = UL T LVLH . This is called the reduced SVD of A.


Proof:

A = UV H =

UL UR

T L

0
0

VL VR

H

= UL T LVLH .

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

Homework 3.19 Let A =

AT
0

74

. Use the SVD of AT to show that kAk2 = kAT k2 .


* SEE ANSWER

Any matrix with rank r can be written as the sum of r rank-one matrices:
Corollary 3.20 Let A = UL T LVLH be the reduced SVD with


UL = u0 ur1 , T L = diag(0 , . . . , r1 ),

and VL =

v0 vr1

Then
A = 0 u0 vH
+ 1 u1 vH
+ + r1 ur1 vH
.
| {z 1}
| {z 0}
|
{z r1}
1
0
r1

(Each term is a nonzero matrix and an outer product, and hence a rank-1 matrix.)
Proof: We leave the proof as an exercise.
Corollary 3.21 C (A) = C (UL ).
Proof:
Let y C (A). Then there exists x Cn such that y = Ax (by the definition of y C (A)). But then
y = Ax = UL T LVLH x = UL z,
| {z }
z
i.e., there exists z Cr such that y = UL z. This means y C (UL ).
Let y C (UL ). Then there exists z Cr such that y = UL z. But then
y = UL z = UL T L 1
z = UL T L VLH VL 1
z = A VL 1
z = Ax
| {z } T L
| {z T L}
| {zT L}
I
I
x
so that there exists x Cn such that y = Ax, i.e., y C (A).
Corollary 3.22 Let A = UL T LVLH be the reduced SVD of A where UL and VL have r columns. Then the
rank of A is r.
Proof: The rank of A equals the dimension of C (A) = C (UL ). But the dimension of C (UL ) is clearly r.
Corollary 3.23 N (A) = C (VR ).
Proof:

3.6. Consequences of the SVD Theorem

75

Let x N (A). Then


x =

H
VV
| {z } x =
I

VL VR

VL VR

VLH x

VRH x



VL VR

H

x=

VL VR

VLH

VRH

= VLVLH x +VRVRH x.

If we can show that VLH x = 0 then x = VR z where z = VRH x. Assume that VLH x 6= 0. Then T L (VLH x) 6=
0 (since T L is nonsingular) and UL (T L (VLH x)) 6= 0 (since UL has linearly independent columns).
But that contradicts the fact that Ax = UL T LVLH x = 0.
Let x C (VR ). Then x = VR z for some z Cr and Ax = UL T L VLH VR z = 0.
| {z }
0
Corollary 3.24 For all x Cn there exists z C (VL ) such that Ax = Az.
Proof:
H
Ax = A VV
| {z } x = A
I

VL VR



VL VR

H


= A VLVLH x +VRVRH x = AVLVLH x + AVRVRH x
= AVLVLH x +UL T L VLH VR VRH x = A VLVLH x .
| {z }
| {z }
0
z
Alternative proof (which uses the last corollary):

Ax = A VLVLH x +VRVRH x = AVLVLH x + A VRVRH x = A VLVLH x .
| {z }
| {z }
N (A)
z
The proof of the last corollary also shows that
Corollary 3.25 Any vector x Cn can be written as x = z + xn where z C (VL ) and xn N (A) = C (VR ).
Corollary 3.26 AH = VL T LULH so that C (AH ) = C (VL ) and N (AH ) = C (UR ).
The above corollaries are summarized in Figure 3.2.
Theorem 3.27 Let A Cnn be nonsingular. Let A = UV H be its SVD. Then
1. The SVD is the reduced SVD.

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

Cn

Cm

dim r

76

C (A) = C (UL )

C (AH ) = C (VL )

y = Az

dim r

x = z + xn
y
y = Ax

= UL T LVLH x

xn

N (A) = C (VR )

N (AH ) = C (UR )

dim n r

dim m r

Figure 3.2: A pictorial description of how x = z + xn is transformed by A Cmn into y = Ax = A(z + xn ).


We see that the spaces C (VL ) and C (VR ) are orthogonal complements of each other within Cn . Similarly,
the spaces C (UL ) and C (UR ) are orthogonal complements of each other within Cm . Any vector x can be
written as the sum of a vector z C (VR ) and xn C (VC ) = N (A).
2. n1 6= 0.
3. If
U=

u0 un1

, = diag(0 , . . . , n1 ), and V =

v0 vn1

then
A

1 T

T H

= (V P )(P P )(UP ) =

where P =

vn1 v0


1 
diag(
, . . . , ) un1 u0 ,
n1
0

0 0 1

0
..
.

is the permutation matrix such that Px reverses the order of the entries

1
..
.

0
..
.

1 0 0
in x. (Note: for this permutation matrix, PT = P. In general, this is not the case. What is the case
for all permutation matrices P is that PT P = PPT = I.)
4. kA1 k2 = 1/n1 .

3.7. Projection onto the Column Space

77

Proof: The only item that is less than totally obvious is (3). Clearly A1 = V 1U H . The problem is that
in 1 the diagonal entries are not ordered from largest to smallest. The permutation fixes this.
Corollary 3.28 If A Cmn has linearly independent columns then AH A is invertible (nonsingular) and
1 H
(AH A)1 = VL 2T L
VL .
Proof: Since A has linearly independent columns, A = UL T LVLH is the reduced SVD where UL has n
columns and VL is unitary. Hence
H
H
H
2
H
AH A = (UL T LVLH )H UL T LVLH = VL H
T LUL UL T LVL = VL T L T LVL = VL T LVL .

Since VL is unitary and T L is diagonal with nonzero diagonal entries, they are both nonsingular. Thus

1 H 
VL 2T LVLH VL 2T L
VL ) = I.
This means AT A is invertible and (AT A)1 is as given.

3.7

Projection onto the Column Space

Definition 3.29 Let UL Cmk have orthonormal columns. The projection of a vector y Cm onto C (UL )
is the vector UL x that minimizes ky UL xk2 , where x Ck . We will also call this vector y the component
of x in C (UL ).
Theorem 3.30 Let UL Cmk have orthonormal columns. The projection of y onto C (UL ) is given by
ULULH y.
Proof: Proof 1: The vector UL x that we want must satisfy
kUL x yk2 = min kUL w yk2 .
wCk

Now, the 2-norm is invariant under multiplication by the unitary matrix U H =

UL UR

kUL x yk22 = min kUL w yk22


wCk


2
= min U H (UL w y) 2 (since the two norm is preserved)
wCk
2

H



(UL w y)
= min UL UR

wCk

= min

wCk


= min

wCk

ULH
URH
ULH
URH

2


(UL w y)


2

2

H

UL
UL w
y


URH

H

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

= min

wCk


= min

wCk


= min

wCk

78



URH UL w
URH y
2
2


w
UHy
L

0
URH y
2
2

w ULH y


H

UR y
ULH UL w

ULH y


2 
2
= min w ULH y 2 + URH y 2
wCk

2


u
= kuk2 + kvk2 )
(since
2
2


v
2


=



2

2
H
min w UL y 2 + URH y 2 .

wCk

This is minimized when w = ULH y. Thus, the vector that is closest to y in the space spanned by UL is given
by x = UL w = ULULH y.
Corollary 3.31 Let A Cmn and A = UL T LVLH be its reduced SVD. Then the projection of y Cm onto
C (A) is given by ULULH y.
Proof: This follows immediately from the fact that C (A) = C (UL ).
Corollary 3.32 Let A Cmn have linearly independent columns. Then the projection of y Cm onto
C (A) is given by A(AH A)1 AH y.
Proof: From Corrolary 3.28, we know that AH A is nonsingular and that (AH A)1 = VL 2T L
A(AH A)1 AH y = (UL T LVLH )(VL 2T L

1

1

VLH . Now,

VLH )(UL T LVLH )H y

= UL T L VLH VL 1
1 V H V U H y = ULULH y.
| {z } T L T L | L{z L} T L L
I
I
Hence the projection of y onto C (A) is given by A(AH A)1 AH y.
We saw this result, for real-valued matrices, already in Section 3.1.1.
Definition 3.33 Let A have linearly independent columns. Then (AH A)1 AH is called the pseudo-inverse
or Moore-Penrose generalized inverse of matrix A.

3.8

Low-rank Approximation of a Matrix

We are now ready to answer the question How then does one compute the best rank-k approximation of
a matrix so that the amount of data that must be stored best captures the matrix? posed in Section 3.1.1.

3.8. Low-rank Approximation of a Matrix

79

Theorem 3.34 Let A Cmn have SVD A = UV H and assume A has rank r. Partition





0
T L
,
U = UL UR , V = VL VR , and =
0 BR
where UL Cmk , VL Cnk , and T L Rkk with k r. Then B = UL T LVLH is the matrix in Cmn
closest to A in the following sense:
kA Bk2 =

min
C Cmn

kA Ck2 = k .

rank(C) k
In other words, B is the rank-k matrix closest to A as measured by the 2-norm.
Proof: First, if B is as defined, then clearly kA Bk2 = k :
kA Bk2 = kU H (A B)V k2 = kU H AV U H BV k2




H 



T L

= UL UR
B VL VR =


2
0




0 0

= kBR k2 = k
=


0 BR

0
BR

T L
0






0
0

Next, assume that C has rank t k and kA Ck2 < kA Bk2 . We will show that this leads to a
contradiction.
The null space of C has dimension at least n k since dim(N (C)) + rank(C) = n.
If x N (C) then
kAxk2 = k(A C)xk2 kA Ck2 kxk2 < k kxk2 .




Partition U = u0 um1 and V = v0 vn1 . Then kAv j k2 = k j u j k2 = j s
for j = 0, . . . , k. Now, let x be any linear combination of v0 , . . . , vk : x = 0 v0 + + k vk . Notice
that
kxk22 = k0 v0 + + k vk k22 |0 |2 + |k |2 .
Then
kAxk22 = kA(0 v0 + + k vk )k22 = k0 Av0 + + k Avk k22
= k0 0 u0 + + k k uk k22 = k0 0 u0 k22 + + kk k uk k22
= |0 |2 20 + + |k |2 2k (|0 |2 + + |k |2 )2k
so that kAxk2 k kxk2 . In other words, vectors in the subspace of all linear combinations of
{v0 , . . . , vk } satisfy kAxk2 k kxk2 . The dimension of this subspace is k + 1 (since {v0 , , vk }
form an orthonormal basis).

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

80

Both these subspaces are subspaces of Cn . Since their dimensions add up to more than n there must
be at least one nonzero vector z that satisfies both kAzk2 < k kzk2 and kAzk2 k kzk2 , which is a
contradiction.
The above theorem tells us how to pick the best approximation to a given matrix of a given desired
rank. In Section 3.1.1 we discussed how a low rank matrix can be used to compress data. The SVD thus
gives the best such rank-k approximation. In the next section, we revisit this.

3.9

An Application

Let us revisit data compression. Let Y Rmn be a matrix that, for example, stores a picture. In this
case, the (i, j) entry in Y is, for example, a number that represents the grayscale value of pixel (i, j). The
following instructions, executed in octave or matlab, generate the picture of Mexican artist Frida Kahlo in
Figure 3.3(top-left). The file FridaPNG.png can be found in Programming/chapter03/.
octave> IMG = imread( FridaPNG.png ); % this reads the image
octave> Y = IMG( :,:,1 );
octave> imshow( Y )
% this dispays the image
Although the picture is black and white, it was read as if it is a color image, which means a m n 3 array
of pixel information is stored. Setting Y = IMG( :,:,1 ) extracts a single matrix of pixel information.
(If you start with a color picture, you will want to approximate IMG( :,:,1), IMG( :,:,2), and IMG(
:,:,3) separately.)
Now, let Y = UV T be the SVD of matrix Y . Partition, conformally,





T L
0
,
U = UL UR , V = VL VR , and =
0 BR
where UL and VL have k columns and T L is k k. so that


 T L
0
Y =
UL UR
0 BR


 T L
0
=
UL UR
0 BR


 T LV T
L
=
UL UR
BRVRT

VL VR

VLT

VRT

= UL T LVLT +UR BRVRT .


Recall that then UL T LVLT is the best rank-k approximation to Y .
Let us approximate the matrix that stores the picture with UL T LVLT :
>> IMG = imread( FridaPNG.png );
>> Y = IMG( :,:,1 );

% read the picture

T

3.9. An Application

81

Original picture

k=1

k=2

k=5

k = 10

k = 25

Figure 3.3: Multiple pictures as generated by the code

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

82

Figure 3.4: Distribution of singular values for the picture.


>>
>>
>>
>>
>>
>>
>>
>>

imshow( Y ); % this dispays the image


k = 1;
[ U, Sigma, V ] = svd( Y );
UL = U( :, 1:k );
% first k columns
VL = V( :, 1:k );
% first k columns
SigmaTL = Sigma( 1:k, 1:k ); % TL submatrix of Sigma
Yapprox = uint8( UL * SigmaTL * VL );
imshow( Yapprox );

As one increases k, the approximation gets better, as illustrated in Figure 3.3. The graph in Figure 3.4
helps explain. The original matrix Y is 387 469, with 181, 503 entries. When k = 10, matrices U, V , and
are 387 10, 469 10 and 10 10, respectively, requiring only 8, 660 entries to be stores.

3.10

SVD and the Condition Number of a Matrix

In Notes on Norms we saw that if Ax = b and A(x + x) = b + b, then


kbk2
kxk2
2 (A)
,
kxk2
kbk2
where 2 (A) = kAk2 kA1 k2 is the condition number of A, using the 2-norm.

3.11. An Algorithm for Computing the SVD?

83

Homework 3.35 Show that if A Cmm is nonsingular, then


kAk2 = 0 , the largest singular value;
kA1 k2 = 1/m1 , the inverse of the smallest singular value; and
2 (A) = 0 /m1 .
* SEE ANSWER
If we go back to the example of A R22 , recall the following pictures that shows how A transforms
the unit circle:
R2 : Domain of A.

R2 : Range (codomain) of A.

In this case, the ratio 0 /n1 represents the ratio between the major and minor axes of the ellipse on
the right. Notice that the more elongated the ellipse on the right is, the worse (larger) the condition
number.

3.11

An Algorithm for Computing the SVD?

It would seem that the proof of the existence of the SVD is constructive in the sense that it provides an
algorithm for computing the SVD of a given matrix A Cmm . Not so fast! Observe that
Computing kAk2 is nontrivial.
Computing the vector that maximizes maxkxk2 =1 kAxk2 is nontrivial.
Given a vector q0 computing vectors q1 , . . . , qm1 so that q0 , . . . , qm1 are mutually orthonormal is
expensive (as we will see when we discuss the QR factorization).
Towards the end of the course we will discuss algorithms for computing the eigenvalues and eigenvectors
of a matrix, and related algorithms for computing the SVD.

Chapter 3. Notes on Orthogonality and the Singular Value Decomposition

3.12

Wrapup

3.12.1

Additional exercises

84

Homework 3.36 Let D Rnn be a diagonal matrix with entries 0 , 1 , . . . , n1 on its diagonal. Show
that
1. maxx6=0 kDxk2 /kxk2 = max0i<n |i |.
2. minx6=0 kDxk2 /kxk2 = min0i<n |i |.
* SEE ANSWER
Homework 3.37 Let A Cnn have singular values 0 1 n1 (where n1 may equal zero).
Prove that n1 kAxk2 /kxk2 0 (assuming x 6= 0).
* SEE ANSWER
Homework 3.38 Let A Cnn have the property that for all vectors x Cn it holds that kAxk2 = kxk2 .
Use the SVD to prove that A is unitary.
* SEE ANSWER
Homework 3.39 Use the SVD to prove that kAk2 = kAT k2
* SEE ANSWER

Homework 3.40 Compute the SVD of

1
2

2 1


.
* SEE ANSWER

3.12.2

Summary

Chapter

Notes on Gram-Schmidt QR Factorization


A classic problem in linear algebra is the computation of an orthonormal basis for the space spanned by a
given set of linearly independent vectors: Given a linearly independent set of vectors {a0 , . . . , an1 } Cm
we would like to find a set of mutually orthonormal vectors {q0 , . . . , qn1 } Cm so that
Span({a0 , . . . , an1 }) = Span({q0 , . . . , qn1 }).


This problem is equivalent to the problem of, given a matrix A = a0 an1 , computing a matrix


Q = q0 qn1 with QH Q = I so that C (A) = C (Q), where (A) denotes the column space of A.
A review at the undergraduate level of this topic (with animated illustrations) can be found in Week 11
of
Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30].

4.1

Opening Remarks

Video from Fall 2014


Read disclaimer regarding the videos in the preface!
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)

4.1.1

Launch

85

Chapter 4. Notes on Gram-Schmidt QR Factorization

4.1.2

86

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4.1.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

4.2

Classical Gram-Schmidt (CGS) Process . . . . . . . . . . . . . . . . . . . . . .

88

4.3

Modified Gram-Schmidt (MGS) Process . . . . . . . . . . . . . . . . . . . . . .

93

4.4

In Practice, MGS is More Accurate . . . . . . . . . . . . . . . . . . . . . . . . .

97

4.5

Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

4.1

4.5.1

Cost of CGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.5.2

Cost of MGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100


Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.6
4.6.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.6.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.1. Opening Remarks

4.1.3

What you will learn

87

Chapter 4. Notes on Gram-Schmidt QR Factorization

4.2

88

Classical Gram-Schmidt (CGS) Process

Given a set of linearly independent vectors {a0 , . . . , an1 } Cm , the Gram-Schmidt process computes an
orthonormal basis {q0 , . . . , qn1 } that spans the same subspace as the original vectors, i.e.
Span({a0 , . . . , an1 }) = Span({q0 , . . . , qn1 }).
The process proceeds as described in Figure 4.1 and in the algorithms in Figure 4.2.
Homework 4.1 What happens in the Gram-Schmidt algorithm if the columns of A are NOT linearly independent? How might one fix this? How can the Gram-Schmidt algorithm be used to identify which
columns of A are linearly independent?
* SEE ANSWER
Homework 4.2 Convince yourself that the relation between the vectors {a j } and {q j } in the algorithms
in Figure 4.2 is given by

0,0 0,1 0,n1

 
 0 1,1 1,n1


,
a0 a1 an1 = q0 q1 qn1 .
..
..
..
..

.
.
.

0
0 n1,n1
where

1 for i = j
qH
q
=
i j
0 otherwise

and

i, j =

qi a j

for i < j

j1
ka j i=0 i, j qi k2

for i = j

otherwise.
* SEE ANSWER

Thus, this relationship between the linearly independent vectors {a j } and the orthonormal vectors {q j }
can be concisely stated as
A = QR,
where A and Q are m n matrices (m n), QH Q = I, and R is an n n upper triangular matrix.
Theorem 4.3 Let A have linearly independent columns, A = QR where A, Q Cmn with n m, R Cnn ,
QH Q = I, and R is an upper triangular matrix with nonzero diagonal entries. Then, for 0 < k < n, the first
k columns of A span the same space as the first k columns of Q.
Proof: Partition

AL

AR

QL

QR

and R

RT L

RT R

RBR

where AL , QL Cmk and RT L Ckk . Then RT L is nonsingular (since it is upper triangular and has no
zero on its diagonal), QH
L QL = I, and AL = QL RT L . We want to show that C (AL ) = C (QL ):

4.2. Classical Gram-Schmidt (CGS) Process

Steps

89

Comment

0,0 := ka0 k2

Compute the length of vector a0 , 0,0 := ka0 k2 .

q0 =: a0 /0,0

Set q0 := a0 /0,0 , creating a unit vector in the direction of


a0 .
Clearly, Span({a0 }) = Span({q0 }). (Why?)

0,1 = qH
0 a1
a
1 = a1 0,1 q0

Compute a
1 , the component of vector a1 orthogonal to q0 .

1,1 = ka
1 k2

Compute 1,1 , the length of a


1.

q1 = a
1 /1,1

Set q1 = a
1 /1,1 , creating a unit vector in the direction of
a
.
1
Now, q0 and q1 are mutually orthonormal
Span({a0 , a1 }) = Span({q0 , q1 }). (Why?)

and

0,2 = qH
0 a2
1,2 = qH
1 a2

a
2 = a2 0,2 q0 1,2 q1 Compute a2 , the component of vector a2 orthogonal to q0
and q1 .
= ka k
2,2

2 2

q2 = a2 /2,2

Compute 2,2 , the length of a


2.
Set q2 = a
2 /2,2 , creating a unit vector in the direction of

a2 .
Now, {q0 , q1 , q2 } is an orthonormal basis
Span({a0 , a1 , a2 }) = Span({q0 , q1 , q2 }). (Why?)

And so forth.
Figure 4.1: Gram-Schmidt orthogonalization.

and

Chapter 4. Notes on Gram-Schmidt QR Factorization

for j = 0, . . . , n 1

for j = 0, . . . , n 1
for k = 0, . . . , j 1
k, j := qH
k aj
end

aj := a j
aj := a j
for k = 0, . . . , j 1
for k = 0, . . . , j 1
k, j := qH
a
k j
aj := aj k, j qk
aj := aj k, j qk
end
end

j, j := ka j k2
j, j := kaj k2

q j := a j / j, j
q j := aj / j, j
end
end

90

for
j = 0, . . .
, n 1

0, j
a
qH

0 j 
H
.. ..
aj
. := . = q0 q j1


j1, j
qHj1 a j


 0, j

.
aj := a j q0 q j1 ..

j1, j

j, j := kaj k2
q j := aj / j, j
end

Figure 4.2: Three equivalent (Classical) Gram-Schmidt algorithms.


We first show that C (AL ) C (QL ). Let y C (AL ). Then there exists x Ck such that y = AL x. But
then y = QL z, where z = RT L x 6= 0, which means that y C (QL ). Hence C (AL ) C (QL ).
We next show that C (QL ) C (AL ). Let y C (QL ). Then there exists z Ck such that y = QL z. But
then y = AL x, where x = R1
T L z, from which we conclude that y C (AL ). Hence C (QL ) C (AL ).
Since C (AL ) C (QL ) and C (QL ) C (AL ), we conclude that C (QL ) = C (AL ).
Theorem 4.4 Let A Cmn have linearly independent columns. Then there exist Q Cmn with QH Q = I
and upper triangular R with no zeroes on the diagonal such that A = QR. This is known as the QR
factorization. If the diagonal elements of R are chosen to be real and positive, the QR factorization is
unique.
Proof: (By induction). Note that n m since A has linearly independent columns.


Base case: n = 1. In this case A = a0 where a0 is its only column. Since A has linearly
independent columns, a0 6= 0. Then


A = a0 = (q0 ) (00 ) ,
where 00 = ka0 k2 and q0 = a0 /00 , so that Q = (q0 ) and R = (00 ).
Inductive step: Assume that the result is true for all A with n 1 linearly independent columns. We
will show it is true for A Cmn with linearly independent columns.


Let A Cmn . Partition A A0 a1 . By the induction hypothesis, there exist Q0 and R00
such that QH
0 Q0 = I, R00 is upper triangular with nonzero diagonal entries and A0 = Q0 R00 . Now,

4.2. Classical Gram-Schmidt (CGS) Process

Algorithm: [Q, R] := CGS unb var1(A)




Partition A AL AR ,


Q QL QR ,

RT L RT R

R
0
RBR
where AL and QL has 0 columns and RT L is 0 0
while n(AL ) 6= n(A) do
Repartition

 

AL AR A0 a1 A2 ,

 

QL QR Q0 q1 Q2 ,

R00 r01 R02

R
RT R
T
TL
0
11 r12

0
RBR
0
0 R22

r01 := QT0 a1
a
1 := a1 Q0 r01
11 := ka
1 k2
q1 := a
1 /11
Continue with

 

AL AR A0 a1 A2 ,

 

QL QR Q0 q1 Q2 ,

R00 r01 R02

RT R
R
TL
0 11 rT
12

0
RBR
0
0
R22

91

for j = 0, . . . , n 1

0, j

H
.
.. := q0 q j1
aj

|{z}
|
{z
}
j1, j
H
a1
Q0
|
{z
}
r01


 0, j
...
aj := a j q0 q j1

|{z}
|{z}
|
{z
}
j1, j
a1
a
Q0
1
|
{z
r01
j, j := kaj k2
q j := aj / j, j
end

(11 := ka
1 k2 )

(q1 := a1 /11 )

endwhile

Figure 4.3: (Classical) Gram-Schmidt algorithm for computing the QR factorization of a matrix A.

compute r01 = QH
0 a1 and a1 = a1 Q0 r01 , the component of a1 orthogonal to C (Q0 ). Because the

columns of A are linearly independent, a


1 6= 0. Let 11 = ka1 k2 and q1 = a1 /11 . Then

Q0 q1

R00 r01
0

Q0 R00 Q0 r01 + q1 11

11
A0 Q0 r01 + a
1

A0 a1

= A.

Chapter 4. Notes on Gram-Schmidt QR Factorization

Hence Q =

Q0 q1

and R =

R00 r01
0

92

11

By the Principle of Mathematical Induction the result holds for all matrices A Cmn with m n.
The proof motivates the algorithm in Figure 4.3 (left) in FLAME notation1 .
An alternative for motivating that algorithm is as follows: Consider A = QR. Partition A, Q, and R to
yield

A0 a1 A2

Q0 q1

R
r
R02
 00 01

T
Q2 0 11 r12

0
0 R22

Assume that Q0 and R00 have already been computed. Since corresponding columns of both sides must be
equal, we find that
a1 = Q0 r01 + q1 11 .
(4.1)
H
H
Also, QH
0 Q0 = I and Q0 q1 = 0, since the columns of Q are mutually orthonormal. Hence Q0 a1 =
H
QH
0 Q0 r01 + Q0 q1 11 = r01 . This shows how r01 can be computed from Q0 and a1 , which are already
known. Next, a
1 = a1 Q0 r01 is computed from (4.1). This is the component of a1 that is perpendicular
to the columns of Q0 . We know it is nonzero since the columns of A are linearly independent. Since

11 q1 = a
1 and we know that q1 has unit length, we now compute 11 = ka1 k2 and q1 = a1 /11 , which
completes a derivation of the algorithm in Figure 4.3.

Homework 4.5 Let A have linearly independent columns and let A = QR be a QR factorization of A.
Partition





RT L RT R
,
A AL AR , Q QL QR , and R
0
RBR
where AL and QL have k columns and RT L is k k. Show that
1. AL = QL RT L : QL RT L equals the QR factorization of AL ,
2. C (AL ) = C (QL ): the first k columns of Q form an orthonormal basis for the space spanned by the
first k columns of A.
3. RT R = QH
L AR ,
4. (AR QL RT R )H QL = 0,
5. AR QL RT R = QR RBR , and
6. C (AR QL RT R ) = C (QR ).
* SEE ANSWER
1

The FLAME notation should be intuitively obvious. If it is not, you may want to review the earlier weeks in Linear
Algebra: Foundations to Frontiers - Notes to LAFF With.

4.3. Modified Gram-Schmidt (MGS) Process

93

[y , r] = Proj orthog to QCGS (Q, y)

[y , r] = Proj orthog to QMGS (Q, y)

(used by classical Gram-Schmidt)

(used by modified Gram-Schmidt)

y = y

y = y

for i = 0, . . . , k 1

for i = 0, . . . , k 1

i := qH
i y
y := y i qi
endfor

i := qH
i y

y := y i qi
endfor

Figure 4.4: Two different ways of computing y = (I QQH )y, the component of y orthogonal to C (Q),
where Q has k orthonormal columns.

4.3

Modified Gram-Schmidt (MGS) Process

We start by considering the following problem: Given y Cm and Q Cmk with orthonormal columns,
compute y , the component of y orthogonal to the columns of Q. This is a key step in the Gram-Schmidt
process in Figure 4.3.
Recall that if A has linearly independent columns, then A(AH A)1 AH y equals the projection of y onto
the columns space of A (i.e., the component of y in C (A) ) and y A(AH A)1 AH y = (I A(AH A)1 AH )y
equals the component of y orthogonal to C (A). If Q has orthonormal columns, then QH Q = I and hence
QQH y equals the projection of y onto the columns space of Q (i.e., the component of y in C (Q) ) and
y QQH y = (I QQH )y equals the component of y orthogonal to C (A).
Thus, mathematically, the solution to the stated problem is given by
y = (I QQH )y = y QQH y

H

y
= y q0 qk1
q0 qk1

qH

 0
.
= y q0 qk1 .. y

H
qk1

H
q y

 0
.
= y q0 qk1 ..

H
qk1 y
 H

= y (q0 y)q0 + + (qH
k1 y)qk1
H
= y (qH
0 y)q0 (qk1 y)qk1 .

This can be computed by the algorithm in Figure 4.4 (left) and is used by what is often called the Classical
Gram-Schmidt (CGS) algorithm given in Figure 4.3.
An alternative algorithm for computing y is given in Figure 4.4 (right) and is used by the Modified
Gram-Schmidt (MGS) algorithm also given in Figure 4.5. This approach is mathematically equivalent to

Chapter 4. Notes on Gram-Schmidt QR Factorization

94

Algorithm: [AR] := Gram-Schmidt(A) (overwrites A with Q)



RT L RT R

Partition A AL AR , R
0
RBR
whereAL has 0 columns and RT L is 0 0
while n(AL ) 6= n(A) do
Repartition

AL

AR

A0 a1 A2

RT L

RT R

RBR

R00

r01 R02
11

T
r12

R22

wherea1 and q1 are columns, 11 is a scalar


CGS

MGS

MGS (alternative)

r01 := AH
0 a1
a1 := a1 A0 r01

[a1 , r01 ] = Proj orthog to QMGS (A0 , a1 )

11 := ka1 k2

11 := ka1 k2

11 := ka1 k2

a1 := a1 /11

q1 := a1 /11

a1 := a1 /11
T := aH A
r12
1 2
T
A2 := A2 a1 r12

Continue with

AL

AR

A0 a1 A2

RT L

RT R

RBR

R
r
R02
00 01

0 11 rT
12

0
0 R22

endwhile
Figure 4.5: Left: Classical Gram-Schmidt algorithm. Middle: Modified Gram-Schmidt algorithm. Right:
Modified Gram-Schmidt algorithm where every time a new column of Q, q1 is computed the component
of all future columns in the direction of this new vector are subtracted out.
the algorithm to its left for the following reason:
The algorithm on the left in Figure 4.4 computes
H
y := y (qH
0 y)q0 (qk1 y)qk1

by in the ith step computing the component of y in the direction of qi , (qH


i y)qi , and then subtracting this

4.3. Modified Gram-Schmidt (MGS) Process

for j = 0, . . . , n 1
aj := a j
for k = 0, . . . , j 1

k, j := qH
k aj
aj := aj k, j qk
end
j, j := kaj k2
q j := aj / j, j
end

95

for j = 0, . . . , n 1
for k = 0, . . . , j 1
k, j := aH
k aj
a j := a j k, j ak
end
j, j := ka j k2
a j := a j / j, j
end

(a) MGS algorithm that computes Q and R from (b) MGS algorithm that computes Q and R from
A.
A, overwriting A with Q.

for j = 0, . . . , n 1
j, j := ka j k2
a j := a j / j, j
for k = j + 1, . . . , n 1
j,k := aHj ak

for j = 0, . . . , n 1
j, j := ka j k2
a j := a j / j, j
for k = j + 1, . . . , n 1
j,k := aHj ak
end
for k = j + 1, . . . , n 1
ak := ak j,k a j
end
end

ak := ak j, j a j
end
end

(c) MGS algorithm that normalizes the jth col- (d) Slight modification of the algorithm in (c) that
umn to have unit length to compute q j (overwrit- computes j,k in a separate loop.
ing a j with the result) and then subtracts the component in the direction of q j off the rest of the
columns (a j+1 , . . . , an1 ).

for j = 0, . . . , n 1
j, j := ka j k2
a j := a j / j, j


for j = 0, . . . , n 1
j, j := ka j k2
aj := a j / j, j

 

j, j+1 j,n1 := aHj a j+1 aHj an1


a j+1 an1 :=


a j+1 j, j+1 a j an1 j,n1 a j
end




j, j+1 j,n1 := aHj a j+1 an1

 

a j+1 an1 := a j+1 an1


a j j, j+1 j,n1
end

(e) Algorithm in (d) rewritten without loops.

(f) Algorithm in (e) rewritten to expose


the
row-vector-times
multiplica
 matrix
H
tion a j a j+1 an1
and rank-1 update





a
a j+1 an1
j j, j+1 j,n1 .

Figure 4.6: Various equivalent MGS algorithms.

Chapter 4. Notes on Gram-Schmidt QR Factorization

Algorithm: [A, R] := MGS unb var1(A)




Partition A AL AR ,

RT L RT R

R
0
RBR
where AL and QL has 0 columns and RT L is 0 0
while n(AL ) 6= n(A) do
Repartition
 


AL AR A0 a1 A2 ,

R00 r01 R02

RT L RT R
T

0
11 r12

0
RBR
0
0 R22

96

for j = 0, . . . , n 1
j, j := ka j k2
a j := a j / j, j

(11 := ka
1 k2 )
(a1 := a1 /11 )

T
r12
z
}|
{
:=
j, j+1 j,n1


aHj
a j+1 an1
|{z} |
{z
}
aH
A2
1

11 := ka1 k2
a1 := a1 /11
T := aH A
r12
1 2

z

T
A2 := A2 a1 r12

Continue with

 

AL AR A0 a1 A2 ,

R00 r01 R02

RT L RT R
T

0 11 r12
0
RBR
0
0
R22

a j+1

A2
A2
}|
z
}|
{
{
:=
an1
a j+1 an1


aj
j, j+1 j,n1
|{z} |
{z
}
T
a1
r12

end

endwhile

Figure 4.7: Modified Gram-Schmidt algorithm for computing the QR factorization of a matrix A.
off the vector y that already contains
H
y = y (qH
0 y)q0 (qi1 y)qi1 ,

leaving us with
H
H
y = y (qH
0 y)q0 (qi1 y)qi1 (qi y)qi .

Now, notice that




H
H
H H
H H
qH
= qH
i y (q0 y)q0 (qi1 y)qi1
i y qi (q0 y)q0 qi (qi1 y)qi1
H
H
H
H
= qH
i y (q0 y) qi q0 (qi1 y) qi qi1
| {z }
| {z }
0
0

= qH
i y.

4.4. In Practice, MGS is More Accurate

97

What this means is that we can use y in our computation of i instead:

H
i := qH
i y = qi y,

an insight that justifies the equivalent algorithm in Figure 4.4 (right).


Next, we massage the MGS algorithm into the third (right-most) algorithm given in Figure 4.5. For
this, consider the equivalent algorithms in Figure 4.6 and 4.7.

4.4

In Practice, MGS is More Accurate

In theory, all Gram-Schmidt algorithms discussed in the previous sections are equivalent: they compute
the exact same QR factorizations. In practice, in the presense of round-off error, MGS is more accurate
than CGS. We will (hopefully) get into detail about this later, but for now we will illustrate it with a classic
example.
When storing real (or complex for that matter) valued numbers in a computer, a limited accuracy can
be maintained, leading to round-off error when a number is stored and/or when computation with numbers
are performed. The machine epsilon or unit roundoff error is defined as the largest positive number mach
such that the stored value of 1 + mach is rounded to 1. Now, let us consider a computer where the only

error that is ever incurred is when 1 + mach is computed and rounded to 1. Let = mach and consider
the matrix

1 1 1

0 0 


A=
=
(4.2)

a0 a1 a2
0 0

0 0
In Figure 4.8 (left) we execute the CGS algorithm. It yields the approximate matrix

1
0
0

2 2

2
2
Q

2
0

0
2

2
0
0
2
If we now ask the question Are the columns of Q orthonormal? we can check this by computing QH Q,
which should equal I, the identity. But

2 2

2
2
QH Q =
2
0
0
2

2
0
0
2

2 2

2
2

2
0
0
2

2
0
0
2

2
2

1
+

mach
2
2


1
= 22
1
2

2
1
2
1
2

Clearly, the second and third columns of Q are not mutually orthogonormal. What is going on? The
answer lies with how a
2 is computed in the last step of each of the algorithms.

Chapter 4. Notes on Gram-Schmidt QR Factorization

First iteration

First iteration
0,0 = ka0 k2 =

0,0 = ka0 k2 =

p
1 + 2 = 1 + mach

which is rounded to 1.


q0 = a0 /0,0 = /1 =

p
p
1 + 2 = 1 + mach

which is rounded to 1.


q0 = a0 /0,0 = /1 =

Second iteration

Second iteration
0,1 = qH
0 a1

98

0,1 = qH
0 a1 = 1

=1

a1 = a1 0,1 q0 =

2
1,1 = ka
1 k2 = 2 = 2

0
0

q1 = a
/( 2) = 22
1 /1,1 =

2
0
0

= a1 0,1 q0 =

1,1 = ka1 k2 = 22 = 2

0
0

q1 = a
/( 2) = 22
1 /1,1 =

0
0
a
1

Third iteration

Third iteration

0,2 = qH
0 a2

0,2 = qH
0 a2 = 1

=1

= a2 0,2 q0 =

H
1,2 = q1 a2 = ( 2/2)

/2

a2 = a2 1,2 q1 =

/2

2,2 = ka
(6/4)2 = ( 6/2)
2 k2 =

0
0

6
6

6
q2 = a
/
=
/(
)
=

2,2
2

6
2
6
2

2 6

6
a
2

1,2 = qH
1 a2 = 0

2 = a2 0,2 q0 1,2 q1 =
0

2
2,2 = ka
2 k2 = 2 = 2

0
0

2
q2 = a
/( 2) =
2 /2,2 =
0
0

Figure 4.8: Execution of the CGS algorith (left) and MGS algorithm (right) on the example in Eqn. (4.2).

4.5. Cost

99

In the CGS algorithm, we find that


H
H
a
2 := a2 (q0 a2 )q0 (q1 a2 )q1 .

Now, q0 has a relatively small error in it and hence qH


0 a2 q0 has a relatively small error in it. It is
likely that a part of that error is in the direction of q1 . Relative to qH
0 a2 q0 , that error in the direction
a
q
it
is
not.
The
point
is
that
then
a2 qH
of q1 is small, but relative to a2 qH
0 a2 q0 has a relatively
0 2 0
H
large error in it in the direction of q1 . Subtracting q1 a2 q1 does not fix this and since in the end
a
2 is small, it has a relatively large error in the direction of q1 . This error is amplified when q2 is
computed by normalizing a
2.
In the MGS algorithm, we find that
H
a
2 := a2 (q0 a2 )q0

after which

H
H
H
H
a
2 := a2 q1 a2 q1 = [a2 (q0 a2 )q0 ] (q1 [a2 (q0 a2 )q0 ])q1 .

This time, if a2 qH
1 a2 q1 has an error in the direction of q1 , this error is subtracted out when
H

(q1 a2 )q1 is subtracted from a


2 . This explains the better orthogonality between the computed vectors q1 and q2 .

Obviously, we have argued via an example that MGS is more accurage than CGS. A more thorough
analysis is needed to explain why this is generally so. This is beyond the scope of this note.

4.5

Cost

Let us examine the cost of computing the QR factorization of an m n matrix A. We will count multiplies
and an adds as each as one floating point operation.
We start by reviewing the cost, in floating point operations (flops), of various vector-vector and matrixvector operations:
Name

Operation

Approximate cost (in flops)

Vector-vector operations (x, y Cn , C)


Dot

:= xH y

2n

Axpy

y := x + y

2n

Scal

x := x

Nrm2

:= ka1 k2

2n

Matrix-vector operations (A Cmn , , C, with x and y vectors of appropriate size)


Matrix-vector multiplication (Gemv) y := Ax + y

2mn

y := AH x + y 2mn
Rank-1 update (Ger)

A := yxH + A

2mn

Now, consider the algorithms in Figure 4.5. Notice that the columns of A are of size m. During the kth
iteration (0 k < n), A0 has k columns and A2 has n k 1 columns.

Chapter 4. Notes on Gram-Schmidt QR Factorization

4.5.1

100

Cost of CGS
Operation

Approximate cost (in flops)

r01 := AH
0 a1

2mk

a1 := a1 A0 r01 2mk
11 := ka1 k2

2m

a1 := a1 /11

Thus, the total cost is (approximately)


n1
k=0 [2mk + 2mk + 2m + m]
= n1
k=0 [3m + 4mk]
= 3mn + 4m n1
k=0 k
2

2
(n1
k=0 k = n(n 1)/2 n /2

3mn + 4m n2

= 3mn + 2mn2
2mn2

4.5.2

(3mn is of lower order).

Cost of MGS
Operation

Approximate cost (in flops)

11 := ka1 k2

2m

a1 := a1 /11

T := aH A
r12
1 2

2m(n k 1)

T 2m(n k 1)
A2 := A2 a1 r12

Thus, the total cost is (approximately)


n1
k=0 [2m + m + 2m(n k 1) + 2m(n k 1)]
= n1
k=0 [3m + 4m(n k 1)]
= 3mn + 4m n1
k=0 (n k 1)
= 3mn + 4m n1
i=0 i
n2

3mn + 4m 2

(Change of variable: i = n k 1)
2
(n1
i=0 i = n(n 1)/2 n /2

= 3mn + 2mn2
2mn2

(3mn is of lower order).

4.6. Wrapup

4.6
4.6.1

101

Wrapup
Additional exercises

Homework 4.6 Implement GS unb var1 using MATLAB and Spark. You will want to use the laff routines summarized in Appendix B. (Im not sure if you can visualize the algorithm with PictureFLAME. Try
it!)
* SEE ANSWER
Homework 4.7 Implement MGS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (Im not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
* SEE ANSWER

4.6.2

Summary

Chapter 4. Notes on Gram-Schmidt QR Factorization

102

Chapter

Notes on the FLAME APIs


Video
Read disclaimer regarding the videos in the preface!
No video.

103

Chapter 5. Notes on the FLAME APIs

5.0.1

104

Outline

Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.0.1

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.2

Install FLAME@lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.3

An Example: Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . 104

5.4

5.1

5.3.1

The Spark Webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.3.2

Implementing CGS with FLAME@lab . . . . . . . . . . . . . . . . . . . . . . . 105

5.3.3

Editing the code skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.3.4

Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Implementing the Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 109

Motivation

In the course so far, we have frequently used the FLAME Notation to express linear algebra algorithms.
In this note we show how to translate such algorithms into code, using various QR factorization algorithms
as examples.

5.2

Install FLAME@lab

The API we will use we refer to as the FLAME@lab API, which is an API that targets the M-script
language used by Matlab and Octave (an Open Source Matlab implementation). This API is very intuitive,
and hence we will spend (almost) no time explaining it.
Download all files from http://www.cs.utexas.edu/users/flame/Notes/FLAMEatlab/ and place
them in the same directory as you will the remaining files that you will create as part of the exercises in
this document. (Unless you know how to set up paths in Matlab/Octave, in which case you can put it
whereever you please, and set the path.)

5.3

An Example: Gram-Schmidt Orthogonalization

Let us start by considering the various Gram-Schmidt based QR factorization algorithms from Notes on
Gram-Schmidt QR Factorization, typeset using the FLAME Notation in Figure 5.2.

5.3.1

The Spark Webpage

We wish to typeset the code so that it closely resembles the algorithms in Figure 5.2. The FLAME
notation itself uses white space to better convey the algorithms. We want to do the same for the codes
that implement the algorithms. However, typesetting that code is somewhat bothersome because of the

5.3. An Example: Gram-Schmidt Orthogonalization

105

[y , r] = Proj orthog to QCGS (Q, y)

[y , r] = Proj orthog to QMGS (Q, y)

(used by classical Gram-Schmidt)

(used by modified Gram-Schmidt)

y = y

y = y

for i = 0, . . . , k 1

for i = 0, . . . , k 1

i := qH
i y

i := qH
i y
y := y i qi
endfor

y := y i qi
endfor

Figure 5.1: Two different ways of computing y = (I QQH )y, the component of y orthogonal to C (Q),
where Q has k orthonormal columns.
careful spacing that is required. For this reason, we created a webpage that creates a code skeleton.. We
call this page the Spark page:
http://www.cs.utexas.edu/users/flame/Spark/.
When you open the link, you will get a page that looks something like the picture in Figure 5.3.

5.3.2

Implementing CGS with FLAME@lab

.
We will focus on the Classical Gram-Schmidt algorithm on the left, which we show by itself in Figure 5.4 (left). To its right, we show how the menu on the left side of the Spark webpage needs to be filled
out.
Some comments:
Name: Choose a name that describes the algorithm/operation being implemented.
Type of function: Later you will learn about blocked algorithms. For now, we implement unblocked
algorithms.
Variant number: Notice that there are a number of algorithmic variants for implementing the GramSchmidt algorithm. We choose to call the first one Variant 1.
Number of operands: This routine requires two operands: one each for matrices A and R. (A will be
overwritten by the matrix Q.)
Operand 1: We indicate that A is a matrix through which we march from left to right (L->R) and it is
both input and output.
Operand 2: We indicate that R is a matrix through which we march from to-left to bottom-right
(TL->BR) and it is both input and output. Our API requires you to pass in the array in which to put
an output, so an appropriately sized R must be passed in.
Pick and output language: A number of different representations are supported, including APIs for Mscript (FLAME@lab), C (FLAMEC), LATEX(FLaTeX), and Python (FlamePy). Pick FLAME@lab.

Chapter 5. Notes on the FLAME APIs

106

Algorithm: [AR] := Gram-Schmidt(A) (overwrites A with Q)



RT L RT R

Partition A AL AR , R
0
RBR
whereAL has 0 columns and RT L is 0 0
while n(AL ) 6= n(A) do
Repartition

AL

AR

A0 a1 A2

RT L

RT R

RBR

R00

r01 R02
11

T
r12

R22

wherea1 and q1 are columns, 11 is a scalar


CGS

MGS

MGS (alternative)

r01 := AH
0 a1
a1 := a1 A0 r01

[a1 , r01 ] = Proj orthog to QMGS (A0 , a1 )

11 := ka1 k2

11 := ka1 k2

11 := ka1 k2

a1 := a1 /11

q1 := a1 /11

a1 := a1 /11
T := aH A
r12
1 2
T
A2 := A2 a1 r12

Continue with

AL

AR

A0 a1 A2

RT L

RT R

RBR

R
r
R02
00 01

0 11 rT
12

0
0 R22

endwhile
Figure 5.2: Left: Classical Gram-Schmidt algorithm. Middle: Modified Gram-Schmidt algorithm. Right:
Modified Gram-Schmidt algorithm where every time a new column of Q, q1 is computed the component
of all future columns in the direction of this new vector are subtracted out.

To the left of the menu, you now find what we call a code skeleton for the implementation, as shown in
Figure 5.5. In Figure 5.6 we show the algorithm and generated code skeleton side-by-side.

5.3. An Example: Gram-Schmidt Orthogonalization

107

Figure 5.3: The Spark webpage.

5.3.3

Editing the code skeleton

At this point, one should copy the code skeleton into ones favorite text editor. (We highly recommend
emacs for the serious programmer.) Once this is done, there are two things left to do:
Fix the code skeleton: The Spark webpage guesses the code skeleton. One detail that it sometimes
gets wrong is the stopping criteria. In this case, the algorithm should stay in the loop as long
as n(AL ) 6= n(A) (the width of AL is not yet the width of A). In our example, the Spark webpage
guessed that the column size of matrix A is to be used for the stopping criteria:
while ( size( AL, 2 ) < size( A, 2 ) )
which happens to be correct. (When you implement the Householder QR factorization, you may not
be so lucky...)
The update statements: The Spark webpage cant guess what the actual updates to the various parts
of matrices A and R should be. It fills in
%
%
%

update line 1
:
update line n

%
%
%

Thus, one has to manually translate


r01 := AH
0 a1
a1 := a1 A0 r01
11 := ka1 k2
a1 := a1 /11

into appropriate M-script code:

r01 = A0 * a1;
a1 = a1 - A0 * r01;
rho11 = norm( a1 );
a1 = a1 / rho11;

Chapter 5. Notes on the FLAME APIs

108

Algorithm: [A, R] := Gram-Schmidt(A) (overwrites A with Q)




Partition A AL AR ,

RT L RT R

R
0
RBR
where AL has 0 columns and RT L is 0 0
while n(AL ) 6= n(A) do
Repartition

 

AL AR A0 a1 A2 ,

R00 r01 R02

RT L RT R

11 r12
0
0
RBR
0
0 R22

r01 := AH
0 a1
a1 := a1 A0 r01
11 := ka1 k2
a1 := a1 /11
Continue with

 

AL AR A0 a1 A2 ,

R00 r01 R02

RT L RT R
0 11 rT

12

0
RBR
0
0
R22

endwhile

Figure 5.4: Left: Classical Gram-Schmidt algorithm. Right: Generated code-skeleton for CGS.
(Notice: if one forgets the ;, when executed the results of the assignment will be printed by
Matlab/Octave.)
At this point, one saves the resulting code in the file CGS unb var1.m. The .m ending is important
since the name of the file is used to find the routine when using Matlab/Octave.

5.3.4

Testing

To now test the routine, one starts octave and, for example, executes the commands
>
>
>
>

A
R
[
A

= rand( 5, 4 )
= zeros( 4, 4 )
Q, R ] = CGS_unb_var1( A, R )
- Q * triu( R )

The result should be (approximately) a 5 4 zero matrix.

5.4. Implementing the Other Algorithms

109

Figure 5.5: The Spark webpage filled out for CGS Variant 1.
(The first time you execute the above, you may get a bunch of warnings from Octave. Just ignore
those.)

5.4

Implementing the Other Algorithms

Next, we leave it to the reader to implement


Modified Gram Schmidt algorithm, (MGS unb var1, corresponding to the right-most algorithm in
Figure 5.2), respectively.
The Householder QR factorization algorithm and algorithm to form Q from Notes on Householder
QR Factorization.
The routine for computing a Householder transformation (similar to Figure 5.1) can be found at
http://www.cs.utexas.edu/users/flame/Notes/FLAMEatlab/Housev.m
That routine implements the algorithm on the left in Figure 5.1). Try and see what happens if you
replace it with the algorithm to its right.
Note: For the Householder QR factorization and form Q algorithm how to start the algorithm when the
matrix is not square is a bit tricky. Thus, you may assume that the matrix is square.

Chapter 5. Notes on the FLAME APIs

110

Algorithm: [A, R] := Gram-Schmidt(A) (overwrites A with Q)




Partition A AL AR ,

RT L RT R

R
0
RBR
where AL has 0 columns and RT L is 0 0
while n(AL ) 6= n(A) do
Repartition

 

AL AR A0 a1 A2 ,

R00 r01 R02

RT L RT R
T

0
11 r12

0
RBR
0
0 R22

r01 := AH
0 a1
a1 := a1 A0 r01
11 := ka1 k2
a1 := a1 /11
Continue with

 

AL AR A0 a1 A2 ,

R00 r01 R02

RT L RT R

0 11 rT
12

0
RBR
0
0
R22

endwhile

Figure 5.6: Left: Classical Gram-Schmidt algorithm. Right: Generated code-skeleton for CGS.

Chapter

Notes on Householder QR Factorization


Video
Read disclaimer regarding the videos in the preface! This is the video recorded in Fall 2014.
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)

6.1
6.1.1

Opening Remarks
Launch

A fundamental problem to avoid in numerical codes is the situation where one starts with large values and
one ends up with small values with large relative errors in them. This is known as catastrophic cancellation.
The Gram-Schmidt algorithms can inherently fall victim to this: column a j is successively reduced in
length as components in the directions of {q0 , . . . , q j1 } are subtracted, leaving a small vector if a j was
almost in the span of the first j columns of A. Application of a unitary transformation to a matrix or vector
inherently preserves length. Thus, it would be beneficial if the QR factorization can be implementated
as the successive application of unitary transformations. The Householder QR factorization accomplishes
this.
The first fundamental insight is that the product of unitary matrices is itself unitary. If, given A Cmn
(with m n), one could find a sequence of unitary matrices, {H0 , . . . , Hn1 }, such that

R
Hn1 H0 A = ,
0
where R Cnn is upper triangular, then





R
R
R
H
= Q = QL QR = QL R,
A = H0H Hn1
| {z }
0
0
0
Q
111

Chapter 6. Notes on Householder QR Factorization

112

where QL equals the first n columns of Q. Then A = QL R is the QR factorization of A. The second
fundamental insight will be that the desired unitary transformations {H0 , . . . , Hn1 } can be computed and
applied cheaply.

6.1. Opening Remarks

6.1.2

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.1
6.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.1.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114


Householder Transformations (Reflectors) . . . . . . . . . . . . . . . . . . . . . 115

6.2
6.2.1

The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.2.2

As implemented for the Householder QR factorization (real case) . . . . . . . . . 116

6.2.3

The complex case (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2.4

A routine for computing the Householder vector . . . . . . . . . . . . . . . . . . 118

6.3

Householder QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.4

Forming Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.5

Applying QH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.6

Blocked Householder QR Factorization . . . . . . . . . . . . . . . . . . . . . . 129


6.6.1

The UT transform: Accumulating Householder transformations . . . . . . . . . . 129

6.6.2

The WY transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.6.3

A blocked algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.6.4

Variations on a theme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.7

Enrichments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.8

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.8.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.8.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

113

Chapter 6. Notes on Householder QR Factorization

6.1.3

What you will learn

114

6.2. Householder Transformations (Reflectors)

115
x

x=z+u xu
H

u xu

v=xy

u xu

(I 2 u u )x

(I 2 u u )x

Figure 6.1: Left: Illustration that shows how, given vectors x and unit length vector u, the subspace
orthogonal to u becomes a mirror for reflecting x represented by the transformation (I 2uuH ). Right:
Illustration that shows how to compute u given vectors x and y with kxk2 = kyk2 .

6.2
6.2.1

Householder Transformations (Reflectors)


The general case

In this section we discuss Householder transformations, also referred to as reflectors.


Definition 6.1 Let u Cn be a vector of unit length (kuk2 = 1). Then H = I 2uuH is said to be a reflector
or Householder transformation.
We observe:
Any vector z that is perpendicular to u is left unchanged:
(I 2uuH )z = z 2u(uH z) = z.
Any vector x can be written as x = z + uH xu where z is perpendicular to u and uH xu is the component
of x in the direction of u. Then
(I 2uuH )x = (I 2uuH )(z + uH xu) = z + uH xu 2u |{z}
uH z 2uuH uH xu
0
= z + uH xu 2uH x |{z}
uH u u = z uH xu.
1
This can be interpreted as follows: The space perpendicular to u acts as a mirror: any vector in that space
(along the mirror) is not reflected, while any other vector has the component that is orthogonal to the space
(the component outside, orthogonal to, the mirror) reversed in direction, as illustrated in Figure 6.1. Notice
that a reflection preserves the length of the vector.

Chapter 6. Notes on Householder QR Factorization

116

Homework 6.2 Show that if H is a reflector, then


HH = I (reflecting the reflection of a vector results in the original vector).
H = HH .
H H H = I (a reflector is unitary).
* SEE ANSWER
Next, let us ask the question of how to reflect a given x Cn into another vector y Cn with kxk2 =
kyk2 . In other words, how do we compute vector u so that (I 2uuH )x = y. From our discussion above,
we need to find a vector u that is perpendicular to the space with respect to which we will reflect. From
Figure 6.1(right) we notice that the vector from y to x, v = x y, is perpendicular to the desired space.
Thus, u must equal a unit vector in the direction v: u = v/kvk2 .
Remark 6.3 In subsequent discussion we will prefer to give Householder transformations as I uuH /,
where = uH u/2 so that u needs no longer be a unit vector, just a direction. The reason for this will
become obvious later.
In the next subsection, we will need to find a Householder transformation H that maps a vector x to a
multiple of the first unit basis vector (e0 ).
Let us first discuss how to find H in the case where x Rn . We seek v so that (I vT2 v vvT )x = kxk2 e0 .
Since the resulting vector that we want is y = kxk2 e0 , we must choose v = x y = x kxk2 e0 .
Homework 6.4 Show that if x Rn , v = x kxk2 e0 , and = vT v/2 then (I 1 vvT )x = kxk2 e0 .
* SEE ANSWER
In practice, we choose v = x + sign(1 )kxk2 e0 where 1 denotes the first element of x. The reason is as
follows: the first element of v, 1 , will be 1 = 1 kxk2 . If 1 is positive and kxk2 is almost equal to 1 ,
then 1 kxk2 is a small number and if there is error in 1 and/or kxk2 , this error becomes large relative
to the result 1 kxk2 , due to catastrophic cancellation. Regardless of whether 1 is positive or negative,
we can avoid this by choosing x = 1 + sign(1 )kxk2 e0 .

6.2.2

As implemented for the Householder QR factorization (real case)

Next, we discuss a slight variant on the above discussion that is used in practice. To do so, we view x as a
vector that consists of its first element, 1 , and the rest of the vector, x2 : More precisely, partition

1
,
x=
x2
where 1 equals
the

first element of x and x2 is the rest of x. Then we will wish to find a Householder
1
so that
vector u =
u2

1
1

kxk2
I 1

1 =
.

u2
u2
x2
0

6.2. Householder Transformations (Reflectors)

117

Notice that y in the previous discussion equals the vector

v=

1 kxk2
x2

kxk2
0

, so the direction of u is given by

We now wish to normalize this vector so its first entry equals 1:

kxk2
1
v
1
1
=
.
u=
=
1 1 kxk2
x2
x2 /1
where 1 = 1 kxk2 equals the first element of v. (Note that if 1 = 0 then u2 can be set to 0.)

6.2.3

The complex case (optional)

Next, let us work out the complex case, dealing explicitly with x as a vector that consists of its first element,
1 , and the rest of the vector, x2 : More precisely, partition

x=

1
x2

where 1 equals
the

first element of x and x2 is the rest of x. Then we will wish to find a Householder
1
so that
vector u =
u2


kxk2
1
1

I 1

1 =
.

u2
u2
x2
0
denotes a complex scalar on the complex unit circle. By the same argument as before
Here

kxk2
1
.
v=
x2

We now wish to normalize this vector so its first entry equals 1:

kxk2

1
v
1
1
=
.
u=
=
kxk2
kvk2 1
x2
x2 /1
kxk2 . (If 1 = 0 then we set u2 to 0.)
where 1 = 1

Chapter 6. Notes on Householder QR Factorization

118

Homework 6.5 Verify that

1
1

I 1

1 =

u2
u2
x2
0
kxk2 .
where = uH u/2 = (1 + uH
2 u2 )/2 and =

Hint: = ||2 = kxk22 since H preserves the norm. Also, kxk22 = |1 |2 + kx2 k22 and

z
z

z
|z| .

* SEE ANSWER
is important. For the complex case we choose
= sign(1 ) =
Again, the choice

6.2.4

1
|1 |

A routine for computing the Householder vector

We will refer to the vector

1
u2

kxk2 e0 and introduce the notation


as the Householder vector that reflects x into

u2

, := Housev

1
x2

as the computation of the above mentioned vector u2 , and scalars and , from vector x. We will use the
notation H(x) for the transformation I 1 uuH where u and are computed by Housev(x).
The function
f u n c t i o n [ rho , . . .
u2 , t a u ] = Housev ( c h i 1 ,
x2 )

...

implements the function Housev. In can be found in


Programming/chapter06/Housev.m (see file only) (view in MATLAB)

Homework 6.6 Function Housev.m implements the steps in Figure 6.2 (left). Update this implementation
with the equivalent steps in Figure 6.2 (right), which is closer to how it is implemented in practice.
* SEE ANSWER

6.3. Householder QR Factorization

Algorithm:

119

u2

, = Housev

1
x2

2 := kx2 k2




1
(= kxk2 )

:=


2
2

= sign(1 )kxk2

:= sign(1 )

1 = 1 + sign(1 )kxk2

1 := 1

u2 = x2 /1

u2 := x2 /1
2 = 2 /|1 |(= ku2 k2 )
= (1 + 22 )/2

= (1 + uH
2 u2 )/2

Figure 6.2: Computing the Householder transformation. Left: simple formulation. Right: efficient computation. Note: I have not completely double-checked these formulas for the complex case. They
work for the real case.

6.3

Householder QR Factorization

Let A be an m n with m n. We will now show how to compute A QR, the QR factorization, as a
sequence of Householder transformations applied to A, which eventually zeroes out all elements of that
matrix below the diagonal. The process is illustrated in Figure 6.3.
In the first iteration, we partition

A
Let

11

u21

11 aT12

a21 A22

, 1 = Housev

11
a21

be the Householder transform computed from the first column of A. Then applying this Householder
transform to A yields

T
T
a
a

1
1

11 12 := I 1

11 12
1
a21 A22
u2
u2
a21 A22

11
aT12 wT12
,
=
T
0 A22 u21 w12
where wT12 = (aT12 + uH
21 A22 )/1 . Computation of a full QR factorization of A will now proceed with the
updated matrix A22 .

Chapter 6. Notes on Householder QR Factorization

11

Original matrix

u21

, 1 =

Housev

11
a21

120

11

aT12

a21

A22

:=

aT12 wT12

11
0

A22 u21 wT12

Move forward

Figure 6.3: Illustration of Householder QR factorization.

6.3. Householder QR Factorization

121

Now let us assume that after k iterations of the algorithm matrix A contains

R
r01 R02

00
RT L RT R

A
= 0 11 a12 ,

0
ABR
0 a21 A22
where RT L and R00 are k k upper triangular matrices. Let

11
11

, 1 = Housev
.
u21
a21
and update

A :=

0 I 1 1 1
1
u2
u2

H
0
r01
0
R

00
1


= I 1 1 0 11
1


u2
u2
0 a21

r
R02
R
00 01

T
T
= 0 11
a12 w12 ,

0
0 A22 u21 wT12
where again wT12 = (aT12 + uH
21 A22 )/1 .
Let

0k

0k

R00

r01

R02

11

aT12

a21 A22

R02
aT12
A22

Hk = I 1 1
1

u21
u21

be the Householder transform so computed during the (k + 1)st iteration. Then upon completion matrix A
contains

RT L
= Hn1 H1 H0 A
R=
0
where A denotes the original contents of A and RT L is an upper triangular matrix. Rearranging this we find
that
A = H0 H1 Hn1 R
which shows that if Q = H0 H1 Hn1 then A = QR.

Chapter 6. Notes on Householder QR Factorization

Homework 6.7 Show that

0 I

122

0
0

H
1

= I 1 1 .
1
1
1
1


1
u
u
u2
u2
2
2

* SEE ANSWER
Typically, the algorithm overwrites the original matrix A with the upper triangular matrix, and at each
step u21 is stored over the elements that become zero, thus overwriting a21 . (It is for this reason that the
first element of u was normalized to equal 1.) In this case Q is usually not explicitly formed as it can be
stored as the separate Householder vectors below the diagonal of the overwritten matrix. The algorithm
that overwrites A in this manner is given in Fig. 6.4.
We will let
[{U\R},t] = HQR(A)
denote the operation that computes the QR factorization of m n matrix A, with m n, via Householder
transformations. It returns the Householder vectors and matrix R in the first argument and the vector of
scalars i that are computed as part of the Householder transformations in t.
Homework 6.8 Given A Rmn show that the cost of the algorithm in Figure 6.4 is given by
2
CHQR (m, n) 2mn2 n3 flops.
3
* SEE ANSWER
Homework 6.9 Implement the algorithm in Figure 6.4 as
function [ Aout, tout ] = HQR unb var1( A, t )
Input is an mn matrix A and vector t of size n. Output is the overwritten matrix A and the vector of scalars
that define the Householder transformations. You may want to use Programming/chapter06/test HQR unb var1.m
to check your implementation.
* SEE ANSWER

6.4

Forming Q

Given A Cmn , let [A,t] = HQR(A) yield the matrix A with the Householder vectors stored below the
diagonal, R stored on and above the diagonal, and the i stored in vector t. We now discuss how to form
the first n columns of Q = H0 H1 Hn1 . The computation is illustrated in Figure 6.5.
Notice that to pick out the first n columns we must form

Inn
Inn
Inn
.
= H0 Hn1
= H0 Hk1 Hk Hn1
Q
0
0
0
{z
}
|
Bk
where Bk is defined as indicated.

6.4. Forming Q

123

Algorithm: [A,t] = HQR UNB VAR 1(A,t)

t
AT L AT R
and t T
Partition A
ABL ABR
tB
whereAT L is 0 0 and tT has 0 elements
while n(ABR ) 6= 0 do
Repartition

a01 A02
A
00

aT 11 aT
12
10
ABL ABR
A20 a21 A22
where11 and 1 are scalars

AT L

11

AT R

, 1 :=

a21

Update

aT12

11
u21

A22
via the steps

1
1

1
u21

1 uH
21

, 1 = Housev

:= I

0
tT

1
and

tB
t2

11

a21

T

a
12
A22

wT12 := (aT12 + aH
21 A22 )/1

T
T
T
a
a12 w12

12 :=
A22
A22 a21 wT12

Continue with

AT L

AT R

ABL

ABR

A
a
00 01

aT 11
10
A20 a21

A02
aT12
A22

0
tT

1
and

tB
t2

endwhile
Figure 6.4: Unblocked Householder transformation based QR factorization.

Chapter 6. Notes on Householder QR Factorization

aT12

11

Original matrix

a21 A22

124

:=
(uH
21 A22 )/1
A22 + u21 aT12

1 1/1
u21 /1

Move forward

1 0 0 0

1 0 0

1 0 0

0 1 0 0

0 1 0

0 1 0

0 0 1 0

0 0 1

0 0 1

0 0 0 1

0 0 0

0 0 0

0 0 0 0

0 0 0

0 0 0

1 0

1 0

0 1

0 1

0 0

0 0

0 0

0 0

0 0

0 0

Figure 6.5: Illustration of the computation of Q.


Lemma 6.10 Bk has the form

Bk = Hk Hn1

Inn
0

Ikk

B k

6.4. Forming Q

125

Proof: The proof of this is by induction on k:

Inn
, which has the desired form.
Base case: k = n. Then Bn =
0
Inductive step: Assume the result is true for Bk . We show it is true for Bk1 :

Inn
0
I
= Hk1 Bk = Hk1 kk
.
Bk1 = Hk1 Hk Hn1
0
0
B k

0
I
0 0
I
(k1)(k1)

(k1)(k1)



=
1
0
1 0

1
1 uH

0
I k
k
uk
0
0 B k

0
I
(k1)(k1)



=
1
0
1

1
1 uH

I

0
k
k
uk
0 B k

I
0
(k1)(k1)




=
1 0
1
where yTk = uH
k Bk /k
T

1/k yk
0 B k
uk

I(k1)(k1)
0

=
yTk
1 1/k

0
T

uk /k Bk uk yk

I(k1)(k1)
0
0

I(k1)(k1)
.
=
0
1 1/k
yTk
=

0
B k1
0
uk /k B k uk yTk
By the Principle of Mathematical Induction the result holds for B0 , . . . , Bn .
Theorem 6.11 Given [A,t] = HQR(A,t) from Figure 6.4, the algorithm in Figure 6.6 overwrites A with
the first n = n(A) columns of Q as defined by the Householder transformations stored below the diagonal
of A and in the vector t.
Proof: The algorithm is justified by the proof of Lemma 6.10.
Homework 6.12 Implement the algorithm in Figure 6.6 as
function Aout = FormQ unb var1( A, t )

Chapter 6. Notes on Householder QR Factorization

126

Algorithm: [A] = F ORM Q UNB VAR 1(A,t)

t
AT L AT R
and t T
Partition A
ABL ABR
tB
whereAT L is n(A) n(A) and tT has n(A) elements
while n(AT R ) 6= 0 do
Repartition

AT L

AT R

Update

11

a21 A22
via the steps

:= I

1
1

1
u21

t
0

tT

1
and

tB
t2

aT12

A
a
A02
00 01

aT 11 aT
12
10
ABL ABR
A20 a21 A22
where11 and 1 are scalars

1 uH
21


1 0

0 A22

11 := 1 1/1
aT12 := (aH
21 A22 )/1
A22 := A22 + a21 aT12
a21 := a21 /1

Continue with

AT L

AT R

ABL

ABR

A
00

aT
10
A20

a01 A02
11 aT12
a21 A22

0
tT

1
and

tB
t2

endwhile
Figure 6.6: Algorithm for overwriting A with Q from the Householder transformations stored as Householder vectors below the diagonal of A (as produced by [A,t] = HQR UNB VAR 1(A,t) ).

6.5. Applying QH

127

You may want to use Programming/chapter06/test HQR unb var1.m to check your implementation.
* SEE ANSWER
Homework 6.13 Given A Cmn the cost of the algorithm in Figure 6.6 is given by
2
CFormQ (m, n) 2mn2 n3 flops.
3
* SEE ANSWER
Homework 6.14 If m = n then Q could be accumulated by the sequence
Q = ( ((IH0 )H1 ) Hn1 ).
Give a high-level reason why this would be (much) more expensive than the algorithm in Figure 6.6.
* SEE ANSWER

6.5

Applying QH

In a future Note, we will see that the QR factorization is used to solve the linear least-squares problem. To
do so, we need to be able to compute y = QH y where QH = Hn1 H0 .
Let us start by computing H0 y:

1
1

I 1

1 = 1

1 /1
1
u2
u2
y2
y2
u2
u2
y2
{z
}
|
1

1
1
1
1
= 1
.
=
y2
u2
y2 1 u2
More generally, let us compute Hk y:

I 1
1

u2

u2

y0

y0


1 = 1 1


y2
y2 1 u2

where 1 = (1 + uH
2 y2 )/1 . This motivates the algorithm in Figure 6.7 for computing y := Hn1 H0 y
given the output matrix A and vector t from routine HQR.
The cost of this algorithm can be analyzed as follows: When yT is of length k, the bulk of the computation is in an inner product with vectors of length m k (to compute 1 ) and an axpy operation with
vectors of length m k to subsequently update 1 and y2 . Thus, the cost is approximately given by
n1

4(m k) 4mn 2n2.


k=0

Notice that this is much cheaper than forming Q and then multiplying.

Chapter 6. Notes on Householder QR Factorization

128

Algorithm: [y] = A PPLY Q T UNB VAR 1(A,t, y)

tT
yT
AT L AT R
, t
, and y

Partition A
ABL ABR
tB
yB
whereAT L is 0 0 and tT , yT has 0 elements
while n(ABR ) 6= 0 do
Repartition

A
a01 A02
00

aT 11 aT
12
10
ABL ABR
A20 a21 A22
where11 , 1 , and 1 are scalars

AT L

AT R

Update

:= I

y2
via the steps

1
1

1
u21

t
y

0
0
yT
tT

1
1 , and
,

tB
yB
t2
y2

1 uH
21


1

y2

1 := (1 + aH
21 y2 )/1

1
1
:= 1


y2
y2 1 u2

Continue with

AT L

AT R

ABL

ABR

A
a
00 01

aT 11
10
A20 a21

A02
aT12
A22

t
y

0
0
yT
tT

1
1 , and
,

tB
yB
t2
y2

endwhile
Figure 6.7: Algorithm for computing y := Hn1 H0 y given the output from the algorithm
HQR UNB VAR 1.

6.6. Blocked Householder QR Factorization

129

6.6

Blocked Householder QR Factorization

6.6.1

The UT transform: Accumulating Householder transformations

Through a series of exercises, we will show how the application of a sequence of k Householder transformations can be accumulated. What we exactly mean that this will become clear.
Homework 6.15 Consider u1 Cm with u1 6= 0 (the zero vector), U0 Cmk , and nonsingular T00 Ckk .
Define 1 = (uH
1 u1 )/2, so that
1
H1 = I u1 uH
1
1
equals a Householder transformation, and let
1 H
Q0 = I U0 T00
U0 .

Show that

1 H
Q0 H1 = (I U0 T00
U0 )(I

1
u1 uH
1 )=I
1

U0 u1

T00 t01
0

1


U0 u1

H

where t01 = QH
0 u1 .
* SEE ANSWER
Homework 6.16 Consider ui Cm with ui 6= 0 (the zero vector). Define i = (uH
i ui )/2, so that
Hi = I

1 H
ui u
i i

equals a Householder transformation, and let




U = u0 u1 uk1 .
Show that
H0 H1 Hk1 = I UT 1U H ,
where T is an upper triangular matrix.
* SEE ANSWER
The above exercises can be summarized in the algorithm for computing T from U in Figure 6.8.
Homework 6.17 Implement the algorithm in Figure 6.8 as
function T = FormT unb var1( U, t, T )
* SEE ANSWER
In [28] we call the transformation I UT 1U H that equals the accumulated Householder transformations the UT transform and prove that T can instead by computed as

T = triu UH U
(the upper triangular part of U H U) followed by either dividing the diagonal elements by two or setting
them to 0 , . . . , k1 (in order). In that paper, we point out similar published results [10, 35, 46, 32].

Chapter 6. Notes on Householder QR Factorization

130

Algorithm: [T ] := F ORM T UNB VAR 1(U,t, T )

UT L UT R
t
T
, t T , T TL
Partition U
UBL UBR
tB
TBL
whereUT L is 0 0, tT has 0 rows, TT L is 0 0

TT R
TBR

while m(UT L ) < m(U) do


Repartition

u01 U02
U
t

00
0
tT

uT 11 uT ,
1
10
12

UBL UBR
tB
U20 u21 U22
t2

t
T
T
00 01 02
TT L TT R

t T 11 t T

12
10
TBL TBR
T20 t21 T22
where11 is 1 1, 1 has 1 row, 11 is 1 1

UT L UT R

Hu
t01 := (uT10 )H +U20
21

11 := 1

Continue with

U
00

uT10

UBR
U
20

T
00
TT R
T

t10

TBR
T20

UT L UT R
UBL

TT L
TBL

u21
t01
11
t21

t
0

t
T

uT12 ,
1 ,

tB
U22
t2

u01 U02
11

T02

T
t12

T22

endwhile
Figure 6.8: Algorithm that computes T from U and the vector t so that I UT 1U H equals the UT
transform. Here U is assumed to the output of, for example, HouseQR unb var1. This means that it is a
lower trapezoidal matrix, with ones on the diagonal.

6.6. Blocked Householder QR Factorization

6.6.2

131

The WY transform

An alternative way of expressing a Householder transform is


I vvT ,
where = 2/vT v (= 1/, where is as discussed before). This leads to an alternative accumulation of
Householder transforms known as the compact WY transform [35] (which is itself a refinement of the WY
representation of producs of Householder Transformations [10]):
I USU H
where upper triangular matrix S relates to the matrix T in the UT transform via S = T 1 . Obviously, T can
be computed first and then inverted via the insights in the next exercise. Alternatively, inversion of matrix
T can be incorporated into the algorithm that computes T (which is what is done in the implementation in
LAPACK), yielding the algorithm in Figure 6.10.
An algorithm that computes the inverse of an upper triangular matrix T based on the above exercise is
given in Figure 6.9.
Homework 6.18 Assuming all inverses exist, show that
1

1
1
T00
t01 /1
T00
T00 t01
=
.

0 1
0
1/1
* SEE ANSWER

6.6.3

A blocked algorithm

A QR factorization that exploits the insights that resulted in the UT transform can now be described:
Partition

A11 A12
A21 A22

where A11 is b b.

We can use the unblocked algorithm in Figure 6.4 to factor the panel

A11
A21

,t1 ] := H OUSE QR

UNB VAR 1(

A11
A21

A11
A21

),

overwriting the entries below the diagonal with the Householder vectors
on the diagonal implicitly stored) and the upper triangular part with R11 .

U11
U21

(with the ones

Chapter 6. Notes on Householder QR Factorization

Algorithm: [T ] := UT RINV UNB

TT L TT R

Partition T
TBL TBR
whereTT L is 0 0

132

VAR 1(T )

while m(TT L ) < m(T ) do


Repartition

t
T
T
00 01 02

t T 11 t T
12
10
TBL TBR
T20 t21 T22
where11 is 1 1

TT L

TT R

t01 := T00t01 /11


11 := 1/11

Continue with

TT L

TT R

TBL

TBR

T00 t01

11
t10

T20 t21

T02

T
t12

T22

endwhile
Figure 6.9: Unblocked algorithm for inverting an upper triangular matrix. The algorithm assumes that T00
has already been inverted, and computes the next column of T in the current iteration.

6.6. Blocked Householder QR Factorization

133

Algorithm: [S] := F ORM S UNB VAR 1(U,t, S)

t
S
UT L UT R
, t T , S TL
Partition U
UBL UBR
tB
SBL
whereUT L is 0 0, tT has 0 rows, ST L is 0 0

ST R
SBR

while m(UT L ) < m(U) do


Repartition

U
u01 U02
t
00

0
tT

uT 11 uT ,
1

10
12

UBL UBR
tB
U20 u21 U22
t2

s
S02
S

00 01
ST L ST R

sT 11 sT

12
10
SBL SBR
S20 t21 S22
where11 is 1 1, 1 has 1 row, 11 is 1 1

UT L UT R

11 := 1/1
Hu )
s01 := 11U00 ((uT10 )H +U20
21

Continue with

U
00

uT10

UBR
U
20

S
00
ST R

sT10

SBR
S20

UT L UT R
UBL

ST L
SBL

u21
s01
11
s21

0
t
T

uT12 ,
1 ,

tB
U22
t2

S02

sT12

S22

u01 U02
11

endwhile
Figure 6.10: Algorithm that computes S from U and the vector t so that I USU H equals the compact
WY transform. Here U is assumed to the output of, for example, HouseQR unb var1. This means that it
is a lower trapezoidal matrix, with ones on the diagonal.

Chapter 6. Notes on Householder QR Factorization

134

Algorithm: [A,t] := H OUSE QR BLK VAR 1(A,t)

tT
AT L AT R
,t

Partition A
ABL ABR
tB
whereAT L is 0 0, tT has 0 rows
while m(AT L ) < m(A) do
Determine block size b
Repartition

A
A01 A02
00
AT L AT R

A10 A11 A12

ABL ABR
A20 A21 A22
whereA11 is b b, t1 has b rows

A11
A21

,t1 ] := HQR(

T11 := F ORM T(

A11
A21

A11
A21

0
t
T

,
t1

tB
t2

,t1 )

H
HA )
W12 := T11
(U H A +U21
22
11 12

A U11W12
A

12 := 12
A22
A22 U21W12

Continue with

AT L

AT R

ABL

ABR

A00 A01

A10 A11

A20 A21

A02
A12
A22

t0

tT

t1

tB
t2

endwhile
Figure 6.11: Blocked Householder transformation based QR factorization.
For T11 from the Householder vectors using the procedure descrived in Section 6.6.1:

T11 := F ORM T(

A11
A21

,t1 )

6.6. Blocked Householder QR Factorization

135

Now we need to also apply the Householder transformations to the rest of the columns:

A12
A22

:= I

A12
A22

U11

T 1

U21

11

U11
U21

A12 U11W12
A22 U21W12

U11
U21

H H

A12
A22

W12

where
H
H
H
W12 = T11
(U11
A12 +U21
A22 ).

This motivates the blocked algorithm in Figure 6.11.

6.6.4

Variations on a theme

Merging the unblocked Householder QR factorization and the formation of T

There are many possible algorithms for computing the QR factorization. For example, the unblocked
algorithm from Figure 6.4 can be merged with the unblocked algorithm for forming T in Figure 6.8 to
yield the algorithm in Figure 6.12.
An alternative unblocked merged algorithm

Let us now again compute the QR factorization of A simultaneous with the forming of T , but now taking
advantage of the fact that T is partically computed to change the algorithm into what some would consider
a left-looking algorithm.
Partition

a01 A02
A
00
T
A a10 11 aT12

A20 a21 A22

and

t
T
T
00 01 02

T
T 0 11 t12

0
0 T22

A
U
00
00
T
T
Assume that a10 has been factored and overwritten with u10 and R00 while also computing

A20
U20
T00 . In the next step, we need to apply previous Householder transformations to the next column of A and
then update that column with the next column of R and U. In addition, the next column of T must be
computing. This means:

Chapter 6. Notes on Householder QR Factorization

136

Algorithm: [A, T ] := H OUSE QR AND F ORM T UNB VAR 1(A, T )

T
AT L AT R
T
, T TL TR
Partition A
ABL ABR
TBL TBR
whereAT L is 0 0, TT L is 0 0
while m(AT L ) < m(A) do
Repartition

a01 A02
A
00

aT 11 aT
12
10
ABL ABR
A20 a21 A22
where11 is 1 1, 11 is 1 1

11

AT R

AT L

, 11 :=

a21

Update

aT12

11
u21

:= I

A22
via the steps

1
11

TT L
,

TBL

1
u21

TT R
TBR

, 11 = Housev

1 uH
21

T
00

tT
10
T20

t01

T02

11

T
t12

t21

T22

11

a
21


aT12

A22

wT12 := (aT12 + aH
21 A22 )/11

T
T
T
a12 w12
a

12 :=
A22
A22 a21 wT12
t01 := (aT10 )H + AH
20 a21

Continue with

AT L

AT R

ABL

ABR

A
a
A02
00 01

aT 11 aT
12
10
A20 a21 A22

TT L
,

TBL

TT R
TBR

T
t
00 01

t T 11
10
T20 t21

T02

T
t12

T22

endwhile
Figure 6.12: Unblocked Householder transformation based QR factorization merged with the computation
of T for the UT transform.

6.6. Blocked Householder QR Factorization

137

Algorithm: [A, T ] := H OUSE QR AND F ORM T UNB VAR 2(A, T )

AT L AT R
TT L TT R
,T

Partition A
ABL ABR
TBL TBR
whereAT L is 0 0, TT L is 0 0
while m(AT L ) < m(A) do
Repartition

A
a01 A02
00

aT 11 aT

12
10
ABL ABR
A20 a21 A22
where11 is 1 1, 11 is 1 1

AT L

AT R

a
U
01 00

T
11 := I u10


a02
U20
via the steps

U
00

1 T
T00 u10

U20

TT L
,

TBL

TT R
TBR

H H

a
01

11

a21

T
00

tT
10
T20

t01

T02

11

T
t12

t21

T22

H
H a + (uT )H +U H a )
w01 := T00
(U00
01
11 10
20 21

a
a
U
01 01 00


T
11 := 11 u10 w01

a02
a02
U20

11 , 11 := 11 , 11 = Housev 11
a21
u21
a21

t01 := (aT10 )H + AH
20 a21

Continue with

AT L

AT R

ABL

ABR

A
a
A02
00 01

aT 11 aT
12
10
A20 a21 A22

TT L
,

TBL

TT R
TBR

T
t
00 01

t T 11
10
T20 t21

T02
T
t12

T22

endwhile
Figure 6.13: Alternative unblocked Householder transformation based QR factorization merged with the
computation of T for the UT transform.

Chapter 6. Notes on Householder QR Factorization

138

Update

a
01

11

a21

U
00
T
:= I u10

U20

00
1 T
T00 u10

U20

H H

a
01

11

a21

Compute the next Householder transform:

11

, 11 ] := Housev( 11 )
[
a21
a21
Compute the rest of the next column of T
t01 := AH
20 a21
This yields the algorithm in Figure 6.13.
Alternative blocked algorithm (Variant 2)

An alternative blocked variant that uses either of the unblocked factorization routines that merges the
formation of T is given in Figure 6.14.

6.7

Enrichments

6.8

Wrapup

6.8.1

Additional exercises

Homework 6.19 In Section 4.4 we discuss how MGS yields a higher quality solution than does CGS in
terms of the orthogonality of the computed columns. The classic example, already mentioned in Chapter 4,
that illustrates this is

1 1 1

0 0

A=
,
0 0

0 0

where = mach . In this exercise, you will compare and contrast the quality of the computed matrix Q
for CGS, MGS, and Householder QR factorization.
Start with the matrix
format long
eps = 1.0e-8
A = [

% print out 16 digits


% roughly the square root of the machine epsilon

6.8. Wrapup

139

Algorithm: [A,t] := H OUSE QR BLK VAR 1(A,t)

t
AT L AT R
,t T
Partition A
ABL ABR
tB
whereAT L is 0 0, tT has 0 rows
while m(AT L ) < m(A) do
Determine block size b
Repartition

A
A01 A02
00
AT L AT R

A10 A11 A12

ABL ABR
A20 A21 A22
whereA11 is b b, t1 has b rows

A11
A21

tT

t1
,

tB
t2

, T11 ] := H OUSE QR F ORM T

t0

UNB VAR X(

A11
A21

H
HA )
W12 := T11
(U H A +U21
22

11 12

A
A U11W12
12 := 12

A22
A22 U21W12

Continue with

AT L

AT R

ABL

ABR

A01
A
00

A10 A11

A20 A21

A02
A12
A22

0
tT

t1
,

tB
t2

endwhile
Figure 6.14: Alternative blocked Householder transformation based QR factorization.

Chapter 6. Notes on Householder QR Factorization

140

1
1
1
eps 0
0
0 eps 0
0
0 eps
}
With the various routines you implemented for Chapter 4 and the current chapter compute
[ Q_CGS, R_CGS ] = CGS_unb_var1( A )
[ Q_MGS, R_MGS ] = MGS_unb_var1( A )
[ A_HQR, t_HQR ] = HQR_unb_var1( A )
Q_HQR = FormQ_unb_var1( A_HQR, t_HQR )
Finally, check whether the columns of the various computed matices Q are mutually orthonormal:
Q_CGS * Q_CGS
Q_MGS * Q_MGS
Q_HQR * Q_HQR
What do you notice? When we discuss numerical stability of algorithms we will gain insight into
why HQR produces high quality mutually orthogonal columns.
Check how well QR approximates A:
A - Q_CGS * triu( R_CGS )
A - Q_MGS * triu( R_MGS )
A - Q_HQR * triu( A_HQR( 1:3, 1:3 ) )
What you notice is that all approximate A well.
Later, we will see how the QR factorization can be used to solve Ax = b and linear least-squares
problems. At that time we will examine how accurate the solution is depending on which method for QR
factorization is used to compute Q and R.
* SEE ANSWER

A
Homework 6.20 Consider the matrix where A has linearly independent columns. Let
B
A = QA RA be the QR factorization of A.

RA
RA
= QB RB be the QR factorization of
.

B
B


A
A
= QR be the QR factorization of .
B
B

6.8. Wrapup

141

Algorithm: [R, B,t] := HQR U PDATE UNB VAR 1(R, B,t)



RT L RT R
t
, B BL BR , t T
Partition R
RBL RBR
tB
whereRT L is 0 0, BL has 0 columns, tT has 0 rows
while m(RT L ) < m(R) do
Repartition

RT L

RBL

RT R
RBR

R
00

rT
10
R20

r01 R02

T
,
r12

R22

11
r21

t
0

BL BR B0 b1 B2 ,

tB
t2
where11 is 1 1, b1 has 1 column, 1 has 1 row

tT

R00 r01

R02

Continue with

RT L

RT R

RBL

RBR

11
r10

R20 r21

T
r12
,

R22

BL

BR

B0 b1 B2

tT

t0

tB
t2

endwhile
Figure 6.15: Outline for algorithm that compute the QR factorization of a triangular matrix, R, appended
with a matrix B. Upon completion, the Householder vectors are stored in B and the scalars associated with
the Householder transformations in t.

Chapter 6. Notes on Householder QR Factorization

142

Assume that the diagonal entries of RA , RB , and R are all positive. Show that R = RB .
* SEE ANSWER

Homework 6.21 Consider the matrix

where R is an upper triangular matrix. Propose a mod-

B
ification of HQR UNB VAR 1 that overwrites R and B with Householder vectors and updated matrix R.
Importantly, the algorithm should take advantage of the zeros in R (in other words, it should avoid computing with the zeroes below its diagonal). An outline for the algorithm is given in Figure 6.15.
* SEE ANSWER

Homework 6.22 Implement the algorithm from Homework 6.22.


* SEE ANSWER

6.8.2

Summary

Chapter

Notes on Rank Revealing Householder QR


Factorization
7.1
7.1.1

Opening Remarks
Launch

It may be good at this point to skip forward to Section 12.4.1 and learn about permutation matrices.
Given A Cmn , with m n, the reduced QR factorization A = QR yields Q Cmn , the columns of
which form an orthonormal basis for the column space of A. If the matrix has rank r with r < n, things
get a bit more complicated. Let P be a permutation matrix so that the first r columns of APT are linearly
independent. Specifically, if we partition


APT AL AR ,
where AL has r columns, we can find an orthonormal basis for the column space of AL by computing its
QR factorization AL = QL RT L , after which


T
AP = QL RT L RT R ,
where RT R = QH
L AR .
Now, if A is merely almost of rank r (in other words, A = B + E where B has rank r and kEk2 is small),
then

 
 RT L RT R



QL RT L RT R ,
APT = AL AR = QL QR
0
RBR
where kRBR k2 is small. This is known as the rank revealing QR factorization (RRQR) and it yields a rank
r approximation to A given by


A QL RT L RT R .
The magnitudes of the diagonal elements of R are typically chosen to be in decreasing order and some
criteria is used to declare the matrix of rank r, at which point RBR can be set to zero. Computing the
RRQR is commonly called the QR factorization with column pivoting (QRP), for reason that will become
apparent.
143

Chapter 7. Notes on Rank Revealing Householder QR Factorization

7.1.2

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1
7.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.1.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.2

Modifying MGS to Compute QR Factorization with Column Pivoting . . . . . . . 146

7.3

Unblocked Householder QR Factorization with Column Pivoting . . . . . . . . . 147


7.3.1

Basic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.3.2

Alternative unblocked Householder QR factorization with column pivoting . . . . 147

7.4

Blocked HQRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.5

Computing Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.6

Enrichments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.6.1

QR factorization with randomization for column pivoting . . . . . . . . . . . . . 151


Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

7.7
7.7.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

7.7.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

144

7.1. Opening Remarks

7.1.3

What you will learn

145

Chapter 7. Notes on Rank Revealing Householder QR Factorization

7.2

146

Modifying MGS to Compute QR Factorization with Column Pivoting

We are going to give a modification of the MGS algorithm for computing QRP and then explain why it
works. Let us call this the MGS algorithm with column pivoting (MGSP). In the below discussion, p is a
vector of integers that will indicate how columns must be swapped in the course of the computation.
Partition

a1 A2

,Q

q1 Q2

,R

11

T
r12

R22

, p

1
p2



Determine the index 1 of the column of a1 A2 that is longest.

 

Permute a1 A2 := a1 A2 P(1 )T , swapping column a1 with the column that is longest.
Compute 11 := ka1 k2 .
q1 := a1 /11 .
T := qT A .
Compute r12
1 2
T .
Update A2 := A2 q1 r12

Continue the process with the updated matrix A2 .


The elements on the diagonal of R will be in non-increasing order (and positive) because updating A2 :=
T inherently does not increase the length of the columns of A . After all, the component in the
A2 q1 r12
2
direction of q1 is being subtracted from each column of A2 , leaving the component orthogonal to q1 .


ka0 k22
0


ka k2


1 2 T
1
Homework 7.1 Let A = a0 a1 an1 , v = . =
, q q = 1 (of same size
..
..


2
n1
kan1 k2




1

T
T
as the columns of A), and r = A q = . . Compute B := A qr with B = b0 b1 bn1 .
..

n1
Then

kb0 k22
0 20

kb k2 2
1 2
1

=
.
.
.
..
..

2
2
kbn1 k2
n1 n1
* SEE ANSWER

7.3. Unblocked Householder QR Factorization with Column Pivoting

147

Building on the last exercise, we make an important observation that greatly reduces the cost of determining the column that is longest. Let us start by computing v as the vector such that the ith entry in
v equals the square of the length of the ith column of A. In other words, the ith entry of v equals the dot
product of the i column of A with itself. In the above outline for the MGS with column pivoting, we can
then also partition

1
v2

T compares to v after that update. The


The question becomes how v2 before the update A2 := A2 q1 r12
2
T .
answer is that the ith entry of v2 must be updated by subtracting off the square of the ith entry of r12
Let us introduce the functions v = C OMPUTE W EIGHTS(A) and v = U PDATE W EIGHTSv, r to compute
the described weight vector v and to update a weight vector v by subtracting from its elements the squares
of the corresponding entries of r. Also, the function D ETERMINE P IVOT returns the index of the largest in
the vector, and swaps that entry with the first entry. A MGS algorithm with column pivoting, MGSP, is
then given in Figure 7.1. In that algorithm, A is overwritten with Q.

7.3
7.3.1

Unblocked Householder QR Factorization with Column Pivoting


Basic algorithm

The insights we gained from discussing the MGSP algorithm can be extended to QR factorization algorithms based on Householder transformations. The unblocked QR factorization discussed in Section 6.3
can be supplemented with column pivoting, yielding HQRP UNB VAR 1 in Figure 7.2.

7.3.2

Alternative unblocked Householder QR factorization with column pivoting

A moment of reflection tells us that there is no straight forward way for adding column pivoting to
HQR AND F ORM T UNB VAR 2. The reason is that the remaining columns of A must be at least partially updated in order to have enough information to determine how to permute columns in the next step.
Instead we introduce yet another variant of HQR that at least partiall overcomes this, and can also be used
in a blocked HQRP algorithm. Connoisseurs of the LU factorization will notice it has a bit of a flavor of
the Crout variant for computing that factorization (see Section 12.9.4).
The insights that underlie the blocked QR factorzation discussed in Section 6.6 can be summarized by
b = R,
(I UT T U T ) A
|
{z
}
QT
b represents the orginal contents of A, U is the matrix of Householder vectors, T is computed from
where A
U as described in Section 6.6.1, and R is upper triangular.
Now, partially into the computation we find ourselves in the state where

UT L
UBL

T H
TL

UT L
UBL

bT L
A

bT R
A

bBL
A

bBR
A

RT L

RT R

eBR
A

Chapter 7. Notes on Rank Revealing Householder QR Factorization

148

Algorithm: [A, R, p] := MGSP(A, R, p)


v := C OMPUTE W EIGHTS( A )



pT
vT
RT L RT R
, p
,v

Partition A AL AR , R
RBL RBR
pB
vB
whereAL has 0 columns, RT L is 0 0, pT has 0 rows, vT has 0 rows
while n(AL ) < n(A) do
Repartition

AL

AR

A0 a1 A2

RT L

RT R

RBL

RBR

R00

r10

R20

r01 R02
11

T
r12

r21 R22

p
v
0
0
v
T

1 ,
1

pB
vB
p2
v2
wherea1 has 1 column, 11 is 1 1, 1 has 1 row, 1 has 1 row

pT

1
v2

, 1 ] = D ETERMINE P IVOT(
a1 A2

A0

:=

A0 a1 A2

1
v2

P(1 )T

11 := ka1 k2
q1 := a1 /11
T := qT A
r12
1 2
T
A2 := A2 q1 r12

v2 := U PDATE W EIGHTS(v2 , r12 )

Continue with

endwhile
Figure 7.1: Modified Gram-Schmidt algorithm with column pivoting.

7.3. Unblocked Householder QR Factorization with Column Pivoting

Algorithm: [A,t, p] = HQRP

UNB VAR 1

149

(A)

v := ComputeWeights( A )

Partition A

AT L

AT R

ABL

ABR

tT

,t

, p

tB

pT
pB

,v

vT
vB

whereAT L is 0 0 and tT has 0 elements


while n(AT L ) < n(A) do
Repartition

AT L

ABL

AT R
ABR

A00

aT
10
A20

a01

A02

t0

p0

v0

t
p
v
T , T , T
aT12
,
1
1
1
tB
pB
vB
A22
t2
p2
v2

11
a21

where11 and 1 are scalars

1
v2

, 1 ] = D ETERMINE P IVOT(

1
v2

A02
A02
a
a
01
01

T
T
:=
11 a12 11 a12 P(1 )T

a21 A22
a21 A22

11 , 1 := 11 , 1 = Housev 11
a21
u21
a21
wT12 := (aT12 + aH
A )/
21 22 1

aT12 wT12
aT12

:=

T
A22
A22 a21 w12
v2 = U PDATE W EIGHT(v2 , a12 )

Continue with

endwhile
Figure 7.2: Simple unblocked rank revealing QR fractorization via Householder transformations (HQRP,
unblocked Variant 1).

Chapter 7. Notes on Rank Revealing Householder QR Factorization

150

The matrix A at that point contains

U\RT L

RT R

UBL

eBR
A

by which we mean that the Householder vectors computed so far are stored below the diagonal of RT L .
Let us manipulate this a bit:

bT L A
bT R
b
b
U
A
A
A
R
R
UT L
TR
TR

T H T L T L

= TL
.
TL
b
b
b
b
e
UBL
UBL
0
ABL ABR
ABL ABR
ABR
|
{z
}


WT L WT R
{z
}
|

U W
UT LWT R
TL TL

UBLWT L UBLWT R
eBR = A
bBR UBLWT R .
from which we find that A
Now, what if we do not bother to update ABR so that instead at this point in the computation A contains

U\RT L RT R
,

b
ABR
UBL
but we also have computed and stored WT R . How would we be able to compute another Householder
vector, row of R, and row of W ?
Repartition


bT R
bT L A
UT L 
RT L RT R
A

WT L WT R
=
b
b
e
ABL ABR
ABR
UBL
0
}
{z
}
{z
|
|
b
b02
ab01 A
U
r01 R02
A
R
00 

00
00


T
T
T
ab10
0
b 11 ab12 u10 W00 w01 W02
e 11 ae12 .

b20 ab21 A
b22
e22
A
U20
0 ae21 A
The important observation is that

T
T
T
T
e
b u10 w01 ab12 u10W02

ae

11 12 = 11

e22
b22 U20W02
ae21 A
ab21 U20 w01 A
What does this mean? To update the next rows of A and R according to the QR factorization via
Householder transformations
The contents of A must be updated by
11 := 11 aT10 w01 (= 11 uT10 w01 ).

7.4. Blocked HQRP

151

a21 := a21 A20 w01 (= a21 U20 w01 ).


aT12 := aT12 aT10W02 (= aT12 uT12W02 ).
This updates those parts of A consistent with prior computations;
The Householder vector can then be computed, updating 11 and a12 ;
e22 /). This com wT12 := aT12 aT21 A22 + (aT21 A21 )W02 )/(= aT12 uT21 (A22 +U21W02 )/ = aT12 uT21 A
putes the next row of W (or, more precisely, the part of that row that will be used in future iterations);
and
aT12 := aT12 wT12 . This computes the remainder of the next row of R, overwriting A.
The resulting algorithm, to which column pivoting is added, is presented in Figure 7.3. In that figure
HQRP UNB VAR 1 is also given for comparison. An extra parameter, r, is added to stop the process after
r columns of A have been completely computed. This will allow the algorithm to be used to compute only
the first r columns of Q (or rather the Householder vectors from which those columns can be computed)
and will also allow it to be incorporated into a blocked algorithm, as we will discuss next.

7.4

Blocked HQRP

Finally, a blocked algorithm that incorporates column pivoting and uses HQRP UNB VAR 3, is given in
Figure 7.4. The idea is that the unblocked algorithm is executed leaving the updating of A22 to happen
periodically, via a matrix-matrix multiplication. While this casts some of the computation in terms of
matrix-matrix multiplication, a careful analysis shows that a comparable amount of computation is in
lower performing operations like matrix-vector multiplication.

7.5

Computing Q

Given the Householder vectors computed as part of any of the HQRP algorithms, any number of desired
columns of matrix Q can be computed using the techniques described in Section 6.4.

7.6
7.6.1

Enrichments
QR factorization with randomization for column pivoting

The blocked algorithm for rank-revealing QR factorization has the fundamental problem that only about
half of the computation can be cast in terms of (high performing) matrix-matrix multiplication. The
following paper shows how randomization can be used to overcome this problem:
Per-Gunnar Martinsson, Gregorio Quintana-Orti, Nathan Heavner, Robert van de Geijn. *
Householder QR Factorization: Adding Randomization for Column Pivoting. FLAME Working Note #78, arXiv:1512.02671. Dec. 2015.

Chapter 7. Notes on Rank Revealing Householder QR Factorization

152

Algorithm: [A,t, p, v,W ] = HQRP UNB VAR 3 (A,t, p, v,W, r)

tT
pT
vT
W
AT L AT R
,t
, p
,v
, W T L
Partition A
ABL ABR
tB
pB
vB
WBL
where AT L is 0 0 and tT has 0 elements

WT R
WBR

while n(AT L ) < r do


Repartition

A00

a01

A02

tT
T
aT

10 11 a12 ,

ABL ABR
tB
A20 a21 A22

v0
W00 w01

WT R
v
W
T , T L
wT
1
10 11
vB
WBL WBR
v2
W20 w21
where 11 and 1 are scalars
AT L

AT R

Variant 1 (for reference):

1
1
, 1 ] = D ETERMINE P IVOT(
)
[
v2
v2

a01 A02
a01 A02

T
T
11 a12 := 11 a12 P(1 )
a21 A22
a21 A22

t0

p0

p
T ,
1
,
1
pB
t2
p2

W02

wT12

W22

Variant 3:

1
v2
a01


11
a21

, 1 ] = D ETERMINE P IVOT(
A02

a01

aT12
:= 11
A22
a21

A02

1
v2

T
aT12
P(1 )
A22

11 := 11 aT10 w01
a21 := a21 A20 w01

11

a21

, 1 := Housev

11
a21

aT := aT12 aT10W02

12

11
11
, 1 := Housev

a21
a21

wT := (aT12 + aH
A )/
21 22 1
12

T
aT12 wT12
a12
:=

A22
A22 a21 wT12

aT12 := aT12 wT12

v2 = U PDATE W EIGHT(v2 , a12 )

v2 = U PDATE W EIGHT(v2 , a12 )

H
wT12 := (aT12 + aH
21 A22 (a21 A20 )W02 )/1

Continue with

endwhile

Figure 7.3: HQRP unblocked Variant 3 that updates r rows and columns of A, but leaves the trailing matrix
ABR prestine. Here v has already been initialized before calling the routine so that it can be called from a
blocked algorithm. The HQRP unblocked Variant 1 from Figure 7.2 is also given for reference.

7.6. Enrichments

153

Algorithm: [A,t, p] := HQRP BLK(A,t, p, r)


v := C OMPUTE W EIGHTS(A)


AT L AT R
tT
pT
vT
,t
, p
,v
, W WL
Partition A
ABL ABR
tB
pB
vB
whereAT L is 0 0, tT has 0 rows, pT has 0 rows, vT has 0 rows, WL has 0 columns

WR

while n(AT L ) < r) do


Determine block size b
Repartition

A00 A01 A02


t0

A
t
AT R
T t ,
TL
A
10 A11 A12 ,
1
ABL ABR
tB
A20 A21 A22
t2

p0
v0
 



pT
vT

p ,
v , WL WR W0 W1 W2
1
1
pB
vB
p2
v2
whereA11 is b b, t1 has b rows, p1 has b rows, v1 has b rows, W1 has b columns

A11 A12
A21 A22
HQRP

,t1 , p1 ,

UNB

v1


W1 W2 ] :=

v
2



v
A11 A12
,t1 , p1 , 1 , W1 W2 , b)
VAR 3 (
A21 A22
v2

A22 := A22 U2W2

Continue with

AT L

AT R

A00

A01

A02

t0

t
A10 A11 A12 , T t1 ,

ABL ABR
tB
A20 A21 A22
t2

p0
v0
 

pT
vT

p1 ,
v1 , WL WR W0

pB
vB
p2
v2

W1

W2

endwhile

Figure 7.4: Blocked HQRP algorithm. Note: W starts as a b n(A) matrix. If this is not a uniform block
size used during computation, resizing may be necessary.

Chapter 7. Notes on Rank Revealing Householder QR Factorization

7.7

Wrapup

7.7.1

Additional exercises

7.7.2

Summary

154

Chapter

Notes on Solving Linear Least-Squares


Problems
For a motivation of the linear least-squares problem, read Week 10 (Sections 10.3-10.5) of
Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30].

Video
Read disclaimer regarding the videos in the preface!
No video... Camera ran out of memory...

8.0.1

Launch

155

Chapter 8. Notes on Solving Linear Least-Squares Problems

8.0.2

Outline
8.0.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.0.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.0.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

8.1

The Linear Least-Squares Problem . . . . . . . . . . . . . . . . . . . . . . . . . 158

8.2

Method of Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

8.3

Solving the LLS Problem Via the QR Factorization . . . . . . . . . . . . . . . . 159


8.3.1

Simple derivation of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . 159

8.3.2

Alternative derivation of the solution . . . . . . . . . . . . . . . . . . . . . . . . 160

8.4

Via Householder QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.5

Via the Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . 162


8.5.1

Simple derivation of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8.5.2

Alternative derivation of the solution . . . . . . . . . . . . . . . . . . . . . . . . 163

8.6

What If A Does Not Have Linearly Independent Columns? . . . . . . . . . . . . 163

8.7

Exercise: Using the the LQ factorization to solve underdetermined systems . . . . 170

8.8

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8.8.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

8.8.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

156

157

8.0.3

What you will learn

Chapter 8. Notes on Solving Linear Least-Squares Problems

8.1

158

The Linear Least-Squares Problem

Let A Cmn and y Cm . Then the linear least-square problem (LLS) is given by
Find x s.t. kAx yk2 = minn kAz yk2 .
zC

In other words, x is the vector that minimizes the expression kAx yk2 . Equivalently, we can solve
Find x s.t. kAx yk22 = minn kAz yk22 .
zC

If x solves the linear least-squares problem, then Ax is the vector in C (A) (the column space of A) closest
to the vector y.

8.2

Method of Normal Equations

Let A Rmn have linearly independent columns (which implies m n). Let f : Rn Rm be defined by
f (x) = kAx yk22 = (Ax y)T (Ax y) = xT AT Ax xT AT y yT Ax + yT y
= xT AT Ax 2xT AT y + yT y.
This function is minimized when the gradient is zero, 5 f (x) = 0. Now,
5 f (x) = 2AT Ax 2AT y.
If A has linearly independent columns then AT A is nonsingular. Hence, the x that minimizes kAx yk2
solves AT Ax = AT y. This is known as the method of normal equations. Notice that then
x = (AT A)1 AT y,
| {z }
A
where A is known as the pseudo inverse or Moore-Penrose pseudo inverse.
In practice, one performs the following steps:
Form B = AT A, a symmetric positive-definite (SPD) matrix.
Cost: approximately mn2 floating point operations (flops), if one takes advantage of symmetry.
Compute the Cholesky factor L, a lower triangular matrix, so that B = LLT .
This factorization, discussed in Week 8 (Section 8.4.2) of Linear Algebra: Foundations to Frontiers
- Notes to LAFF With and to be revisited later in this course, exists since B is SPD.
Cost: approximately 31 n3 flops.
Compute y = AT y.
Cost: 2mn flops.
Solve Lz = y and LT x = z.
Cost: n2 flops each.

8.3. Solving the LLS Problem Via the QR Factorization

159

Thus, the total cost of solving the LLS problem via normal equations is approximately mn2 + 13 n3 flops.

Remark 8.1 We will later discuss that if A is not well-conditioned (its columns are nearly linearly dependent), the Method of Normal Equations is numerically unstable because AT A is ill-conditioned.

The above discussion can be generalized to the case where A Cmn . In that case, x must solve AH Ax =
AH y.
A geometric explanation of the method of normal equations (for the case where A is real valued) can
be found in Week 10 (Sections 10.3-10.5) of Linear Algebra: Foundations to Frontiers - Notes to LAFF
With.

8.3

Solving the LLS Problem Via the QR Factorization

Assume A Cmn has linearly independent columns and let A = QL RT L be its QR factorization. We wish
to compute the solution to the LLS problem: Find x Cn such that
kAx yk22 = minn kAz yk22 .
zC

8.3.1

Simple derivation of the solution

Notice that we know that, if A has linearly independent columns, the solution is given by x = (AH A)1 AH y
(the solution to the normal equations). Now,
x = [AH A]1 AH y

1
= (QL RT L )H (QL RT L )
(QL RT L )H y

1 H H
H
= RH
RT L QL y
T L QL QL RT L
 H
1 H H
= RT L RT L
RT L QL y

Solution to the Normal Equations


A = QL RT L
(BC)H = (CH BH )
QH
L QL = I

H H
H
= R1
T L RT L RT L QL y

(BC)1 = C1 B1

H
= R1
T L QL y

H
RH
T L RT L = I

Thus, the x that solves RT L x = QH


L y solves the LLS problem.

Chapter 8. Notes on Solving Linear Least-Squares Problems

8.3.2

160

Alternative derivation of the solution

We know that then there exists a matrix QR such that Q =

QL

QR

is unitary. Now,

minzCn kAz yk22


= minzCn kQL RT L z yk22

(substitute A = QL RT L )

= minzCn kQH (QL RT L z y)k22


2


H
QH

Q
L
L

QL RT L z
y
= minzCn


QH

QH
R
R
2

2



H
RT L z
Q y

L
= minzCn




0
QH
Ry
2

2


H
RT L z QL y


= minzCn


H


QR y
2



2
+ kQH yk2
y
= minzCn RT L z QH
L
R 2
2

(two-norm is preserved since QH is unitary)



2 
+ kQH yk2
y
minzCn RT L z QH
L
R 2
2

2
= kQH
R yk2

(partitioning, distributing)

(partitioned matrix-matrix multiplication)

(partitioned matrix addition)


(property of the 2-norm:
2


x
= kxk2 + kyk2 )
2
2


y
2

(QH
R y is independent of z)
(minimized by x that satisfies RT L x = QH
L y)

Thus, the desired x that minimizes the linear least-squares problem solves RT L x = QH
L y. The solution is
unique because RT L is nonsingular (because A has linearly independent columns).
In practice, one performs the following steps:
Compute the QR factorization A = QL RT L .
If Gram-Schmidt or Modified Gram-Schmidt are used, this costs 2mn2 flops.
Form y = QH
L y.
Cost: 2mn flops.
Solve RT L x = y (triangular solve).
Cost: n2 flops.
Thus, the total cost of solving the LLS problem via (Modified) Gram-Schmidt QR factorization is approximately 2mn2 flops.
Notice that the solution computed by the Method of Normal Equations (generalized to the complex
case) is given by
H
1 H
H
(AH A)1 AH y = ((QL RT L )H (QL RT L ))1 (QL RT L )H y = (RH
T L QL QL RT L ) RT L QL y
1 H H
1 H
1
1 H
H
H
= (RH
T L RT L ) RT L QL y = RT L RT L RT L QL y = RT L QL y = RT L y = x

where RT L x = y.
This shows that the two approaches compute the same solution, generalizes the Method of
Normal Equations to complex valued problems, and shows that the Method of Normal Equations computes
the desired result without requiring multivariate calculus.

8.4. Via Householder QR Factorization

8.4

161

Via Householder QR Factorization

Given A Cmn with linearly independent columns, the Householder QR factorization yields n Householder transformations, H0 , . . . , Hn1 , so that

RT L
.
Hn1 H0 A =
| {z } 
0
H
Q = QL QR
We wish to solve RT L x = QH
L y . But
|{z}
y

y = QH
y
=
I
L
=

0
0

QH
L

y =

(Hn1 H0 )y. =

QH
R




QL

QR

H

y=

QH y

( Hn1 H0 y ) = wT .
| {z
}

wT

w=
wB

This suggests the following approach:

Compute H0 , . . . , Hn1 so that Hn1 H0 A =

RT L

, storing the Householder vectors that define


0
over the elements in A that they zero out (see Notes on Householder QR Factoriza-

H0 , . . . , Hn1
tion).
Cost: 2mn2 32 n3 flops.


Form w= (Hn1 ( (H0 y) )) (see Notes on Householder QR Factorization). Partition w =
w
T where wT Cn . Then y = wT .
wB
Cost: 4m2 2n2 flops. (See Notes on Householder QR Factorization regarding this.)
Solve RT L x = y.

Cost: n2 flops.
Thus, the total cost of solving the LLS problem via Householder QR factorization is approximately 2mn2
2 3
3 n flops. This is cheaper than using (Modified) Gram-Schmidt QR factorization, and hence preferred
(because it is also numerically more stable, as we will discuss later in the course).

Chapter 8. Notes on Solving Linear Least-Squares Problems

8.5

162

Via the Singular Value Decomposition

Given A Cmn with linearly independent columns, let A = UV H be its SVD decomposition. Partition

U=

UL UR

and

T L
0

where UL Cmn and T L Rnn so that

A=

UL UR

T L

V H = UL T LV H .

We wish to compute the solution to the LLS problem: Find x Cn such that

kAx yk22 = minn kAz yk22 .


zC

8.5.1

Simple derivation of the solution

Notice that we know that, if A has linearly independent columns, the solution is given by x = (AH A)1 AH y
(the solution to the normal equations). Now,

x = [AH A]1 AH y

1
= (UL T LV H )H (UL T LV H )
(UL T LV H )H y

1
= (V T LULH )(UL T LV H )
(V T LULH )y

1
= V T L T LV H
V T LULH y

Solution to the Normal Equations


A = UL T LV H
(BCD)H = (DH CH BH ) and H
T L = T L
ULH UL = I

1 H
H
= V 1
T L T LV V T LUL y

V 1 = V H and (BCD)1 = D1C1 B1

H
= V 1
T LUL y

V H V = I and 1
T L T L = I

8.6. What If A Does Not Have Linearly Independent Columns?

8.5.2

163

Alternative derivation of the solution

We now discuss a derivation of the result that does not depend on the Normal Equations, in preparation
for the more general case discussed in the next section.
minzCn kAz yk22
= minzCn kUV H z yk22

(substitute A = UV H )

= minzCn kU(V H z U H y)k22

(substitute UU H = I and factor out U)

= minzCn kV H z U H yk22

(multiplication by a unitary matrix

preserves two-norm)

2



T L
ULH y
H

(partition, partitioned matrix-matrix multiplication)
V z
= minzCn

H

0
UR y
2

2


H
H
T LV z UL y


= minzCn
(partitioned matrix-matrix multiplication and addition)


H


UR y
2


2




2
2
v

T = kvT k2 + kvB k2
= minzCn T LV H z ULH y 2 + URH y 2

2
2


vB
2

H
The x that solves T LV H x = ULH y minimizes the expression. That x is given by x = V 1
T LUL y.
This suggests the following approach:

Compute the reduced SVD: A = UL T LV H .


Cost: Greater than computing the QR factorization! We will discuss this in a future note.
H
Form y = 1
T LUL y.
Cost: approx. 2mn flops.

Compute z = V y.

Cost: approx. 2mn flops.

8.6

What If A Does Not Have Linearly Independent Columns?

In the above discussions we assume that A has linearly independent columns. Things get a bit trickier if A
does not have linearly independent columns. There is a variant of the QR factorization known as the QR
factorization with column pivoting that can be used to find the solution. We instead focus on using the
SVD.
Given A Cmn with rank(A) = r < n, let A = UV H be its SVD decomposition. Partition

U=

UL UR

V=

VL VR

and =

T L

Chapter 8. Notes on Solving Linear Least-Squares Problems

164

where UL Cmr , VL Cnr and T L Rrr so that

 T L 0 
H

VL VR
= UL T LVLH .
A = UL UR
0 0
Now,
minzCn kAz yk22
= minzCn kUV H z yk22

(substitute A = UV H )

= minzCn kUV H z UU H yk22

(UU H = I)

= min

kUV H V w UU H yk2
2

(choosing the max over w Cn with z = V w

w Cn

is the same as choosing the max over z Cn .)

z = V w

= min

kU(w U H y)k2
2

(factor out U and V H V = I)

w Cn

z = V w

= min

kw U H yk2
2

(kUvk2 = kvk2 )

w Cn

z = V w

= min

w Cn

T L
0

0
0

wT
wB

ULH
URH

yk2 (partition , w, and U)


2

z = V w


2


T L wT ULH y

= min

w Cn
H


UR y
2

z = V w

= min

Cn

(partitioned matrix-matrix multiplication)

T L wT

2
2
ULH y 2 + URH y 2


2



v

T = kvT k2 + kvB k2

2
2


vB
2

z = V w

1 H
Since T L is a diagonal with no zeroes on its diagonal, we know that 1
T L exists. Choosing wT = T LUL y
means that


H 2

min T L wT UL y 2 = 0,
w Cn
z = V w

which obviously minimizes the entire expression. We conclude that


 1U H y
H
x = V w = VL VR T L L = VL 1
T LUL y +VR wB
wB
characterizes all solutions to the linear least-squares problem, where wB can be chosen to be any vector of
H
size n r. By choosing wB = 0 and hence x = VL 1
T LUL y we choose the vector x that itself has minimal
2-norm.

8.6. What If A Does Not Have Linearly Independent Columns?

165

The sequence of pictures on the following pages reasons through the insights that we gained so far
(in Notes on the Singular Value Decomposition and this note). These pictures can be downloaded as a
PowerPoint presentation from
http://www.cs.utexas.edu/users/flame/Notes/Spaces.pptx

Chapter 8. Notes on Solving Linear Least-Squares Problems

166

A
C(VL )

C(U L )

Row space

Column
space

dim = r

dim = r

Ax = y
0

x = xr + x n

dim = n r

C(VR )

dim = m r

Le1 null
space

Null space

C(U R )

If A Cmn and

A=

UL UR

T L

0
0

VL VR

H

= UL T LVLH

equals the SVD, where UL Cmr , VL Cnr , and T L Crr , then


The row space of A equals C (VL ), the column space of VL ;
The null space of A equals C (VR ), the column space of VR ;
The column space of A equals C (UL ); and
The left null space of A equals C (UR ).
Also, given a vector x Cn , the matrix A maps x to yb = Ax, which must be in C (A) = C (UL ).

8.6. What If A Does Not Have Linearly Independent Columns?

167

A
C(VL )

C(U L )

Row space

Column
space

dim = r

dim = r
Ax = y??
0

dim = n r

C(VR )

Null space

y
dim = m r

Le1 null
space

C(U R )

Given an arbitrary y Cm not in C (A) = C (UL ), there cannot exist a vector x Cn such
that Ax = y.

Chapter 8. Notes on Solving Linear Least-Squares Problems

168

A
C(VL )

C(U L )

Row space
H

xr = VL TLU L y
dim = r

Axr = U L TLVLH xr = y

Column
space
y = U LU LH y

dim = r
Ax = y
0

dim = n r

x = xr + x n
Axn = 0

xn = VR wB

C(VR )

Null space

y
dim = m r

Le1 null
space

C(U R )

The solution to the Linear Least-Squares problem, x, equals any vector that is mapped by
A to the projection of y onto the column space of A: yb = A(AH A)1 AH y = QQH y. This
solution can be written as the sum of a (unique) vector in the row space of A and any vector
in the null space of A. The vector in the row space of A is given by
1 H
H
b.
xr = VL 1
T LUL y == VL T LUL y

8.6. What If A Does Not Have Linearly Independent Columns?

169

The sequence of pictures, and their explanations, suggest a much simply path towards the formula for
solving the LLS problem.
We know that we are looking for the solution x to the equation
Ax = ULULH y.
We know that there must be a solution xr in the row space of A. It suffices to find wT such that
xr = VL wT .
Hence we search for wT that satisfies
AVL wT = ULULH y.
Since there is a one-to-one mapping by A from the row space of A to the column space of A, we
know that wT is unique. Thus, if we find a solution to the above, we have found the solution.
Multiplying both sides of the equation by ULH yields
ULH AVL wT = ULH y.
H
Since A = UL 1
T LVL , we can rewrite the above equation as

T L wT = ULH y
H
so that wT = 1
T LUL y.

Hence
H
xr = VL 1
T LUL y.

Adding any vector in the null space of A to xr also yields a solution. Hence all solutions to the LLS
problem can be characterized by
H
x = VL 1
T LUL y +VR wR .
Here is yet another important way of looking at the problem:
We start by considering the LLS problem: Find x Cn such that
kAx yk22 = maxn kAz yk22 .
zC

We changed this into the problem of finding wL that satisfied


T L wL = vT
where x = VL wL and y = ULULH y = UL vT .
Thus, by expressing x in the right basis (the columns of VL ) and the projection of y in the right
basis (the columns of UL ), the problem became trivial, since the matrix that related the solution
to the right-hand side became diagonal.

Chapter 8. Notes on Solving Linear Least-Squares Problems

8.7

170

Exercise: Using the the LQ factorization to solve underdetermined systems

We next discuss another special case of the LLS problem: Let A Cmn where m < n and A has linearly
independent rows. A series of exercises will lead you to a practical algorithm for solving the problem of
describing all solutions to the LLS problem
kAx yk2 = min kAz yk2 .
z

You may want to review Notes on the QR Factorization as you do this exercise.
Homework 8.2 Let A Cmn with m < n have linearly independent rows. Show that there exist a lower
triangular matrix LL Cmm and a matrix QT Cmn with orthonormal
rows such that A = LL QT ,

noting that LL does not have any zeroes on the diagonal. Letting L = LL 0 be Cmn and unitary

QT
, reason that A = LQ.
Q=
QB
Dont overthink the problem: use results you have seen before.
* SEE ANSWER
Homework 8.3 Let A Cmn with m < n have linearly independent rows. Consider
kAx yk2 = min kAz yk2 .
z

Use the fact that A = LL QT , where LL Cmm is lower triangular and QT has orthonormal rows, to argue
1
H
nm ) is a solution to the LLS
that any vector of the
QH
T LL y + QB wB (where wB is any vector in C
form
problem. Here Q =

QT

QB

.
* SEE ANSWER

Homework 8.4 Continuing Exercise 8.2, use Figure 8.1 to give a Classical Gram-Schmidt inspired algorithm for computing LL and QT . (The best way to check you got the algorithm right is to implement
it!)
* SEE ANSWER
Homework 8.5 Continuing Exercise 8.2, use Figure 8.2 to give a Householder QR factorization inspired
algorithm for computing L and Q, leaving L in the lower triangular part of A and Q stored as Householder
vectors above the diagonal of A. (The best way to check you got the algorithm right is to implement it!)
* SEE ANSWER

8.8

Wrapup

8.8.1

Additional exercises

8.8.2

Summary

8.8. Wrapup

171

Algorithm: [L, Q] := LQ CGS UNB(A, L, Q)

LT L LT R
QT
AT
,L
,Q

Partition A
AB
LBL LBR
QB
whereAT has 0 rows, LT L is 0 0, QT has 0 rows
while m(AT ) < m(A) do
Repartition

AT

A0

L00

LT L LT R

aT1 ,
l10

AB
LBL LBR
A2
L20
wherea1 has 1 row, 11 is 1 1, q1 has 1 row

l01

L02

11

T
l12

l21

L22

Q0

QT T
q1
,

QB
Q2

Continue with

A
0
L

aT
TL
,

1
AB
LBL
A2

AT

LT R
LBR

L
l
00 01

l T 11
10
L20 l21

L02
T
l12

L22

0
QT T
q1
,

QB
Q2

endwhile
Figure 8.1: Algorithm skeleton for CGS-like LQ factorization.

Chapter 8. Notes on Solving Linear Least-Squares Problems

172

Algorithm: [A,t] := HLQ UNB(A,t)

AT L AT R
t
,t T
Partition A
ABL ABR
tB
whereAT L is 0 0, tT has 0 rows
while m(AT L ) < m(A) do
Repartition

A
a01 A02
00

aT 11 aT
12
10
ABL ABR
A20 a21 A22
where11 is 1 1, 1 has 1 row

AT L

AT R

t
0

tT

1
,

tB
t2

Continue with

AT L

AT R

ABL

ABR

a
A
00 01

aT 11
10
A20 a21

A02
aT12
A22

0
tT

1
,

tB
t2

endwhile
Figure 8.2: Algorithm skeleton for Householder QR factorization inspired LQ factorization.

Chapter

Notes on the Condition of a Problem


9.1
9.1.1

Opening Remarks
Launch

Correctness in the presence of error (e.g., when floating point computations are performed) takes on a
different meaning. For many problems for which computers are used, there is one correct answer, and
we expect that answer to be computed by our program. The problem is that, as we will see later, most
real numbers cannot be stored exactly in a computer memory. They are stored as approximations, floating
point numbers, instead. Hence storing them and/or computing with them inherently incurs error.
Naively, we would like to be able to define a program that computes with floating point numbers as
being correct if it computes an answer that is close to the exact answer. Unfortunately, some problems
that are computed this way have the property that a small change in the input yields a large change in the
output. Surely we cant blame the program for not computing an answer close to the exact answer in this
case. The mere act of storing the input data as a floating point number may cause a completely different
output, even if all computation is exact. We will later define stability to be a property of a program. It
is what takes the place of correctness. In this note, we instead will focus on when a problem is a good
problem, meaning that in exact arithmetic a small change in the input will always cause at most a small
change in the output, or a bad problem if a small change may yield a large A good problems will be
called well-conditioned. A bad problem will be called ill-conditioned.
Notice that small and large are vague. To some degree, norms help us measure size. To some
degree, small and large will be in the eyes of the beholder (in other words, situation dependent).

Video
Read disclaimer regarding the videos in the preface!
Video did not turn out...

173

Chapter 9. Notes on the Condition of a Problem

9.1.2

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

9.1
9.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

9.1.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

9.2

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

9.3

The Prototypical Example: Solving a Linear System . . . . . . . . . . . . . . . . 176

9.4

Condition Number of a Rectangular Matrix . . . . . . . . . . . . . . . . . . . . . 180

9.5

Why Using the Method of Normal Equations Could be Bad . . . . . . . . . . . . 181

9.6

Why Multiplication with Unitary Matrices is a Good Thing . . . . . . . . . . . . 182

9.7

Balancing a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

9.8

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.8.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184


Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

9.9
9.9.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

9.9.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

174

9.1. Opening Remarks

9.1.3

What you will learn

175

Chapter 9. Notes on the Condition of a Problem

9.2

176

Notation

Throughout this note, we will talk about small changes (perturbations) to scalars, vectors, and matrices.
To denote these, we attach a delta to the symbol for a scalar, vector, or matrix.
A small change to scalar C will be denoted by C;
A small change to vector x Cn will be denoted by x Cn ; and
A small change to matrix A Cmn will be denoted by A Cmn .
Notice that the delta touches the , x, and A, so that, for example, x is not mistaken for x.

9.3

The Prototypical Example: Solving a Linear System

Assume that A Rnn is nonsingular and x, y Rn with Ax = y. The problem here is the function that
computes x from y and A. Let us assume that no error is introduced in the matrix A when it is stored,
but that in the process of storing y a small error is introduced: y Rn so that now y + y is stored. The
question becomes by how much the solution x changes as a function of y. In particular, we would like
to quantify how a relative change in the right-hand side y (kyk/kyk in some norm) translates to a relative
change in the solution x (kxk/kxk). It turns out that we will need to compute norms of matrices, using
the norm induced by the vector norm that we use.
Since Ax = y, if we use a consistent (induced) matrix norm,
kyk = kAxk kAkkxk or, equivalently,

1
1
kAk
.
kxk
kyk

(9.1)

Also,

A(x + x) = y + y
implies that Ax = y so that x = A1 y.

Ax
= y
Hence
kxk = kA1 yk kA1 kkyk.

(9.2)

Combining (9.1) and (9.2) we conclude that


kxk
kyk
kAkkA1 k
.
kxk
kyk
It means that the relative error in the solution is at worst the relative error in
the right-hand side, amplified by kAkkA1 k. So, if that quantity is small and the relative error in the
right-hand size is small and exact arithmetic is used, then one is guaranteed a solution with a relatively
small error.
The quantity kk (A) = kAkkA1 k is called the condition number of nonsingular matrix A (associated
with norm k k).
What does this mean?

9.3. The Prototypical Example: Solving a Linear System

177

The answer to this is


no. For every nonsingular matrix A, there exists a right-hand side y and perturbation y such that, if
A(x + x) = y + y,
kxk
kyk
= kAkkA1 k
.
kxk
kyk
Are we overestimating by how much the relative error can be amplified?

In order for this equality to hold, we need to find y and y such that
kyk = kAxk = kAkkxk or, equivalently, kAk =

kAxk
kxk

and
kxk = kA1 yk = kA1 kkyk. or, equivalently, kA1 k =

kA1 yk
.
kyk

In other words, x can be chosen as a vector that maximizes kAxk/kxk and y should maximize kA1 yk/kyk.
The vector y is then chosen as y = Ax.
For this norm, 2 (A) = kAk2 kA1 k2 = 0 /n1 . So, the ratio between the
largest and smallest singular value determines whether a matrix is well-conditioned or ill-conditioned.
To show for what vectors the maximal magnification is attained, consider the SVD
What if we use the 2-norm?

A = UV T =

u0 u1 un1


H

.
v0 v1 vn1

1
..

.
n1

Recall that
kAk2 = 0 , v0 is the vector that maximizes maxkzk2 =1 kAzk2 , and Av0 = 0 u0 ;
kA1 k2 = 1/n1 , un1 is the vector that maximizes maxkzk2 =1 kA1 zk2 , and Avn1 = n1 un1 .
Now, take y = 0 u0 . Then Ax = y is solved by x = v0 . Take y = 1 u1 . Then Ax = y is solved by
x = v1 . Now,
kyk2 ||1
kxk2
=
and
= ||.
kyk2
0
kxk2
Hence

kxk2
0 kyk2
=
kxk2
n1 kyk2

This is depicted in Figure 9.1 for n = 2.


The SVD can be used to show that A maps the unit ball to an ellipsoid. The singular values are
the lengths of the various axes of the ellipsoid. The condition number thus captures the eccentricity of
the ellipsoid: the ratio between the lengths of the largest and smallest axes. This is also illustrated in
Figure 9.1.

Chapter 9. Notes on the Condition of a Problem

R2 : Domain of A.

178

R2 : Codomain of A.

0 kyk2
2
Figure 9.1: Illustration for choices of vectors y and y that result in kxk
kxk2 = n1 kyk2 . Because of the
eccentricity of the ellipse, the relatively small change y relative to y is amplified into a relatively large
change x relative to x. On the right, we see that kyk2 /kyk2 = 1 /0 (since ku0 k2 = ku1 k2 = 1). On the
left, we see that kxk2 /kxk = (since kv0 k2 = kv1 k2 = 1).

Notice that for scalars and , log10 (


) = log10 () log10 () roughly
equals the number leading decimal digits of + that are accurate, relative to . For example, if
= 32.512 and = 0.02, then + = 32.532 which has three accurate digits (highlighted in read).
Now, log10 (32.512) log10 (0.02) 1.5 (1.7) = 3.2.
Now, if
kyk
kxk
= (A)
.
kxk
kyk

Number of accurate digits

then
log10 (kxk) log10 (kxk) = log10 ((A)) + log10 (kyk) log10 (kyk)
so that
log10 (kxk) log10 (kxk) = [log10 (kyk) log10 (kyk)] log10 ((A)).
In other words, if there were k digits of accuracy in the right-hand side, then it is possible that (due only
to the condition number of A) there are only k log10 ((A)) digits of accuracy in the solution. If we start
with only 8 digits of accuracy and (A) = 105 , we may only get 3 digits of accuracy. If (A) 108 , we
may not get any digits of accuracy...
Homework 9.1 Show that, if A is a nonsingular matrix, for a consistent matrix norm, (A) 1.
* SEE ANSWER
We conclude from this that we can generally only expect as much relative accuracy in the solution as
we had in the right-hand side.

9.3. The Prototypical Example: Solving a Linear System

179

Note: the below links conditioning of matrices to the relative condition number
of a more general function. For a more thorough treatment, you may want to read Lecture 12 of Trefethen
and Bau. That book discusses the subject in much more generality than is needed for our discussion of
linear algebra. Thus, if this alternative exposition baffles you, just skip it!
Let f : Rn Rm be a continuous function such that f (y) = x. Let k k be a vector norm. Consider for
y 6= 0

 

k f (y + y) f (y)k
kyk
f
(y) = lim
sup
/
.
k f (y)k
kyk
0
kyk
Alternative exposition

Letting f (y + y) = x + x, we find that




(y) = lim
0

sup
kyk

 

kyk
kx + xk
/
.
kxk
kyk

(Obviously, if y = 0 or y = 0 or f (y) = 0 , things get a bit hairy, so lets not allow that.)
Roughly speaking, f (y) equals the maximum that a(n infitessimally) small relative error in y is magnified into a relative error in f (y). This can be considered the relative condition number of function f . A
large relative condition number means a small relative error in the input (y) can be magnified into a large
relative error in the output (x = f (y)). This is bad, since small errors will invariable occur.
Now, if f (y) = x is the function that returns x where Ax = y for a nonsingular matrix A Cnn , then
via an argument similar to what we did earlier in this section we find that f (y) (A) = kAkkA1 k, the
condition number of matrix A:

lim
0

sup
kyk

= lim
0

= lim
0

 

k f (y + y) f (y)k
kyk
/
k f (y)k
kyk


 

kyk
kA1 (y + y) A1 (y)k
/
kA1 yk
kyk

 

kA1 (y + y) A1 (y)k
kyk
/
kA1 yk
kyk

  1 
kA1 yk
kA yk
/
kyk
kyk

kA1 yk
kyk

sup
kyk
max
kzk = 1
y = z

= lim
0

max
kzk = 1
y = z

= lim
0

max
kzk = 1
y = z



kyk
kA1 yk

Chapter 9. Notes on the Condition of a Problem

180




1 yk  kyk 
kA

= lim max

0
kA1 yk
kyk

kzk = 1

y = z


 1
kyk
kA ( z)k
= lim max
k zk
kA1 yk
0
kzk = 1
y = z
 1  

kA zk
kyk
=
max
kzk
kA1 yk
kzk = 1

 1  
kAxk
kA zk
=
max
kzk
kxk
kzk = 1

 1 


kA zk
kAxk

max
max

kzk
kxk
kzk = 1
x 6= 0
= kAkkA1 k,
where k k is the matrix norm induced by vector norm k k.

9.4

Condition Number of a Rectangular Matrix

Given A Cmn with linearly independent columns and y Cm , consider the linear least-squares (LLS)
problem
kAx yk2 = min kAw yk2
(9.3)
w

and the perturbed problem


kA(x + x) yk2 = min kA(w + w) (y + y)k2 .

(9.4)

w+w

We will again bound by how much the relative error in y is amplified.


Notice that the solutions to (9.3) and (9.4) respectively satisfy
AH Ax = AH y
AH A(x + x) = AH (y + y)
so that AH Ax = AH y (subtracting the first equation from the second) and hence
kxk2 = k(AH A)1 AH yk2 k(AH A)1 AH k2 kyk2 .
Now, let z = A(AH A)1 AH y be the projection of y onto C (A) and let be the angle between z and y. Let
us assume that y is not orthogonal to C (A) so that z 6= 0. Then cos = kzk2 /kyk2 so that
cos kyk2 = kzk2 = kAxk2 kAk2 kxk2

9.5. Why Using the Method of Normal Equations Could be Bad

181

Figure 9.2: Linear least-squares problem kAx yk2 = minv kAv yk2 . Vector z is the projection of y onto
C (A).
and hence

We conclude that

1
kAk2

kxk2 cos kyk2


1 0 kyk2
kxk2 kAk2 k(AH A)1 AH k2 kyk2

=
kxk2
cos
kyk2
cos n1 kyk2

where 0 and n1 are (respectively) the largest and smallest singular values of A, because of the following
result:
Homework 9.2 If A has linearly independent columns, show that k(AH A)1 AH k2 = 1/n1 , where n1
equals the smallest singular value of A. Hint: Use the SVD of A.
* SEE ANSWER
The condition number of A Cmn with linearly independent columns is 2 (A) = 0 /n1 .
Notice the effect of the cos . When y is almost perpendicular to C (A), then its projection z is small
and cos is small. Hence a small relative change in y can be greatly amplified. This makes sense: if y is
almost perpendical to C (A), then x 0, and any small y C (A) can yield a relatively large change x.

9.5

Why Using the Method of Normal Equations Could be Bad

Homework 9.3 Let A have linearly independent columns. Show that 2 (AH A) = 2 (A)2 .
* SEE ANSWER
Homework 9.4 Let A Cnn have linearly independent columns.
Show that Ax = y if and only if AH Ax = AH y.
Reason that using the method of normal equations to solve Ax = y has a condition number of 2 (A)2 .

Chapter 9. Notes on the Condition of a Problem

182

* SEE ANSWER
Let A Cmn have linearly independent columns. If one uses the Method of Normal Equations to solve
the linear least-squares problem minx kAxyk2 , one ends up solving the square linear system AH Ax = AH y.
Now, 2 (AH A) = 2 (A)2 . Hence, using this method squares the condition number of the matrix being used.

9.6

Why Multiplication with Unitary Matrices is a Good Thing

Next, consider the computation C = AB where A Cmm is nonsingular and B, B,C, C Cmn . Then
(C + C) = A(B + B)
C = AB
C = AB
Thus,
kCk2 = kABk2 kAk2 kBk2 .
Also, B = A1C so that
kBk2 = kA1Ck2 kA1 k2 kCk2
and hence

1
1
kA1 k2
.
kCk2
kBk2

Thus,
kBk2
kBk2
kCk2
kAk2 kA1 k2
= 2 (A)
.
kCk2
kBk2
kBk2
This means that the relative error in matrix C = AB is at most 2 (A) greater than the relative error in B.
The following exercise gives us a hint as to why algorithms that cast computation in terms of multiplication by unitary matrices avoid the buildup of error:
Homework 9.5 Let U Cnn be unitary. Show that 2 (U) = 1.
* SEE ANSWER
This means is that the relative error in matrix C = UB is no greater than the relative error in B when U is
unitary.
Homework 9.6 Characterize the set of all square matrices A with 2 (A) = 1.
* SEE ANSWER

9.7

Balancing a Matrix

Consider the following problem: You buy two items, apples and oranges, at a price of 0 and 1 , respectively, but you forgot how much each was. But what you do remember is that the first time you
dollars
2 kg. of apples 0 kg.dollars
of apples + 3 kg. of oranges 1 kg. of oranges = 8 dollars
dollars
3 kg. of apples 0 kg.dollars
of apples + 2 kg.s of oranges 1 kg. of oranges = 5 dollars

9.7. Balancing a Matrix

In matrix notation this becomes

183

2 3

1
3 2

2 3
is
The condition number of the matrix A =
3 2

8
5

2 (A) = 5.
Now, let us change the problem to
2 kg.s of apples 0 kg.dollars
of apples + 3000 g. of oranges 1 g.
3 kg.s of apples 0 kg.dollars
of apples + 2000 g. of oranges 1 g.
Clearly, this is an equivalent problem, except
matrix notation this becomes

2 3000

3 2000

2
The condition number of the matrix B =
3

dollars
of oranges = 8 dollars
dollars
of oranges = 5 dollars

that 1 is computed as the cost per gram of oranges. In

8
0 = .
1
5

3000
is
2000

2 (B) = 2600.
So what is going on?
One way to look at this is that matrix B is close to singular: If only three significant digits are stored,
then compared to 3000 and 2000 the values 2 and 3 are not that different from 0. So in that case

0 3000
2 3000

0 2000
3 2000
which is a singular matrix, with condition number equal to infinity.
Notice that the vectors x and z are related by

0
0
1
0
.

=

1
1
0 1000
{z
}
|
DR
Also
Ax = b
can be changed to
z = b.
ADR D1
|{z} R
B

Chapter 9. Notes on the Condition of a Problem

184

What this in turn shows is that sometimes a poorly conditioned matrix can be transformed into a wellconditioned matrix (or, at least, a better conditioned matrix). When presented with Ax = b, a procedure
for solving for x may want to start by balancing the norms of rows and columns of A:
DR x = DL b ,
DL AD1
|{z}
| {z R } |{z}
z
b
B
where DL and DR are diagonal matrices chosen so that (B) is smaller than (A). This is known as
balancing the matrix. The example also shows how this can be thought of as picking the units in terms of
which the entries of x and b are expressed carefully so that large values are not artificially introduced into
the matrix.

9.8
9.8.1

Wrapup
Additional exercises

In our discussions of the conditioning of Ax = y where A is nonsingular, we assumed that error only
occurred in y: A(x + x) = y + y. Now, A is also input to the problem and hence can also have error in it.
The following questions relate to this.
Homework 9.7 Let k k be a vector norm with corresponding induced matrix norm. If A Cnn is
nonsingular and
1
kAk
<
.
(9.5)
kAk
(A)
then A + A is nonsingular.
* SEE ANSWER
Homework 9.8 Let A Cnn be nonsingular, Ax = y, and (A + A)(x + x) = y. Show that (for induced
norm)
kAk
kxk
(A)
.
kx + xk
kAk
* SEE ANSWER
Homework 9.9 Let A Cnn be nonsingular, kAk/kAk < 1/(A), y 6= 0, Ax = y, and (A +A)(x +x) =
y. Show that (for induced norm)
(A) kAk
kxk
kAk

.
kxk
1 (A) kAk
kAk
* SEE ANSWER
Homework 9.10 Let A Cnn be nonsingular, kAk/kAk < 1, y 6= 0, Ax = y, and (A + A)(x + x) =
y + y. Show that (for induced norm)


kyk
kAk
kxk (A) kAk + kyk

.
kxk
1 (A) kAk
kAk

* SEE ANSWER

9.9. Wrapup

9.9

Wrapup

9.9.1

Additional exercises

9.9.2

Summary

185

Chapter 9. Notes on the Condition of a Problem

186

Chapter

10

Notes on the Stability of an Algorithm


Based on Goal-Oriented and Modular Stability Analysis [7, 8] by Paolo Bientinesi and Robert van de
Geijn.

Video
Read disclaimer regarding the videos in the preface!
* Lecture on the Stability of an Algorithm
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)

10.0.1

Launch

187

Chapter 10. Notes on the Stability of an Algorithm

10.0.2

Outline
10.0.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

10.0.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

10.0.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

10.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

10.2

Floating Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

10.3

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

10.4

Floating Point Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193


10.4.1

Model of floating point computation . . . . . . . . . . . . . . . . . . . . . . . . 193

10.4.2

Stability of a numerical algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 194

10.4.3

Absolute value of vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . 194


Stability of the Dot Product Operation . . . . . . . . . . . . . . . . . . . . . . . 195

10.5
10.5.1

An algorithm for computing D OT . . . . . . . . . . . . . . . . . . . . . . . . . . 195

10.5.2

A simple start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

10.5.3

Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

10.5.4

Target result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

10.5.5

A proof in traditional format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

10.5.6

A weapon of math induction for the war on (backward) error (optional) . . . . . . 200

10.5.7

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Stability of a Matrix-Vector Multiplication Algorithm . . . . . . . . . . . . . . . 203

10.6
10.6.1

An algorithm for computing G EMV . . . . . . . . . . . . . . . . . . . . . . . . . 203

10.6.2

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Stability of a Matrix-Matrix Multiplication Algorithm . . . . . . . . . . . . . . . 205

10.7
10.7.1

An algorithm for computing G EMM . . . . . . . . . . . . . . . . . . . . . . . . . 205

10.7.2

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

10.7.3

An application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

10.8
10.8.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

10.8.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

188

189

10.0.3

What you will learn

Chapter 10. Notes on the Stability of an Algorithm

190

Figure 10.1: In this illustation, f : D R is a function to be evaluated. The function f represents the
implementation of the function that uses floating point arithmetic, thus incurring errors. The fact that
for a nearby value x the computed value equals the exact function applied to the slightly perturbed x,
f (x)
= f(x), means that the error in the computation can be attributed to a small change in the input. If
this is true, then f is said to be a (numerically) stable implementation of f for input x.

10.1

Motivation

Correctness in the presence of error (e.g., when floating point computations are performed) takes on a
different meaning. For many problems for which computers are used, there is one correct answer and we
expect that answer to be computed by our program. The problem is that most real numbers cannot be stored
exactly in a computer memory. They are stored as approximations, floating point numbers, instead. Hence
storing them and/or computing with them inherently incurs error. The question thus becomes When is a
program correct in the presense of such errors?
Let us assume that we wish to evaluate the mapping f : D R where D Rn is the domain and
R Rm is the range (codomain). Now, we will let f : D R denote a computer implementation of this
function. Generally, for x D it is the case that f (x) 6= f(x). Thus, the computed value is not correct.
From the Notes on Conditioning, we know that it may not be the case that f(x) is close to f (x). After
all, even if f is an exact implementation of f , the mere act of storing x may introduce a small error x and
f (x + x) may be far from f (x) if f is ill-conditioned.
The following defines a property that captures correctness in the presense of the kinds of errors that
are introduced by computer arithmetic:
Definition 10.1 Let given the mapping f : D R, where D Rn is the domain and R Rm is the range
(codomain), let f : D R be a computer implementation of this function. We will call f a (numerically)
stable implementation of f on domain D if for all x D there exists a x close to x such that f(x) = f (x).

10.2. Floating Point Numbers

191

In other words, f is a stable implementation if the error that is introduced is similar to that introduced
when f is evaluated with a slightly changed input. This is illustrated in Figure 10.1 for a specific input x.
If an implemention is not stable, it is numerically unstable.

10.2

Floating Point Numbers

Only a finite number of (binary) digits can be used to store a real number number. For so-called singleprecision and double-precision floating point numbers 32 bits and 64 bits are typically employed, respectively. Let us focus on double precision numbers.
Recall that any real number can be written as e , where is the base (an integer greater than one),
(1, 1) is the mantissa, and e is the exponent (an integer). For our discussion, we will define F as
the set of all numbers = e such that = 2, = .0 1 t1 has only t (binary) digits ( j {0, 1}),
0 = 0 iff = 0 (the mantissa is normalized), and L e U. Elements in F can be stored with a finite
number of (binary) digits.
There is a largest number (in absolute value) that can be stored. Any number that is larger overflows. Typically, this causes a value that denotes a NaN (Not-a-Number) to be stored.
There is a smallest number (in absolute value) that can be stored. Any number that is smaller
underflows. Typically, this causes a zero to be stored.
Example 10.2 For x Rn , consider computing
v
un1
u
kxk2 = t 2i .

(10.1)

i=0

Notice that
kxk2

n1
n max |i |
i=0

and hence unless some i is close to overflowing, the result will not overflow. The problem is that if
some element i has the property that 2i overflows, intermediate results in the computation in (10.1) will
overflow. The solution is to determine k such that
n1

|k | = max |i |
i=0

and to then instead compute


v
un1  2
u
i
kxk2 = |k |t
.
i=0 k
It can be argued that the same approach also avoids underflow if underflow can be avoided..
In our further discussions, we will ignore overflow and underflow issues.
What is important is that any time a real number is stored in our computer, it is stored as the nearest
floating point number (element in F). We first assume that it is truncated (which makes our explanation
slightly simpler).

Chapter 10. Notes on the Stability of an Algorithm

192

Let positive be represented by


= .0 1 2e ,
where 0 = 1 (the mantissa is normalized). If t digits are stored by our floating point system, then =
Then
.0 1 t1 2e is stored. Let = .
00} t 2e < .0
01} 2e = 2et .
= .0 1 t1 t 2e .0 1 t1 2e = .0
| {z
| {z
|
{z
}
{z
}
|
t
t

Also, since is positive,


= .0 1 2e .1 2e = 2e1 .
Thus,
2et
e1 = 21t

2
which can also be written as
21t .
A careful analysis of what happens when might equal zero or be negative yields
|| 21t ||.
Now, in practice any base can be used and floating point computation uses rounding rather than
truncating. A similar analysis can be used to show that then
|| u||
where u = 12 1t is known as the machine epsilon or unit roundoff. When using single precision or
double precision real arithmetic, u 108 or 1016 , respectively. The quantity u is machine dependent; it
is a function of the parameters characterizing the machine arithmetic.
The unit roundoff is often alternatively defined as the maximum positive floating point number which
can be added to the number stored as 1 without changing the number stored as 1. In the notation
introduced below, fl(1 + u) = 1.

Homework 10.3 Assume a floating point number system with = 2 and a mantissa with t digits so that a
typical positive number is written as .d0 d1 . . . dt1 2e , with di {0, 1}.
Write the number 1 as a floating point number.
What is the largest positive real number u (represented as a binary fraction) such that the floating point representation of 1 + u equals the floating point representation of 1? (Assume rounded
arithmetic.)
Show that u = 12 21t .
* SEE ANSWER

10.3. Notation

10.3

193

Notation

When discussing error analyses, we will distinguish between exact and computed quantities. The function
fl(expression) returns the result of the evaluation of expression, where every operation is executed in
floating point arithmetic. For example, assuming that the expressions are evaluated from left to right,
fl( + + /) is equivalent to fl(fl(fl() + fl()) + fl(fl()/fl())). Equality between the quantities lhs
and rhs is denoted by lhs = rhs. Assignment of rhs to lhs is denoted by lhs := rhs (lhs becomes rhs). In
the context of a program, the statements lhs := rhs and lhs := fl(rhs) are equivalent. Given an assignment
:= expression, we use the notation (pronounced check kappa) to denote the quantity resulting from
fl(expression), which is actually stored in the variable .

10.4

Floating Point Computation

We introduce definitions and results regarding floating point arithmetic. In this note, we focus on real
valued arithmetic only. Extensions to complex arithmetic are straightforward.

10.4.1

Model of floating point computation

The Standard Computational Model (SCM) assumes that, for any two floating point numbers and ,
the basic arithmetic operations satisfy the equality
fl( op ) = ( op )(1 + ),

|| u, and op {+, , , /}.

The quantity is a function of , and op. Sometimes we add a subscript (+ , , . . .) to indicate what
operation generated the (1 + ) error factor. We always assume that all the input variables to an operation
are floating point numbers. We can interpret the SCM as follows: These operations are performed
exactly and it is only in storing the result that a roundoff error occurs (equal to that introduced
when a real number is stored as a floating point number).
Remark 10.4 Given , F, performing any operation op {+, , , /} with and in floating point
arithmetic, [ op ], is a stable operation: Let = op and = + = [ (op) ]. Then || u||
and hence is close to (it has k correct binary digits).
For certain problems it is convenient to use the Alternative Computational Model (ACM) [26],
which also assumes for the basic arithmetic operations that
op
,
|| u, and op {+, , , /}.
fl( op ) =
1+
As for the standard computation model, the quantity is a function of , and op. Note that the s
produced using the standard and alternative models are generally not equal.
Remark 10.5 The Taylor series expansion of 1/(1 + ) is given by
1
= 1 + () + O(2 ),
1+
which explains how the SCM and ACM are related.
Remark 10.6 Sometimes it is more convenient to use the SCM and sometimes the ACM. Trial and error,
and eventually experience, will determine which one to use.

Chapter 10. Notes on the Stability of an Algorithm

10.4.2

194

Stability of a numerical algorithm

In the presence of round-off error, an algorithm involving numerical computations cannot be expected
to yield the exact result. Thus, the conventional notion of correctness applies only to the execution of
algorithms in exact arithmetic. Here we briefly introduce the notion of stability of algorithms.
Let f : D R be a mapping from the domain D to the range R and let f : D R represent the
mapping that captures the execution in floating point arithmetic of a given algorithm which computes f .
The algorithm is said to be backward stable if for all x D there exists a perturbed input x D ,
close to x, such that f(x) = f (x).
In other words, the computed result equals the result obtained when
the exact function is applied to slightly perturbed data. The difference between x and x, x = x x, is the
perturbation to the original input x.
The reasoning behind backward stability is as follows. The input to a function typically has some errors
associated with it. Uncertainty may be due to measurement errors when obtaining the input and/or may be
the result of converting real numbers to floating point numbers when storing the input on a computer. If it
can be shown that an implementation is backward stable, then it has been proved that the result could have
been obtained through exact computations performed on slightly corrupted input. Thus, one can think of
the error introduced by the implementation as being comparable to the error introduced when obtaining
the input data in the first place.
When discussing error analyses, x, the difference between x and x,
is the backward error and the
difference f(x) f (x) is the forward error. Throughout the remainder of this note we will be concerned
with bounding the backward and/or forward errors introduced by the algorithms executed with floating
point arithmetic.
The algorithm is said to be forward stable on domain D if for all x D it is that case that f(x) f (x).
In other words, the computed result equals a slight perturbation of the exact result.

10.4.3

Absolute value of vectors and matrices

In the above discussion of error, the vague notions of near and slightly perturbed are used. Making
these notions exact usually requires the introduction of measures of size for vectors and matrices, i.e.,
norms. Instead, for the operations analyzed in this note, all bounds are given in terms of the absolute
values of the individual elements of the vectors and/or matrices. While it is easy to convert such bounds
to bounds involving norms, the converse is not true.
Definition 10.7 Given x Rn and A Rmn ,

|0 |
|0,0 |
|0,1 |

| |
| |
|1,1 |
1
1,0

|x| =
and
|A|
=

..
..
..

.
.
.

|n1 |
|m1,0 | |m1,1 |
Definition 10.8 Let M {<, , =, , >} and x, y Rn . Then
|x| M |y| iff |i | M |i |,
with i = 0, . . . , n 1. Similarly, given A and B Rmn ,
|A| M |B| iff |i j | M |i j |,
with i = 0, . . . , m 1 and j = 0, . . . , n 1.

...

|0,n1 |

...
..
.

|1,n1 |
..
.

. . . |m1,n1 |

10.5. Stability of the Dot Product Operation

195

The next Lemma is exploited in later sections:


Lemma 10.9 Let A Rmk and B Rkn . Then |AB| |A||B|.
Homework 10.10 Prove Lemma 10.9.
* SEE ANSWER
The fact that the bounds that we establish can be easily converted into bounds involving norms is a
consequence of the following theorem, where k kF indicates the Frobenius matrix norm.
Theorem 10.11 Let A, B Rmn . If |A| |B| then kAk1 kBk1 , kAk kBk , and kAkF kBkF .
Homework 10.12 Prove Theorem 10.11.
* SEE ANSWER

10.5

Stability of the Dot Product Operation

The matrix-vector multiplication algorithm discussed in the next section requires the computation of the
dot (inner) product (D OT) of vectors x, y Rn : := xT y. In this section, we give an algorithm for this
operation and the related error results.

10.5.1

An algorithm for computing D OT

We will consider the algorithm given in Figure 10.2. It uses the FLAME notation [25, 4] to express the
computation


:= ((0 0 + 1 1 ) + ) + n2 n2 + n1 n1
(10.2)
in the indicated order.

10.5.2

A simple start

Before giving a general result, let us focus on the case where n = 2:


:= 0 0 + 1 1 .
Then, under the computational model given in Section 10.4, if := 0 0 +1 1 is executed, the computed
satisfies
result, ,

0
=
1
1
= [0 0 + 1 1 ]
= [[0 0 ] + [1 1 ]]
(0)

(1)

(0)

(1)

= [0 0 (1 + ) + 1 1 (1 + )]
(1)

= (0 0 (1 + ) + 1 1 (1 + ))(1 + + )
(0)

(1)

(1)

(1)

= 0 0 (1 + )(1 + + ) + 1 1 (1 + )(1 + + )

Chapter 10. Notes on the Stability of an Algorithm

196

Algorithm: D OT: := xT y

yT
xT
,y

x
xB
yB
while m(xT ) < m(x)

x
0
xT

xB
x2

do

y0

yT

1
,

yB
y2

:= + 1 1

x
y
0
0
yT

1 ,
1

xB
yB
x2
y2

xT

endwhile
Figure 10.2: Algorithm for computing := xT y.
T

0

1
T

0

1

(0)

(1)

(0)
(1)
0 (1 + )(1 + + )
(1)
(1)
1 (1 + )(1 + + )

(0)
(1)
(1 + )(1 + + )

(0)
(1)
0 (1 + )(1 + + )
(1)
(1)
1 (1 + )(1 + + )

0
(1)

0
T

0
(1)

(1 + )(1 + + )

(1)

where | |, | |, |+ | u.

Homework 10.13 Repeat the above steps for the computation


:= ((0 0 + 1 1 ) + 2 2 ),
computing in the indicated order.
* SEE ANSWER

10.5. Stability of the Dot Product Operation

10.5.3

197

Preparation

satisfies
Under the computational model given in Section 10.4, the computed result of (10.2), ,



(0)
(1)
(1)
(n2)
(n1)
(n1)
=
(0 0 (1 + ) + 1 1 (1 + ))(1 + + ) + (1 + + ) + n1 n1 (1 +
) (1 + + )
!
n1

(i)
i i (1 + )

( j)

(1 + + )

(10.3)

j=i

i=0
(0)

n1

( j)

(0)

( j)

where + = 0 and | |, | |, |+ | u for j = 1, . . . , n 1.


Clearly, a notation to keep expressions from becoming unreadable is desirable. For this reason we
introduce the symbol j :
Lemma 10.14 Let i R, 0 i n 1, nu < 1, and |i | u. Then n R such that
n1

(1 + i)1 = 1 + n,
i=0

with |n | nu/(1 nu).


Proof: By Mathematical Induction.
Base case. n = 1. Trivial.
Inductive Step. The Inductive Hypothesis (I.H.) tells us that for all i R, 0 i n 1, nu < 1, and
|i | u, there exists a n R such that
n1

(1 + i)1 = 1 + n, with |n| nu/(1 nu).


i=0

We will show that if i R, 0 i n, (n + 1)u < 1, and |i | u, then there exists a n+1 R such
that
n

(1 + i)1 = 1 + n+1, with |n+1| (n + 1)u/(1 (n + 1)u).


i=0

1
Case 1: ni=0 (1 + i )1 = n1
i=0 (1 + i ) (1 + n ). See Exercise 10.15.
1
Case 2: ni=0 (1 + i )1 = (n1
i=0 (1 + i ) )/(1 + n ). By the I.H. there exists a n such that
1 and | | nu/(1 nu). Then
(1 + n ) = n1
n
i=0 (1 + i )
1
1 + n
n n
n1
i=0 (1 + i )
=
= 1+
,
1 + n
1 + n
1 + n
| {z }
n+1

which tells us how to pick n+1 . Now




nu
n n |n | + u
nu + (1 nu)u
1nu + u


|n+1 | =

=

1 + n
1u
1u
(1 nu)(1 u)
2
(n + 1)u nu
(n + 1)u
=

.
2
1 (n + 1)u + nu
1 (n + 1)u

Chapter 10. Notes on the Stability of an Algorithm

198

By the Principle of Mathematical Induction, the result holds.

Homework 10.15 Complete the proof of Lemma 10.14.


* SEE ANSWER

The quantity n will be used throughout this note. It is not intended to be a specific number. Instead,
it is an order of magnitude identified by the subscript n, which indicates the number of error factors of
the form (1 + i ) and/or (1 + i )1 that are grouped together to form (1 + n ).
Since the bound on |n | occurs often, we assign it a symbol as follows:
Definition 10.16 For all n 1 and nu < 1, define n := nu/(1 nu).
With this notation, (10.3) simplifies to
= 0 0 (1 + n ) + 1 1 (1 + n ) + + n1 n1 (1 + 2 )

T
0
(1 + n )
0
0



0
(1 + n )
0

0
1



=
0
0
(1 + n1 )
0
2
.
.
.
.
..
.
..
..
..
..
..
.


n1

=
2
.
..

n1

0
..
.

0
..
.

n1
..
.





I +



(10.4)

.
..

(1 + 2 )
n1

2
,
..
.

(10.5)

. . . ..

.

2
n1

where | j | j , j = 2, . . . , n.
Two instances of the symbol n , appearing even in the same expression, typically do not represent the
same number. For example, in (10.4) a (1 + n ) multiplies each of the terms 0 0 and 1 1 , but these
two instances of n , as a rule, do not denote the same quantity. In particular, One should be careful when
factoring out such quantities.
As part of the analyses the following bounds will be useful to bound error that accumulates:
Lemma 10.17 If n, b 1 then n n+b and n + b + n b n+b .
Homework 10.18 Prove Lemma 10.17.
* SEE ANSWER

10.5. Stability of the Dot Product Operation

10.5.4

199

Target result

It is of interest to accumulate the roundoff error encountered during computation as a perturbation of input
and/or output parameters:
= (x + x)T y;

( is the exact output for a slightly perturbed x)

= xT (y + y);

( is the exact output for a slightly perturbed y)

= xT y + .

( equals the exact result plus an error)

The first two are backward error results (error is accumulated onto input parameters, showing that the
algorithm is numerically stable since it yields the exact output for a slightly perturbed input) while the last
one is a forward error result (error is accumulated onto the answer). We will see that in different situations,
a different error result may be needed by analyses of operations that require a dot product.
Let us focus on the second result. Ideally one would show that each of the entries of y is slightly
perturbed relative to that entry:

0 0
..
.

y =

n1 n1

0
0
0
. .
.
.
= ..
..
.. ..

0 n1
n1

= y,

where each i is small and = diag(0 , . . . , n1 ). The following special structure of , inspired
by (10.5) will be used in the remainder of this note:

(n)

0 0 matrix
1

if n = 0
(10.6)

if n = 1

diag(n , n , n1 , . . . , 2 ) otherwise.

Recall that j is an order of magnitude variable with | j | j .


Homework 10.19 Let k 0 and assume that |1 |, |2 | u, with 1 = 0 if k = 0. Show that

(k)
0
I +
(1 + 2 ) = (I + (k+1) ).

0
(1 + 1 )
Hint: reason the case where k = 0 separately from the case where k > 0.
* SEE ANSWER
We state a theorem that captures how error is accumulated by the algorithm.
Theorem 10.20 Let x, y Rn and let := xT y be computed by executing the algorithm in Figure 10.2.
Then
= [xT y] = xT (I + (n) )y.

Chapter 10. Notes on the Stability of an Algorithm

10.5.5

200

A proof in traditional format

In the below proof, we will pick symbols for various (sub)vectors so that the proof can be easily related to
the alternative framework to be presented in Section 10.5.6.
Proof: By Mathematical Induction on n, the length of vectors x and y.
Base case. m(x) = m(y) = 0. Trivial.
Inductive Step. I.H.: Assume that if xT , yT Rk , k > 0, then
fl(xTT yT ) = xTT (I + T )yT , where T = (k) .
T
T
T )yT holds true again.
We will show that when xT , yT Rk+1 , the equality

fl(xT yT ) = xT (I +
x0
y
and yT 0 . Then
Assume that xT , yT Rk+1 , and partition xT
1
1

fl(

x0

y0

) = fl(fl(xT y0 ) + fl(1 1 ))

(definition)

= fl(xT0 (I + 0 )y0 + fl(1 1 ))

(I.H. with xT = x0 ,
yT = y0 , and 0 = (k) )


= x0T (I + 0 )y0 + 1 1 (1 + ) (1 + + )
T

x0
(I + 0 )
0
y

(1 + + ) 0
=
1
0
(1 + )
1

(SCM, twice)

= xTT (I + T )yT

(renaming),

where | |, |+ | u, + = 0 if k = 0, and (I + T ) =

(rearrangement)

(I + 0 )

(1 + )

(1 + + ) so that T =

(k+1) .
By the Principle of Mathematical Induction, the result holds.

10.5.6

A weapon of math induction for the war on (backward) error (optional)

We focus the readers attention on Figure 10.3 in which we present a framework, which we will call the
error worksheet, for presenting the inductive proof of Theorem 10.20 side-by-side with the algorithm for
D OT. This framework, in a slightly different form, was first introduced in [3]. The expressions enclosed
by { } (in the grey boxes) are predicates describing the state of the variables used in the algorithms and
in their analysis. In the worksheet, we use superscripts to indicate the iteration number, thus, the symbols
vi and vi+1 do not denote two different variables, but two different states of variable v.

10.5. Stability of the Dot Product Operation

201

:= 0

Partition x

xT

, y

yT

Step

{=0}
!

1a

Error side

0 B
xB
yB
where xT and yT are empty, and T is 0 0
n
o
= xTT (I + T )yT T = (k) m(xT ) = k

while m(xT ) < m(x) do

2a
3

o
n
= xTT (I + T )yT T = (k) m(xT ) = k

2b

Repartition

xT

x0

y0

y
, T ,
1
1
xB
yB
x2
y2

i0

0 i1 0
0

where 1 , 1 , and i1 are scalars


o
n
i = x0T (I + i0 )y0 i0 = (k) m(x0 ) = k

i+1 = i + 1 1 (1 + ) (1 + + )

5a

SCM, twice
(+ = 0 if k = 0)

:= + 1 1


(k)
= x0T (I + 0 )y0 + 1 1 (1 + ) (1 + + )
T

(k)
0
I + 0
x0

(1 + + )
=
1
0
1 +

x0

Step 6: I.H.
y0

Rearrange


 y0

I + (k+1)
1

Exercise 10.19

i+1

0
y

0
i+1 = 0 I + 0

i+1

1
0 1
1

i+1

0
x
0
(k+1) m 0 = (k + 1)

i+1

0 1
1
Continue
with
!
x0
xT

1 ,
xB

x2

yT
yB

y0


1 ,
y

i+1
0

0
0

i+1
1

2
n 2
o
T
(k)
= xT (I + T )yT T = m(xT ) = k

5b

2c

endwhile
n
o
= xTT (I + T )yT T = (k) m(xT ) = k m(xT ) = m(x)
n
o
= xT (I + (n) )y m(x) = n

2d
1b

Figure 10.3: Error worksheet completed to establish the backward error result for the given algorithm that
computes the D OT operation.

Chapter 10. Notes on the Stability of an Algorithm

202

The proof presented in Figure 10.3 goes hand in hand with the algorithm, as it shows that before and
xT , yT , T are such that the predicate
after each iteration of the loop that computes := xT y, the variables ,
{ = xTT (I + T )yT k = m(xT ) T = (k) }

(10.7)

holds true. This relation is satisfied at each iteration of the loop, so it is also satisfied when the loop
completes. Upon completion, the loop guard is m(xT ) = m(x) = n, which implies that = xT (I + (n) )y,
i.e., the thesis of the theorem, is satisfied too.
In details, the inductive proof of Theorem 10.20 is captured by the error worksheet as follows:
Base case. In Step 2a, i.e. before the execution of the loop, predicate (10.7) is satisfied, as k = m(xT ) = 0.
Inductive step. Assume that the predicate (10.7) holds true at Step 2b, i.e., at the top of the loop. Then
Steps 6, 7, and 8 in Figure 10.3 prove that the predicate is satisfied again at Step 2c, i.e., the bottom
of the loop. Specifically,
Step 6 holds by virtue of the equalities x0 = xT , y0 = yT , and i0 = T .
The update in Step 8-left introduces the error indicated in Step 8-right (SCM, twice), yielding
i+1
the results for i+1
0 and 1 , leaving the variables in the state indicated in Step 7.
Finally, the redefinition of T in Step 5b transforms the predicate in Step 7 into that of Step 2c,
completing the inductive step.
By the Principle of Mathematical Induction, the predicate (10.7) holds for all iterations. In particular,
when the loop terminates, the predicate becomes
= xT (I + (n) )y n = m(xT ).
This completes the discussion of the proof as captured by Figure 10.3.
In the derivation of algorithms, the concept of loop-invariant plays a central role. Let L be a loop and
P a predicate. If P is true before the execution of L , at the beginning and at the end of each iteration of L ,
and after the completion of L , then predicate P is a loop-invariant with respect to L . Similarly, we give
the definition of error-invariant.
Definition 10.21 We call the predicate involving the operands and error operands in Steps 2ad the errorinvariant for the analysis. This predicate is true before and after each iteration of the loop.
For any algorithm, the loop-invariant and the error-invariant are related in that the former describes the
status of the computation at the beginning and the end of each iteration, while the latter captures an error
result for the computation indicated by the loop-invariant.
The reader will likely think that the error worksheet is an overkill when proving the error result for the
dot product. We agree. However, it links a proof by induction to the execution of a loop, which we believe
is useful. Elsewhere, as more complex operations are analyzed, the benefits of the structure that the error
worksheet provides will become more obvious. (We will analyze more complex algorithms as the course
proceeds.)

10.6. Stability of a Matrix-Vector Multiplication Algorithm

10.5.7

203

Results

A number of useful consequences of Theorem 10.20 follow. These will be used later as an inventory
(library) of error results from which to draw when analyzing operations and algorithms that utilize D OT.
Corollary 10.22 Under the assumptions of Theorem 10.20 the following relations hold:
R1-B: (Backward analysis) = (x + x)T y, where |x| n |x|, and = xT (y + y), where |y| n |y|;
R1-F: (Forward analysis) = xT y + , where || n |x|T |y|.
Proof: We leave the proof of R1-B as an exercise. For R1-F, let = xT (n) y, where (n) is as in
Theorem 10.20. Then
|| =

|xT (n) y|
|0 ||n ||0 | + |1 ||n ||1 | + + |n1 ||2 ||n1 |
n |0 ||0 | + n |1 ||1 | + + 2 |n1 ||n1 |
n |x|T |y|.

Homework 10.23 Prove R1-B.


* SEE ANSWER

10.6

Stability of a Matrix-Vector Multiplication Algorithm

In this section, we discuss the numerical stability of the specific matrix-vector multiplication algorithm
that computes y := Ax via dot products. This allows us to show how results for the dot product can be used
in the setting of a more complicated algorithm.

10.6.1

An algorithm for computing G EMV

We will consider the algorithm given in Figure 10.4 for computing y := Ax, which computes y via dot
products.

10.6.2

Analysis

Assume A Rmn and partition

aT0

aT
1
A= .
..

aTm1

.
.
.

m1

and

Chapter 10. Notes on the Stability of an Algorithm

204

Algorithm: G EMV:y := Ax

yT
AT
,y

A
AB
yB
while m(AT ) < m(A)

A
0
AT

aT1

AB
A2

do

y0

yT

1
,

yB
y2

1 := aT1 x

x
y
0
0
yT

1 ,
1

xB
yB
x2
y2

xT

endwhile
Figure 10.4: Algorithm for computing y := Ax.

Then

.
..

m1

aT0 x


aT x
1
:=
..

.

aTm1 x

From Corollary 10.22 R1-B regarding the dot product we know that


1
y = .
..

m1

(a0 + a0 )T x

aT0


aT
1
= .
..

T
(am1 + am1 ) x
aTm1
(a1 + a1 )T x
..
.

where |ai | n |ai |, i = 0, . . . , m 1, and hence |A| n |A|. .

aT0


aT

1
+
.
..


aTm1

x = (A + A)x,

10.7. Stability of a Matrix-Matrix Multiplication Algorithm

205

Also, from Corollary 10.22 R1-F regarding the dot product we know that


1
y = .
..

m1

aT0 x + 0

aT1 x + 1
..
.


aT
1
= .
..

aTm1

aTm1 x + m1

aT0

x+
.
..

m1

= Ax + y.

where |i | n |ai |T |x| and hence |y| n |A||x|.


The above observations can be summarized in the following theorem:
Theorem 10.24 Error results for matrix-vector multiplication. Let A Rmn , x Rn , y Rm and
consider the assignment y := Ax implemented via the algorithm in Figure 10.4. Then these equalities
hold:
R1-B: y = (A + A)x, where |A| n |A|.
R2-F: y = Ax + y, where |y| n |A||x|.
Homework 10.25 In the above theorem, could one instead prove the result
y = A(x + x),
where x is small?
* SEE ANSWER

10.7

Stability of a Matrix-Matrix Multiplication Algorithm

In this section, we discuss the numerical stability of the specific matrix-matrix multiplication algorithm
that computes C := AB via the matrix-vector multiplication algorithm from the last section, where C
Rmn , A Rmk , and B Rkn .

10.7.1

An algorithm for computing G EMM

We will consider the algorithm given in Figure 10.5 for computing C := AC, which computes one column
at a time so that the matrix-vector multiplication algorithm from the last section can be used.

10.7.2

Analysis

Partition
C=
Then

c0 c1 cn1

c0 c1 cn1

and B =

:=

b0 b1 bn1

Ab0 Ab1 Abn1

Chapter 10. Notes on the Stability of an Algorithm

Algorithm: G EMM :C := AB



C CL CR , B BL

BR

206

while n(CL ) < n(C) do


 
 

CL CR C0 c1 C2 , BL


 
BR B0 b1 B2

c1 := Ab1


CL

 
 

CR
C0 c1 C2 , BL

 


BR
B0 b1 B2

endwhile
Figure 10.5: Algorithm for computing C := AB one column at a time.
From Corollary 10.22 R1-B regarding the dot product we know that




=
c0 c1 cn1
Ab0 + c0 Ab1 + c1 Abn1 + cn1

 

=
Ab0 Ab1 Abn1 + c0 c1 cn1
= AB + C.
where |c j | k |A||b j |, j = 0, . . . , n 1, and hence |C| k |A||B|.
The above observations can be summarized in the following theorem:
Theorem 10.26 (Forward) error results for matrix-matrix multiplication. Let C Rmn , A Rmk ,
and B Rkn and consider the assignment C := AB implemented via the algorithm in Figure 10.5. Then
the following equality holds:
R1-F: C = AB + C, where |C| k |A||B|.
Homework 10.27 In the above theorem, could one instead prove the result
C = (A + A)(B + B),
where A and B are small?
* SEE ANSWER

10.7.3

An application

A collaborator of ours recently implemented a matrix-matrix multiplication algorithm and wanted to check
if it gave the correct answer. To do so, he followed the following steps:
He created random matrices A Rmk , and C Rmn , with positive entries in the range (0, 1).

10.8. Wrapup

207

He computed C = AB with an implementation that was known to be correct and assumed it yields
the exact solution. (Of course, it has error in it as well. We discuss how he compensated for that,
below.)
He computed C = AB with his new implementation.
He computed C = C C and checked that each of its entries satisfied i, j 2kui, j .
In the above, he took advantage of the fact that A and B had positive entries so that |A||B| = AB = C.
ku
He also approximated k = 1ku
with ku, and introduced the factor 2 to compensate for the fact that
C itself was inexactly computed.

10.8

Wrapup

10.8.1

Additional exercises

10.8.2

Summary

Chapter 10. Notes on the Stability of an Algorithm

208

Chapter

11

Notes on Performance
How to attain high performance on modern architectures is of importance: linear algebra is fundamental
to scientific computing. Scientific computing often involves very large problems that require the fastest
computers in the world to be employed. One wants to use such computers efficiently.
For now, we suggest the reader become familiar with the following resources:
Week 5 of Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30].
Focus on Section 5.4 Enrichment.
Kazushige Goto and Robert van de Geijn.
Anatomy of high-performance matrix multiplication [23].
ACM Transactions on Mathematical Software, 34 (3), 2008.
Field G. Van Zee and Robert van de Geijn.
BLIS: A Framework for Rapid Instantiation of BLAS Functionality [43].
ACM Transactions on Mathematical Software, to appear.
Robert van de Geijn.
How to Optimize Gemm.
wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm.
(An exercise on how to write a high-performance matrix-matrix multiplication in C.)
A similar exercise:
Michael Lehn
GEMM: From Pure C to SSE Optimized Micro Kernels
http://apfel.mathematik.uni-ulm.de/ lehn/sghpc/gemm/index.html.

209

Chapter 11. Notes on Performance

210

Chapter

12

Notes on Gaussian Elimination and LU


Factorization
12.1

Opening Remarks

12.1.1

Launch

The LU factorization is also known as the LU decomposition and the operations it performs are equivalent
to those performed by Gaussian elimination. For details, we recommend that the reader consult Weeks 6
and 7 of
Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30].

Video
Read disclaimer regarding the videos in the preface!
Lecture on Gaussian Elimination and LU factorization:
* YouTube
* Download from UT Box
* View After Local Download
Lecture on deriving dense linear algebra algorithms:
* YouTube
* Download from UT Box
* View After Local Download
* Slides
(For help on viewing, see Appendix A.)

211

Chapter 12. Notes on Gaussian Elimination and LU Factorization

12.1.2

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

12.1
12.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

12.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

12.1.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

12.2

Definition and Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

12.3

LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
12.3.1

First derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

12.3.2

Gauss transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

12.3.3

Cost of LU factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218


LU Factorization with Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . 219

12.4
12.4.1

Permutation matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

12.4.2

The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

12.5

Proof of Theorem 12.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

12.6

LU with Complete Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

12.7

Solving Ax = y Via the LU Factorization with Pivoting . . . . . . . . . . . . . . . 229

12.8

Solving Triangular Systems of Equations . . . . . . . . . . . . . . . . . . . . . . 229


12.8.1

Lz = y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

12.8.2

Ux = z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Other LU Factorization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 232

12.9
12.9.1

Variant 1: Bordered algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

12.9.2

Variant 2: Left-looking algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 237

12.9.3

Variant 3: Up-looking variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

12.9.4

Variant 4: Crout variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

12.9.5

Variant 5: Classical LU factorization . . . . . . . . . . . . . . . . . . . . . . . . 239

12.9.6

All algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

12.9.7

Formal derivation of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

12.10

Numerical Stability Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

12.11

Is LU with Partial Pivoting Stable? . . . . . . . . . . . . . . . . . . . . . . . . . 242

12.12

Blocked Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

12.12.1

Blocked classical LU factorization (Variant 5) . . . . . . . . . . . . . . . . . . . 242

12.12.2

Blocked classical LU factorization with pivoting (Variant 5) . . . . . . . . . . . . 245

12.13

Variations on a Triple-Nested Loop . . . . . . . . . . . . . . . . . . . . . . . . . 246

12.14

Inverting a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

12.14.1

Basic observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

12.14.2

Via the LU factorization with pivoting . . . . . . . . . . . . . . . . . . . . . . . 247

212

12.1. Opening Remarks

12.14.3

Gauss-Jordan inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

12.14.4

(Almost) never, ever invert a matrix . . . . . . . . . . . . . . . . . . . . . . . . . 249

12.15

Efficient Condition Number Estimation . . . . . . . . . . . . . . . . . . . . . . . 250

12.15.1

The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

12.15.2

Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

12.15.3

A simple approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

12.15.4

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

12.16

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

12.16.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

12.16.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

213

Chapter 12. Notes on Gaussian Elimination and LU Factorization

12.1.3

What you will learn

214

12.2. Definition and Existence

12.2

215

Definition and Existence

Definition 12.1 LU factorization (decomposition)Given a matrix A Cmn with m n its LU factorization is given by A = LU where L Cmn is unit lower trapezoidal and U Cnn is upper triangular.
The first question we will ask is when the LU factorization exists. For this, we need a definition.
Definition 12.2 The k k
principle leading
submatrix of a matrix A is defined to be the square matrix
AT L AT R
.
AT L Ckk such that A =
ABL ABR
This definition allows us to indicate when a matrix has an LU factorization:
Theorem 12.3 Existence Let A Cmn and m n have linearly independent columns. Then A has a
unique LU factorization if and only if all its principle leading submatrices are nonsingular.
The proof of this theorem is a bit involved and can be found in Section 12.5.

12.3

LU Factorization

We are going to present two different ways of deriving the most commonly known algorithm. The first is
a straight forward derivation. The second presents the operation as the application of a sequence of Gauss
transforms.

12.3.1

First derivation

Partition A, L, and U as follows:

T
11 a12
,
A
a21 A22

l21 L22

and U

11

uT12

U22

Then A = LU means that

T
T
T
a
1
0

u
11
u12

11 12 =
.
11 12 =
T
a21 A22
l21 L22
0 U22
l21 11 l21 u12 + L22U22
This means that
11 = 11

aT12 = uT12

a21 = 11 l21 A22 = l21 uT12 + L22U22


or, equivalently,
11 = 11

aT12 = uT12

a21 = 11 l21 A22 l21 uT12 = L22U22


If we let U overwrite the original matrix A this suggests the algorithm

Chapter 12. Notes on Gaussian Elimination and LU Factorization

216

l21 = a21 /11 .


a21 = 0.
A22 := A22 l 21 aT12 .
Continue by overwriting the updated A22 with its LU factorization.
This is captured in the algorithm in Figure 12.1.

12.3.2

Gauss transforms

Ik

Definition 12.4 A matrix Lk of the form Lk = 0

0
form.

1
l21

0 where Ik is k k is called a Gauss trans


0

Example 12.5 Gauss transforms can be used to take multiples of a row and subtract these multiples from
other rows:

T
T
T
b
a
b
a
ab0
0 0 0
1
0
0

0
T
T

b
a
b
b
a
a
1
0
0

1
1
=

=
.

0
T
T
T

b
b
b
b
a

a
a

a
1
0
21 1
21
2 2 21 abT 2

1
abT3
31
abT3
abT3 31 abT1
0 31 0 1
Notice the similarity with what one does in Gaussian Elimination: take a multiples of one row and subtract
these from other rows.
Now assume that the LU factorization in the previous subsection has proceeded to where A contains

A
a01 A02
00

T
0 11 a12

0 a21 A22
where A00 is upper triangular (recall: it is being overwritten by
 U!). Whatwe would like to do is eliminate
the elements in a21 by taking multiples of the current row 11 aT12 and subtract these from the rest


of the rows: a21 A22 . The vehicle is a Gauss transform: we must determine l21 so that

0
0
A
a01 A02
I
00

0
1
0 0 11 aT12

0 l21 I
0 a21 A22

A
a01
A02
00

= 0 11
aT12

0
0 A22 l21 aT12

12.3. LU Factorization

217

Algorithm: Compute LU factorization of A, overwriting L with


factor
L and A withfactor U

AT L AT R
0
L
, L T L

Partition A
ABL ABR
LBL LBR
whereAT L and LT L are 0 0
while n(AT L ) < n(A) do
Repartition

A
a01 A02
00

aT 11 aT
12
10
ABL ABR
A20 a21 A22
where11 , 11 are 1 1

AT L

AT R

LT L
,

LBL

LT R
LBR

L
0
0
00

l T 11 0
10
L20 l21 L22

l21 := a21 /11

A22 := A22 l21 aT12

(a21 := 0)

or, alternatively,

l21 := a21 /11

A
a01 A02

00

0 11 aT12

0 a21 A22

A00

a01 A02

:=

0
1
0 0 11 aT12

0 l21 0
0 a21 A22

A02
A00 a01

T
= 0 11

a12

0
0 A22 l21 aT12

Continue with

AT L

AT R

ABL

ABR

A
a
A02
00 01

aT 11 aT
12
10
A20 a21 A22

LT L
,

LBL

0
LBR

L
0
0
00

l T 11 0
10
L20 l21 L22

endwhile
Figure 12.1: Most commonly known algorithm for overwriting a matrix with its LU factorization.

Chapter 12. Notes on Gaussian Elimination and LU Factorization

This means we must pick l21 = a21 /11 since

0
0
a01 A02
I
A

00

0
1
0 0 11 aT12

0 l21 I
0 a21 A22

218

a01
A02
A
00

= 0
11
aT12

0 a21 11 l21 A22 l21 aT12

The resulting algorithm is summarized in Figure 12.1 under or, alternatively,. Notice that this algorithm
is identical to the algorithm for computing LU factorization discussed before!
How can this be? The following set of exercises explains it.
Homework 12.6 Show that

I
k

0
1
l21

I
k

= 0

0
1
l21

I
* SEE ANSWER

b0 , . . . , L
bn1
Now, clearly, what the algorithm does is to compute a sequence of n Gauss transforms L
1
b
b
b
b
b
such that Ln1 Ln2 L1 L0 A = U. Or, equivalently, A = L0 L1 Ln2 Ln1U, where Lk = Lk . What
will show next is that L = L0 L1 Ln2 Ln1 is the unit lower triangular matrix computed by the LU
factorization.

0 0
L

00

T
Homework 12.7 Let L k = L0 L1 . . . Lk . Assume that L k has the form L k1 = l10
1 0 , where L 00

L20 0 I

L00 0 0
I
0
0

is k k. Show that Lk is given by Lk = l10 1 0 .. (Recall: Lk = 0


1
0 .)

L20 l21 I
0 l21 I
* SEE ANSWER
What this exercise shows is that L = L0 L1 Ln2 Ln1 is the triangular matrix that is created by simply
placing the computed vectors l21 below the diagonal of a unit lower triangular matrix.

12.3.3

Cost of LU factorization

The cost of the LU factorization algorithm given in Figure 12.1 can be analyzed as follows:
Assume A is n n.
During the kth iteration, AT L is initially k k.

12.4. LU Factorization with Partial Pivoting

219

Computing l21 := a21 /11 is typically implemented as := 1/11 and then the scaling l21 := a21 .
The reason is that divisions are expensive relative to multiplications. We will ignore the cost of the
division (which will be insignificant if n is large). Thus, we count this as n k 1 multiplies.
The rank-1 update of A22 requires (n k 1)2 multiplications and (n k 1)2 additions.
Thus, the total cost (in flops) can be approximated by
n1 

(n k 1) + 2(n k 1)

n1 

j + 2 j2

n1

n1

(Change of variable: j = n k 1)

j=0

k=0

j + 2 j2

j=0

j=0

n
n(n 1)
+2
x2 dx

2
0
n(n 1) 2 3
=
+ n
2
3
2 3

n
3

Notice that this involves roughly half the number of floating point operations as are required for a Householder transformation based QR factorization.

12.4

LU Factorization with Partial Pivoting

It is well-known that the LU factorization is numerically unstable under general circumstances. In particular, a backward stability analysis, given for example in [3, 9, 7] and summarized in Section 12.10, shows
that the computed matrices L and U statisfy
(A + A) = L U

U|.

where |A| n |L||

(This is the backward error result for the Crout variant for LU factorization, discussed later in this note.
U|).)

Some of the other variants have an error result of (A + A) = L U where |A| n (|A| + |L||
Now, if
11 is small in magnitude compared to the entries of a21 then not only will l21 have large entries, but the
update A22 l21 aT12 will potentially introduce large entries in the updated A22 (in other words, the part of
matrix A from which the future matrix U will be computed), a phenomenon referred to as element growth.
To overcome this, we take will swap rows in A as the factorization proceeds, resulting in an algorithm
known as LU factorization with partial pivoting.

12.4.1

Permutation matrices

Definition 12.8 An n n matrix P is said to be a permutation matrix, or permutation, if, when applied
to a vector x = (0 , 1 , . . . , n1 )T , it merely rearranges the order of the elements in that vector. Such a
permutation can be represented by the vector of integers, (0 , 1 , . . . , n1 )T , where {0 , 1 , . . . , n1 } is a
permutation of the integers {0, 1, . . . , n 1} and the permuted vector Px is given by (0 , 1 , . . . , n1 )T .

Chapter 12. Notes on Gaussian Elimination and LU Factorization

220

Algorithm: Compute LU factorization with partial pivoting of A, overwriting L with


factor
L and A with
factor U. The pivot vector is returned in p.
Partition A

A A
BL BR

LT L

AT L AT R

, p

,
pT

LBL LBR
pB
whereAT L and LT L are 0 0 and pT is 0 1

while n(AT L ) < n(A) do


Repartition

AT L AT R

A00 a01 A02

L00 0

p0


LT L LT R T
pT
T
T

a10 11 a12
l10 11 0
1


ABL ABR
LBL LBR
pB
A20 a21 A22
L20 l21 L22
p2
where11 , 11 , 1 are 1 1

1 = maxi

aT
11 12

a21 A22

11
a21

:= P(1 )

aT
11 12

a21 A22

l21 := a21 /11


A22 := A22 l21 aT12
(a21 := 0)

Continue with

ABL

A a A
L
0 0
p
00 01 02
00

0
L
0
p
T
L
T

aT 11 aT ,
l T 11 0 , 1

10
12

10


ABR
LBL LBR
pB
A20 a21 A22
L20 l21 L22
p2

AT L AT R

endwhile
Figure 12.2: LU factorization with partial pivoting.

12.4. LU Factorization with Partial Pivoting

221

If P is a permutation matrix then PA rearranges the rows of A exactly as the elements of x are rearranged
by Px.
We will see that when discussing the LU factorization with partial pivoting, a permutation matrix that
swaps the first element of a vector with the -th element of that vector is a fundamental tool. We will
denote that matrix by

In
if = 0

0 0 1
0

0 I

P() =
0
0
1

otherwise,

1 0 0

0 0 0 In1
where n is the dimension of the permutation matrix. In the following we will use the notation Pn to indicate
that the matrix P is of size n. Let p be a vector of integers satisfying the conditions
p = (0 , . . . , k1 )T , where 1 k n and 0 i < n i,

(12.1)

then Pn (p) will denote the permutation:

Ik1
0
I
0
1
0
k2

Pn (0 ).
Pn (p) =
0 Pnk+1 (k1 )
0 Pnk+2 (k2 )
0 Pn1 (1 )

Remark 12.9 In the algorithms, the subscript that indicates the matrix dimensions is omitted.
Example 12.10 Let aT0 , aT1 , . . . , aTn1 be the rows of a matrix A. The application of P(p) to A yields a
matrix that results from swapping row aT0 with aT0 , then swapping aT1 with aT1 +1 , aT2 with aT2 +2 , until
finally aTk1 is swapped with aTk1 +k1 .
Remark 12.11 For those familiar with how pivot information is stored in LINPACK and LAPACK, notice
that those packages store the vector of pivot information (0 + 1, 1 + 2, . . . , k1 + k)T .

12.4.2

The algorithm

Having introduced our notation for permutation matrices, we can now define the LU factorization with
partial pivoting: Given an n n matrix A, we wish to compute a) a vector p of n integers which satisfies
the conditions (12.1), b) a unit lower trapezoidal matrix L, and c) an upper triangular matrix U so that
P(p)A = LU. An algorithm for computing this operation is typically represented by
[A, p] := LUpivA,
where upon completion A has been overwritten by {L\U}.
Let us start with revisiting the first derivation of the LU factorization. The first step is to find a first
permutation matrix P(1 ) such that the element on the diagonal in the first column is maximal in value.
For this, we will introduce the function maxi(x) which, given a vector x, returns the index of the element
in x with maximal magnitude (absolute value). The algorithm then proceeds as follows:

Chapter 12. Notes on Gaussian Elimination and LU Factorization

222

Partition A, L as follows:

11
a21

Permute the rows:

11

a21 A22

and L

l21 L22

Compute 1 = maxi

aT12

11

.
aT12

a21 A22

:= P(1 )

11

aT12

a21 A22

Compute l21 := a21 /11 .


Update A22 := A22 l21 aT12 .
Now, in general, assume that the computation has
overwritten by

A
a01
00

0 11

0 a21
where A00 is upper
the update

A
a01 A02
00

0 11 aT12

0 a21 A22

proceeded to the point where matrix A has been


A02
aT12
A22

triangular. If no pivoting was added one would compute l21 := a21 /11 followed by

I
A
0
0

00

:= 0
1
0 0

0 l21 I
0

Now, instead one performs the steps

11
.
Compute 1 = maxi
a21

A
a01 A02
00

Permute the rows: 0 11 aT12

0 a21 A22

a01 A02
11

aT12

a21 A22

A
00

= 0

0

00
0
I

0
:=

0 P(1 )
0

a01

A02

11

aT12

A22 l21 aT12

a01 A02

11 aT12

a21 A22

Compute l21 := a21 /11 .


Update

A
a01 A02
00

0 11 aT12

0 a21 A22

I
0
0
A
a01 A02

00

:= 0
1
0 0 11 aT12

0 l21 I
0 a21 A22

A
00

= 0

0

a01

A02

11

aT12

A22 l21 aT12

12.4. LU Factorization with Partial Pivoting

223

This algorithm is summarized in Figure 12.2.


b0 , . . . , L
bn1 and permulations
Now, what this algorithm computes is a sequence of Gauss transforms L
P0 , . . . , Pn1 such that
bn1 Pn1 L
b0 P0 A = U
L
or, equivalently,
T
Ln1U,
A = P0T L0 Pbn1

b1 . What we will finally show is that there are Gauss transforms L0 , . . . Ln1 (here the bar
where Lk = L
k
does NOT mean conjugation. It is just a symbol) such that
T
A = P0T Pn1
L0 Ln1 U
| {z }
L

or, equivalently,
P(p)A = Pn1 P0 A = L0 Ln1 U,
| {z }
L
which is what we set out to compute.
Here is the insight. Assume that after k steps of LU factorization we have computed pT , LT L , LBL , etc.
so that

AT L AT R
LT L 0

,
P(pT )A =
LBL I
0
ABR
where AT L is upper triangular and k k.

Now compute the next step of LU factorization with partial pivoting with
Partition

Compute 1 = maxi

AT L

AT R

ABR

A
a01 A02
00

0 11 aT
12

0 a01 A02

11
a21

A
a01 A02
00

Permute 0 11 aT12

0 a21 A22
Compute l21 := a21 /11 .

00
0
I

0
:=

0 P(1 )
0

a01 A02
11

aT12

a21 A22

AT L

AT R

ABR

Chapter 12. Notes on Gaussian Elimination and LU Factorization

224

Algorithm: Compute LU factorization with partial pivoting of A, overwriting L with


factor
in p.
factorU. The pivot
vector
is returned

L and A with
Partition A

AT L AT R

, L

LT L

, p

ABL ABR
LBL LBR
whereAT L and LT L are 0 0 and pT is 0 1

pT

pB

while n(AT L ) < n(A) do


Repartition

AT L AT R

A00 a01 A02

L00 0

p0


LT L LT R

pT
T

,
aT10 11 aT12 ,
l10

1
11 0


ABL ABR
LBL LBR
pB
A20 a21 A22
L20 l21 L22
p2
where11 , 11 , 1 are 1 1

1 = maxi

11

a21

A a A
00 01 02

T
l10 11 aT12

L20 a21 A22


l21 := a21 /11

A a A
00 01 02

0 11 aT12

0 a21 A22

A a A
00 01 02

l T 11 aT
:=

10
12
0 P(1 )
L20 a21 A22

A02
A a A
I 0 0
A a
00 01 02 00 01

:= 0 1 0 0 11 aT12 = 0 11

aT12

T
0 l21 0
0 a21 A22
0 0 A22 l21 a12

Continue with

A00 a01 A02

L00 0

p0


LT L 0 T
pT
T
T

a10 11 a12
l10 11 0
1


ABR
LBL LBR
pB
A20 a21 A22
L20 l21 L22
p2

AT L AT R
ABL

endwhile
Figure 12.3: LU factorization with partial pivoting.

12.4. LU Factorization with Partial Pivoting

225

Algorithm: Compute LU factorization with partial pivoting of A,


overwriting A with factors L and U. The pivot vector
is returned
in p.

Partition A

AT L

AT R

, p

ABL ABR
whereAT L is 0 0 and pT is 0 1

pT

pB

while n(AT L ) < n(A) do


Repartition

a01 A02
A
00

aT 11 aT

12
10
ABL ABR
A20 a21 A22
where11 , 11 , 1 are 1 1

AT L

AT R

1 = maxi

aT10
A20

0
pT

1
,

pB
p2

11
a21

11 aT12

:= P(1 )

a21 A22

aT10
A20

11 aT12
a21 A22

a21 := a21 /11


A22 := A22 a21 aT12

Continue with

AT L

AT R

ABL

ABR

A00 a01

A02

aT10 11

A20 a21

aT12

A22

p0

pT

pB
p2

endwhile
Figure 12.4: LU factorization with partial pivoting, overwriting A with the factors.
Update

A
a01 A02
00

0 11 aT12

0 a21 A22

A
00

:= 0

0

a01

A02

11

aT12

A22 l21 aT12

Chapter 12. Notes on Gaussian Elimination and LU Factorization

226

After this,

P(pT )A =

LT L
LBL

I 0 0
a01
A02
A

00

0 1 0 0 11
aT12

I
0 P(1 )
0 l21 I
0
0 A22 l21 aT12

(12.2)

But

LT L

LBL

I
0

I
0

I 0 0
A

00

0 1 0 0

0 P(1 )
0 l21 I
0

I 0 0
LT L
0
0

0 1 0

P(1 )
P(1 )LBL I
0 l21 I

I 0 0

0
0
L

TL

0 1 0

P(1 )
LBL I
0 l21 I

a01

A02

11

aT12

0 A22 l21 aT12

A
a01
A02
00

0 11
aT12

0
0 A22 l21 aT12

A00 a01
A02

T
,,
0 11
a12

T
0
0 A22 l21 a12

NOTE: There is a bar above some of Ls and ls. Very hard to see!!! For this reason, these show
up in red as well. Here we use the fact that P(1 ) = P(1 )T because of its very special structure.
Bringing the permutation to the left of (12.2) and repartitioning we get

0 0
a01
A02
I 0 0
A
L
00

00

0
I

T
P(p0 ) A = l T 1 0 0 1 0 0 11

.
a12
10

0 P(1 )
T
L
0
I
0
l
I
0
0
A

l
a
}
|
20
21
22
21 12
{z
|
{z
}

p0

P
L
0 0
00

1
T

l 10 1 0

L20 l21 I
This explains how the algorithm in Figure 12.3 compute p, L, and U (overwriting A with U) so that
P(p)A = LU.
Finally, we recognize that L can overwrite the entries of A below its diagonal, yielding the algorithm
in Figure 12.4.

12.5

Proof of Theorem 12.3

Proof:
() Let nonsingular A have a (unique) LU factorization. We will show that its principle leading subma-

12.5. Proof of Theorem 12.3

trices are nonsingular. Let

227

AT L

AT R

ABL

ABR
{z
A

=
}

LT L
LBL

LBR

UT L UT R
0

} |

{z
L

UBR
{z
}
U

be the LU factorization of A where AT L , LT L , and UT L are k k. Notice that U cannot have a zero on
the diagonal since then A would not have linearly independent columns. Now, the k k principle leading
submatrix AT L equals AT L = LT LUT L which is nonsingular since LT L has a unit diagonal and UT L has no
zeroes on the diagonal. Since k was chosen arbitrarily, this means that all principle leading submatrices
are nonsingular.
() We will do a proof by induction on n.

Base Case: n = 1. Then A has the form A =

11
a21

where 11 is a scalar. Since the principle leading

submatrices are nonsingular 11 6= 0. Hence A =

11 is the LU factorization of
|{z}
a21 /11
{z
} U
|
L
A. This LU factorization is unique because the first element of L must be 1.
Inductive Step: Assume the result is true for all matrices with n = k. Show it is true for matrices with
n = k + 1.
Let A of size n = k + 1 have nonsingular principle leading submatrices. Now, if an LU factorization
of A exists, A = LU, then it would have to form

L00 0
A00 a01

U00 u01

T
.
(12.3)
a10 11 = l10
1

0 11
A20 a21
L20 l21
|
{z
}
|
{z
}
|
{z
}
U
A
L
If we can show that the different parts of L and U
can be rewritten as

A
L
00 00
T T
a10 = l10 U00 and

A20
L20

exist and are unique, we are done. Equation (12.3)

a
01

11

a21

L00 u01
T u +
l10
01
11

L20 u01 + l21 11

T , and L exist and are unique. So the question is whether


Now, by the Induction Hypothesis L11 , l10
20
u01 , 11 , and l21 exist and are unique:

Chapter 12. Notes on Gaussian Elimination and LU Factorization

228

u01 exists and is unique. Since L00 is nonsingular (it has ones on its diagonal) L00 u01 = a01
has a solution that is unique.
T and u exist and are unique, = l T u
11 exists, is unique, and is nonzero. Since l10
01
11
11
10 01
exists and is unique. It is also nonzero since the principle leading submatrix of A given by

A
a01
0
u01
U
L
00
= 00
00
,
T
aT10 11
l10
1
0 11
is nonsingular by assumption and therefore 11 must be nonzero.
l21 exists and is unique. Since 11 exists and is nonzero, l21 = a21 /11 exists and is uniquely
determined.
Thus the m (k + 1) matrix A has a unique LU factorization.
By the Principle of Mathematical Induction the result holds.
Homework 12.12 Implement LU factorization with partial pivoting with the FLAME@lab API, in Mscript.
* SEE ANSWER

12.6

LU with Complete Pivoting

LU factorization with partial pivoting builds on the insight that pivoting (rearranging) rows in a linear
system does not change the solution: if Ax = b then P(p)Ax = P(p)b, where p is a pivot vector. Now, if
r is another pivot vector, then notice that P(r)T P(r) = I (a simple property of pivot matrices) and AP(r)T
permutes the columns of A in exactly the same order as P(r)A permutes the rows of A.
What this means is that if Ax = b then P(p)AP(r)T [P(r)x] = P(p)b. This supports the idea that one
might want to not only permute rows of A, as in partial pivoting, but also columns of A. This is done in a
variation on LU factorization that is known as LU factorization with complete pivoting.
The idea is as follows: Given matrix A, partition

11 aT12
.
A=
a21 A22
Now, instead of finding the largest element in magnitude in the first column, find the largest element in
magnitude in the entire matrix. Lets say it is element (0 , 0 ). Then, one permutes

T
T

a
11 12 := P(0 ) 11 12 P(0 )T ,
a21 A22
a21 A22
making 11 the largest element in magnitude. This then reduces the magnitude of multipliers and element
growth.
It can be shown that the maximal element growth experienced when employing LU with complete
pivoting indeed reduces element growth. The problem is that it requires O(n2 ) comparisons per iteration. Worse, it completely destroys the ability to utilize blocked algorithms, which attain much greater
performance.
In practice LU with complete pivoting is not used.

12.7. Solving Ax = y Via the LU Factorization with Pivoting

12.7

229

Solving Ax = y Via the LU Factorization with Pivoting

Given nonsingular matrix A Cmm , the above discussions have yielded an algorithm for computing
permutation matrix P, unit lower triangular matrix L and upper triangular matrix U such that PA = LU.
We now discuss how these can be used to solve the system of linear equations Ax = y.
Starting with
Ax = y
we multiply both sizes of the equation by permutation matrix P
PAx =

Py
|{z}
yb

and substitute LU for PA


L |{z}
Ux yb.
z
We now notice that we can solve the lower triangular system
Lz = yb
after which x can be computed by solving the upper triangular system
Ux = z.

12.8

Solving Triangular Systems of Equations

12.8.1 Lz = y
First, we discuss solving Lz = y where L is a unit lower triangular matrix.
Variant 1

Consider Lz = y where L is unit lower triangular. Partition

1
, z 1
L
l21 L22
z2
Then

and y


=
.
l21 L22
z2
y2
|
{z
} | {z }
| {z }
L
z
y

1
y2

Chapter 12. Notes on Gaussian Elimination and LU Factorization

230

Algorithm: Solve Lz = y, overwriting y (Var. 1)


LT L LT R
y
,y T
Partition L
LBL LBR
yB
where LT L is 0 0, yT has 0 rows

Algorithm: Solve Lz = y, overwriting y (Var. 2)


LT L LT R
y
,y T
Partition L
LBL LBR
yB
where LT L is 0 0, yT has 0 rows

while m(LT L ) < m(L) do

while m(LT L ) < m(L) do

Repartition

Repartition

L l L
00 01 02

l T 11 l T

,
12
10
LBL LBR
L l L
20 21 22

y
0
yT


yB
y2

LT L LT R

Continue with

Continue with

LT L LT R

L00 l01 L02

T
T

l
10 11 12 ,

LBL LBR
L l L
20 21 22

y
0
yT


1

yB
y2

endwhile

L l L
00 01 02

l T 11 l T

,
12
10
LBL LBR
L l L
20 21 22

y
0
yT


yB
y2
LT L LT R

T y
1 := 1 l10
0

y2 := y2 1 l21

LT L LT R

L00 l01 L02

T
T

l
10 11 12 ,

LBL LBR
L l L
20 21 22

y
0
yT


1

yB
y2

endwhile

Figure 12.5: Algorithms for the solution of a unit lower triangular system Lz = y that overwrite y with z.

12.8. Solving Triangular Systems of Equations

231

Multiplying out the left-hand side yields

1 l21 + L22 z2

1
y2

and the equalities


1 = 1
1 l21 + L22 z2 = y2 ,
which can be rearranged as
1 = 1
L22 z2 = y2 1 l21 .
These insights justify the algorithm in Figure 12.5 (left), which overwrites y with the solution to Lz = y.
Variant 2

An alternative algorithm can be derived as follows: Partition

L00 0
z0
y0
, z
and y
.
L
T
l10 1
1
1
Then

L00 0

z0

.
1
| {z }
y

T
l10
1
1
|
{z
} | {z }
L
z

Multiplying out the left-hand side yields

L00 z0
T z +
l10
0
1

y0

y0

and the equalities


L00 z0
T
l10 z0 + 1

= y0
= 1 .

The idea now is as follows: Assume that the elements of z0 were computed in previous iterations in
the algorithm in Figure 12.5 (left), overwriting y0 . Then in the current iteration we can compute 1 :=
T z , overwriting .
0 l10
0
1
Discussion

Notice that Variant 1 casts the computation in terms of an AXPY operation while Variant 2 casts it in terms
of DOT products.

Chapter 12. Notes on Gaussian Elimination and LU Factorization

232

12.8.2 Ux = z
Next, we discuss solving Ux = y where U is an upper triangular matrix (with no assumptions about its
diagonal entries).
Homework 12.13 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts
most computation in terms of DOT products. Hint: Partition

T
11 u12
.
U
0 U22
Call this Variant 1 and use Figure 12.6 to state the algorithm.
* SEE ANSWER
Homework 12.14 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts most
computation in terms of AXPY operations. Call this Variant 2 and use Figure 12.6 to state the algorithm.
* SEE ANSWER

12.9

Other LU Factorization Algorithms

There are actually five different (unblocked) algorithms for computing the LU factorization that were
discovered over the course of the centuries1 . The LU factorization in Figure 12.1 is sometimes called
classical LU factorization or the right-looking algorithm. We now briefly describe how to derive the other
algorithms.
Finding the algorithms starts with the following observations.
b to denote the original contents
Our algorithms will overwrite the matrix A, and hence we introduce A
of A. We will say that the precondition for the algorithm is that
b
A=A
(A starts by containing the original contents of A.)
We wish to overwrite A with L and U. Thus, the postcondition for the algorithm (the state in which
we wish to exit the algorithm) is that
b
A = L\U LU = A
(A is overwritten by L below the diagonal and U on and above the diagonal, where multiplying L
and U yields the original matrix A.)
All the algorithms will march through the matrices from top-left to bottom-right. Thus, at a representative point in the algorithm, the matrices are viewed as quadrants:

LT L
0
UT L UT R
AT L AT R
, L
, and U
.
A
ABL ABR
LBL LBR
0 UBR
where AT L , LT L , and UT L are all square and equally sized.
1 For

a thorough discussion of the different LU factorization algorithms that also gives a historic perspective, we recommend
Matrix Algorithms Volume 1 by G.W. Stewart [36]

12.9. Other LU Factorization Algorithms

233

Algorithm: Solve Uz = y, overwriting y (Variant 1) Algorithm: Solve Uz = y, overwriting y (Variant 2)


UT L UT R
yT
UT L UT R
y
,y
,y T
Partition U
Partition U
UBL UBR
yB
UBL UBR
yB
where UBR is 0 0, yB has 0 rows
where UBR is 0 0, yB has 0 rows
while m(UBR ) < m(U) do

while m(UBR ) < m(U) do

Repartition

Repartition

U u U
00 01 02

uT 11 uT
,
12
10

UBL UBR
U u U
20 21 22

y
0
yT


yB
y2

UT L UT R

Continue with

U u U
00 01 02

uT 11 uT
,
12
10
UBL UBR
U u U
20 21 22

y
0
yT


yB
y2
UT L UT R

endwhile

UT L UT R

Continue with

U u U
00 01 02

uT 11 uT
,
12
10

UBL UBR
U u U
20 21 22

y
0
yT


yB
y2

U u U
00 01 02

uT 11 uT
,
12
10
UBL UBR
U u U
20 21 22

y
0
yT


yB
y2

UT L UT R

endwhile

Figure 12.6: Algorithms for the solution of an upper triangular system Ux = y that overwrite y with x.

Chapter 12. Notes on Gaussian Elimination and LU Factorization

234

In terms of these exposed quadrants, in the end we wish for matrix A to contain

AT L

AT R

ABL

A
BR
LT L
where
LBL

L\UT L

UT R

LBL
L\UBR


0
U
UT R
TL
=
LBR
0 UBR

bT L
A

bT R
A

bBL
A

bBR
A

Manipulating this yields what we call the Partitioned Matrix Expression (PME), which can be
viewed as a recursive definition of the LU factorization:

AT L
ABL

AT R
ABR

L\UT L

LBL

UT R
L\UBR

bT L
LT LUT L = A

bT R
LT LUT R = A

bBL
LBLUT L = A

bBR
LBLUT R + LBRUBR = A

Now, consider the code skeleton for the LU factorization in Figure 12.7. At the top of the loop
(right after the while), we want to maintain certain contents in matrix A. Since we are in a loop, we
havent yet overwritten A with the final result. Instead, some progress toward this final result have
been made. The way we can find what the state of A is that we would like to maintain is to take the
PME and delete subexpression. For example, consider the following condition on the contents of A:

AT L
ABL

AT R
ABR

L\UT L

UT R

LBL

bBR LBLUT R
A
bT R
LT LUT R = A

bT L
LT LUT L = A

(
(((

(( b
(U
bBL LBLU
+(
LBR
LBLUT L = A
((
BR = ABR
(((T R

What we are saying is that AT L , AT R , and ABL have been completely updated with the corresponding
parts of L and U, and ABR has been partially updated. This is exactly the state that the algorithm
that we discussed previously in this document maintains! What is left is to factor ABR , since it
bBR LBLUT R , and A
bBR LBLUT R = LBRUBR .
contains A
By carefully analyzing the order in which computation must occur (in compiler lingo: by performing
a dependence analysis), we can identify five states that can be maintained at the top of the loop, by
deleting subexpressions from the PME. These are called loop invariants and are listed in Figure 12.8.

Key to figuring out what updates must occur in the loop for each of the variants is to look at how the
matrices are repartitioned at the top and bottom of the loop body.

12.9. Other LU Factorization Algorithms

235

Algorithm: A := LU(A)

LT L LT R
UT L UT R
AT L AT R
,L
,U

Partition A
ABL ABR
LBL LBR
UBL UBR
whereAT L is 0 0, LT L is 0 0, UT L is 0 0
while m(AT L ) < m(A) do
Repartition

AT L

A00

a01 A02

LT L

aT10 11 aT12 ,

ABL ABR
LBL
A20 a21 A22

U00 u01 U02

U
UT R

T
T

TL
u10 11 u12

UBL UBR
U20 u21 U22
where11 is 1 1, 11 is 1 1, 11 is 1 1
AT R

LT R
LBR

L00

l01

L02

T
T

l10
11 l12

L20 l21 L22

Continue with

AT L

AT R

ABL

ABR

UT L UT R
UBL

UBR

A
00

aT
10
A
20

U
00

uT
10
U20

a01

A02

11

aT12

a21

A22

u01 U02
11

uT12

u21 U22

LT L
,

LBL

LT R
LBR

L
l
00 01

l T 11
10
L20 l21

endwhile
Figure 12.7: Code skeleton for LU factorization.

L02
T
l12

L22

Chapter 12. Notes on Gaussian Elimination and LU Factorization

Variant

Algorithm

Bordered

AT L
ABL

Left-looking

Up-looking

Crout variant

ABL

ABR

AT L
ABL

AT L
ABL

Classical LU

(
((
b(
(T(
(U
LBL
(
L = ABL

AT R

State (loop invariant)


bT R
AT R
L\UT L A
=

b
b
ABR
ABL
ABR
(
((b(
bT L
(T(
LT(
LT LUT L = A
(
LU
R = AT R

AT L

AT L
ABL

236

(
(
((((
(
(
b
(
L (U
R + LBRUBR = ABR
(T(
(BL

L\UT L

bT R
A

LBL

bBR
A

bT L
LT LUT L = A

(
((b(
(T(
LT(
(
LU
R = AT R
(
(
((((
(U
(
b
(
L
U
+
L
=
A
(
BR BR
BR
((T R
(BL

bBL
LBLUT L = A

AT R
L\UT L UT R

=
b
b
ABL
ABR
ABR
bT L
bT R
LT LUT L = A
LT LUT R = A
(
((
b(
(T(
(U
LBL
(
L = ABL

(
(((

(( b
(U
LBL
U
+(
LBR
= ABR
T(
R(
(
(
(
BR

AT R
L\UT L UT R

=
b
ABR
ABR
LBL
bT L
bT R
LT LUT L = A
LT LUT R = A
((((

(
(

(U = A
bBR
bBL LBLU
+(
LBR
LBLUT L = A
R(
BR
(((T(

AT R
L\UT L
UT R
=

bBR LBLUT R
ABR
LBL
A
bT L
bT R
LT LUT L = A
LT LUT R = A
(
(((

(( b
(U
bBL LBLU
LBLUT L = A
+(
LBR
R(
BR = ABR
(((T(

Figure 12.8: Loop invariants for various LU factorization algorithms.

12.9. Other LU Factorization Algorithms

12.9.1

237

Variant 1: Bordered algorithm

Consider the loop invariant:

AT L

AT R

ABL

ABR

L\UT L

bT R
A

bBL
A

bBR
A

(
((b(
(T(
LT(
(
LU
R = AT R
(
((
((( b
(U
(
(
L
U
+
L
=
A
(
BR BR
BR
((T R
(BL

bT L
LT LUT L = A
(
((
b(
(T(
(U
LBL
(
L = ABL

At the top of the loop, after repartitioning, A contains


L\U00

b02
ab01 A

abT10
b20
A

b 11 abT12

b22
ab21 A

L\U00

b02
u01 A

T
l10
b20
A

11 abT12
b22
ab21 A

while at the bottom it must contain

b we notice that
where the entries in blue are to be computed. Now, considering LU = A
b00
L00U00 = A

L00 u01 = ab01

b02
L00U02 = A

TU =a
bT10
l10
00

T u + =
b 11
l10
01
11

T U + uT = a
bT12
l10
02
12

b20 L20 u01 + 11 l21 = ab21


L20U00 = A

b22
L20U02 + l21 uT12 + L22U22 = A

where the entries in red are already known. The equalities in yellow can be used to compute the desired
parts of L and U:
Solve L00 u01 = a01 for u01 , overwriting a01 with the result.
T U = aT (or, equivalently, U T (l T )T = (aT )T for l T ), overwriting aT with the result.
Solve l10
00
10
00 10
10
10
10
T u , overwriting with the result.
Compute 11 = 11 l10
01
11

Homework 12.15 If A is an n n matrix, show that the cost of Variant 1 is approximately 23 n3 flops.
* SEE ANSWER

12.9.2

Variant 2: Left-looking algorithm

Consider the loop invariant:

AT L
ABL

AT R
ABR

L\UT L

bT R
A

LBL

bBR
A

bT L
LT LUT L = A
bBL
LBLUT L = A

(
((b(
(T(
LT(
(
LU
R = AT R
(
(
((((
(
(
b
(
L (U
R + LBRUBR = ABR
(T(
(BL

Chapter 12. Notes on Gaussian Elimination and LU Factorization

238

At the top of the loop, after repartitioning, A contains


L\U00

b02
ab01 A

T
l10

L20

b 11 abT12

b22
ab21 A

L\U00

b02
u01 A

T
l10

11 abT12
b22
l21 A

while at the bottom it must contain

L20

b we notice that
where the entries in blue are to be computed. Now, considering LU = A
b00
L00U00 = A

L00 u01 = ab01

b02
L00U02 = A

TU =a
bT10
l10
00

T u + =
b 11
l10
01
11

T U + uT = a
bT12
l10
02
12

b20 L20 u01 + 11 l21 = ab21


L20U00 = A

b22
L20U02 + l21 uT12 + L22U22 = A

The equalities in yellow can be used to compute the desired parts of L and U:
Solve L00 u01 = a01 for u01 , overwriting a01 with the result.
T u , overwriting with the result.
Compute 11 = 11 l10
01
11

Compute l21 := (a21 L20 u01 )/11 , overwriting a21 with the result.

12.9.3

Variant 3: Up-looking variant

Homework 12.16 Derive the up-looking variant for computing the LU factorization.
* SEE ANSWER

12.9.4

Variant 4: Crout variant

Consider the loop invariant:

AT L

AT R

ABL

ABR

L\UT L UT R
LBL

bT L
LT LUT L = A

bBR
A

bT R
LT LUT R = A
(
(((

(( b
(U
bBL LBLU
LBLUT L = A
+(
LBR
R(
BR = ABR
(((T(

At the top of the loop, after repartitioning, A contains


L\U00

u01 U02

T
l10

b 11 abT12

b22
ab21 A

L20

12.9. Other LU Factorization Algorithms

239

while at the bottom it must contain


L\U00

u01 U02

T
l10

11 uT12
b22
l21 A

L20

b we notice that
where the entries in blue are to be computed. Now, considering LU = A
b00
L00U00 = A

L00 u01 = ab01

b02
L00U02 = A

TU =a
bT10
l10
00

T u + =
b 11
l10
01
11

T U + uT = a
bT12
l10
02
12

b20 L20 u01 + 11 l21 = ab21


L20U00 = A

b22
L20U02 + l21 uT12 + L22U22 = A

The equalities in yellow can be used to compute the desired parts of L and U:
T u , overwriting with the result.
Compute 11 = 11 l10
01
11

Compute l21 := (21 L20 u01 )/11 , overwriting a21 with the result.
T U , overwriting aT with the result.
Compute uT12 := aT12 l10
02
12

12.9.5

Variant 5: Classical LU factorization

We have already derived this algorithm. You may want to try rederiving it using the techniques discussed
in this section.

12.9.6

All algorithms

All five algorithms for LU factorization are summarized in Figure 12.9.


Homework 12.17 Implement all five LU factorization algorithms with the FLAME@lab API, in M-script.
* SEE ANSWER
Homework 12.18 Which of the five variants can be modified to incorporate partial pivoting?
* SEE ANSWER

12.9.7

Formal derivation of algorithms

The described approach to deriving algorithms, linking the process to the a priori identification of loop
invariants, was first proposed in [25]. It was refined into what we call the worksheet for deriving algorithms hand-in-hand with their proofs of correctness, in [4]. A book that describes the process at a level
also appropriate for the novice is The Science of Programming Matrix Computations [40].

Chapter 12. Notes on Gaussian Elimination and LU Factorization

240

Algorithm: A := L\U = LUA

AT L AT R

Partition A
ABL ABR
whereAT L is 0 0
while n(AT L ) < n(A) do
Repartition

A
00

aT

10
ABL ABR
A20
where11 is 1 1

AT L

AT R

a01 A02
11

aT12

a21 A22

A00 contains L00 and U00 in its strictly lower and upper triangular part, respectively.
Variant 1

Variant 2

Variant 3

Variant 4

Variant 5

Bordered

Left-looking

Up-looking

Crout variant

Classical LU

1
a01 := L00
a01

1
a01 := L00
a01

Exercise in 12.16

aT10

1
:= aT10U00

11 := 11 aT21 a01 11 := 11 aT21 a01

11 := 11 aT21 a01
aT12 := aT12 aT10 A02

a21 := a21 A20 a01

a21 := a21 A20 a01

a21 := a21 /11

a21 := a21 /11

a21 := a21 /11


A22 := A22 a21 aT12

Continue with

AT L

AT R

ABL

ABR

A00 a01

A02

aT10 11

A20 a21

aT12

A22

endwhile
Figure 12.9: All five LU factorization algorithms.

12.10. Numerical Stability Results

12.10

241

Numerical Stability Results

The numerical stability of various LU factorization algorithms as well as the triangular solve algorithms
can be found in standard graduate level numerical linear algebra texts and references [22, 26, 36]. Of
particular interest may be the analysis of the Crout variant (Variant 4) in [9], since it uses our notation as
well as the results in Notes on Numerical Stability. (We recommend the technical report version [7] of
the paper, since it has more details as well as exercises to help the reader understand.) In that paper, a
systematic approach towards the derivation of backward error results is given that mirrors the systematic
approach to deriving the algorithms given in [?, 4, 40].
Here are pertinent results from that paper, assuming floating point arithmetic obeys the model of computation given in Notes on Numerical Stability (as well as [9, 7, 26]). It is assumed that the reader is
familiar with those notes.
Theorem 12.19 Let A Rnn and let the LU factorization of A be computed via the Crout variant (Variant
Then
4), yielding approximate factors L and U.
(A + A) = L U

U|.

with |A| n |L||

Theorem 12.20 Let L Rnn be lower triangular and y, z Rn with Lz = y. Let z be the approximate
solution that is computed. Then
(L + L)z = y

with |L| n |L|.

Theorem 12.21 Let U Rnn be upper triangular and x, z Rn with Ux = z. Let x be the approximate
solution that is computed. Then
(U + U)x = z

with |U| n |U|.

Theorem 12.22 Let A Rnn and x, y Rn with Ax = y. Let x be the approximate solution computed via
the following steps:

Compute the LU factorization, yielding approximate factors L and U.


= y, yielding approximate solution z.
Solve Lz
= z, yielding approximate solution x.
Solve Ux

Then (A + A)x = y

U|.

with |A| (3n + 2n )|L||

The analysis of LU factorization without partial pivoting is related that of LU factorization with partial
pivoting. We have shown that LU with partial pivoting is equivalent to the LU factorization without partial
pivoting on a pre-permuted matrix: PA = LU, where P is a permutation matrix. The permutation doesnt
involve any floating point operations and therefore does not generate error. It can therefore be argued that,
as a result, the error that is accumulated is equivalent with or without partial pivoting

Chapter 12. Notes on Gaussian Elimination and LU Factorization

12.11

242

Is LU with Partial Pivoting Stable?

Homework 12.23 Apply LU with partial pivoting to

1
0 0

1
1 0

1 1 1

A= .
.. ..
..
. .

1 1

1 1

0 1

0 1

0 1

.
.. ..
. .

1 1

..
.

1 1

Pivot only when necessary.


* SEE ANSWER
From this exercise we conclude that even LU factorization with partial pivoting can yield large (exponential) element growth in U. You may enjoy the collection of problems for which Gaussian elimination with
partial pivoting in unstable by Stephen Wright [48].
In practice, this does not seem to happen and LU factorization is considered to be stable.

12.12

Blocked Algorithms

It is well-known that matrix-matrix multiplication can achieve high performance on most computer architectures [2, 23, 20]. As a result, many dense matrix algorithms are reformulated to be rich in matrix-matrix
multiplication operations. An interface to a library of such operations is known as the level-3 Basic Linear
Algebra Subprograms (BLAS) [18]. In this section, we show how LU factorization can be rearranged so
that most computation is in matrix-matrix multiplications.

12.12.1

Blocked classical LU factorization (Variant 5)

Partition A, L, and U as follows:

A11 A12
,
A
A21 A22

L11

L21 L22

and U

U11 U12
0

U22

where A11 , L11 , and U11 are b b. Then A = LU means that

A11 A12
L11 0
U11 U12
L11U11
L11U12

=
.
A21 A22
L21 L22
0 U22
L21U11 L21U12 + L22U22
This means that
A11 = L11U11

A12 = L11U12

A21 = L21U11 A22 = L21U12 + L22U22

12.12. Blocked Algorithms

243

Algorithm: A := LU(A)

AT L AT R

Partition A
ABL ABR
whereAT L is 0 0
while m(AT L ) < m(A) do
Determine block size b
Repartition

A
00
AT L AT R

A10

ABL ABR
A20
whereA11 is b b

A01 A02
A11 A12
A21 A22

A00 contains L00 and U00 in its strictly lower and upper triangular part, respectively.
Variant 1:
A01 := L00 1 A01

Variant 2:
A01 := L00 1 A01

Variant 3:
A10 := A10U00 1

A10 := A10U00 1

A11 := A11 A10 A01

A11 := A11 A10 A01

A11 := A11 A10 A01

A11 := LUA11

A11 := LUA11

A11 := LUA11

A21 := (A21 A20 A01 )U11 1A12 := A12 A10 A02

Variant 4:
A11 := A11 A10 A01

Variant 5:
A11 := LUA11
A21 := A21U11 1

A11 := LUA11

1
A21 := (A21 A20 A01 )U11
A12 := L11 1 A12

A12 := L11 1 (A12 A10 A02 )A22 := A22 A21 A12

Continue with

AT L

AT R

ABL

ABR

A
A01
00

A10 A11

A20 A21

A02
A12
A22

endwhile
Figure 12.10: Blocked algorithms for computing the LU factorization.

Chapter 12. Notes on Gaussian Elimination and LU Factorization

244

Algorithm: [A, p] := LU PIV BLK(A, p)

AT L AT R
pT
, p

Partition A
ABL ABR
pB
whereAT L is 0 0, pT has 0 elements.
while n(AT L ) < n(A) do
Determine block size b
Repartition

A00 A01 A02

AT L AT R
A

10 A11 A12
ABL ABR
A20 A21 A22
whereA11 is b b, p1 has b elements

p0

p
, T p
1

pB
p2

Variant 2:
A01 := L00 1 A01

Variant 4:

Variant 5:

A11 := A11 A10 A01

A11 := A11 A10 A01

A21 := A21 A20 A01


A
11 , p1 :=
A21

A11
, p1
LUpiv
A21

A12
A
:=
10
A20 A22

A10 A12

P(p1 )
A20 A22

A21 := A21 A20 A01


A
11 , p1 :=
A21

A11
, p1
LUpiv
A21

A12
A
:=
10
A20 A22

A10 A12

P(p1 )
A20 A22
A12 := A12 A10 A02

A11
A21

, p1 :=

LUpiv

A11

A
21

A10

A12

A20

A22

, p1

:=

P(p1 )

A10

A12

A20

A22

A12 := L11 1 A12

A12 := L11 1 A12

A22 := A22 A21 A12

Continue with

AT L

AT R

ABL

ABR

A00

A01

A10

A20

A11

A21

A02

p0

p
T p1
A12

pB
A22
p2

endwhile

Figure 12.11: Blocked algorithms for computing the LU factorization with partial pivoting.
.

12.12. Blocked Algorithms

245

or, equivalently,
A11 = L11U11

A12 = L11U12

A21 = L21U11 A22 L21U12 = L22U22


If we let L and U overwrite the original matrix A this suggests the algorithm
Compute the LU factorization A11 = L11U11 , overwriting A11 with L\U11 . Notice that any of the
unblocked algorithms previously discussed in this note can be used for this factorization.
1
Solve L11U12 = A12 , overwriting A12 with U12 . (This can also be expressed as A12 := L11
A12 .)
1
Solve L21U11 = A21 , overwiting A21 with U21 . (This can also be expressed as A21 := A21U11
.)

Update A22 := A22 A21 A12 .


Continue by overwriting the updated A22 with its LU factorization.
If b is small relative to n, then most computation is in the last step, which is a matrix-matrix multiplication.
Similarly, blocked algorithms for the other variants can be derived. All are given in Figure 12.10.

12.12.2

Blocked classical LU factorization with pivoting (Variant 5)

Pivoting can be added to some of the blocked algorithms. Let us focus once again on Variant 5.
Partition A, L, and U as follows:

A00 A01 A02


0
U01 U02
L00 0
U

00

A A10 A11 A12 , L L10 L11 0 , and U 0 U11 U12

A20 A21 A22


L20 L21 L22
0
0 U22
where A00 , L00 , and U00 are k k, and A11 , L11 , and U11 are b b.
Assume that the computation has proceeded to the point where A contains


A
A01 A02
L\U00
U01
U02
00


b11 L10U01 A12 L10U02
A10 A11 A12 = L10
A


b21 L20U01 A22 L20U02
A20 A21 A22
L20
A
b denotes the original contents of A and
where, as before, A

b
b
b
A
A01 A02
L
0 0
U
U01 U02
00
00
00
b

b11 A
b12 = L10 I 0 0 A11 A12
P(p0 ) A10 A

b20 A
b21 A
b22
A
L20 0 I
0 A21 A22
In the current blocked step, we now perform the following computations

Chapter 12. Notes on Gaussian Elimination and LU Factorization

246

Compute the LU factorization with pivoting of the current panel

P(p1 )

A11
A21

A11

A21

L11

U11 ,

L21

overwriting A11 with L\U11 and A21 with L21 .


Correspondingly, swap rows in the remainder of the matrix

A10

A12

A20

A22

:=

P(p1 )

A10

A12

A20

A22

Solve L11U12 = A12 , overwriting A12 with U12 . (This can also be more concisely written as A12 :=
1
L11
A12 .)
Update A22 := A22 A21 A12 .
Careful consideration shows that this puts the matrix A in the state

A00 A01 A02

A10 A11 A12

A20 A21 A22

L\U00



= L10

L20

U01

U02

L\U11

U12

L21

b22 L20U02 L21U12


A

where

b
b01 A
b02
A
A
00
p0
)
b
b11 A
b12
P(
A
A
10
p1
b20 A
b21 A
b22
A

L00



= L10

L20

0
L11
L21

U00 U01 U02

0 0

I
0

U11 U12
0

A22

Similarly, blocked algorithms with pivoting for some of the other variants can be derived. All are given
in Figure 12.10.

12.13

Variations on a Triple-Nested Loop

All LU factorization algorithms presented in this note perform exactly the same floating point operations
(with some rearrangement of data thrown in for the algorithms that perform pivoting) as does the triplenested loop that implements Gaussian elimination:

12.14. Inverting a Matrix

for j = 0, . . . , n 1
for i = j + 1, . . . n 1
i, j := i, j / j, j
for k = j + 1, . . . , n 1

247

(zero the elements below ( j, j) element)


(compute multiplier i, j , overwriting i, j )
(subtract i, j times the jth row from ith row)

i,k := i,k i, j j,k


endfor
endfor
endfor

12.14

Inverting a Matrix

12.14.1

Basic observations

If n n matrix A is invertable (nonsingular, has linearly independent columns, etc), then its inverse, A1 =
X satisfies
AX = I
where I is the identity matrix. Partitioning X and I by columns, we find that
 


A x0 xn1 = e0 en1 ,
where e j equals the jth standard basis vector. Thus Ax j = e j , and hence any method for solving Ax = y
can be used to solve each of these problems, of which there are n.

12.14.2

Via the LU factorization with pivoting

We have learned how to compute the LU factorization with partial pivoting, yielding P, L, and U such that
PA = LU
at an approxiate cost of 32 n3 flops.
Obviously, for each column of X can then solve Ax j = e j or, equivalently,
L(Ux j ) = Pe j .
The cost of this would be approximately 2n2 per column x j for a total of 2n3 flops. This is in addition to
the cost of factoring A in the first place, for a grand total of 83 n3 operations.
Alternatively, one can manipulate AX = I into
LUX = P
or, equivalently,
UX = L1 P
and finally
U(XPT ) = L1 .
this suggests the steps

Chapter 12. Notes on Gaussian Elimination and LU Factorization

248

Compute P, L, and U such that PA = LU.


This costs, approximately 32 n3 .
Invert L, yielding unit lower triangular matrix L1 .
Homework 12.26 will show that this can be done at a cost of, approximately, 13 n3 flops.
Solve UY = L1 .
Homework exercises will show that this can be done at a cost of, approximately, 33 n3 flops. (I need
to check this!)
Permute the columns of the result: X = Y P.
This incurs a O(n2 ) cost.
The total cost now is 2n3 flops.
Homework 12.24 Let L be a unit lower triangular matrix partitioned as

L00 0
.
L=
T
l10
1
Show that

L1 =

1
L00
T L1
l10
00

.
* SEE ANSWER

Homework 12.25 Use Homework 12.24 and Figure 12.12 to propose an algorithm for overwriting a unit
lower triangular matrix L with its inverse.
* SEE ANSWER
Homework 12.26 Show that the cost of the algorithm in Homework 12.25 is, approximately, 13 n3 , where
L is n n.
* SEE ANSWER
Homework 12.27 Given n n upper triangular matrix U and n n unit lower triangular matrix L, propose an algorithm for computing Y where UY = L. Be sure to take advantage of zeros in U and L. The
cost of the resulting algorithm should be, approximately, n3 flops.
There is a way of, given that L and U have overwritten matrix A, overwriting this matrix with Y . Try
to find that algorithm. It is not easy...
* SEE ANSWER

12.14.3

Gauss-Jordan inversion

The above discussion breaks the computation of A1 down into several steps. Alternatively, one can use
variants of the Gauss-Jordan algorithm to invert a matrix, also at a cost of 2n3 flops. For details, see
Chapter 8 of Linear Algebra: Foundations to Frontiers [30] and/or [33]. A thorough discussion of
algorithms for inverting symmetric positive definite (SPD) matrices, which also discusses algorithms for
inverting triangular matrices, can be found in [5].

12.14. Inverting a Matrix

249

Algorithm: [L] := LI NV UNB VAR 1(L)

LT L LT R

Partition L
LBL LBR
whereLT L is 0 0
while m(LT L ) < m(L) do
Repartition

l
L02
L
00 01

l T 11 l T
12
10
LBL LBR
L20 l21 L22
where11 is 1 1

LT L

LT R

Continue with

LT L

LT R

LBL

LBR

L
l
00 01

l T 11
10
L20 l21

L02
T
l12

L22

endwhile
Figure 12.12: Algorithm skeleton for inverting a unit lower triangular matrix.

12.14.4

(Almost) never, ever invert a matrix

There are very few applications for which explicit inversion of a matrix is needed. When talking about
solving Ax = y, the term inverting the matrix is often used, since y = A1 x. However, notice that
inverting the matrix and then multiplying it times vector y requires 2n3 flops for the inversion and then 2n2
flops for the multiplication. In contrast, computing the LU factorization requires 23 n3 flops, after which the
solves with L and U requires an additional 2n2 flops. Thus, it is essentially always beneficial to instead
compute the LU factorization and to use this to solve for x.
There are also numerical stability benefits to this, which we will not discuss now.

Chapter 12. Notes on Gaussian Elimination and LU Factorization

12.15

250

Efficient Condition Number Estimation

Given that the condition number of a matrix tells one how accurately one can expect the solution to Ax = y
to be when y is slightly perturbed, a routine that solves this problem should ideally also return the condition
number of the matrix so that a user can assess the quality of the solution (or, rather, the difficulty of the
posed problem).
Here, we briefly discuss simple techniques that underlie the more sophisticated methods used by LAPACK, focusing on A Rnn .

12.15.1

The problem

The problem is to estimate (A) = kAkkA1 k for some matrix norm induced by a vector norm. If A is a
dense matrix, then inverting the matrix requires approximately 2n3 computations, which is more expensive
than solving Ax = y via, for example, LU factorization with partial pivoting. A second problem is that
computing the 2-norms kAk2 and kA1 k2 is less than straight-forward and computationally expensive.

12.15.2

Insights

The second problem can be solved by choosing either the 1-norm or the -norm. The first problem can
be solved by settling for an estimation of the condition number. For this we will try to find a lower bound
close to the actual condition number, since then the user gains insight into how many digits of accuracy
might be lost and will know this is a poorly conditioned problem if the estimate is large. If the estimate is
small, the jury is out whether the matrix is actually well-conditioned.
Notice that
kA1 k = max kA1 yk kA1 dk = kxk
kyk =1

where Ax = d and kdk = 1. So, we want to find an approximation for the vector (direction) d with
kdk = 1 that has the property that kxk is nearly maximally magnified. Remember from the proof of the
fact that
kAk = max kb
a j k1
j

(where abTj equals the jth row of A) that the maximum


kAk = max kAyk
kyk =1

is attained for a vector

y=

1
..
.

1
which has property that kyk = 1. So, it would be good to look for a vector d with the special property

12.15. Efficient Condition Number Estimation

251

that

d=

1
..
.

12.15.3

A simple approach

It helps to read up on how to solve triangular linear systems, in Chapter ??.


Condition number estimation is usually required in context of finding the solution to Ax = y, where one
also wants to know how well-conditioned the problem is. This means that typically the LU factorization
with partial pivoting is already being computed as part of the solution method. In other words, permutation
matrix P, unit lower triangular matrix L, and upper triangular matrix L are known so that PA = LU. Now
kA1 k1 = kAT k = k(PLU)T k = k((PLU)T )1 k = k(U T LT PT )1 k = kPLT U T k
= kLT U T k .
So, we are looking for x that satisfies U T LT x = d where d has 1 for each of its elements. Importantly,
we get to choose d.
The way a simple approach proceeds is to pick d in such a way in an effort to make kzk large, where
T
U z = d. After this LT x = z is solved and the estimate for kAT k is kxk .
Partition

11 uT12
0

U22

,z

1
z2

, and d

1
d2

Then U T z = d means

11

uT12
U22

z2

1
d2

and hence
11 1 = 1
1 u12 +U22 z2 = d2
Now, 1 can be chosen as 1. (The choice does not make a difference. Might as well make it 1 = 1.)
Then 1 = 1/11 .
Now assume that the process has proceeded to the point where

U
00

U 0

u01 U02
11 uT12
0

U22

z
d

0
0

, z 1 , and d 1

z2
d2

Chapter 12. Notes on Gaussian Elimination and LU Factorization

252

Algorithm: z := A PPROX N ORM U INV T UNB VAR 1(U, z) Algorithm: z := A PPROX N ORM U INV T UNB VAR 2(U, z)
z := 0

,z
UBL UBR
zB
is 0 0, zT has 0 rows

Partition U

Partition U
where UT L

UT L UT R

zT

where UT L

while m(UT L ) < m(U) do

UT L UT R
UBL

U00 u01 U02

z0


z
uT uT , T
10 11 12
1
UBR
zB
U20 u21 U22
z2

UT L UT R
UBL

U00 u01 U02

z2 := z2 1 u12

Continue with

UBL

z0

z0


z
uT uT , T
10 11 12
1
UBR
zB
U20 u21 U22
z2

1 := (sign(1 ) + 1 )/11

UT L UT R

zT

Repartition

1 := (sign(1 ) + 1 )/11

,z
UBL UBR
zB
is 0 0, zT has 0 rows

1 := uT01 z0

while m(UT L ) < m(U) do

Repartition

UT L UT R

Continue with

U00 u01 U02

z0

z
uT 11 uT , T 1
12
10

UBR
zB
U20 u21 U22
z2

endwhile

UT L UT R
UBL

U00 u01 U02

z
uT 11 uT , T 1
12
10

UBR
zB
U20 u21 U22
z2

endwhile

Figure 12.13: Two equivalent algorithms for computing z so that kzk kU T k .


so that U T z = d means
T
U00
z0 = d0
uT01 z0 + 11 1 = 1
T
T
U02
z0 + 1 u12 +U22
z2 = d2 .

Assume that d0 and z0 have already been computed and now 1 and 1 are to be computed. Then
1 = (1 uT01 z0 )/11 .
Constrained by the fact that 1 = 1, the magnitude of 1 can be (locally) maximized by choosing
1 := sign(uT01 z0 )
and then
1 := (1 uT01 z0 )/11 .
This suggests the algorithm in Figure 12.13(left). By then taking the output vector z and solving the unit
upper triangular system LT x = z we compute kxk kAT k = kA1 k1 so that
1 (A) = kAk1 kA1 k1 kAk1 kxk .

12.15. Efficient Condition Number Estimation

253

An alternative algorithm that casts computation in terms of AXPY instead of DOT, put computes the
exact same z as the last algorithm, can be derived as follows: The first (top) entry of d can again be chosen
to equal 1. Assume we have proceeded to the point where

U00 u01 U02


z0
d0

U 0 11 uT12 , z 1 , and d 1

0
0 U22
z2
d2
so that U T z = d again means
T
z0 = d0
U00
T
u01 z0 + 11 1 = 1
T
T
U02 z0 + 1 u12 +U22
z2 = d2 .
Tz .
Assume that d0 and z0 have already been computed and that 1 contains uT01 z0 while z2 contains U02
0
Then
1 := (1 uT01 z0 )/11 = (1 + 1 )/11 .

Constrained by the fact that 1 = 1, the magnitude of 1 can be (locally) maximized by choosing
1 := sign(uT01 z0 ) = sign(1 )
and then
1 := (1 + ))/11 .
After this, z2 is updated with
z2 := z2 1 u12 .
This suggests the algorithm in Figure 12.13(right). It computes the same z as does the algorithm in Figure 12.13(left).
By then taking the output vector z and solving the unit upper triangular system LT x = z we again
compute kxk kAT k = kA1 k1 so that
1 (A) = kAk1 kA1 k1 kAk1 kxk .

12.15.4

Discussion

There is a subtle reason for why we estimate kAk1 by estimating kAT k instead. The reason is that
when the LU factorization with pivoting, PA = LU, is computed, poor conditioning of the problem tends
to translate more into poor conditioning of matrix U than L, since L is chosen to be unit lower triangular
and the magnitudes of its strictly lower triangular elements are bounded by on. So, but choosing d based
on U, we using information about the triangular matrix that is more important.
A secondary reason is that if element growth happens during the factorization, that element growth
exhibits itself in artificially large elements of U, which then translates to a less well-conditioned matrix U.
In this case the condition number of A may not be all that bad, but when its factorization is used to solve
Ax = y the effective conditioning is worse and that effective conditioning is due to U.

Chapter 12. Notes on Gaussian Elimination and LU Factorization

12.16

Wrapup

12.16.1

Additional exercises

12.16.2

Summary

254

Chapter

13

Notes on Cholesky Factorization


13.1

Opening Remarks

Video
Read disclaimer regarding the videos in the preface!
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)

13.1.1

Launch

255

Chapter 13. Notes on Cholesky Factorization

13.1.2

Outline
Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

13.1
13.1.1

Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

13.1.2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

13.1.3

What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

13.2

Definition and Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

13.3

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

13.4

An Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

13.5

Proof of the Cholesky Factorization Theorem . . . . . . . . . . . . . . . . . . . . 260

13.6

Blocked Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

13.7

Alternative Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

13.8

Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

13.9

Solving the Linear Least-Squares Problem via the Cholesky Factorization . . . . 265

13.10

Other Cholesky Factorization Algorithms . . . . . . . . . . . . . . . . . . . . . 265

13.11

Implementing the Cholesky Factorization with the (Traditional) BLAS . . . . . . 267

13.11.1

What are the BLAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

13.11.2

A simple implementation in Fortran . . . . . . . . . . . . . . . . . . . . . . . . . 270

13.11.3

Implemention with calls to level-1 BLAS . . . . . . . . . . . . . . . . . . . . . . 270

13.11.4

Matrix-vector operations (level-2 BLAS) . . . . . . . . . . . . . . . . . . . . . . 270

13.11.5

Matrix-matrix operations (level-3 BLAS) . . . . . . . . . . . . . . . . . . . . . . 274

13.11.6

Impact on performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

13.12

Alternatives to the BLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

13.12.1

The FLAME/C API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

13.12.2

BLIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

13.13

Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

13.13.1

Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

13.13.2

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

256

13.1. Opening Remarks

13.1.3

What you will learn

257

Chapter 13. Notes on Cholesky Factorization

13.2

258

Definition and Existence

This operation is only defined for Hermitian positive definite matrices:


Definition 13.1 A matrix A Cmm is Hermitian positive definite (HPD) if and only if it is Hermitian
(AH = A) and for all nonzero vectors x Cm it is the case that xH Ax > 0. If in addition A Rmm then A
is said to be symmetric positive definite (SPD).
(If you feel uncomfortable with complex arithmetic, just replace the word Hermitian with Symmetric
in this document and the Hermitian transpose operation, H , with the transpose operation, T .
Example 13.2 Consider the case where m = 1 so that A is a real scalar, . Notice that then A is SPD if
and only if > 0. This is because then for all nonzero R it is the case that 2 > 0.
First some exercises:
Homework 13.3 Let B Cmn have linearly independent columns. Prove that A = BH B is HPD.
* SEE ANSWER
Homework 13.4 Let A Cmm be HPD. Show that its diagonal elements are real and positive.
* SEE ANSWER
We will prove the following theorem in Section 13.5:
Theorem 13.5 (Cholesky Factorization Theorem) Given a HPD matrix A there exists a lower triangular
matrix L such that A = LLH .
Obviously, there similarly exists an upper triangular matrix U such that A = U H U since we can choose
U H = L.
The lower triangular matrix L is known as the Cholesky factor and LLH is known as the Cholesky
factorization of A. It is unique if the diagonal elements of L are restricted to be positive.
The operation that overwrites the lower triangular part of matrix A with its Cholesky factor will be
denoted by A := CholA, which should be read as A becomes its Cholesky factor. Typically, only the
lower (or upper) triangular part of A is stored, and it is that part that is then overwritten with the result. In
this discussion, we will assume that the lower triangular part of A is stored and overwritten.

13.3

Application

The Cholesky factorization is used to solve the linear system Ax = y when A is HPD: Substituting the
factors into the equation yields LLH x = y. Letting z = LH x,
Ax = L (LH x) = Lz = y.
| {z }
z
Thus, z can be computed by solving the triangular system of equations Lz = y and subsequently the desired
solution x can be computed by solving the triangular linear system LH x = z.

13.4. An Algorithm

13.4

259

An Algorithm

The most common algorithm for computing A := CholA can be derived as follows: Consider A = LLH .
Partition

11 ?
0
and L = 11
.
(13.1)
A=
a21 A22
l21 L22

Remark 13.6 We adopt the commonly used notation where Greek lower case letters refer to scalars, lower
case letters refer to (column) vectors, and upper case letters refer to matrices. (This is convention is often
attributed to Alston Householder.) The ? refers to a part of A that is neither stored nor updated.
By substituting these partitioned matrices into A = LLH we find that

11

a21 A22

11

l21

L22

11

|11 |2

l21
?

11

=
L22
l21

H + L LH
11 l21 l21 l21
22 22

L22

11

H
l21

H
L22

from which we conclude that


|11 | =

11

H
l21 = a21 /11 L22 = CholA22 l21 l21

These equalities motivate the algorithm

1. Partition A

11

a21 A22

2. Overwrite 11 := 11 =
ness.)

11 . (Picking 11 = 11 makes it positive and real, and ensures unique-

3. Overwrite a21 := l21 = a21 /11 .


H (updating only the lower triangular part of A ). This operation is
4. Overwrite A22 := A22 l21 l21
22
called a symmetric rank-1 update.

5. Continue with A = A22 . (Back to Step 1.)


Remark 13.7 Similar to the tril function in Matlab, we use tril (B) to denote the lower triangular part
of matrix B.

Chapter 13. Notes on Cholesky Factorization

13.5

260

Proof of the Cholesky Factorization Theorem

In this section, we partition A as in (13.1):

11

aH
21

a21 A22

The following lemmas are key to the proof:


Lemma 13.8 Let A Cnxn be HPD. Then 11 is real and positive.
Proof: This is special case of Exercise 13.4.

H is HPD.
Lemma 13.9 Let A Cmm be HPD and l21 = a21 / 11 . Then A22 l21 l21
H.
Proof: Since A is Hermitian so are A22 and A22 l21 l21

Let x2 C(n1)(n1) be an arbitrary nonzero vector. Define x =

1
x2

where 1 = aH x2 /11 .
21

Then, since x 6= 0,
H

0 < xH Ax =

x2

11

1
x2

aH
21

a21 A22

11 1 + aH
21 x2
a21 1 + A22 x2

x2

H
H
= 11 |1 |2 + 1 aH
21 x2 + x2 a21 1 + x2 A22 x2
aH x2
aH x2 x2H a21 x2H a21 H
= 11 21

a21 x2 x2H a21 21 + x2H A22 x2


11 11
11
11
H
a21 a21
H
H
= x2H (A22
)x2 (since x2H a21 aH
21 x2 is real and hence equals a21 x2 x2 a21 )
11
H
H
= x2 (A22 l21 l21
)x2 .
H is HPD.
We conclude that A22 l21 l21
Proof: of the Cholesky Factorization Theorem
Proof by induction.

Base case: n = 1. Clearly the result is true for a 1 1 matrix A = 11 : In this case, the fact that A is

HPD means that 11 is real and positive and a Cholesky factor is then given by 11 = 11 , with
uniqueness if we insist that 11 is positive.
Inductive step: Assume the result is true for HPD matrix A C(n1)(n1) . We will show that it holds for

A Cnn . Let A Cnn be HPD. Partition A and L as in (13.1) and let 11 = 11 (which is wellH (which exists as a consequence
defined by Lemma 13.8), l21 = a21 /11 , and L22 = CholA22 l21 l21
of the Inductive Hypothesis and Lemma 13.9). Then L is the desired Cholesky factor of A.
By the principle of mathematical induction, the theorem holds.

13.6. Blocked Algorithm

13.6

261

Blocked Algorithm

In order to attain high performance, the computation is cast in terms of matrix-matrix multiplication by
so-called blocked algorithms. For the Cholesky factorization a blocked version of the algorithm can be
derived by partitioning

L
0
A11 ?
and L 11
,
A
A21 A22
L21 L22
where A11 and L11 are b b. By substituting these partitioned matrices into A = LLH we find that

A11

A21 A22

L11

L21 L22

L11

L21 L22

H
L11 L11
H
L21 L11

?
H + L LH
L21 L21
22 22

From this we conclude that


L11 = CholA11

H
L21 = A21 L11

H
L22 = CholA22 L21 L21

An algorithm is then described by the steps

A11 ?
, where A11 is b b.
1. Partition A
A21 A22
2. Overwrite A11 := L11 = CholA11 .
H
3. Overwrite A21 := L21 = A21 L11
.
H (updating only the lower triangular part).
4. Overwrite A22 := A22 L21 L21

5. Continue with A = A22 . (Back to Step 1.)


Remark 13.10 The Cholesky factorization A11 := L11 = CholA11 can be computed with the unblocked
algorithm or by calling the blocked Cholesky factorization algorithm recursively.
H
Remark 13.11 Operations like L21 = A21 L11
are computed by solving the equivalent linear system with
H
H
multiple right-hand sides L11 L21 = A21 .

13.7

Alternative Representation

When explaining the above algorithm in a classroom setting, invariably it is accompanied by a picture
sequence like the one in Figure 13.1(left)1 and the (verbal) explanation:
1

Picture modified from a similar one in [25].

Chapter 13. Notes on Cholesky Factorization

done

done

done

partially
updated

262

Beginning of iteration

AT L

ABL

ABR

A00

Repartition

aT10 11

A20 a21

A22

UPD.

Update
UPD.

11

done

A22

a21
11

UPD.

done
@
R
@

a21 aT21

done

AT L

ABL

ABR

End of iteration
partially
updated

Figure 13.1: Left: Progression of pictures that explain Cholesky factorization algorithm. Right: Same
pictures, annotated with labels and updates.

13.7. Alternative Representation

263

Algorithm: A := C HOL UNB(A)

AT L ?

Partition A
ABL ABR
where AT L is 0 0

Algorithm: A := C HOL BLK(A)

AT L ?

Partition A
ABL ABR
where AT L is 0 0

while m(AT L ) < m(A) do

while m(AT L ) < m(A) do

Repartition

Repartition

AT L
ABL

11 :=

A00 ?

aT10 11 ?

ABR
A20 a21 A22

11

ABL

endwhile

A10 A11 ?

ABR
A20 A21 A22

Continue with

ABL

A22 := A22 tril A21 AH


21

Continue with
AT L

AT L

A00 ?

A21 := A21 tril (A11 )H

A22 := A22 tril a21 aH


21

A11 := CholA11

a21 := a21 /11

A00 ?

a10 11 ?

ABR
A20 a21 A22

AT L
ABL

A00 ?

A10 A11 ?

ABR
A20 A21 A22

endwhile

Figure 13.2: Unblocked and blocked algorithms for computing the Cholesky factorization in FLAME
notation.

Chapter 13. Notes on Cholesky Factorization

264

Beginning of iteration: At some stage of the algorithm (Top of the loop), the computation has moved
through the matrix to the point indicated by the thick lines. Notice that we have finished with the
parts of the matrix that are in the top-left, top-right (which is not to be touched), and bottom-left
quadrants. The bottom-right quadrant has been updated to the point where we only need to perform
a Cholesky factorization of it.
Repartition: We now repartition the bottom-right submatrix to expose 11 , a21 , and A22 .
Update: 11 , a21 , and A22 are updated as discussed before.
End of iteration: The thick lines are moved, since we now have completed more of the computation, and
only a factorization of A22 (which becomes the new bottom-right quadrant) remains to be performed.
Continue: The above steps are repeated until the submatrix ABR is empty.
To motivate our notation, we annotate this progression of pictures as in Figure 13.1 (right). In those
pictures, T, B, L, and R stand for Top, Bottom, Left, and Right, respectively. This then
motivates the format of the algorithm in Figure 13.2 (left). It uses what we call the FLAME notation for
representing algorithms [25, 24, 40]. A similar explanation can be given for the blocked algorithm, which
is given in Figure 13.2 (right). In the algorithms, m(A) indicates the number of rows of matrix A.
Remark 13.12 The indices in our more stylized presentation of the algorithms are subscripts rather than
indices in the conventional sense.
Remark 13.13 The notation in Figs. 13.1 and 13.2 allows the contents of matrix A at the beginning of the
iteration to be formally stated:

?
?
LT L
AT L
=
A=
 ,
H

ABL ABR
LBL ABR tril LBL LBL

where LT L = CholA T L , LBL = A BL LTH


L , and AT L , ABL and ABR denote the original contents of the quadrants
AT L , ABL and ABR , respectively.
Homework 13.14 Implement the Cholesky factorization with M-script.
* SEE ANSWER

13.8

Cost

The cost of the Cholesky factorization of A Cmm can be analyzed as follows: In Figure 13.2 (left)
during the kth iteration (starting k at zero) A00 is k k. Thus, the operations in that iteration cost

11 := 11 : negligible when k is large.


a21 := a21 /11 : approximately (m k 1) flops.

2
A22 := A22 tril a21 aH
21 : approximately (m k 1) flops. (A rank-1 update of all of A22 would
2
have cost 2(m k 1) flops. Approximately half the entries of A22 are updated.)

13.9. Solving the Linear Least-Squares Problem via the Cholesky Factorization

265

Thus, the total cost in flops is given by


m1

m1

(m k 1)2

CChol (m)

|k=0 {z
}
(Due to update of A22 )
m1

m1

j2 +

j=0

j=0

(m k 1)

|k=0 {z
}
(Due to update of a21 )

1
1
1
j m3 + m2 m3
3
2
3

which allows us to state that (obvious) most computation is in the update of A22 . It can be shown that the
blocked Cholesky factorization algorithm performs exactly the same number of floating point operations.
Comparing the cost of the Cholesky factorization to that of the LU factorization from Notes on LU
Factorization we see that taking advantage of symmetry cuts the cost approximately in half.

13.9

Solving the Linear Least-Squares Problem via the Cholesky


Factorization

Recall that if B Cmn has linearly independent columns, then A = BH B is HPD. Also, recall from Notes
on Linear Least-Squares that the solution, x Cn to the linear least-squares (LLS) problem
kBx yk2 = minn kBz yk2
zC

equals the solution to the normal equations


H
B
B x = BH y .
|{z}
|{z}
A
y

This makes it obvious how the Cholesky factorization can (and often is) used to solve the LLS problem.
Homework 13.15 Consider B Cmn with linearly independent columns. Recall that B has a QR factorization, B = QR where Q has orthonormal columns and R is an upper triangular matrix with positive
diagonal elements. How are the Cholesky factorization of BH B and the QR factorization of B related?
* SEE ANSWER

13.10

Other Cholesky Factorization Algorithms

There are actually three different unblocked and three different blocked algorithms for computing the
Cholesky factorization. The algorithms we discussed so far in this note are sometimes called right-looking
algorithms. Systematic derivation of all these algorithms, as well as their blocked counterparts, are given
in Chapter 6 of [40]. In this section, a sequence of exercises leads to what is often referred to as the
bordered Cholesky factorization algorithm.

Chapter 13. Notes on Cholesky Factorization

266

Algorithm: A := C HOL UNB(A) Bordered algorithm)

?
AT L

Partition A
ABL ABR
whereAT L is 0 0
while m(AT L ) < m(A) do
Repartition

AT L

A00

aT
10
ABL ABR
A20
where11 is 1 1

?
11
a21

A22

Continue with

AT L
ABL

?
ABR

A00

aT
10
A20

?
11
a21

A22

endwhile

Figure 13.3: Unblocked Cholesky factorization, bordered variant, for Homework 13.17.

13.11. Implementing the Cholesky Factorization with the (Traditional) BLAS

267

Homework 13.16 Let A be SPD and partition

A00 a10
aT10 11

(Hint: For this exercise, use techniques similar to those in Section 13.5.)
1. Show that A00 is SPD.
T , where L is lower triangular and nonsingular, argue that the assign2. Assuming that A00 = L00 L00
00
T
T
T
ment l10 := a10 L00 is well-defined.
T where L is lower triangular and nonsingular, and l T =
3. Assuming that A00 is SPD, A00 = L00 L00
00
10
q
T
T
T
T
a10 L00 , show that 11 l10 l10 > 0 so that 11 := 11 l10 l10 is well-defined.

4. Show that

A00 a10
aT10

11

L00
T
l10

11

L00
T
l10

11
* SEE ANSWER

Homework 13.17 Use the results in the last exercise to give an alternative proof by induction of the
Cholesky Factorization Theorem and to give an algorithm for computing it by filling in Figure 13.3. This
algorithm is often referred to as the bordered Cholesky factorization algorithm.
* SEE ANSWER
Homework 13.18 Show that the cost of the bordered algorithm is, approximately, 13 n3 flops.
* SEE ANSWER

13.11

Implementing the Cholesky Factorization with the (Traditional)


BLAS

The Basic Linear Algebra Subprograms (BLAS) are an interface to commonly used fundamental linear
algebra operations. In this section, we illustrate how the unblocked and blocked Cholesky factorization
algorithms can be implemented in terms of the BLAS. The explanation draws from the entry we wrote for
the BLAS in the Encyclopedia of Parallel Computing [39].

13.11.1

What are the BLAS?

The BLAS interface [29, 19, 18] was proposed to support portable high-performance implementation of
applications that are matrix and/or vector computation intensive. The idea is that one casts computation in
terms of the BLAS interface, leaving the architecture-specific optimization of that interface to an expert.

Simple

Chapter 13. Notes on Cholesky Factorization

Algorithm

Code

for j = 1 : n

j, j := j, j

do j=1, n
A(j,j) = sqrt(A(j,j))

for i = j + 1 : n
i, j := i, j / j, j
endfor
for k = j + 1 : n
for i = k : n
i,k := i,k i, j k, j
endfor
endfor
endfor
for j = 1 : n

j, j := j, j

Vector-vector

268

do i=j+1,n
A(i,j) = A(i,j) / A(j,j)
enddo
do k=j+1,n
do i=k,n
A(i,k) = A(i,k) - A(i,j) * A(k,j)
enddo
enddo
enddo
do j=1, n
A( j,j ) = sqrt( A( j,j ) )
call dscal( n-j, 1.0d00 / A(j,j), A(j+1,j), 1 )

j+1:n, j := j+1:n, j / j, j
for k = j + 1 : n
k:n,k := k, j k:n, j + k:n,k
endfor
endfor

do k=j+1,n
call daxpy( n-k+1, -A(k,j), A(k,j), 1,
A(k,k), 1 )
enddo
enddo

Figure 13.4: Simple and vector-vector (level-1 BLAS) based representations of the right-looking algorithm.

Matrix-vector

13.11. Implementing the Cholesky Factorization with the (Traditional) BLAS

Algorithm

Code

for j = 1 : n

j, j := j, j

do j=1, n
A(j,j) = sqrt(A(j,j))

j+1:n, j := j+1:n, j / j, j

call dscal( n-j, 1.0d00 / A(j,j), A(j+1,j), 1 )

j+1:n, j+1:n :=
call dsyr( lower triangular,
tril( j+1:n, j Tj+1:n, j ) + j+1:n, j+1:n
n-j, -1.0, A(j+1,j), 1, A(j+1,j+1), lda )
endfor
enddo
int Chol_unb_var3( FLA_Obj A )

{
AT L ?
FLA_Obj ATL,
ATR,
A00, a01,
A02,

Partition A
ABL,
ABR,
a10t, alpha11, a12t,
ABL ABR
A20, a21,
A22;
whereAT L is 0 0
while m(AT L ) < m(A) do

FLAME Notation

269

FLA_Part_2x2( A,

&ATL, &ATR,
&ABL, &ABR,

0, 0, FLA_TL );

Repartition

AT L

A00

while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){


FLA_Repart_2x2_to_3x3(
ATL,/**/ ATR,
&A00, /**/ &a01,
&A02,
/* ************ */ /* ************************** */
&a10t,/**/ &alpha11, &a12t,
ABL,/**/ ABR,
&A20, /**/ &a21,
&A22,
1, 1, FLA_BR );
/*-------------------------------------------------*/
FLA_Sqrt( alpha11 );
FLA_Invscal( alpha11, a21 );
FLA_Syr( FLA_LOWER_TRIANGULAR, FLA_MINUS_ONE,
A21, A22 );
/*-------------------------------------------------*/
FLA_Cont_with_3x3_to_2x2(
&ATL,/**/ &ATR,
A00, a01,
/**/ A02,
a10t, alpha11,/**/ a12t,
/* ************* */ /* *********************** */
&ABL,/**/ &ABR,
A20, a21,
/**/ A22,
FLA_TL );
}
return FLA_SUCCESS;

aT
10 11 ?
ABL ABR
A20 a21 A22
where11 is 1 1

11 :=

11

a21 := a21 /11


A22 := A22 tril a21 aH
21

Continue with

AT L
ABL

A00 ?

aT 11 ?
10

ABR
A20 a21 A22

endwhile
}

Figure 13.5: Matrix-vector (level-2 BLAS) based representations of the right-looking algorithm.

Chapter 13. Notes on Cholesky Factorization

13.11.2

270

A simple implementation in Fortran

We start with a simple implementation in Fortran. A simple algorithm that does not use BLAS and the
corresponding code in given in the row labeled Simple in Figure 13.4. This sets the stage for our
explaination of how the algorithm and code can be represented with vector-vector, matrix-vector, and
matrix-matrix operations, and the corresponding calls to BLAS routines.

13.11.3

Implemention with calls to level-1 BLAS

The first BLAS interface [29] was proposed in the 1970s when vector supercomputers like the early Cray
architectures reigned. On such computers, it sufficed to cast computation in terms of vector operations.
As long as memory was accessed mostly contiguously, near-peak performance could be achieved. This
interface is now referred to as the Level-1 BLAS. It was used for the implementation of the first widely
used dense linear algebra package, LINPACK [17].
Let x and y be vectors of appropriate length and be scalar. In this and other notes we have vectorvector operations such as scaling of a vector (x := x), inner (dot) product ( := xT y), and scaled vector
addition (y := x + y). This last operation is known as an axpy, which stands for alpha times x plus y.
Our Cholesky factorization algorithm expressed in terms of such vector-vector operations and the corresponding code are given in Figure 13.4 in the row labeled Vector-vector. If the operations supported
by dscal and daxpy achieve high performance on a target archecture (as it was in the days of vector
supercomputers) then so will the implementation of the Cholesky factorization, since it casts most computation in terms of those operations. Unfortunately, vector-vector operations perform O(n) computation on
O(n) data, meaning that these days the bandwidth to memory typically limits performance, since retrieving a data item from memory is often more than an order of magnitude more costly than a floating point
operation with that data item.
We summarize information about level-1 BLAS in Figure 13.6.

13.11.4

Matrix-vector operations (level-2 BLAS)

The next level of BLAS supports operations with matrices and vectors. The simplest example of such
an operation is the matrix-vector product: y := Ax where x and y are vectors and A is a matrix. Another
example is the computation A22 = a21 aT21 + A22 (symmetric rank-1 update) in the Cholesky factorization.
Here only the lower (or upper) triangular part of the matrix is updated, taking advantage of symmetry.
The use of symmetric rank-1 update is illustrated in Figure 13.5, in the row labeled Matrix-vector.
There dsyr is the routine that implements a double precision symmetric rank-1 update. Readability of the
code is improved by casting computation in terms of routines that implement the operations that appear in
the algorithm: dscal for a21 = a21 /11 and dsyr for A22 = a21 aT21 + A22 .
If the operation supported by dsyr achieves high performance on a target archecture (as it was in
the days of vector supercomputers) then so will this implementation of the Cholesky factorization, since
it casts most computation in terms of that operation. Unfortunately, matrix-vector operations perform
O(n2 ) computation on O(n2 ) data, meaning that these days the bandwidth to memory typically limits
performance.
We summarize information about level-2 BLAS in Figure 13.7.

13.11. Implementing the Cholesky Factorization with the (Traditional) BLAS

A proto-typical calling sequence for a level-1 BLAS routine is


axpy( n, alpha, x, incx, y, incy ),
which implements the scaled vector addition operation y = x + y. Here
The  indicates the data type. The choices for this first letter are
s single precision
d double precision
c single precision complex
z double precision complex
The operation is identified as axpy: alpha times x plus y.
n indicates the number of elements in the vectors x and y.
alpha is the scalar .
x and y indicate the memory locations where the first elements of x and y are stored,
respectively.
incx and incy equal the increment by which one has to stride through memory for the
elements of vectors x and y, respectively
The following are the most frequently used level-1 BLAS:
routine/ operation
function
swap

xy

scal

x x

copy

yx

axpy

y x + y

dot

xT y

nrm2

kxk2

asum

kre(x)k1 + kim(x)k1

imax

min(k) : |re(xk )| + |im(xk )| = max(|re(xi )| + |im(xi )|)

Figure 13.6: Summary of the most commonly used level-1 BLAS.

271

Chapter 13. Notes on Cholesky Factorization

272

The naming convention for level-2 BLAS routines is given by


XXYY,
where
 can take on the values s, d, c, z.
XX indicates the shape of the matrix.
YY indicates the operation to be performed:
XX matrix shape

YY matrix shape

ge general (rectangular)

mv matrix vector multiplication

sy symmetric

sv solve vector

he Hermitian

tr triangular

r2 rank-2 update

rank-1 update

In addition, operations with banded matrices are supported, which we do not discuss
here.
A representative call to a level-2 BLAS operation is given by
dsyr( uplo, n, alpha, x, incx, A, lda )
which implements the operation A = xxT + A, updating the lower or upper triangular part of
A by choosing uplo as Lower triangular or Upper triangular, respectively. The
parameter lda (the leading dimension of matrix A) indicates the increment by which memory
has to be traversed in order to address successive elements in a row of matrix A.
The following table gives the most commonly used level-2 BLAS operations:
routine/ operation
function
gemv

general matrix-vector multiplication

symv

symmetric matrix-vector multiplication

trmv

triangular matrix-vector multiplication

trsv

triangular solve vector

ger

general rank-1 update

syr

symmetric rank-1 update

syr2

symmetric rank-2 update

Figure 13.7: Summary of the most commonly used level-2 BLAS.

13.11. Implementing the Cholesky Factorization with the (Traditional) BLAS

Algorithm

273

Code
do j=1, n, nb
jb = min( nb, n-j+1 )
call chol( jb, A( j, j ), lda )

for j = 1 : n in steps of nb
b := min(n j + 1, nb )

call dtrsm( Right, Lower triangular,


Transpose, Nonunit diag,
J-JB+1, JB, 1.0d00, A( j, j ),
lda,
A( j+jb, j ), lda )

Matrix-matrix

A j: j+b1, j: j+b1 := CholA j: j+b1, j: j+b1


A j+b:n, j: j+b1 :=
A j+b:n, j: j+b1 AH
j: j+b1, j: j+b1

call dsyrk( Lower triangular, No transpose,


J-JB+1, JB, -1.0d00, A( j+jb, j ),
lda,
1.0d00, A( j+jb, j+jb ), lda )
enddo
int Chol_unb_var3( FLA_Obj A )
{
FLA_Obj ATL,
ATR,
A00, a01,
A02,
ABL,
ABR,
a10t, alpha11, a12t,
A20, a21,
A22;

A j+b:n, j+b:n := A j+b:n, j+b:n




tril Aj+b:n,j:j+b1 AH
j+b:n,j:j+b1
endfor

Partition A
whereAT L

AT L

ABL ABR
is 0 0

while m(AT L ) < m(A) do

FLAME Notation

FLA_Part_2x2( A,
Determine block size b
Repartition

A00 ? ?

A
?

TL
A A
10 11 ?
ABL ABR
A20 A21 A22
whereA11 is b b

A21 := A21 tril (A11 )H




Continue with

AT L
ABL

A00 ?

0, 0, FLA_TL );

while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){


FLA_Repart_2x2_to_3x3(
ATL,/**/ ATR,
&A00, /**/ &a01,
&A02,
/* ************ */ /* ************************** */
&a10t,/**/ &alpha11, &a12t,
ABL,/**/ ABR,
&A20, /**/ &a21,
&A22,
1, 1, FLA_BR );
/*-------------------------------------------------*/
FLA_Sqrt( alpha11 );
FLA_Invscal( alpha11, a21 );
FLA_Syr( FLA_LOWER_TRIANGULAR, FLA_MINUS_ONE,
A21, A22 );
/*-------------------------------------------------*/
FLA_Cont_with_3x3_to_2x2(
&ATL,/**/ &ATR,
A00, a01,
/**/ A02,
a10t, alpha11,/**/ a12t,
/* ************* */ /* *********************** */
&ABL,/**/ &ABR,
A20, a21,
/**/ A22,
FLA_TL );
}
return FLA_SUCCESS;

A11 := CholA11
A22 := A22 tril A21 AH
21

&ATL, &ATR,
&ABL, &ABR,

A10 A11 ?

ABR
A20 A21 A22

endwhile
}

Figure 13.8: Blocked algorithm and implementation with level-3 BLAS.

Chapter 13. Notes on Cholesky Factorization

274

The naming convention for level-3 BLAS routines are similar to those for the level-2 BLAS.
A representative call to a level-3 BLAS operation is given by
dsyrk( uplo, trans, n, k, alpha, A, lda, beta, C, ldc )
which implements the operation C := AAT + C or C := AT A + C depending on whether
trans is chosen as No transpose or Transpose, respectively. It updates the lower or
upper triangular part of C depending on whether uplo equal Lower triangular or Upper
triangular, respectively. The parameters lda and ldc are the leading dimensions of arrays
A and C, respectively.
The following table gives the most commonly used Level-3 BLAS operationsx
routine/ operation
function
gemm

general matrix-matrix multiplication

symm

symmetric matrix-matrix multiplication

trmm

triangular matrix-matrix multiplication

trsm

triangular solve with multiple right-hand sides

syrk

symmetric rank-k update

syr2k symmetric rank-2k update

Figure 13.9: Summary of the most commonly used level-3 BLAS.

13.11.5

Matrix-matrix operations (level-3 BLAS)

Finally, we turn to the implementation of the blocked Cholesky factorization algorithm from Section 13.6.
The algorithm is expressed with FLAME notation and Matlab-like notation in Figure 13.8.
The routines dtrsm and dsyrk are level-3 BLAS routines:
T =A .
The call to dtrsm implements A21 := L21 where L21 L11
21
T +A .
The call to dsyrk implements A22 := L21 L21
22

The bulk of the computation is now cast in terms of matrix-matrix operations which can achieve high
performance.
We summarize information about level-3 BLAS in Figure 13.9.

13.11.6

Impact on performance

Figure 13.10 illustrates the performance benefits that come from using the different levels of BLAS on a
typical architecture.

13.12. Alternatives to the BLAS

275
One thread

10

GFlops

Hand optimized
BLAS3
BLAS2
BLAS1
triple loops
0

200

400

600

800

1000
n

1200

1400

1600

1800

2000

Figure 13.10: Performance of the different implementations of Cholesky factorization that use different
levels of BLAS. The target processor has a peak of 11.2 Gflops (billions of floating point operations per
second). BLAS1, BLAS2, and BLAS3 indicate that the bulk of computation was cast in terms of level-1,
-2, or -3 BLAS, respectively.

13.12

Alternatives to the BLAS

13.12.1

The FLAME/C API

In a number of places in these notes we presented algorithms in FLAME notation. Clearly, there is a
disconnect between this notation and how the algorithms are then encoded with the BLAS interface. In
Figures 13.4, 13.5, and 13.8 we also show how the FLAME API for the C programming language [6]
allows the algorithms to be more naturally translated into code. While the traditional BLAS interface
underlies the implementation of Cholesky factorization and other algorithms in the widely used LAPACK
library [1], the FLAME/C API is used in our libflame library [25, 41, 42].

13.12.2

BLIS

The implementations that call BLAS in this paper are coded in Fortran. More recently, the languages of
choice for scientific computing have become C and C++. While there is a C interface to the traditional
BLAS called the CBLAS [11], we believe a more elegant such interface is the BLAS-like Library Instantiation Software (BLIS) interface [43]. BLIS is not only a framework for rapid implementation of the
traditional BLAS, but also presents an alternative interface for C and C++ users.

Chapter 13. Notes on Cholesky Factorization

13.13

Wrapup

13.13.1

Additional exercises

13.13.2

Summary

276

Chapter

14

Notes on Eigenvalues and Eigenvectors


If you have forgotten how to find the eigenvalues and eigenvectors of 2 2 and 3 3 matrices, you may
want to review
Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30].

Video
Read disclaimer regarding the videos in the preface!
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)

277

Chapter 14. Notes on Eigenvalues and Eigenvectors

14.0.1

278

Outline

Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
14.0.1

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

14.1

Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

14.2

The Schur and Spectral Factorizations . . . . . . . . . . . . . . . . . . . . . . . 281

14.3

Relation Between the SVD and the Spectral Decomposition . . . . . . . . . . . . 283

14.1

Definition

Definition 14.1 Let A Cmm . Then C and nonzero x Cm are said to be an eigenvalue and corresponding eigenvector if Ax = x. The tuple (, x) is said to be an eigenpair. The set of all eigenvalues of A
is denoted by (A) and is called the spectrum of A. The spectral radius of A, (A), equals the magnitude
of the largest eigenvalue, in magnitude:
(A) = max(A) ||.
The action of A on an eigenvector x is as if it were multiplied by a scalar. The direction does not
change, only its length is scaled:
Ax = x.
Theorem 14.2 Scalar is an eigenvalue of A if and only if

is singular

has a nontrivial null-space

has linearly dependendent columns


(I A)

det(I A) = 0

(I A)x = 0 has a nontrivial solution

etc.
The following exercises expose some other basic properties of eigenvalues and eigenvectors:
Homework 14.3 Eigenvectors are not unique.
* SEE ANSWER
Homework 14.4 Let be an eigenvalue of A and let E (A) = {x Cm |Ax = x} denote the set of all
eigenvectors of A associated with (including the zero vector, which is not really considered an eigenvector). Show that this set is a (nontrivial) subspace of Cm .
* SEE ANSWER
Definition 14.5 Given A Cmm , the function pm () = det(I A) is a polynomial of degree at most m.
This polynomial is called the characteristic polynomial of A.

14.1. Definition

279

The definition of pm () and the fact that is a polynomial of degree at most m is a consequence of the
definition of the determinant of an arbitrary square matrix. This definition is not particularly enlightening
other than that it allows one to succinctly related eigenvalues to the roots of the characteristic polynomial.
Remark 14.6 The relation between eigenvalues and the roots of the characteristic polynomial yield a
disconcerting insight: A general formula for the eigenvalues of a m m matrix with m > 4 does not exist.
The reason is that there is no general formula for the roots of a polynomial of degree m > 4. Given any
polynomial pm () of degree m, an m m matrix can be constructed such that its characteristic polynomial
is pm (). If
pm () = 0 + 1 + + m1 m1 + m
and

A=

n1 n2 n3 1 0
1

0
..
.

0
..
.

1
..
.

..
.

0
..
.

0
..
.

then
pm () = det(I A)
Hence, we conclude that no general formula can be found for the eigenvalues for m m matrices when
m > 4. What we will see in future Notes on ... is that we will instead create algorithms that converge to
the eigenvalues and/or eigenvalues of matrices.
Theorem 14.7 Let A Cmm and pm () be its characteristic polynomial. Then (A) if and only if
pm () = 0.
Proof: This is an immediate consequence of Theorem 14.2.
In other words, is an eigenvalue of A if and only if it is a root of pm (). This has the immediate
consequence that A has at most m eigenvalues and, if one counts multiple roots by their multiplicity, it has
exactly m eigenvalues. (One says Matrix A Cmm has m eigenvalues, multiplicity counted.)
Homework 14.8 The eigenvalues of a diagonal matrix equal the values on its diagonal. The eigenvalues
of a triangular matrix equal the values on its diagonal.
* SEE ANSWER
Corollary 14.9 If A Rmm is real valued then some or all of its eigenvalues may be complex valued. In
this case, if (A) then so is its conjugate, .
Proof: It can be shown that if A is real valued, then the coefficients of its characteristic polynomial are
all real valued. Complex roots of a polynomial with real coefficients come in conjugate pairs.
It is not hard to see that an eigenvalue that is a root of multiplicity k has at most k eigenvectors. It
is, however, not necessarily the case that an eigenvalue that is a root of multiplicity k also has k linearly

Chapter 14. Notes on Eigenvalues and Eigenvectors

280

independent eigenvectors. In other words, the null space of I A may have dimension less then the
algebraic multiplicity of . The prototypical counter example is the k k matrix

0 1 ...

. .
. .. ... ...
J() =
.

0 0 0 ...

0 0 0

0 0

0
0
..
.

0
..
.

where k > 1. Observe that I J() is singular if and only if = . Since I J() has k 1 linearly
independent columns its null-space has dimension one: all eigenvectors are scalar multiples of each other.
This matrix is known as a Jordan block.
Definition 14.10 A matrix A Cmm that has fewer than m linearly independent eigenvectors is said to
be defective. A matrix that does have m linearly independent eigenvectors is said to be nondefective.
Theorem 14.11 Let A Cmm . There exist nonsingular matrix X and diagonal matrix such that A =
XX 1 if and only if A is nondefective.
Proof:
(). Assume there exist nonsingular matrix X and diagonal matrix so that A = XX 1 . Then, equivalently, AX = X. Partition X by columns so that

x0 x1 xm1

x0 x1 xm1

0
..
.

1
.. ..
.
.

0
..
.

0


0 x0 1 x1 m1 xm1

m1
.

Then, clearly, Ax j = j x j so that A has m linearly independent eigenvectors and is thus nondefective.
(). Assume that A is nondefective. Let {x0 , , xm1
 } equal m linearly independent eigenvectors corresponding to eigenvalues {0 , , m1 }. If X = x0 x1 xm1 then AX = X where =
diag(0 , . . . , m1 ). Hence A = XX 1 .
Definition 14.12 Let (A) and pm () be the characteristic polynomial of A. Then the algebraic
multiplicity of is defined as the multiplicity of as a root of pm ().
Definition 14.13 Let (A). Then the geometric multiplicity of is defined to be the dimension of
E (A). In other words, the geometric multiplicity of equals the number of linearly independent eigenvectors that are associated with .

14.2. The Schur and Spectral Factorizations

281

Theorem 14.14 Let A Cmm . Let the eigenvalues of A be given by 0 , 1 , , k1 , where an eigenvalue
is listed exactly n times if it has geometric multiplicity n. There exists a nonsingular matrix X such that

J(0 )
0

0
J(1 )
0

A=X .
.
.
.
.

..
..
..
..

0
0
J(k1 )
For our discussion, the sizes of the Jordan blocks J(i ) are not particularly important. Indeed, this decomposition, known as the Jordan Canonical Form of matrix A, is not particularly interesting in practice. For
this reason, we dont discuss it further and do not we give its proof.

14.2

The Schur and Spectral Factorizations

Theorem 14.15 Let A,Y, B Cmm , assume Y is nonsingular, and let B = Y 1 AY . Then (A) = (B).
Proof: Let (A) and x be an associated eigenvector. Then Ax = x if and only if Y 1 AYY 1 x = Y 1 x
if and only if B(Y 1 x) = (Y 1 x).
Definition 14.16 Matrices A and B are said to be similar if there exists a nonsingular matrix Y such that
B = Y 1 AY .
Given a nonsingular matrix Y the transformation Y 1 AY is called a similarity transformation of A.
It is not hard to expand the last proof to show that if A is similar to B and (A) has algebraic/geometric multiplicity k then (B) has algebraic/geometric multiplicity k.
The following is the fundamental theorem for the algebraic eigenvalue problem:
Theorem 14.17 Schur Decomposition Theorem Let A Cmm . Then there exist a unitary matrix Q and
upper triangular matrix U such that A = QUQH . This decomposition is called the Schur decomposition
of matrix A.
In the above theorem, (A) = (U) and hence the eigenvalues of A can be found on the diagonal of U.
Proof: We will outline how to construct Q so that QH AQ = U, an upper triangular matrix.
Since a polynomial of degree m has at least one root, matrix A has at least one eigenvalue, 1 , and
corresponding eigenvector
 we normalize this eigenvector to have length one. Thus Aq1 = 1 q1 .
 q1 , where
Choose Q2 so that Q = q1 Q1 is unitary. Then
H 

A
q1 Q2
q1 Q2

H AQ
H AQ
T
qH
Aq
q

w
1
2
1
2
1
1
=
= 1
,
= 1
H
H
H
H
Q2 Aq1 Q2 AQ2
Q2 q1 Q2 AQ2
0 B

QH AQ =

H
where wT = qH
1 AQ2 and B = Q2 AQ2 . This insight can be used to construct an inductive proof.
One should not mistake the above theorem and its proof as a constructive way to compute the Schur
decomposition: finding an eigenvalue and/or the eigenvalue associated with it is difficult.

Chapter 14. Notes on Eigenvalues and Eigenvectors

Lemma 14.18 Let A Cmm be of form A =


appropriate size. Then

QT L
A=
0

0
QBR

282

AT L

AT R

ABR

. Assume that QT L and QBR are unitary of

QT L AT L QH
TL

QT L AT R QH
BR

QBR ABR QH
BR

QT L

QBR

Homework 14.19 Prove Lemma 14.18. Then generalize it to a result for block upper triangular matrices:

A0,0 A0,1 A0,N1

0 A1,1 A1,N1

A=
.
.
.
0

..
..
0

0
0 AN1,N1
* SEE ANSWER

Corollary 14.20 Let A Cmm be of for A =

AT L

AT R

ABR

. Then (A) = (AT L ) (ABR ).

Homework 14.21 Prove Corollary 14.20. Then generalize it to a result for block upper triangular matrices.
* SEE ANSWER
A theorem that will later allow the eigenvalues and vectors of a real matrix to be computed (mostly)
without requiring complex arithmetic is given by
Theorem 14.22 Let A Rmm . Then there exist a unitary matrix Q Rmm and quasi upper triangular
matrix U Rmm such that A = QUQT .
A quasi upper triangular matrix is a block upper triangular matrix where the blocks on the diagonal are
1 1 or 2 2. Complex eigenvalues of A are found as the complex eigenvalues of those 2 2 blocks on
the diagonal.
Theorem 14.23 Spectral Decomposition Theorem Let A Cmm be Hermitian. Then there exist a
unitary matrix Q and diagonal matrix Rmm such that A = QQH . This decomposition is called the
Spectral decomposition of matrix A.
Proof: From the Schur Decomposition Theorem we know that there exist a matrix Q and upper triangular
matrix U such that A = QUQH . Since A = AH we know that QUQH = QU H QH and hence U = U H . But
a Hermitian triangular matrix is diagonal with real valued diagonal entries.
What we conclude is that a Hermitian matrix is nondefective and its eigenvectors can be chosen to
form an orthogonal basis.
Homework 14.24 Let A be Hermitian and and be distinct eigenvalues with eigenvectors x and x ,
respectively. Then xH x = 0. (In other words, the eigenvectors of a Hermitian matrix corresponding to
distinct eigenvalues are orthogonal.)
* SEE ANSWER

14.3. Relation Between the SVD and the Spectral Decomposition

14.3

283

Relation Between the SVD and the Spectral Decomposition

Homework 14.25 Let A Cmm be a Hermitian matrix, A = QQH its Spectral Decomposition, and
A = UV H its SVD. Relate Q, U, V , , and .
* SEE ANSWER
Homework 14.26 Let A Cmm and A = UV H its SVD. Relate the Spectral decompositions of AH A and
AAH to U, V , and .
* SEE ANSWER

Chapter 14. Notes on Eigenvalues and Eigenvectors

284

Chapter

15

Notes on the Power Method and Related


Methods
You may want to review Chapter 12 of
Linear Algebra: Foundations to Frontiers - Notes to LAFF With [30]
in which the Power Method and Inverse Power Methods are discussed at a more rudimentary level.

Video
Read disclaimer regarding the videos in the preface!
Tragically, I forgot to turn on the camera... This was a great lecture!

285

Chapter 15. Notes on the Power Method and Related Methods

15.0.1

286

Outline

Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
15.0.1

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

15.1
15.1.1

First attempt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

15.1.2

Second attempt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

15.1.3

Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

15.1.4

Practical Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

15.1.5

The Rayleigh quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

15.1.6

What if |0 | |1 |? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

15.2

The Inverse Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

15.3

Rayleigh-quotient Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

15.1

The Power Method

The Power Method is a simple method that under mild conditions yields a vector corresponding to the
eigenvalue that is largest in magnitude.
Throughout this section we will assume that a given matrix A Cmm is nondeficient: there exists a
nonsingular matrix X and diagonal matrix such that A = XX 1 . (Sometimes this is called a diagonalizable matrix since there exists a matrix X so that
X 1 AX = or, equivalently, A = XX 1 .
From Notes on Eigenvalues and Eigenvectors we know then that the columns of X equal eigenvectors
of A and the elements on the diagonal of equal the eigenvalues::

X=

x0 x0 xm1

and

0
..
.

1
.. . .
.
.

0
..
.

so that
Axi = i xi

for i = 0, . . . , m 1.

For most of this section we will assume that


|0 | > |1 | |m1 |.
In particular, 0 is the eigenvalue with maximal absolute value.

m1

15.1. The Power Method

15.1.1

287

First attempt

Now, let v(0) Cmm be an initial guess. Our (first attempt at the) Power Method iterates as follows:
for k = 0, . . .
v(k+1) = Av(k)
endfor
Clearly v(k) = Ak v(0) . Let
v(0) = Xy = 0 x0 + 1 x1 + + m1 xm1 .
What does this mean? We view the columns of X as forming a basis for Cm and then the elements in
vector y = X 1 v(0) equal the coefficients for describing v(0) in that basis. Then
v(1) = Av(0) =
=
(2)
(1)
v = Av
=
..
.
(k)
(k1)
v = Av
=

A (0 x0 + 1 x1 + + m1 xm1 )
0 0 x0 + 1 0 x1 + + m1 m1 xm1 ,
0 20 x0 + 1 21 x1 + + m1 2m1 xm1 ,
0 k0 x0 + 1 k1 x1 + + m1 km1 xm1 .

Now, as long as 0 6= 0 clearly 0 k0 x0 will eventually dominate which means that v(k) will start pointing
in the direction of x0 . In other words, it will start pointing in the direction of an eigenvector corresponding
to 0 . The problem is that it will become infinitely long if |0 | > 1 or infinitessimily short if |0 | < 1. All
is good if |0 | = 1.

15.1.2

Second attempt

Again, let v(0) Cmm be an initial guess. The second attempt at the Power Method iterates as follows:
for k = 0, . . .
v(k+1) = Av(k) /0
endfor
It is not hard to see that then
v(k) = Av(k1) /0 = Ak v(0) /k0
 k
 k


1
m1 k
0
= 0
x0 + 1
x1 + + m1
xm1
0
0
0
 k


1
m1 k
= 0 x0 + 1
x1 + + m1
xm1 .
0
0


Clearly limk v(k) = 0 x0 , as long as 0 6= 0, since 0k < 1 for k > 0.
Another way of stating this is to notice that
Ak = (AA A) = (XX 1 )(XX 1 ) (XX 1 ) = Xk X 1 .
| {z }
|
{z
}
k
k times

Chapter 15. Notes on the Power Method and Related Methods

288

so that
v(k) =
=
=
=

Ak v(0) /k0
Ak Xy/k0
Xk X 1 Xy/k0
Xk y/k0




= X k /k0 y = X

1
0
..
.

0
 k
1
0

...

...

0
0
..
.


m1
0

y.

k



Now, since 0k < 1 for k > 1 we can argue that

lim v(k)

= lim X
k

1
0
..
.
0

0
 k
1
0

...

...

0
0
..
.


m1
0

y = X

k

0 0 0

y
.. . . . . ..
.
. .
.

0 0 0

= X0 e0 = 0 Xe0 = 0 x0 .
Thus, as long as 0 6= 0 (which means v must have a component in the direction of x0 ) this method will
eventually yield a vector in the direction of x0 . However, this time the problem is that we dont know 0
when we start.

15.1.3

Convergence

Before we make the algorithm practical, let us examine how fast the iteration converges. This requires a
few definitions regarding rates of convergence.
Definition 15.1 Let 0 , 1 , 2 , . . . C be an infinite sequence of scalars. Then k is said to converge to
if
lim |k | = 0.
k

Let x0 , x1 , x2 , . . . Cm be an infinite sequence of vectors. Then xk is said to converge to x in the k k


norm if
lim kxk xk = 0.
k

Notice that because of the equivalence of norms, if the sequence converges in one norm, it converges in
all norms.
Definition 15.2 Let 0 , 1 , 2 , . . . C be an infinite sequence of scalars that converges to . Then

15.1. The Power Method

289

k is said to converge linearly to if for large enough k


|k+1 | C|k |
for some constant C < 1.
k is said to converge super-linearly to if
|k+1 | Ck |k |
with Ck 0.
k is said to converge quadratically to if for large enough k
|k+1 | C|k |2
for some constant C.
k is said to converge super-quadratically to if
|k+1 | Ck |k |2
with Ck 0.
k is said to converge cubically to if for large enough k
|k+1 | C|k |3
for some constant C.
Linear convergence can be slow. Lets say that for k K we observe that
|k+1 | C|k |.
Then, clearly, |k+n | Cn |k |. If C = 0.99, progress may be very, very slow. If |k | = 1, then
|k+1 |
|k+2 |
|k+3 |
|k+4 |
|k+5 |
|k+6 |
|k+7 |
|k+8 |
|k+9 |

0.99000
0.98010
0.97030
0.96060
0.95099
0.94148
0.93206
0.92274
0.91351

Quadratic convergence is fast. Now


|k+1 | C|k |2
|k+2 | C|k+1 |2 C(C|k |2 )2 = C3 |k |4
|k+3 | C|k+2 |2 C(C3 |k |4 )2 = C7 |k |8
..
.
n
n
|k+n | C2 1 |k |2

Chapter 15. Notes on the Power Method and Related Methods

290

Even C = 0.99 and |k | = 1, then


|k+1 |
|k+2 |
|k+3 |
|k+4 |
|k+5 |
|k+6 |
|k+7 |
|k+8 |
|k+9 |
|k+10 |

0.99000
0.970299
0.932065
0.860058
0.732303
0.530905
0.279042
0.077085
0.005882
0.000034

If we consider the correct result then, eventually, the number of correct digits roughly doubles in each
iteration. This can be explained as follows: If |k | < 1, then the number of correct decimal digits is
given by
log10 |k |.
Since log10 is a monotonically increasing function,
log10 |k+1 | log10 C|k |2 = log10 (C) + 2 log10 |k | 2 log10 (|k |
and hence
log10 |k+1 | 2( log10 (|k | ).
|
{z
}
|
{z
}
number of correct
number of correct
digits in k+1

digits in k

Cubic convergence is dizzyingly fast: Eventually the number of correct digits triples from one iteration
to the next.
We now define a convenient norm.
Lemma 15.3 Let X Cmm be nonsingular. Define k kX : Cm R by kykX = kXyk for some given norm
k k : Cm R. Then k kX is a norm.
Homework 15.4 Prove Lemma 15.3.
* SEE ANSWER
With this new norm, we can do our convergence analysis:

1
0

 k

1
0
0

(k)
k (0) k
v 0 x0 = A v /0 0 x0 = X .
...
..

0
0

...

0
..
.


m1
0

1 (0)
X v 0 x0

k

15.1. The Power Method

= X

291

1
0
..
.
0

 k
1

0
..
..
.
.
0

0
0
..
.


m1
0

Hence

X 1 (v(k) 0 x0 ) =

x
=
X

0 0

k

0
0
..
.

0
 k
1
0

..

0
..
.
0

..
.

0
..
.


m1
0

 k
1

0
..
..
.
.
0

0
0
..
.


m1
0

k

k

0 10
.. . . . .
.
.
.

0
..
.

1 (k)
X (v 0 x0 ).

and

1 (k+1)
X (v
0 x0 ) =

m1
0

Now, let k k be a p-norm1 and its induced matrix norm and k kX 1 as defined in Lemma 15.3. Then
kv(k+1) 0 x0 kX 1 = kX 1 (v(k+1) 0 x0 )k

0 0

0



0 1


0
1 (k)
= . .
X (v 0 x0 )
.
.
.. . . . .

..



0 0 m1

0


1
1 1 (k)


kX (v 0 x0 )k = kv(k) 0 x0 kX 1 .
0
0
This shows that, in this norm, the convergence of v(k) to 0 x0 is linear: The difference between current
approximation, v(k) , and the solution, x0 , is reduced by at least a constant factor in each iteration.

15.1.4

Practical Power Method

The following algorithm, known as the Power Method, avoids the problem of v(k) growing or shrinking in
length, without requiring 0 to be known, by scaling it to be of unit length at each step:
for k = 0, . . .
v(k+1) = Av(k)
v(k+1) = v(k+1) /kv(k+1) k
endfor
1 We

choose a p-norm to make sure that the norm of a diagonal matrix equals the absolute value of the largest element (in
magnitude) on its diagonal.

Chapter 15. Notes on the Power Method and Related Methods

15.1.5

292

The Rayleigh quotient

A question is how to extract an approximation of 0 given an approximation of x0 . The following theorem


provides the answer:
Theorem 15.5 If x is an eigenvector of A then = xH Ax/(xH x) is the associated eigenvalue of A. This
ratio is known as the Rayleigh quotient.
Proof: Let x be an eigenvector of A and the associated eigenvalue. Then Ax = x. Multiplying on the
left by xH yields xH Ax = xH x which, since x 6= 0 means that = xH Ax/(xH x).
Clearly this ratio as a function of x is continuous and hence an approximation to x0 when plugged into this
formula would yield an approximation to 0 .

What if |0 | |1 |?

15.1.6

Now, what if
|0 | = = |k1 | > |k | . . . |m1 |?
By extending the above analysis one can easily show that v(k) will converge to a vector in the subspace
spanned by the eigenvectors associated with 0 , . . . , k1 .
An important special case is when k = 2: if A is real valued then 0 still may be complex valued in
which case 0 is also an eigenvalue and it has the same magnitude as 0 . We deduce that v(k) will always
be in the space spanned by the eigenvectors corresponding to 0 and 0 .

15.2

The Inverse Power Method

The Power Method homes in on an eigenvector associated with the largest (in magnitude) eigenvalue. The
Inverse Power Method homes in on an eigenvector associated with the smallest eigenvalue (in magnitude).
Throughout this section we will assume that a given matrix A Cmm is nondeficient and nonsingular
so that there exist matrix X and diagonal matrix such that A = XX 1 . We further assume that =
diag(0 , , m1 ) and
|0 | |1 | |m2 | > |m1 |.
Theorem 15.6 Let A Cmm be nonsingular. Then and x are an eigenvalue and associated eigenvector
of A if and only if 1/ and x are an eigenvalue and associated eigenvector of A1 .
Homework 15.7 Assume that
|0 | |1 | |m2 | > |m1 | > 0.
Show that






1 1 1
1





m1 > m2 m3 0 .
* SEE ANSWER

Thus, an eigenvector associated with the smallest (in magnitude) eigenvalue of A is an eigenvector associated with the largest (in magnitude) eigenvalue of A1 . This suggest the following naive iteration:

15.3. Rayleigh-quotient Iteration

293

for k = 0, . . .
v(k+1) = A1 v(k)
v(k+1) = m1 v(k+1)
endfor
Of course, we would want to factor A = LU once and solve L(Uv(k+1) ) = v(k) rather than multiplying
with A1 . From the analysis of the convergence of the second attempt for a Power Method algorithm
we conclude that now


m1 (k)
(k+1)
kv m1 xm1 k 1 .
kv
m1 xm1 kX 1
X
m2
A practical Inverse Power Method algorithm is given by
for k = 0, . . .
v(k+1) = A1 v(k)
v(k+1) = v(k+1) /kv(k+1) k
endfor
Often, we would expect the Invert Power Method to converge faster than the Power Method. For
example, take the case where |k | are equally spaced between 0 and m: |k | = (k + 1). Then



m1 1
1 m 1
= .
=
and
0
m
m2 2
which means that the Power Method converges much more slowly than the Inverse Power Method.

15.3

Rayleigh-quotient Iteration

The next observation is captured in the following lemma:


Lemma 15.8 Let A Cmm and C. Then (, x) is an eigenpair of A if and only if ( , x) is an
eigenpair of (A I).
Homework 15.9 Prove Lemma 15.8.
* SEE ANSWER
The matrix A I is referred to as the matrix A that has been shifted by . What the lemma says is that
shifting A by shifts the spectrum of A by :
Lemma 15.10 Let A Cmm , A = XX 1 and C. Then A I = X( I)X 1 .
Homework 15.11 Prove Lemma 15.10.
* SEE ANSWER
This suggests the following (naive) iteration: Pick a value close to m1 . Iterate

Chapter 15. Notes on the Power Method and Related Methods

294

for k = 0, . . .
v(k+1) = (A I)1 v(k)
v(k+1) = (m1 )v(k+1)
endfor
Of course one would solve (A I)v(k+1) = v(k) rather than computing and applying the inverse of A.
If we index the eigenvalues so that |0 | |m2 | < |m1 | then


m1 (k)
(k+1)
kv m1 xm1 k 1 .
kv
m1 xm1 kX 1
X
m2
The closer to m1 the shift (so named because it shifts the spectrum of A) is chosen, the more favorable
the ratio that dictates convergence.
A more practical algorithm is given by
for k = 0, . . .
v(k+1) = (A I)1 v(k)
v(k+1) = v(k+1) /kv(k+1) k
endfor
The question now becomes how to chose so that it is a good guess for m1 . Often an application
inherently supplies a reasonable approximation for the smallest eigenvalue or an eigenvalue of particular
interest. However, we know that eventually v(k) becomes a good approximation for xm1 and therefore
the Rayleigh quotient gives us a way to find a good approximation for m1 . This suggests the (naive)
Rayleigh-quotient iteration:
for k = 0, . . .
k = v(k) H Av(k) /(v(k) H v(k) )
v(k+1) = (A k I)1 v(k)
v(k+1) = (m1 k )v(k+1)
endfor
Now2
(k+1)

kv

m1 xm1 kX 1



m1 k (k)
kv m1 xm1 k 1

X
m2 k

with
lim (m1 k ) = 0

which means super linear convergence is observed. In fact, it can be shown that once k is large enough
kv(k+1) m1 xm1 kX 1 Ckv(k) m1 xm1 k2X 1 ,
which is known as quadratic convergence. Roughly speaking this means that every iteration doubles
the number of correct digits in the current approximation. To prove this, one shows that |m1 k |
Ckv(k) m1 xm1 kX 1 .
2

I think... I have not checked this thoroughly. But the general idea holds. m1 has to be defined as the eigenvalue to which
the method eventually converges.

15.3. Rayleigh-quotient Iteration

295

Better yet, it can be shown that if A is Hermitian, then, once k is large enough,
kv(k+1) m1 xm1 kX 1 Ckv(k) m1 xm1 k3X 1 ,
which is known as cubic convergence. Roughly speaking this means that every iteration triples the number
of correct digits in the current approximation. This is mind-boggling fast convergence!
A practical Rayleigh quotient iteration is given by
v(0) = v(0) /kv(0) k2
for k = 0, . . .
k = v(k) H Av(k)
(Now kv(k) k2 = 1)
v(k+1) = (A k I)1 v(k)
v(k+1) = v(k+1) /kv(k+1) k
endfor

Chapter 15. Notes on the Power Method and Related Methods

296

Chapter

16

Notes on the QR Algorithm and other Dense


Eigensolvers
Video
Read disclaimer regarding the videos in the preface!
Tragically, the camera ran out of memory for the first lecture... Here is the second lecture, which discusses
the implicit QR algorithm
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)
In most of this ote, we focus on the case where A is symmetric and real valued. The reason for this is
that many of the techniques can be more easily understood in that setting.

297

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

16.0.1

Outline

Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
16.0.1

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

16.1

Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

16.2

Subspace Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

16.3

The QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304


16.3.1

A basic (unshifted) QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 304

16.3.2

A basic shifted QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304


Reduction to Tridiagonal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

16.4
16.4.1

Householder transformations (reflectors) . . . . . . . . . . . . . . . . . . . . . . 306

16.4.2

Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
The QR algorithm with a Tridiagonal Matrix . . . . . . . . . . . . . . . . . . . . 309

16.5
16.5.1

Givens rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

16.6

QR Factorization of a Tridiagonal Matrix . . . . . . . . . . . . . . . . . . . . . . 310

16.7

The Implicitly Shifted QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 312


16.7.1

Upper Hessenberg and tridiagonal matrices . . . . . . . . . . . . . . . . . . . . . 312

16.7.2

The Implicit Q Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

16.7.3

The Francis QR Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

16.7.4

A complete algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316


Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

16.8
16.8.1

More on reduction to tridiagonal form . . . . . . . . . . . . . . . . . . . . . . . 320

16.8.2

Optimizing the tridiagonal QR algorithm . . . . . . . . . . . . . . . . . . . . . . 320


Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

16.9
16.9.1

Jacobis method for the symmetric eigenvalue problem . . . . . . . . . . . . . . 320

16.9.2

Cuppens Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

16.9.3

The Method of Multiple Relatively Robust Representations (MRRR) . . . . . . . 323

16.10

The Nonsymmetric QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 323

16.10.1

A variant of the Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . 323

16.10.2

Reduction to upperHessenberg form . . . . . . . . . . . . . . . . . . . . . . . . 324

16.10.3

The implicitly double-shifted QR algorithm . . . . . . . . . . . . . . . . . . . . 327

298

16.1. Preliminaries

16.1

299

Preliminaries

The QR algorithm is a standard method for computing all eigenvalues and eigenvectors of a matrix. In this
note, we focus on the real valued symmetric eigenvalue problem (the case where A Rnn . For this case,
recall the Spectral Decomposition Theorem:
Theorem 16.1 If A Rnn then there exists unitary matrix Q and diagonal matrix such that A = QQT .


We will partition Q = q0 qn1 and assume that = diag(()0 , , n1 ) so that throughout
this note, qi and i refer to the ith column of Q and the ith diagonal element of , which means that each
tuple (, qi ) is an eigenpair.

16.2

Subspace Iteration

We start with a matrix V Rnr with normalized columns and iterate something like
V (0) = V
for k = 0, . . . convergence
V (k+1) = AV (k)
Normalize the columns to be of unit length.
end for
The problem with this approach is that all columns will (likely) converge to an eigenvector associated with
the dominant eigenvalue, since the Power Method is being applied to all columns simultaneously. We will
now lead the reader through a succession of insights towards a practical algorithm.


b
Let us examine what V = AV looks like, for the simple case where V = v0 v1 v2 (three columns).
We know that
v j = Q QT v j .
| {z }
yj
Hence
n1

v0 =

0, j q j ,

j=0

n1

v1 =

1, j q j , and

j=0

n1

v2 =

2, j q j ,

j=0

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

300

where i, j equals the ith element of y j . Then




AV = A v0 v1 v2


n1
n1
= A n1

q
j=0 0, j j j=0 1, j j j=0 2, j j


n1
n1
n1
=
j=0 0, j Aq j j=0 1, j Aq j j=0 2, j Aq j


n1
n1
=

n1
j=0 1, j j j
j=0 2, j j j
j=0 0, j j j
If we happened to know 0 , 1 , and 2 then we could divide the columns by these, respectively, and
get new vectors
 

 
 
  
j
j
j
n1
n1
n1
=

q
q

vb0 vb1 vb2


j=0 0, j 0 j j=0 1, j 1 j j=0 2, j 2 q j

 
 
 
1,0 10 q0 +
2,0 20 q0 + 2,1 12 q1 +

= 0,0 q0 +
(16.1)
1,1 q1 +
2,2 q2 +
 
 
 

j
j
j
n1
n1
n1
j=1 0, j 0 q j
j=2 1, j 1 q j
j=3 2, j 2 q j
Assume that |0 | > |1 | > |2 | > |3 | |n1 |. Then, similar as for the power method,
The first column will see components in the direction of {q1 , . . . , qn1 } shrink relative to the
component in the direction of q0 .
The second column will see components in the direction of {q2 , . . . , qn1 } shrink relative to the
component in the direction of q1 , but the component in the direction of q0 increases, relatively,
since |0 /1 | > 1.
The third column will see components in the direction of {q3 , . . . , qn1 } shrink relative to the
component in the direction of q2 , but the components in the directions of q0 and q1 increase,
relatively, since |0 /2 | > 1 and |1 /2 | > 1.
How can we make it so that v j converges to a vector in the direction of q j ?
If we happen to know q0 , then we can subtract out the component of
vb1 = 1,0

n1
j
0
q0 + 1,1 q1 + 1, j q j
1
1
j=2

in the direction of q0 :
n1

vb1 qT0 vb1 q0 = 1,1 q1 + 1, j


j=2

j
qj
1

so that we are left with the component in the direction of q1 and components in directions of
q2 , . . . , qn1 that are suppressed every time through the loop.
Similarly, if we also know q1 , the components of vb2 in the direction of q0 and q1 can be subtracted
from that vector.

16.2. Subspace Iteration

301

We do not know 0 , 1 , and 2 but from the discussion about the Power Method we remember that
we can just normalize the so updated vb0 , vb1 , and vb2 to have unit length.
How can we make these insights practical?
We do not know q0 , q1 , and q2 , but we can informally argue that if we keep iterating,
The vector vb0 , normalized in each step, will eventually point in the direction of q0 .
S (b
v0 , vb1 ) will eventually equal S (q0 , q1 ).
In each iteration, we can subtract the component of vb1 in the direction of vb0 from vb1 , and then
normalize vb1 so that eventually result in a the vector that points in the direction of q1 .
S (b
v0 , vb1 , vb2 ) will eventually equal S (q0 , q1 , q2 ).
In each iteration, we can subtract the component of vb2 in the directions of vb0 and vb1 from vb2 ,
and then normalize the result, to make vb2 eventually point in the direction of q2 .
What we recognize is that normalizing vb0 , subtracting out the component of vb1 in the direction of vb0 , and
then normalizing vb1 , etc., is exactly what the Gram-Schmidt process does. And thus, we can use any
convenient (and stable) QR factorization method. This also shows how the method can be generalized to
work with more than three columns and even all columns simultaneously.
The algorithm now becomes:
V (0) = I np

(I np represents the first p columns of I)

for k = 0, . . . convergence
AV (k) V (k+1) R(k+1)

(QR factorization with R(k+1) R pp )

end for
Now consider again (16.1), focusing on the third column:

 
0
2

q0 + 2,1
2,0

2,2 q2 +
 

n1
j=3 j 2 q j

 
1
2

q1 +

 
0
2

 
1
2

2,0
q0 + 2,1
q1 +

q +
= 2,2 2
 
 

j

j 23 q3 + n1
j=4 2, j 2 q j

This shows that, if the components in the direction of q0 and q1 are subtracted out, it is the
component

3
in the direction of q3 that is deminished in length the most slowly, dictated by the ratio 2 . This, of
(k)

course, generalizes: the jth column of V (k) , vi will have a component in the direction of q j+1 , of length
(k)
|qTj+1 v j |, that can be expected to shrink most slowly.
We demonstrate this in Figure 16.1, which shows the execution of the algorithm with p = n for a 5 5
(k)
matrix, and shows how |qTj+1 v j | converge to zero as as function of k.

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

302

Figure 16.1: Convergence of the subspace iteration for a 5 5 matrix. This graph is mislabeled: x should
be labeled with v. The (linear) convergence of v j to a vector in the direction of q j is dictated by now
quickly the component in the direction q j+1 converges to zero. The line labeled |qTj+1 x j | plots the length
of the component in the direction q j+1 as a function of the iteration number.

Next, we observe that if V Rnn in the above iteration (which means we are iterating with n vectors
at a time), then AV yields a next-to last column of the form

n3
j=0 n2, j

j
n2

q j+

n2,n2 qn2 +




qn1
n2,n1 n1
n2

where i, j = qTj vi . Thus, given that the components in the direction of q j , j = 0, . . . , n 2 can be expected
in later iterations
to be greatly reduced by the QR factorization that subsequently happens with AV , we


(k)
notice that it is n1
that dictates how fast the component in the direction of qn1 disappears from vn2 .
n2
This is a ratio we also saw in the Inverse Power Method and that we noticed we could accelerate in the
Rayleigh Quotient Iteration: At each iteration we should shift the matrix to (A k I) where k n1 .
Since the last column of V (k) is supposed to be converging to qn1 , it seems reasonable to use k =
(k)T
(k)
(k)
vn1 Avn1 (recall that vn1 has unit length, so this is the Rayleigh quotient.)
The above discussion motivates the iteration

16.2. Subspace Iteration

303

| qT0 x(k)
|
4

10

| qT1 x(k)
|
4

length of component in direction ...

| qT x(k) |
2 4
T (k)

| q3 x4 |

10

10

10

15

10

10

15

20
k

25

30

35

40

Figure 16.2: Convergence of the shifted subspace iteration for a 5 5 matrix. This graph is mislabeled: x
should be labeled with v. What this graph shows is that the components of v4 in the directions q0 throught
q3 disappear very quickly. The vector v4 quickly points in the direction of the eigenvector associated
with the smallest (in magnitude) eigenvalue. Just like the Rayleigh-quotient iteration is not guaranteed to
converge to the eigenvector associated with the smallest (in magnitude) eigenvalue, the shifted subspace
iteration may home in on a different eigenvector than the one associated with the smallest (in magnitude)
eigenvalue. Something is wrong in this graph: All curves should quickly drop to (near) zero!

V (0) := I

(V (0) Rnn !)

for k := 0, . . . convergence
(k)T

(k)

k := vn1 Avn1

(Rayleigh quotient)

(A k I)V (k) V (k+1) R(k+1)

(QR factorization)

end for

Notice that this does not require one to solve with (A k I), unlike in the Rayleigh Quotient Iteration.
However, it does require a QR factorization, which requires more computation than the LU factorization
(approximately 43 n3 flops).
We demonstrate the convergence in Figure 16.2, which shows the execution of the algorithm with a
(k)
5 5 matrix and illustrates how |qTj vn1 | converge to zero as as function of k.

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

Subspace iteration
b(0) := A
A

QR algorithm

Vb (0) := I

V (0) := I

304

A(0) := A

for k := 0, . . . until convergence


for k := 0, . . . until convergence
AVb (k) Vb (k+1) Rb(k+1) (QR factorization)
A(k) Q(k+1) R(k+1) (QR factorization)
b(k+1) := Vb (k+1)T AVb (k+1)
A
A(k+1) := R(k+1) Q(k+1)
V (k+1) := V (k) Q(k+1)
end for

end for
Figure 16.3: Basic subspace iteration and basic QR algorithm.

16.3

The QR Algorithm

The QR algorithm is a classic algorithm for computing all eigenvalues and eigenvectors of a matrix.
While we explain it for the symmetric eigenvalue problem, it generalizes to the nonsymmetric eigenvalue
problem as well.

16.3.1

A basic (unshifted) QR algorithm

We have informally argued that the columns of the orthogonal matrices V (k) Rnn generated by the
(unshifted) subspace iteration converge to eigenvectors of matrix A. (The exact conditions under which
this happens have not been fully discussed.) In Figure 16.3 (left), we restate the subspace iteration. In
it, we denote matrices V (k) and R(k) from the subspace iteration by Vb (k) and Rb to distinguish them from
the ones computed by the algorithm on the right. The algorithm on the left also computes the matrix
b(k) = V (k)T AV (k) , a matrix that hopefully converges to , the diagonal matrix with the eigenvalues of A
A
on its diagonal. To the right is the QR algorithm. The claim is that the two algorithms compute the same
quantities.
b(k) = A(k) , k = 0, . . ..
Homework 16.2 Prove that in Figure 16.3, Vb (k) = V (k) , and A
* SEE ANSWER
We conclude that if Vb (k) converges to the matrix of orthonormal eigenvectors when the subspace iteration
is applied to V (0) = I, then A(k) converges to the diagonal matrix with eigenvalues along the diagonal.

16.3.2

A basic shifted QR algorithm

In Figure 16.4 (left), we restate the subspace iteration with shifting. In it, we denote matrices V (k) and R(k)
from the subspace iteration by Vb (k) and Rb to distinguish them from the ones computed by the algorithm
b(k) = V (k)T AV (k) , a matrix that hopefully
on the right. The algorithm on the left also computes the matrix A
converges to , the diagonal matrix with the eigenvalues of A on its diagonal. To the right is the shifted
QR algorithm. The claim is that the two algorithms compute the same quantities.

16.3. The QR Algorithm

305

Subspace iteration
b(0) := A
A

QR algorithm

Vb (0) := I

V (0) := I

for k := 0, . . . until convergence

for k := 0, . . . until convergence

A(0) := A

(k) T
(k)
b
k := vbn1 Ab
vn1
(A b
k I)Vb (k) Vb (k+1) Rb(k+1)

(k)

k = n1,n1
(QR factorization)
A(k) k I Q(k+1) R(k+1) (QR factorization)

b(k+1) := Vb (k+1)T AVb (k+1)


A

A(k+1) := R(k+1) Q(k+1) + k I


V (k+1) := V (k) Q(k+1)

end for

end for
Figure 16.4: Basic shifted subspace iteration and basic shifted QR algorithm.

b(k) = A(k) , k = 0, . . ..
Homework 16.3 Prove that in Figure 16.4, Vb (k) = V (k) , and A
* SEE ANSWER
We conclude that if Vb (k) converges to the matrix of orthonormal eigenvectors when the shifted subspace
iteration is applied to V (0) = I, then A(k) converges to the diagonal matrix with eigenvalues along the
diagonal.
The convergence of the basic shifted QR algorithm is illustrated below. Pay particular attention to the
convergence of the last row and column.

2.01131953448

(0)

A = 0.05992695085
0.14820940917

2.63492207667

(2)
A =
0.47798481637
0.07654607908

2.96578660126

(4)

A = 0.18177690194
0.00000000000

0.05992695085 0.14820940917

2.21466116574 0.34213192482

(1)

A = 0.34213192482 2.54202325042
0.31816754245 0.57052186467

2.87588550968 0.32971207176

(3)
A =
0.32971207176 2.12411444949
0.00024210487 0.00014361630

2.9912213907 0.093282073553

(5)

A = 0.0932820735 2.008778609226
0.0000000000 0.000000000000

2.30708673171 0.93623515213

0.93623515213 1.68159373379

0.47798481637 0.07654607908

2.35970859985 0.06905042811

0.06905042811 1.00536932347

0.18177690194 0.00000000000

2.03421339873 0.00000000000

0.00000000000 1.00000000000

0.31816754245

0.57052186467

1.24331558383
0.00024210487

0.00014361630

1.00000004082
0.00000000000

0.00000000000

1.00000000000

Once the off-diagonal elements of the last row and column have converged (are sufficiently small), the
problem can be deflated by applying the following theorem:
Theorem 16.4 Let

A0,0

A= .
..

A0N1

A1,1
..
..
.
.

A1,N1
..
.

A01

AN1,N1

where Ak,k are all square. Then (A) = N1


k=0 (Ak,k ).

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

306

Homework 16.5 Prove the above theorem.


* SEE ANSWER
In other words, once the last row and column have converged, the algorithm can continue with the submatrix that consists of the first n 1 rows and columns.
The problem with the QR algorithm, as stated, is that each iteration requires O(n3 ) operations, which
is too expensive given that many iterations are required to find all eigenvalues and eigenvectors.

16.4

Reduction to Tridiagonal Form

In the next section, we will see that if A(0) is a tridiagonal matrix, then so are all A(k) . This reduces the cost
of each iteration from O(n3 ) to O(n). We first show how unitary similarity transformations can be used to
reduce a matrix to tridiagonal form.

16.4.1

Householder transformations (reflectors)

We briefly review the main tool employed to reduce a matrix to tridiagonal form: the Householder transform, also known as a reflector. Full details were given in Chapter 6.
Definition 16.6 Let u Rn , R. Then H = H(u) = I uuT /, where = 12 uT u, is said to be a reflector
or Householder transformation.
We observe:
Let z be any vector that is perpendicular to u. Applying a Householder transform H(u) to z leaves
the vector unchanged: H(u)z = z.
Let any vector x be written as x = z + uT xu, where z is perpendicular to u and uT xu is the component
of x in the direction of u. Then H(u)x = z uT xu.
This can be interpreted as follows: The space perpendicular to u acts as a mirror: any vector in that
space (along the mirror) is not reflected, while any other vector has the component that is orthogonal
to the space (the component outside and orthogonal to the mirror) reversed in direction. Notice that a
reflection preserves the length of the vector. Also, it is easy to verify that:
1. HH = I (reflecting the reflection of a vector results in the original vector);
2. H = H T , and so H T H = HH T = I (a reflection is an orthogonal matrix and thus preserves the norm);
and
3. if H0 , , Hk1 are Householder transformations and Q = H0 H1 Hk1 , then QT Q = QQT = I (an
accumulation of reflectors is an orthogonal matrix).
As part of the reduction to condensed form operations, given a vector x we will wish to find a
Householder transformation, H(u), such that H(u)x equals a vector with zeroes below the first element:
H(u)x = kxk2 e0 where e0 equals the first column of the identity matrix. It can be easily checked that
choosing u = x kxk2 e0 yields the desired H(u). Notice that any nonzero scaling of u has the same property, and the convention is to scale u so that the first element equals one. Let us define [u, , h] = HouseV(x)
to be the function that returns u with first element equal to one, = 12 uT u, and h = H(u)x.

16.4. Reduction to Tridiagonal Form

16.4.2

307

Algorithm

The first step towards computing the eigenvalue decomposition of a symmetric matrix is to reduce the
matrix to tridiagonal form.
The basic algorithm for reducing a symmetric matrix to tridiagonal form, overwriting the original
matrix with the result, can be explained as follows. We assume that symmetric A is stored only in the
lower triangular part of the matrix and that only the diagonal and subdiagonal of the symmetric tridiagonal
matrix is computed, overwriting those parts of A. Finally, the Householder vectors used to zero out parts
of A overwrite the entries that they annihilate (set to zero).

T
11 a21
.
Partition A
a21 A22
Let [u21 , , a21 ] := HouseV(a21 ).1
Update

11

aT21

a21 A22

:=

0 H

11

aT21

a21 A22

0 H

11

aT21 H

Ha21 HA22 H

where H = H(u21 ). Note that a21 := Ha21 need not be executed since this update was performed by
the instance of HouseV above.2 Also, aT12 is not stored nor updated due to symmetry. Finally, only
the lower triangular part of HA22 H is computed, overwriting A22 . The update of A22 warrants closer
scrutiny:
1
1
A22 := (I u21 uT21 )A22 (I u21 uT21 )

1
1
T
= (A22 u21 u21 A22 )(I u21 uT21 )
| {z }

T
y21
1
1
1
= A22 u21 yT21 Au21 uT21 + 2 u21 yT21 u21 uT21
| {z }

|{z}

y21
2

 

1

T
T
T
T
= A22
u21 y21 2 u21 u21
y21 u21 2 u21 u21





1 T
1
T

= A22 u21
y21 u21
y21 u21 uT21

|
{z
}
|
{z
}
wT21
w21
=

A u21 wT21 w21 uT21 .


| 22
{z
}
symmetric
rank-2 update

Note that the semantics here indicate that a21 is overwritten by Ha21 .
In practice, the zeros below the first element of Ha21 are not actually written. Instead, the implementation overwrites these
elements with the corresponding elements of the vector u21 .
2

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

Algorithm: [A,t] := T RI R ED

Partition A

AT L

AT R

ABL

ABR

UNB (A)

tT
, t

tB

whereAT L is 0 0 and tb has 0 elements


while m(AT L ) < m(A) do
Repartition

AT L

ABL

AT R
ABR

A00

aT
10
A20

a01

A02

t0

t
T
aT12
1
,
tB
A22
t2

11
a21

where11 and 1 are scalars


[u21 , 1 , a21 ] := HouseV(a21 )
y21 := A22 u21
:= uT21 y21 /2
w21 := (y21 u21 /1 )/1
A22 := A22 tril u21 wT21 + w21 uT21

(symmetric rank-2 update)

Continue with

AT L
ABL

AT R
ABR

A00

a01

aT
10
A20

11

a21

A02

t0

t
T 1
aT12
,

tB
A22
t2

endwhile
Figure 16.5: Basic algorithm for reduction of a symmetric matrix to tridiagonal form.

308

16.5. The QR algorithm with a Tridiagonal Matrix

309

Original matrix

First iteration

Second iteration

0 0

Third iteration
Figure 16.6: Illustration of reduction of a symmetric matrix to tridiagonal form. The s denote nonzero
elements in the matrix. The gray entries above the diagonal are not actually updated.
Continue this process with the updated A22 .
This is captured in the algorithm in Figure 16.5. It is also illustrated in Figure 16.6.
The total cost for reducing A Rnn is approximately
n1

k=0


4
4(n k 1)2 flops n3 flops.
3

This equals, approximately, the cost of one QR factorization of matrix A.

16.5

The QR algorithm with a Tridiagonal Matrix

We are now ready to describe an algorithm for the QR algorithm with a tridiagonal matrix.

16.5.1

Givens rotations

First,we introduce
another important class of unitary matrices known as Givens
rotations.
Given a vector

1
kxk2
R2 , there exists an orthogonal matrix G such that GT x =
. The Householder
x=
2
0

transformation is one example of such a matrix G. An alternative is the Givens rotation: G =


Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

310

where 2 + 2 = 1. (Notice that and can be thought of as the cosine and sine of an angle.) Then

T




=


GT G =



2
2
+ +
1 0
=
,
=
2
2
+
0 1
which means that a Givens rotation is a unitary matrix.
Now, if = 1 /kxk2 and = 2 /kxk2 , then 2 + 2 = (21 + 22 )/kxk22 = 1 and

T
2
2
(1 + 2 )/kxk2


kxk2
.
=
1 =

1 =
(1 2 1 2 )/kxk2
2
2


0

16.6

QR Factorization of a Tridiagonal Matrix

Now, consider the 4 4 tridiagonal matrix

0,0 0,1 0


1,0 1,1 1,2

0 2,1 2,2

0
0 3,2

0,0
one can compute 1,0 and 1,0 so that
From
1,0

1,0

1,0
0,0
1,0 1,0
1,0

2,3

3,3

b 0,0

Then

b 0,1
b 0,0
b 0,2 0 1,0 1,0 0 0 0,0 0,1
0
0

0
b 1,1
b 1,2 0 1,0 1,0 0 0 1,0 1,1 1,2 0
=

0 2,1 2,2 2,3 0


0
1 0 0 2,1 2,2 2,3

0
0
0 1
0
0
3,2 3,3
0
0
3,2 3,3

b 1,1

one can compute 2,1 and 2,1 so that


Next, from
2,1

b
b
b

2,1

2,1
1,1 = 1,1 .
2,1 2,1
2,1
0

16.6. QR Factorization of a Tridiagonal Matrix

311

Then

b 0,0

b 0,1
b 0,2

b
b
b 1,1
b 1,2

1,3

b 2,2

3,2 3,3

Finally, from

0
b
b

b 2,3

b 0,0


0 2,1 2,1 0 0


0 2,1 2,1 0 0


0
0
0
0
1

b 2,2

one can compute 3,2 and 3,2 so that

3,2

3,2 3,2

b 0,1
b 0,2

b 1,1
b 1,2

2,1 2,2

2,3

3,2

3,2

3,2

3,3

b 2,2

3,2

b
b 2,2

Then

b
b 0,1

0,0

b
b 1,1
0

0
0

0
0


b 0,2 0



b
b
b 1,2
b 1,3

=

b
b
b 2,2
b 2,3


b 3,3
0

1 0

0 1

1 0

3,2

3,2

0 1 3,2 3,2

b
b 0,1
b 0,2

0,0

b
b
b 1,1
b 1,2
0

b 2,2
0

0
0
3,2

0
b
b

1,3

b 2,3

3,3

The matrix Q is the orthogonal matrix that results from multiplying the different Givens rotations together:

1,0 1,0

1,0 1,0
Q=

0
0

0
0

1 0

0 0

0
0
0 0 0 2,1 2,1 0 0 1

1 0 0 2,1 2,1 0 0 0 3,2 3,2

0 0 3,2 3,2
0 1
0
0
0
1

However, it is typically not explicitly formed.

(16.2)

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

312

The next question is how to compute RQ given the QR factorization of the tridiagonal matrix:

b 0,0

b 0,1

b
b

1,1

b 0,2

b
b

b
b 2,2

1,2

0,0


1,0

0
|

0
b
b

1,0

1,0

0
0

0
0
1
0
}


1,0

b
b 2,3

0
b 3,3

0
{z
0,1
b 1,1

b 0,2

b
b

0
b
b

b
b 2,2

b
b 2,3

b 3,3

1,3

1,3

1,2

1,0

0,0


1,0

0
|

2,1

2,1

2,1

2,1

0
0

0
0

3,2

3,2

3,2

3,2

{z
0,1 0,2

1,1

1,2

0
b
b

2,1

2,2

b
b 2,3

b 3,3

1,3

0,0

1,0

{z
0,1
1,1

0,2
1,2

2,1

2,2

3,2

1,3

.
2,3

3,3

A symmetry argument can be used to motivate that 0,2 = 1,3 = 0.

16.7

The Implicitly Shifted QR Algorithm

16.7.1

Upper Hessenberg and tridiagonal matrices

Definition 16.7 A matrix is said to be upper Hessenberg if all entries below its first subdiagonal equal
zero.
In other words, if matrix A Rnn is upper Hessenberg, it looks like

0,0 0,1 0,2


1,0 1,1 1,2

0 2,1 2,2

A= .
..
..
..
.
.

0
0
0

0
0
0

0,n1
1,n1

2,n1
..
..
.
.
..
. n2,n2

0,n1

1,n1

2,n1

.
..

n2,n2

n1,n2 n1,n2

Obviously, a tridiagonal matrix is a special case of an upper Hessenberg matrix.

16.7. The Implicitly Shifted QR Algorithm

16.7.2

313

The Implicit Q Theorem

The following theorem sets up one of the most remarkable algorithms in numerical linear algebra, which
allows us to greatly simplify the implementation of the shifted QR algorithm when A is tridiagonal.
Theorem 16.8 (Implicit Q Theorem) Let A, B Rnn where B is upper Hessenberg and has only positive
elements on its first subdiagonal and assume there exists a unitary matrix Q such that QT AQ = B. Then Q
and B are uniquely determined by A and the first column of Q.
Proof: Partition

Q=

q0 q1 q2 qn2 qn1

0,0 0,1 0,2


1,0 1,1 1,2

0 2,1 2,2


and B =
0
0 3,2

.
..
..
..
.
.

0
0
0

0,n2

1,n2

2,n2

..
.

3,n2
..
.

0,n1

1,n1

2,n1

.
3,n1

..

n1,n2 n1,n1

Notice that AQ = QB and hence




A q0 q1 q2 qn2 qn1

q0 q1 q2 qn2 qn1

0,0 0,1 0,2


1,0 1,1 1,2

 0 2,1 2,2

0
0 3,2

.
..
..
..
.
.

0
0
0

0,n2

1,n2

2,n1

...

3,n2
..
.

0,n1

1,n1

.
3,n1

..

n1,n2 n1,n1

Equating the first column on the left and right, we notice that
Aq0 = 0,0 q0 + 1,0 q1 .
Now, q0 is given and kq0 k2 since Q is unitary. Hence
qT0 Aq0 = 0,0 qT0 q0 + 1,0 qT0 q1 = 0,0 .
Next,
1,0 q1 = Aq0 0,0 q0 = q1 .
Since kq1 k2 = 1 (it is a column of a unitary matrix) and 1,0 is assumed to be positive, then we know that
1,0 = kq1 k2 .
Finally,
q1 = q1 /1,0 .
The point is that the first column of B and second column of Q are prescribed by the first column of Q and
the fact that B has positive elements on the first subdiagonal.
In this way, each column of Q and each column of B can be determined, one by one.

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

314

Homework 16.9 Give all the details of the above proof.


* SEE ANSWER
Notice the similarity between the above proof and the proof of the existence and uniqueness of the QR
factorization!
To take advantage of the special structure of A being symmetric, the theorem can be expanded to
Theorem 16.10 (Implicit Q Theorem) Let A, B Rnn where B is upper Hessenberg and has only positive elements on its first subdiagonal and assume there exists a unitary matrix Q such that QT AQ = B.
Then Q and B are uniquely determined by A and the first column of Q. If A is symmetric, then B is also
symmetric and hence tridiagonal.

16.7.3

The Francis QR Step

The Francis QR Step combines the steps (A(k1) k I) Q(k) R(k) and A(k+1) := R(k) Q(k) + k I into a
single step.
Now, consider the 4 4 tridiagonal matrix

0,0 0,1 0
0

1,0 1,1 1,2 0

I
0 2,1 2,2 2,3

0
0 3,2 3,3

0,0
, yielding 1,0 and 1,0 so that
The first Givens rotation is computed from
1,0

1,0 1,0
0,0 I

1,0 1,0
1,0
has a zero second entry. Now, to preserve eigenvalues, any orthogonal matrix that is applied from the left
must also have its transpose applied from the right. Let us compute

0,0


b 1,0


b 2,0
0

b 1,0

b 2,0

b 1,1

b 1,2

b 2,1

2,2

2,3

3,2

3,3



1,0

=

0

0

1,0

1,0

1,0

0
1,0

0
0
1
0

0,0

Next, from

b 1,0

0,1

1,1

1,2

2,1

2,2

2,3

3,2

3,3


1,0

one can compute 2,0 and 2,0 so that

b 2,0

2,0 2,0
2,0

1,0

1,0

.
0

2,0

1,0

b 1,0

b 2,0

1,0
0

Then

0,0

1,0


1,0

1,1
b
b

0
b
b

2,1

2,1

b 2,2

b 3,1

b 3,2

b 3,1

0
=

b 2,3

0
3,3
0

2,0

2,0

2,0

2,0

0,0

b
0
1,0

b 2,0
0

0
1

b 1,0

b 2,0

b 1,1

b 1,2

b 2,1

2,2

2,3

3,2

3,3

2,0

2,0

2,0

2,0

16.7. The Implicitly Shifted QR Algorithm

315

From: Gene H Golub <golub@stanford.edu>


Date: Sun, 19 Aug 2007 13:54:47 -0700 (PDT)
Subject: John Francis, Co-Inventor of QR
Dear Colleagues,
For many years, I have been interested in meeting J G F Francis, one of
the co-inventors of the QR algorithm for computing eigenvalues of general
matrices. Through a lead provided by the late Erin Brent and with the aid
of Google, I finally made contact with him.
John Francis was born in 1934 in London and currently lives in Hove, near
Brighton. His residence is about a quarter mile from the sea; he is a
widower. In 1954, he worked at the National Research Development Corp
(NRDC) and attended some lectures given by Christopher Strachey.
In 1955,56 he was a student at Cambridge but did not complete a degree.
He then went back to NRDC as an assistant to Strachey where he got
involved in flutter computations and this led to his work on QR.
After leaving NRDC in 1961, he worked at the Ferranti Corp and then at the
University of Sussex. Subsequently, he had positions with various
industrial organizations and consultancies. He is now retired. His
interests were quite general and included Artificial Intelligence,
computer languages, systems engineering. He has not returned to numerical
computation.
He was surprised to learn there are many references to his work and
that the QR method is considered one of the ten most important
algorithms of the 20th century. He was unaware of such developments as
TeX and Math Lab. Currently he is working on a degree at the Open
University.
John Francis did remarkable work and we are all in his debt. Along with
the conjugate gradient method, it provided us with one of the basic tools
of numerical analysis.
Gene Golub
Figure 16.7: Posting by the late Gene Golub in NA Digest Sunday, August 19, 2007 Volume 07 : Issue
34. An article on the ten most important algorithms of the 20th century, published in SIAM News, can be
found at http://www.uta.edu/faculty/rcli/TopTen/topten.pdf.

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

again preserves eigenvalues. Finally, from

316

b 2,1

one can compute 3,1 and 3,1 so that

b 3,1

3,1 3,1

3,1

b 2,1

b 3,1

3,1

2,1
0

Then

0,0

1,0

0,0

1,0

1,0

1,1

2,1

2,1

2,2

2,3

3,2

3,2

3,2

3,3

3,2

3,2

1,0

1,1
b
b


0

=
1

0

The matrix Q is the orthogonal


gether:

1,0 1,0

1,0 1,0
Q=

0
0

0
0

0
b
b

3,1

3,1

3,1

3,1

2,1

b 2,2

b 3,1

b 2,3

b 3,1

b 3,2

3,3

2,1

matrix that results from multiplying the different Givens rotations to

1 0
0
0

0
0
0 2,0 2,0 0 0 1

0 2,0 2,0 0 0 0 3,1 3,1

0
0

0
0
0
1
3,1
3,1

1,0


1,0

It is important to note that the first columns of Q is given by


, which is exactly the same first
0

0
column had Q been computed as in Section 16.6 (Equation 16.2). Thus, by the Implicit Q Theorem, the
tridiagonal matrix that results from this approach is equal to the tridiagonal matrix that would be computed
by applying the QR factorization from Section 16.6 with A I, A I QR followed by the formation
of RQ + I using the algorithm for computing RQ in Section 16.6.
b i+1,i is often referred to as chasing the bulge while the entire
The successive elimination of elements
process that introduces the bulge and then chases it is known as a Francis Implicit QR Step. Obviously,
the method generalizes to matrices of arbitrary size, as illustrated in Figure 16.8. An algorithm for the
chasing of the bulge is given in Figure 18.4. (Note that in those figures T is used for A, something that
needs to be made consistent in these notes, eventually.) In practice, the tridiagonal matrix is not stored as
a matrix. Instead, its diagonal and subdiagonal are stored as vectors.

16.7.4

0 0

0 0

1 0

0 1

A complete algorithm

This last section shows how one iteration of the QR algorithm can be performed on a tridiagonal matrix
by implicitly shifting and then chasing the bulge. All that is left to complete the algorithm is to note that
The shift k can be chosen to equal n1,n1 (the last element on the diagonal, which tends to
converge to the eigenvalue smallest in magnitude). In practice, choosing the shift to be an eigenvalue
of the bottom-right 2 2 matrix works better. This is known as the Wilkinson Shift.

16.7. The Implicitly Shifted QR Algorithm

317

Beginning of iteration
+

TT L

T
TML

TML

TMM

T
TBM

TBM

TBR

T00

t10

T
T10

11

T
t21

t21

T22

t32

T
t32

33

T
t43

t43

T44

Repartition

Update

T00

t10

T
t10

11

T
t21

t21

T22

t32

T
t32

33

T
t43

t43

T44

+
End of iteration

TT L

T
TML

TML

TMM

TBM

T
TBM

TBR

Figure 16.8: One step of chasing the bulge in the implicitly shifted symmetric QR algorithm.
If an element of the subdiagonal (and corresponding element on the superdiagonal) becomes small
enough, it can be considered to be zero and the problem deflates (decouples) into two smaller tridiagonal matrices. Small is often taken to means that |i+1,i | (|i,i | + |i+1,i+1 |) where is some
quantity close to the machine epsilon (unit roundoff).
If A = QT QT reduced A to the tridiagonal matrix T before the QR algorithm commensed, then the
Givens rotations encountered as part of the implicitly shifted QR algorithm can be applied from the

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

318

Algorithm: T := C HASE B ULGE(T )

TT L
?
?

Partition T TML TMM


?

0
TBM TBR
whereTT L is 0 0 and TMM is 3 3
while m(TBR ) > 0 do
Repartition

T00

11

t21

T22

T
t32

33

t43

tT
10


TMM
?
0

0
TBM TBR

0
where11 and 33 are scalars
(during final step, 33 is 0 0)
TT L

TML

Compute (, ) s.t. GT,t21 =

T44

21

, and assign t21 =

21
0

T22 = GT, T22 G,


T = tT G
t32
32 ,

(not performed during final step)

Continue with

TT L

TML

?
TMM
TBM

T00

11

t21

T22

T
t32

33

t43

tT
10

? 0

0
TBR

0
0

T44

endwhile

Figure 16.9: Chasing the bulge.


right to the appropriate columns of Q so that upon completion Q is overwritten with the eigenvectors
of A. Notice that applying a Givens rotation to a pair of columns of Q requires O(n) computation
per Givens rotation. For each Francis implicit QR step O(n) Givens rotations are computed, making the application of Givens rotations to Q of cost O( n2 ) per iteration of the implicitly shifted QR
algorithm. Typically a few (2-3) iterations are needed per eigenvalue that is uncovered (by deflation) meaning that O(n) iterations are needed. Thus, the QR algorithm is roughly of cost O(n3 )
if the eigenvalues are accumulated (in addition to the cost of forming the Q from the reduction to
tridiagonal form, which takes another O(n3 ) operations.)

16.7. The Implicitly Shifted QR Algorithm

319

If an element on the subdiagonal becomes zero (or very small), and hence the corresponding element
of the superdiagonal, then the problem can be deflated: If

T =

T00

T11





0
0

then
The computation can continue separately with T00 and T11 .
One can pick the shift from the bottom-right of T00 as one continues finding the eigenvalues of
T00 , thus accelerating the computation.
One can pick the shift from the bottom-right of T11 as one continues finding the eigenvalues of
T11 , thus accelerating the computation.
One must continue to accumulate the eigenvectors by applying the rotations to the appropriate
columns of Q.
Because of the connection between the QR algorithm and the Inverse Power Method, subdiagonal
entries near the bottom-right of T are more likely to converge to a zero, so most deflation will happen
there.
A question becomes when an element on the subdiagonal, i+1,i can be considered to be zero. The
answer is when |i+1,i | is small relative to |i | and |i+1,i+1 . A typical condition that is used is
q
|i+1,i | u |i,i i + 1, i + 1|.
If A Cnn is Hermitian, then its Spectral Decomposition is also computed via the following steps,
which mirror those for the real symmetric case:
Reduce to tridiagonal form. Householder transformation based similarity transformations can
again be used for this. This leaves one with a tridiagonal matrix, T , with real values along the
diagonal (because the matrix is Hermitian) and values on the subdiagonal and superdiagonal
that may be complex valued.
The matrix QT such A = QT T QH
T can then be formed from the Householder transformations.
A simple step can be used to then change this tridiagonal form to have real values even on the
subdiagonal and superdiagonal. The matrix QT can be updated accordingly.
The tridiagonal QR algorithm that we described can then be used to diagonalize the matrix,
accumulating the eigenvectors by applying the encountered Givens rotations to QT . This is
where the real expense is: Apply the Givens rotations to matrix T requires O(n) per sweep.
Applying the Givens rotation to QT requires O(n2 ) per sweep.
For details, see some of our papers mentioned in the next section.

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

16.8

Further Reading

16.8.1

More on reduction to tridiagonal form

320

The reduction to tridiagonal form can only be partially cast in terms of matrix-matrix multiplication [21].
This is a severe hindrance to high performance for that first step towards computing all eigenvalues and
eigenvector of a symmetric matrix. Worse, a considerable fraction of the total cost of the computation is
in that first step.
For a detailed discussion on the blocked algorithm that uses FLAME notation, we recommend [45]
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ort, G. Joseph Elizondo.
Families of Algorithms for Reducing a Matrix to Condensed Form.
ACM Transactions on Mathematical Software (TOMS) , Vol. 39, No. 1, 2012
(Reduction to tridiagonal form is one case of what is more generally referred to as condensed form.)

16.8.2

Optimizing the tridiagonal QR algorithm

As the Givens rotations are applied to the tridiagonal matrix, they are also applied to a matrix in which
eigenvectors are accumulated. While one Implicit Francis Step requires O(n) computation, this accumulation of the eigenvectors requires O(n2 ) computation with O(n2 ) data. We have learned before that this
means the cost of accessing data dominates on current architectures.
In a recent paper, we showed how accumulating the Givens rotations for several Francis Steps allows
one to attain performance similar to that attained by a matrix-matrix multiplication. Details can be found
in [44]:
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ort.
Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance.
ACM Transactions on Mathematical Software (TOMS), Vol. 40, No. 3, 2014.

16.9

Other Algorithms

16.9.1

Jacobis method for the symmetric eigenvalue problem

(Not to be mistaken for the Jacobi iteration for solving linear systems.)
The oldest algorithm for computing the eigenvalues and eigenvectors of a matrix is due to Jacobi and
dates back to 1846 [27]. This is a method that keeps resurfacing, since it parallelizes easily. The operation
count tends to be higher (by a constant factor) than that of reduction to tridiagonal form followed by the
tridiagonal QR algorithm.
The idea is as follows: Given a symmetric 2 2 matrix

A31 =

11 13

31 33

16.9. Other Algorithms

321

Sweep 1
0
0

zero (1, 0)

zero (2, 0)

zero (3, 0)

0
0

zero (2, 1)

zero (3, 1)

zero (3, 2)

Sweep 2
0
0
0

zero (1, 0)

zero (2, 0)

zero (3, 0)

0
0

zero (2, 1)

zero (3, 1)

zero (3, 2)

Figure 16.10: Column-cyclic Jacobi algorithm.

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

322

There exists a rotation (which is of course unitary)

11 13

J31 =
31 33
such that

T
=
J31 A31 J31

11 31

11 31

31

31

31 33

33

11 31

33

b 11

b 33

We know this exists since the Spectral Decomposition of the 2 2 matrix exists. Such a rotation is called
a Jacobi rotation. (Notice that it is different from a Givens rotation because it diagonalizes a 2 2 matrix
when used as a unitary similarity transformation. By contrast, a Givens rotation zeroes an element when
applied from one side of a matrix.)
b 211 +
b 233 .
Homework 16.11 In the above discussion, show that 211 + 2231 + 233 =

* SEE ANSWER

Jacobi rotation rotations can be used to selectively zero off-diagonal elements by observing the following:



T
I 0 0
0
0
A00 a10 AT20 a30 AT40
I 0 0
0
0

0
aT

T
T 0
0

0
a

a
0

0
11
31
11
31
11
13

10

21
41

T
T

JAJ =
0
0
0
0
0 0 I
A20 a21 A22 a32 A42 0 0 I

0 31 0 33 0 aT 31 aT 33 aT 0 31 0 33 0
32
43

30

0 0 0
0
I
A40 a41 A42 a43 A44
0 0 0
0
I

A00 ab10 AT20 ab30 AT40

abT
10 b 11

=
A20 ab21

abT
30 0
A40 ab41
where

11 31

31

33

aT10
aT30

aT21
aT32

abT21

A22 ab32
b 33
abT32
A42 ab43
aT41
aT43

abT41

T
b
A42
= A,

abT43

A44
abT10
abT30

Importantly,
aT10 a10 + aT30 a30 = abT10 ab10 + abT30 ab30
aT21 a21 + aT32 a32 = abT21 ab21 + abT32 ab32
aT41 a41 + aT43 a43 = abT41 ab41 + abT43 ab43 .

abT21
abT32

abT41
abT43

16.10. The Nonsymmetric QR Algorithm

323

What this means is that if one defines off(A) as the square of the Frobenius norm of the off-diagonal
elements of A,
off(A) = kAk2F kdiag(A)k2F ,
b = off(A) 22 .
then off(A)
31
The good news: every time a Jacobi rotation is used to zero an off-diagonal element, off(A) decreases by twice the square of that element.
The bad news: a previously introduced zero may become nonzero in the process.
The original algorithm developed by Jacobi searched for the largest (in absolute value) off-diagonal
element and zeroed it, repeating this processess until all off-diagonal elements were small. The algorithm
was applied by hand by one of his students, Seidel (of Gauss-Seidel fame). The problem with this is that
searching for the largest off-diagonal element requires O(n2 ) comparisons. Computing and applying one
Jacobi rotation as a similarity transformation requires O(n) flops. Thus, for large n this is not practical.
Instead, it can be shown that zeroing the off-diagonal elements by columns (or rows) also converges to a
diagonal matrix. This is known as the column-cyclic Jacobi algorithm. We illustrate this in Figure 16.10.

16.9.2

Cuppens Algorithm

To be added at a future time.

16.9.3

The Method of Multiple Relatively Robust Representations (MRRR)

Even once the problem has been reduced to tridiagonal form, the computation of the eigenvalues and
eigenvectors via the QR algorithm requires O(n3 ) computations. A method that reduces this to O(n2 )
time (which can be argued to achieve the lower bound for computation, within a constant, because the n
vectors must be at least written) is achieved by the Method of Multiple Relatively Robust Representations
(MRRR) by Dhillon and Partlett [14, 13, 15, 16]. The details of that method go beyond the scope of this
note.

16.10

The Nonsymmetric QR Algorithm

The QR algorithm that we have described can be modified to compute the Schur decomposition of a
nonsymmetric matrix. We briefly describe the high-level ideas.

16.10.1

A variant of the Schur decomposition

Let A Rnn be nonsymmetric. Recall:


There exists a unitary matrix Q Cnn and upper triangular matrix R Cnn such that A = QRQH .
Importantly: even if A is real valued, the eigenvalues and eigenvectors may be complex valued.
The eigenvalues will come in conjugate pairs.
A variation of the Schur Decomposition theorem is

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

324

Theorem 16.12 Let A Rnn . Then there exist unitary Q Rnn and quasi upper triangular matrix
R Rnn such that A = QRQT .
Here quasi upper triangular matrix means that the matrix is block upper triangular with blocks of the
diagonal that are 1 1 or 2 2.
Remark 16.13 The important thing is that this alternative to the Schur decomposition can be computed
using only real arithmetic.

16.10.2

Reduction to upperHessenberg form

The basic algorithm for reducing a real-valued nonsymmetric matrix to upperHessenberg form, overwriting the original matrix with the result, can be explained similar to the explanation of the reduction of a
symmetric matrix to tridiagonal form. We assume that the upperHessenberg matrix overwrites the upperHessenbert part of A and the Householder vectors used to zero out parts of A overwrite the entries that they
annihilate (set to zero).

a01 A02
A

00

T
T
Assume that the process has proceeded to where A = a10 11 a12 with A00 being k k and

0 a21 A22
upperHessenberg, aT10 being a row vector with only a last nonzero entry, the rest of the submatrices
updated according to the application of previous k Householder transformations.
Let [u21 , , a21 ] := HouseV(a21 ).3
Update

A
00
T
a10

a01 A02
11

aT12

a21 A22

I 0

:= 0 1

0 0

A
00
T
= a10

a01 A02
A
00
T
0 a10 11 aT12

H
0 a21 A22

a01
A02 H

T
11
a12 H

Ha21 HA22 H

I 0 0

0 1 0

0 0 H

where H = H(u21 ).
Note that a21 := Ha21 need not be executed since this update was performed by the instance
of HouseV above.4
3

Note that the semantics here indicate that a21 is overwritten by Ha21 .
In practice, the zeros below the first element of Ha21 are not actually written. Instead, the implementation overwrites these
elements with the corresponding elements of the vector u21 .
4

16.10. The Nonsymmetric QR Algorithm

325

Algorithm: [A,t] := H ESS R ED

Partition A

AT L

AT R

ABL

ABR

UNB (A)

tT
, t

tB

whereAT L is 0 0 and tb has 0 elements


while m(AT L ) < m(A) do
Repartition

AT L

ABL

AT R
ABR

A00

aT
10
A20

a01
11
a21

A02

t0

t
T
aT12
,
1
tB
A22
t2

where11 is a scalar
[u21 , 1 , a21 ] := HouseV(a21 )
y01 := A02 u21
A02 := A02 1 y01 uT21
11 := aT12 u21
aT12 := aT12 + 11 uT21
y21 := A22 u21
:= uT21 y21 /2
z21 := (y21 u21 /1 )/1
w21 := (AT22 u21 u21 /)/
A22 := A22 (u21 wT21 + z21 uT21 ) (rank-2 update)

Continue with

AT L
ABL

AT R
ABR

A00

a01

aT
10
A20

11

a21

A02

t0

t
T 1
aT12
,

tB
A22
t2

endwhile
Figure 16.11: Basic algorithm for reduction of a nonsymmetric matrix to upperHessenberg form.

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

326

Original matrix

First iteration

Second iteration

Third iteration
Figure 16.12: Illustration of reduction of a nonsymmetric matrix to upperHessenbert form. The s denote
nonzero elements in the matrix.
The update A02 H requires
1
A02 := A02 (I u21 uT21 )

1
= A02 A02 u21 uT21
| {z }
y01
1
= A02 y01 uT21

(ger)

The update aT12 H requires


1
aT12 := aT12 (I u21 uT21 )

1
= aT12 aT12 u21 uT21
| {z }
11
 
11
= aT12
uT21 (axpy)

The update of A22 requires


1
1
A22 := (I u21 uT21 )A22 (I u21 uT21 )

16.10. The Nonsymmetric QR Algorithm

327

1
1
= (A22 u21 uT21 A22 )(I u21 uT21 )

1
1
1
= A22 u21 uT21 A22 A22 u21 uT21 + 2 u21 uT21 Au21 uT21
| {z }

2
 


1

1
T
T
T
T
= A22
u21 u21 A22 2 u21 u21
Au21 u21 2 u21 u21






1 T
1

= A22 u21 uT21 A22 uT21 Au21 u21


u

21
{z
}
{z
}
|
|
wT21
z21
=

1
1
A22 u21 wT21 z21 uT21

|
{z
}
rank-2 update

Continue this process with the updated A.


It doesnt suffice to only This is captured in the algorithm in Figure 16.11. It is also illustrated in Figure 16.12.
Homework 16.14 Give the approximate total cost for reducing a nonsymmetric A Rnn to upperHessenberg form.
* SEE ANSWER
For a detailed discussion on the blocked algorithm that uses FLAME notation, we recommend [34]
Gregorio Quintana-Ort and Robert A. van de Geijn.
Improving the performance of reduction to Hessenberg form.
ACM Transactions on Mathematical Software (TOMS) , Vol. 32, No. 2, 2006
and [45]
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ort, G. Joseph Elizondo.
Families of Algorithms for Reducing a Matrix to Condensed Form.
ACM Transactions on Mathematical Software (TOMS) , Vol. 39, No. 1, 2012
In those papers, citations to earlier work can be found.

16.10.3

The implicitly double-shifted QR algorithm

To be added at a future date!

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers

328

Chapter

17

Notes on the Method of Relatively Robust


Representations (MRRR)
The purpose of this note is to give some high level idea of how the Method of Relatively Robust Representations (MRRR) works.

329

Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR)

17.0.1

Outline
17.0.1

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

17.1

MRRR, from 35,000 Feet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

17.2

Cholesky Factorization, Again . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

17.3

The LDLT Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

17.4

The UDU T Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

17.5

The UDU T Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

17.6

The Twisted Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

17.7

Computing an Eigenvector from the Twisted Factorization . . . . . . . . . . . . . 341

17.1

330

MRRR, from 35,000 Feet

The Method of Relatively Robust Representations (MRRR) is an algorithm that, given a tridiagonal matrix,
computes eigenvectors associated with that matrix in O(n) time per eigenvector. This means it computes
all eigenvectors in O(n2 ) time, which is much faster than the tridiagonal QR algorithm (which requires
O(n3 ) computation). Notice that highly accurate eigenvalues of a tridiagonal matrix can themselves be
computed in O(n2 ) time. So, it is legitimate to start by assuming that we have these highly accurate
eigenvalues. For our discussion, we only need one.
The MRRR algorithm has at least two benefits of the symmetric QR algorithm for tridiagonal matrices:
It can compute all eigenvalues and eigenvectors of tridiagonal matrix in O(n2 ) time (versus O(n3 )
time for the symmetric QR algorithm).
It can efficiently compute eigenvectors corresponding to a subset of eigenvalues.
The benefit when computing all eigenvalues and eigenvectors of a dense matrix is considerably less because transforming the eigenvectors of the tridiagonal matrix back into the eigenvectors of the dense matrix
requires an extra O(n3 ) computation, as discussed in [44]. In addition, making the method totally robust
has been tricky, since the method does not rely exclusively on unitary similarity transformations.
The fundamental idea is that of a twisted factorization of the tridiagonal matrix. This note builds up
to what that factorization is, how to compute it, and how to then compute an eigenvector with it. We
start by reminding the reader of what the Cholesky factorization of a symmetric positive definite (SPD)
matrix is. Then we discuss the Cholesky factorization of a tridiagonal SPD matrix. This then leads to
the LDLT factorization of an indefinite and indefinite tridiagonal matrix. Next follows a discussion of the
UDU T factorization of an indefinite matrix, which then finally yields the twisted factorization. When the
matrix is nearly singular, an approximation of the twisted factorization can then be used to compute an
approximate eigenvalue.
Notice that the devil is in the details of the MRRR algorithm. We will not tackle those details.

17.1. MRRR, from 35,000 Feet

331

Algorithm: A := C HOL(A)

AT L ?

Partition A
ABL ABR
where AT L is 0 0
while m(AT L ) < m(A) do
Repartition

AT L
ABL

11 :=

A00

aT
10 11 ?
ABR
A20 a21 A22

11

a21 := a21 /11


A22 := A22 a21 aT21
Continue with

AT L
ABL

A00 ?

aT 11 ?
10

ABR
A20 a21 A22

endwhile

Figure 17.1: Unblocked algorithm for computing the Cholesky factorization. Updates to A22 affect only
the lower triangular part.

Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR)

Algorithm: A := C HOL TRI(A)

AFF
?
?

Partition A MF eTL MM
?

0
LM eF ALL
where AFF is 0 0
while m(AFF ) < m(A) do
Repartition

A00

A00

A
?
?
FF
eT
?
?

10 L 11

T
MF eL MM

0
21 22 ?

0
LM eF ALL
0
0 32 eF A33
11 :=

11

21 := 21 /11
22 := 22 221
Continue with

?
?
A
FF
10 eT 11
?
?
L

T
MF eL MM

0 21 22
?

0
LM eF ALL
0
0 32 eF A33
endwhile

Figure 17.2: Algorithm for computing the the Cholesky factorization of a tridiagonal matrix.

332

17.2. Cholesky Factorization, Again

17.2

333

Cholesky Factorization, Again

We have discussed in class the Cholesky factorization, A = LLT , which requires a matrix to be symmetric
positive definite. (We will restrict our discussion to real matrices.) The following computes the Cholesky
factorization:

11 aT21
.
Partition A
a21 A22
Update 11 :=

11 .

Update a21 := a21 /11 .


Update A22 := A22 a21 aT21 (updating only the lower triangular part).
Continue to compute the Cholesky factorization of the updated A22 .
The resulting algorithm is given in Figure 17.1.
In the special case where A is tridiagonal and SPD, the algorithm needs to be modified so that it can
take advantage of zero elements:

?
?
11

Partition A 21
, where ? indicates the symmetric part that is not stored. Here
22
?

0 32 eF A33
eF indicates the unit basis vector with a 1 as first element.

Update 11 := 11 .
Update 21 := 21 /11 .
Update 22 := 22 221 .

Continue to compute the Cholesky factorization of

22

32 eF

A33

The resulting algorithm is given in Figure 17.2. In that figure, it helps to interpret F, M, and L as First,
Middle, and Last. In that figure, eF and eL are the unit basis vectors with a 1 as first and last element,
respectively Notice that the Cholesky factor of a tridiagonal matrix is a lower bidiagonal matrix that
overwrites the lower triangular part of the tridiagonal matrix A.
Naturally, the whole matrix needs not be stored. But that detail is not important for our discussion.

17.3

The LDLT Factorization

Now, one can alternatively compute A = LDLT , where L is unit lower triangular and D is diagonal. We will
look at how this is done first and will then note that this factorization can be computed for any indefinite

Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR)

334

Algorithm: A := LDLT(A)

AT L ?

Partition A
ABL ABR
whereAT L is 0 0
while m(AT L ) < m(A) do
Repartition

AT L

A00

aT
10 11 ?
ABL ABR
A20 a21 A22
where11 is 1 1

Continue with

AT L
ABL

A00 ?

aT 11 ?
10

ABR
A20 a21 A22

endwhile

Figure 17.3: Unblocked algorithm for computing A LDLT , overwriting A.


(nonsingular) symmetric matrix. (Notice: the so computed L is not the same as the L computed by the
Cholesky factorization.) Partition

T
11 a21
1
0

0
,L
, and D 1
.
A
a21 A22
l21 L22
0 D22
Then

11 aT21
a21 A22

11
11 l21

0 D22
l21 L22

T
0
1 l21

T
L22 D22
0 L22

l21 L22

17.4. The UDU T Factorization

335

11

T + L D LT
11 l21 11 l21 l21
22 22 22

This suggests the following algorithm for overwriting the strictly lower triangular part of A with the strictly
lower triangular part of L and the diagonal of A with D:

T
11 a21
.
Partition A
a21 A22
11 := 11 = 11 (no-op).
Compute l21 := a21 /11 .
Update A22 := A22 l21 aT21 (updating only the lower triangular part).
a21 := l21 .
T .
Continue with computing A22 L22 D22 L22

This algorithm will complete as long as 11 6= 0, which happens when A is nonsingular. This is equivalent
to A does not having zero as an eigenvalue. Such a matrix is also called indefinite.
Homework 17.1 Modify the algorithm in Figure 17.1 so that it computes the LDLT factorization. (Fill in
Figure 17.3.)
* SEE ANSWER
Homework 17.2 Modify the algorithm in Figure 17.2 so that it computes the LDLT factorization of a
tridiagonal matrix. (Fill in Figure 17.4.) What is the approximate cost, in floating point operations, of
computing the LDLT factorization of a tridiagonal matrix? Count a divide, multiply, and add/subtract as
a floating point operation each. Show how you came up with the algorithm, similar to how we derived the
algorithm for the tridiagonal Cholesky factorization.
* SEE ANSWER
Notice that computing the LDLT factorization of an indefinite matrix is not a good idea when A is
nearly singular. This would lead to some 11 being nearly zero, meaning that dividing by it leads to very
large entries in l21 and corresponding element growth in A22 . (In other words, strange things will happen
down stream from the small 11 .) Not a good thing. Unfortunately, we are going to need something like
this factorization specifically for the case where A is nearly singular.

17.4

The UDU T Factorization

The fact that the LU, Cholesky, and LDLT factorizations start in the top-left corner of the matrix and work
towards the bottom-right is an accident of history. Gaussian elimination works in that direction, hence so
does LU factorization, and the rest kind of follow.
One can imagine, given a SPD matrix A, instead computing A = UU T , where U is upper triangular.
Such a computation starts in the lower-right corner of the matrix and works towards the top-left. Similarly,
an algorithm can be created for computing A = UDU T where U is unit upper triangular and D is diagonal.

Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR)

Algorithm: A := LDLT TRI(A)

AFF
?
?

Partition A MF eTL MM
?

0
LM eF ALL
whereALL is 0 0
while m(AFF ) < m(A) do
Repartition

A00

A
?
?
eT
FF
?
?
10 L 11

T
MF eL MM

?
0

21 22 ?

0
LM eF ALL
0
0 32 eF A33
where22 is a scalars

Continue with

A00

A
?
?
FF
10 eT 11
?
?
L

MF eTL MM

0 21 22
?

0
LM eF ALL
0
0 32 eF A33
endwhile

Figure 17.4: Algorithm for computing the the LDLT factorization of a tridiagonal matrix.

336

17.4. The UDU T Factorization

337

Algorithm: A := UDU T(A)

AT L AT R

Partition A
? ABR
whereABR is 0 0
while m(ABR ) < m(A) do
Repartition

AT L AT R

A00 a01 A02

? 11 aT
12

? ABR
? ? A22
where11 is 1 1

Continue with

AT L AT R
?

A00 a01 A02

ABR
?

11 aT12

? A22

endwhile

Figure 17.5: Unblocked algorithm for computing A = UDU T . Updates to A00 affect only the upper
triangular part.

Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR)

338

Homework 17.3 Derive an algorithm that, given an indefinite matrix A, computes A = UDU T . Overwrite
only the upper triangular part of A. (Fill in Figure 17.5.) Show how you came up with the algorithm,
similar to how we derived the algorithm for LDLT .
* SEE ANSWER

17.5

The UDU T Factorization

Homework 17.4 Derive an algorithm that, given an indefinite tridiagonal matrix A, computes A = UDU T .
Overwrite only the upper triangular part of A. (Fill in Figure 17.6.) Show how you came up with the algorithm, similar to how we derived the algorithm for LDLT .
* SEE ANSWER
Notice that the UDU T factorization has the exact same shortcomings as does LDLT when applied
to a singular matrix. If D has a small element on the diagonal, it is again down stream that this can
cause large elements in the factored matrix. But now down stream means towards the top-left since the
algorithm moves in the opposite direction.

17.6

The Twisted Factorization

Let us assume that A is a tridiagonal matrix and that we are given one of its eigenvalues (or rather, a very
good approximation), , and let us assume that this eigenvalue has multiplicity one. Indeed, we are going
to assume that it is well-separated meaning there is no other eigevalue close to it.
Here is a way to compute an associated eigenvector: Find a nonzero vector in the null space of B =
A I. Let us think back how one was taught how to do this:
Reduce B to row-echelon form. This may will require pivoting.
Find a column that has no pivot in it. This identifies an independent (free) variable.
Set the element in vector x corresponding to the independent variable to one.
Solve for the dependent variables.
There are a number of problems with this approach:
It does not take advantage of symmetry.
It is inherently unstable since you are working with a matrix, B = A I that inherently has a bad
condition number. (Indeed, it is infinity if is an exact eigenvalue.)
is not an exact eigenvalue when it is computed by a computer and hence B is not exactly singular.
So, no independent variables will even be found.
These are only some of the problems. To overcome this, one computes something called a twisted factorization.
Again, let B be a tridiagonal matrix. Assume that is an approximate eigenvalue of B and let A =
B I. We are going to compute an approximate eigenvector of B associated with by computing a vector
that is in the null space of a singular matrix that is close to A. We will assume that is an eigenvalue of

17.6. The Twisted Factorization

339

Algorithm: A := UDU T TRI(A)

A
eT
0
FF FM L

Partition A ?
MM ML eTF

?
?
ALL
whereAFF is 0 0

while m(ALL ) < m(A) do


Repartition

A
e
0
FF FM L

?
MM ML eTF

?
?
ALL

A00 01 eL

11

12

22

23 eTF

A33
0

where

Continue with

A
e
0
FF FM L

?
MM ML eTF

?
?
ALL

A00 01 eL

11
?
?

T
22 23 eF

? A33
12

endwhile

Figure 17.6: Algorithm for computing the the UDU T factorization of a tridiagonal matrix.

Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR)

340

B that has multiplicity one, and is well-separated from other eigenvalues. Thus, the singular matrix close
to A has a null space with dimension one. Thus, we would expect A = LDLT to have one element on the
diagonal of D that is essentially zero. Ditto for A = UEU T , where E is diagonal and we use this letter to
be able to distinguish the two diagonal matrices.
Thus, we have
is an approximate (but not exact) eigenvalue of B.
A = B I is indefinite.
A = LDLT . L is bidiagonal, unit lower triangular, and D is diagonal with one small element on the
diagonal.
A = UEU T . U is bidiagonal, unit upper triangular, and E is diagonal with one small element on the
diagonal.
Let

A00

A = 10 eTL

10 eL

11

21 eTF

21 eF

A22

, L = 10 eTL

0
0
D
00

D = 0 1
0

0
0 D22

L00

U00 01 eL

0 ,U = 0

L22
0

1
21 eF

0
0
E

00

, and E = 0 1 0

0
0 E22

21 eTF

U22

where all the partitioning is conformal.


Homework 17.5 Show that, provided 1 is chosen appropriately,

L00

10 eTL

21 eTF

U22

D00

L00

0 10 eTL

E22
0

1
0

A00

= 01 eTL

21 eTF

U22

01 eL

11

21 eTF

21 eF

A22

(Hint: multiply out A = LDLT and A = UEU T with the partitioned matrices first. Then multiply out the
above. Compare and match...) How should 1 be chosen? What is the cost of computing the twisted
factorization given that you have already computed the LDLT and UDU T factorizations? A Big O
estimate is sufficient. Be sure to take into account what eTL D00 eL and eTF E22 eF equal in your cost estimate.
* SEE ANSWER

17.7. Computing an Eigenvector from the Twisted Factorization

17.7

341

Computing an Eigenvector from the Twisted Factorization

The way the method now works for computing the desired approximate eigenvector is to find the 1 that is
smallest in value of all possible such values. In other words, you can partition A, L, U, D, and E in many
ways, singling out any of the diagonal values of these matrices. The partitioning chosen is the one that
makes 1 the smallest of all possibilities. It is then set to zero so that

L
00

10 eTL

D
00

1 21 eTF 0

0 U22
0

L
00

0 0 10 eTL

0 E22
0

A00

1 21 eTF 01 eTL

0 U22
0

01 eL
11
21 eF

21 eTF .

A22

Because was assumed to have multiplicity one and well-separated, the resulting twisted factorization
has nice properties. We wont get into what that exactly means.
Homework 17.6 Compute x0 , 1 , and x2 so that

L00

10 eTL

21 eTF

U22

D00

0 0

0 E22

L00

10 eTL

0
|

21 eTF

U22
{z

Hint: 1

x0

1 = 0

x2
0
}

x
0

where x = 1 is not a zero vector. What is the cost of this computation, given that L00 and U22 have

x2
special structure?
* SEE ANSWER
The vector x that is so computed is the desired approximate eigenvector of B.
Now, if the eigenvalues of B are well separated, and one follows essentially this procedure to find
eigenvectors associated with each eigenvalue, the resulting eigenvectors are quite orthogonal to each other.
We know that the eigenvectors of a symmetric matrix with distinct eigenvalues should be orthogonal, so
this is a desirable property.
The approach becomes very tricky when eigenvalues are clustered, meaning that some of them are
very close to each other. A careful scheme that shifts the matrix is then used to make eigenvalues in a
cluster relatively well separated. But that goes beyond this note.

Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR)

342

Chapter

18

Notes on Computing the SVD of a Matrix


For simplicity we will focus on the case where A Rnn . We will assume that the reader has read Chapter 16.

343

Chapter 18. Notes on Computing the SVD of a Matrix

18.0.1

Outline
18.0.1

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

18.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

18.2

Reduction to Bidiagonal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

18.3

The QR Algorithm with a Bidiagonal Matrix . . . . . . . . . . . . . . . . . . . . 348

18.4

Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

18.1

344

Background

Recall:
Theorem 18.1 If A Rmm then there exists unitary matrices U Rmm and V Rnn , and diagonal
matrix Rmn such that A = UV T . This is known as the Singular Value Decomposition.
There is a relation between the SVD of a matrix A and the Spectral Decomposition of AT A:
Homework 18.2 If A = UV T is the SVD of A then AT A = V 2V T is the Spectral Decomposition of AT A.
* SEE ANSWER
The above exercise suggests steps for computing the Reduced SVD:
Form C = AT A.
Compute unitary V and diagonal such that C = QQT , ordering the eigenvalues from largest to
smallest on the diagonal of .
Form W = AV . Then W = U so that U and can be computed from W by choosing the diagonal elements of to equal the lengths of the corresponding columns of W and normalizing those
columns by those lengths.
The problem with this approach is that forming AT A squares the condition number of the problem. We
will show how to avoid this.

18.2

Reduction to Bidiagonal Form

In the last chapter, we saw that it is beneficial to start by reducing a matrix to tridiagonal or upperHessenberg form when computing the Spectral or Schur decompositions. The corresponding step when computing the SVD of a given matrix is to reduce the matrix to bidiagonal form.
We assume that the bidiagnoal matrix overwrites matrix A. Householder vectors will again play a role,
and will again overwrite elements that are annihilated (set to zero).

18.2. Reduction to Bidiagonal Form

345

Algorithm: [A,t, r] := B I DR ED

Partition A

AT L

AT R

ABL

ABR

UNB (A)

tT
rT
, t
,r

tB
rB

whereAT L is 0 0 and tb has 0 elements


while m(AT L ) < m(A) do
Repartition

AT L

ABL

AT R
ABR

A00

aT
10
A20

a01
11
a21

A02

t0

r0

t
r
T , T
aT12
1
1
,
tB
rB
A22
t2
r2

where11 , 1 , and 1 are scalars

1
u21

, 1 ,

11
a21

] := HouseV(

11
a21

aT12 := aT12 wT12


z21 := (y21 u21 /1 )/1
A22 := A22 u21 wT21 (rank-1 update)
[v12 , 1 , a12 ] := HouseV(a12 )
w21 := A22 v12 /1
A22 := A22 w21 vT12 (rank-1 update)

Continue with

AT L
ABL

AT R
ABR

A00

a01

aT
10
A20

11

a21

A02

t0

r0

t
r
T 1 , T 1
aT12
,

tB
rB
A22
t2
r2

endwhile
Figure 18.1: Basic algorithm for reduction of a nonsymmetric matrix to bidiagonal form.

Chapter 18. Notes on Computing the SVD of a Matrix

Third iteration

First iteration

Original matrix

346

Second iteration

Fourth iteration

Figure 18.2: Illustration of reduction to bidiagonal form form.

Partition A

11 aT12
a21 A22

. In the first iteration, this can be visualized as






We introduce zeroes below the diagonal in the first column by computing a Householder transformation and applying this from the left:

, 1 , 11 ] := HouseV( 11 ). This not only computes the House Let [


u21
a21
a21
holder vector, but also zeroes the entries in a21 and updates 11 .
Update

T
T
T

a
1
1

a
11 12 := (I 1

) 11 12 = 11 12
1
a21 A22
u21
u21
a21 A22
0 A 22
where

18.2. Reduction to Bidiagonal Form

347

* 11 is updated as part of the computation of the Householder vector;


T
T
T
* w12 := (a12 + u21 A22 )/1 ;
T
T
T
* a12 := a12 w12 ; and
T
* A 22 := A22 u21 w12 .
This introduces zeroes below the diagonal in the first column:

The zeroes below 11 are not actually written. Instead, the implementation overwrites these
elements with the corresponding elements of the vector uT21 .
Next, we introduce zeroes in the first row to the right of the element on the superdiagonal by computing a Householder transformation from aT12 :
Let [v12 , 1 , a12 ] := HouseV(a12 ). This not only computes the Householder vector, but also
zeroes all but the first entry in aT12 , updating that first entry.
Update

aT12
A22

:=

aT12
A22

12 eT0

(I 1 v12 vT12 ) =

A22

where
T
* 12 equals the first element of the updated a12 ;
* w21 := A22 v12 /1 ;
T
* A 22 := A22 w21 v12 .
This introduces zeroes to the right of the superdiagonal in the first row:

0
0

The zeros to the right of the first element of aT12 are not actually written. Instead, the implementation overwrites these elements with the corresponding elements of the vector vT12 .
Continue this process with the updated A22 .
The above observations are captured in the algorithm in Figure 18.1. It is also illustrated in Figure 18.2.

Chapter 18. Notes on Computing the SVD of a Matrix

348

Homework 18.3 Homework 18.4 Give the approximate total cost for reducing A Rnn to bidiagonal
form.
* SEE ANSWER
For a detailed discussion on the blocked algorithm that uses FLAME notation, we recommend [45]
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ort, G. Joseph Elizondo.
Families of Algorithms for Reducing a Matrix to Condensed Form.
ACM Transactions on Mathematical Software (TOMS) , Vol. 39, No. 1, 2012

18.3

The QR Algorithm with a Bidiagonal Matrix

Let B = UBT AVB be bidiagonal where UB and VB have orthonormal columns.


Lemma 18.5 Let B Rmn be upper bidiagonal and T Rnn with T = BT B. Then T is tridiagonal.
Proof: Partition

B=

B00 10 el
0

11

where el denotes the unit basis vector with a 1 in the last entry. Then
T

e
B

e
B
0
B

e
B
10 l
00
00 10 l = 00
00 10 l
BT B =
0
11
0
11
10 eTl 11
0
11

BT00 B00 10 BT00 el


,

=
10 eTl B 210 + 211
where 10 BT00 el is (clearly) a vector with only a nonzero in the last entry.
The above proof also shows that if B has no zeroes on its superdiagonal, then neither does BT B.
This means that one can apply the QR algorithm to matrix BT B. The problem, again, is that this squares
the condition number of the problem.
The following extension of the Implicit Q Theorem comes to the rescue:
Theorem 18.6 Let C, B Rmn . If there exist unitary matrices U Rmm and V Rnn so that B =
U T CV is upper bidiagonal and has positive values on its superdiagonal then B and V are uniquely determined by C and the first column of V .
The above theorem supports an algorithm that, starting with an upper bidiagonal matrix B, implicitly
performs a shifted QR algorithm with T = BT B.
Consider the 5 5 bidiagonal matrix

0,0 0,1 0
0
0

0
0
1,1
1,2

B= 0
0 2,2 2,3 0
.

0
0

3,3
3,4

0
0
0
0 4,4

18.3. The QR Algorithm with a Bidiagonal Matrix

349

Then T = BT B equals

20,0


0,1 0,0

T =
0

23,4 + 24,4

where the ?s indicate dont care entries. Now, if an iteration of the implicitly shifted QR algorithm were
executed with this matrix, then the shift would be k = 23,4 + 24,4 (actually, it is usually computed from
the bottom-right 2 2 matrix, a minor detail) and the first Givens rotation would be computed so that

0,1

0,1

20,0 I

0,1 0,1

0,1 1,1

has a zero second entry. Let us call the n n matrix with this as its top-left submatrix G0 . Then applying
this from the left and right of T means forming G0 T G0 = G0 BT BGT0 = (BGT0 )T (BGT0 ). What if we only
apply it to B? Then

0,0


1,0

0,1
1,1

1,2

2,2

2,3

3,3

3,4

4,4

0,0

0,1

1,1

1,2

2,2

2,3

3,3

3,4

4,4


0,1

0,1

0,1

0,1

0
.

The idea now is to apply Givens rotation from the left and right to chasethe bulge
except that now the
0,0
one can compute 1,0
rotations applied from both sides need not be the same. Thus, next, from
1,0

1,0 1,0
0,0
0,0
=

. Then
and 1,0 so that
1,0 1,0
1,0
0

0,0

1,1

02
1,2

0,1

2,2

2,3

3,3

3,4

4,4




1,0

=
0



0

0

1,0

1,0

1,0

0,0

0
1,0

0
0

0
0
1
0
0

0,1
1,1

1,2

2,2

2,3

3,3

3,4

4,4

0,2 0,2

0,1 =
Continuing, from 0,1 one can compute 0,2 and 0,2 so that
0,2
0,2 0,2
0,2

Chapter 18. Notes on Computing the SVD of a Matrix

350

0,2 . Then
0

0,0

0,1

1,1

1,2

2,1

2,2

2,3

3,3

3,4

4,4

And so forth, as illustrated in Figure 18.3.

18.4

Putting it all together

0,0

1,1

02
1,2

0,1

2,2

2,3

3,3

3,4

4,4

1,0

1,0

1,0

1,0

18.4. Putting it all together

351

0
0

Figure 18.3: Illustration of one sweep of an implicit QR algorithm with a bidiagonal matrix.

Chapter 18. Notes on Computing the SVD of a Matrix

352

Algorithm: B := C HASE B ULGE B I D(B)

BT L BT M
?

Partition B 0
BMM BMR

0
0
BBR
whereBT L is 0 0 and BMM is 2 2
while m(BBR ) 0 do
Repartition

B00

BT M

11

12

21

22

23

33

34 eT0

B44

0


0
BMM BMR
0

0
0
BBR
0
0
where11 and 33 are scalars
(during final step, 33 is 0 0)
BT L

01 el

L1

Compute (L1 , L1 ) s.t.

L1

1,1

=
L1 L1
2,1
overwriting 1,1 with 1,1

1,2 1,3
L1
L1
0
1,2

:=

2,2 2,3
L1 L1
2,2 2,3

1,1

if m(BBR ) 6= 0

Compute (R1 , R1 ) s.t.

R1

R1

1,2

=
R1 R1
1,3
overwriting 1,2 with 1,2

1,2
1,2 1,3
0

R1 R1

2,2 2,3
:= 2,2 2,3
R1
R1
3,2 3,3
0
3,3

1,2
0

Continue with

BT L

BT M

BMM

BMR

BBR

B00

01 el

11

12

21

22

23

33

34 eT0

B44

endwhile

Figure 18.4: Chasing the bulge.

More Chapters to be Added in the Future!

353

Chapter 18. Notes on Computing the SVD of a Matrix

354

Answers

Chapter 1. Notes on Simple Vector and Matrix Operations (Answers)

Homework 1.1 Partition A

A=

a0 a1 an1

Convince yourself that the following hold:

a0 a1 an1

abT0

abT
1
.
..

abTm1

T

aT
1
= .
..

aTm1




= ab0 ab1 abn1 .

aT0

a0 a1 an1

H

aH
0

aH
1
= .
..

aH
m1

355

abT0

abT
1
= .
..

abTm1

abT0

abT
1
.
..

abTm1




= ab0 ab1 abn1 .

* BACK TO TEXT

Homework 1.2 Partition x into subvectors:

x0

x1

x= .
..

xN1

Convince yourself that the following hold:

x0

x1

x= .
..

xN1

xT =

x0T

xH =

H
x0H x1H xN1

x1T

T
xN1


.

.
* BACK TO TEXT

Homework 1.3 Partition A

A0,0

A0,1

A0,N1

A=

A1,0
..
.

A1,1
..
.

A1,N1
..
.

AM1,0 AM1,1 AM1,N1


N1
where Ai, j Cmi ni . Here M1
i=0 mi = m and j=0 ni = n.
Convince yourself that the following hold:

356

A0,0

A0,1

A0,N1

A1,0
..
.

A1,1
..
.

A1,N1
..
.

AM1,0 AM1,1 AM1,N1

AT0,0

AT1,0

ATM1

AT
AT1,1 ATM1,1

0,1
=
..
..
..

.
.

AT0,N1 AT1,N1 ATM1,N1

A0,0

A0,1

A0,N1

A0,0

A0,1

A0,N1

A1,0
..
.

A1,1
..
.

A1,N1
..
.

A1,0
..
.

A1,1
..
.

A1,N1
..
.

AM1,0 AM1,1 AM1,N1

A0,0

A0,1

A0,N1

A1,0
..
.

A1,1
..
.

A1,N1
..
.

AM1,0 AM1,1 AM1,N1

AM1,0 AM1,1 AM1,N1


H

AH
0,0

AH
1,0

AH
M1

AH
AH
AH

0,1
1,1
M1,1
=
..
..
..

.
.

H
H
AH
0,N1 A1,N1 AM1,N1

* BACK TO TEXT

Homework 1.4 Convince yourself of the following:




xT = 0 1 n1 .
(x)T = xT .
(x)H = xH .


x0
x0


x1 x1


. =
..
..
.


xN1
xN1

* BACK TO TEXT

Homework 1.5

x0

x1

.
..

xN1

Convince yourself of the following:




y0
x0 + y0


y1
x1 + y1


+ . =
..
..
.


yN1
xN1 + yN1
357

. (Provided xi , yi Cni and N1


i=0 ni = n.)

* BACK TO TEXT

Homework 1.6 Convince yourself of the following:

x0
y0

x1 y1

N1
ni
H
. . = N1
i=0 xi yi . (Provided xi , yi C and i=0 ni = n.)
.. ..

xN1
yN1
* BACK TO TEXT

Homework 1.7 Prove that xH y = yH x.


Answer:
xH y =

n1

n1

n1

n1

ii =

ii =

ii =

ii = yH x

i=0

i=0

i=0

i=0

* BACK TO TEXT

Homework 1.8 Follow the instructions in the below video to implement the two algorithms for computing y := Ax + y, using MATLAB and Spark. You will want to use the laff routines summarized in
Appendix B. You can visualize the algorithm with PictureFLAME.

* YouTube
* Downloaded
Video
Answer: Implementations are given in Figure 1.5. They can also be found in
Programming/chapter01 answers/Mvmult unb var1.m (see file only) (view in MATLAB)
and
Programming/chapter01 answers/Mvmult unb var2.m (see file only) (view in MATLAB)
* BACK TO TEXT

Homework 1.9 Implement the two algorithms for computing A := yxT + A, using MATLAB and Spark.
You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
Answer: Implementations can be found in
358

f u n c t i o n [ y o u t ] = M v m u l t u n b v a r 1 ( A, x , y )
[ AL , AR ] = F L A P a r t 1 x 2 ( A,

f u n c t i o n [ y o u t ] = M v m u l t u n b v a r 2 ( A, x , y )

...
0 , FLA LEFT ) ;

[ AT , . . .
AB ] = F L A P a r t 2 x 1 ( A, . . .
0 , FLA TOP ) ;

[ xT , . . .
xB ] = F L A P a r t 2 x 1 ( x , . . .
0 , FLA TOP ) ;

[ yT , . . .
yB ] = F L A P a r t 2 x 1 ( y , . . .
0 , FLA TOP ) ;

w h i l e ( s i z e ( AL , 2 ) < s i z e ( A, 2 ) )
w h i l e ( s i z e ( AT , 1 ) < s i z e ( A, 1 ) )
[ A0 , a1 , A2 ] = F L A R e p a r t 1 x 2 t o 1 x 3 ( AL , AR, . . .
1 , FLA RIGHT ) ;

[ A0 , . . .
a1t , . . .
A2 ] = F L A R e p a r t 2 x 1 t o 3 x 1 ( AT , . . .
AB, . . .
1 , FLA BOTTOM ) ;

[ x0 , . . .
chi1 , . . .
x2 ] = F L A R e p a r t 2 x 1 t o 3 x 1 ( xT , . . .
xB , . . .
1 , FLA BOTTOM ) ;

[ y0 , . . .
psi1 , . . .
y2 ] = F L A R e p a r t 2 x 1 t o 3 x 1 ( yT , . . .
yB , . . .
1 , FLA BOTTOM ) ;

%%
% y = c h i 1 * a1 + y ;
y = l a f f a x p y ( c h i 1 , a1 , y ) ;

%%
%%
% psi1 = a1t * x + psi1 ;
psi1 = l a f f d o t s ( a1t , x , psi1 ) ;

[ AL , AR ] = F L A C o n t w i t h 1 x 3 t o 1 x 2 ( A0 , a1 , A2 , . . .
FLA LEFT ) ;

%%
[ xT , . . .
xB ] = F L A C o n t w i t h 3 x 1 t o 2 x 1 ( x0 , . . .
chi1 , . . .
x2 , . . .
FLA TOP ) ;

[ AT , . . .
AB ] = F L A C o n t w i t h 3 x 1 t o 2 x 1 ( A0 , . . .
a1t , . . .
A2 , . . .
FLA TOP ) ;

end
[ yT , . . .
yB ] = F L A C o n t w i t h 3 x 1 t o 2 x 1 ( y0 , . . .
psi1 , . . .
y2 , . . .
FLA TOP ) ;

y out = y ;

return

end
y o u t = [ yT
yB ] ;
return

Figure 1.5: Implementations of the two unblocked variants for computing y := Ax + y.

359

Programming/chapter01 answers/Rank1 unb var1.m (see file only) (view in MATLAB)


and
Programming/chapter01 answers/Rank1 unb var2.m (see file only) (view in MATLAB)
* BACK TO TEXT

Homework 1.10 Prove that the matrix xyT where x and y are vectors has rank at most one, thus explaining
the name rank-1 update.
Answer: There are a number of ways of proving this:
The rank of a matrix equals the number of linearly independent columns it has. But each column of xyT
is a multiple of vector x and hence there is at most one linearly independent column.
The rank of a matrix equals the number of linearly independent rows it has. But each row of xyT is a
multiple of row vector yT and hence there is at most one linearly independent row.
The rank of a matrix equals the dimension of the column space of the matrix. If z is in the column space
of xyT then there must be a vector w such that z = (xyT )w. But then z = yT wx and hence is a multiple
of x. This means the column space is at most one dimensional and hence the rank is at most one.
There are probably a half dozen other arguments... For now, do not use an argument that utilizes the SVD,
since we have not yet learned about it.
* BACK TO TEXT

Homework 1.11 Implement C := AB +C via matrix-vector multiplications, using MATLAB and Spark.
You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
Answer: An implementation can be found in
Programming/chapter01 answers/Gemm unb var1.m (see file only) (view in MATLAB)
* BACK TO TEXT

Homework 1.12 Argue that the given matrix-matrix multiplication algorithm with m n matrix C, m k
matrix A, and k n matrix B costs, approximately, 2mnk flops. Answer: This can be easily argued by
noting that each update of a rwo of matrix C costs, approximately, 2nk flops. There are m such rows to be
computed.
* BACK TO TEXT

360

Homework 1.13 Implement C := AB + C via row vector-matrix multiplications, using MATLAB and
Spark. You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm
with PictureFLAME. Hint: yT := xT A + yT is not supported by a laff routine. You can use laff gemv
instead.
Answer: An implementation can be found in
Programming/chapter01 answers/Gemm unb var2.m (see file only) (view in MATLAB)
* BACK TO TEXT

Homework 1.9 Argue that the given matrix-matrix multiplication algorithm with m n matrix C, m k
matrix A, and k n matrix B costs, approximately, 2mnk flops. Answer: This can be easily argued
by noting that each rank-1 update of matrix C costs, approximately, 2mn flops. There are k such rank-1
updates to be computed.
* BACK TO TEXT

Homework 1.10 Implement C := AB +C via rank-1 updates, using MATLAB and Spark. You will want
to use the laff routines summarized in Appendix B. You can visualize the algorithm with PictureFLAME.
Answer: An implementation can be found in
Programming/chapter01 answers/Gemm unb var3.m (see file only) (view in MATLAB)
* BACK TO TEXT

361

Chapter 2. Notes on Vector and Matrix Norms (Answers)


Homework 2.2 Prove that if : Cn R is a norm, then (0) = 0 (where the first 0 denotes the zero vector
in Cn ).
Answer: Let x Cn and ~0 the zero vector of size n and 0 the scalar zero. Then
(~0) = (0 x)

0 x = ~0

= |0|(x)

() is homogeneous

= 0

algebra
* BACK TO TEXT

Homework 2.7 The vector 1-norm is a norm.


Answer: We show that the three conditions are met:
Let x, y Cn and C be arbitrarily chosen. Then
x 6= 0 kxk1 > 0 (k k1 is positive definite):
Notice that x 6= 0 means that at least one of its components is nonzero. Lets assume that
j 6= 0. Then
kxk1 = |0 | + + |n1 | | j | > 0.
kxk1 = ||kxk1 (k k1 is homogeneous):
kxk1 = |0 | + + |n1 | = |||0 | + + |||n1 | = ||(|0 | + + |n1 |) =
||(|0 | + + |n1 |) = ||kxk1 .
kx + yk1 kxk1 + kyk1 (k k1 obeys the triangle inequality).

kx + yk1 =

=
=

|0 + 0 | + |1 + 1 | + + |n1 + n1 |
|0 | + |0 | + |1 | + |1 | + + |n1 | + |n1 |
|0 | + |1 | + + |n1 | + |0 | + |1 | + + |n1 |
kxk1 + kyk1 .
* BACK TO TEXT

Homework 2.9 The vector -norm is a norm.


Answer: We show that the three conditions are met:
Let x, y Cn and C be arbitrarily chosen. Then
362

x 6= 0 kxk > 0 (k k is positive definite):


Notice that x 6= 0 means that at least one of its components is nonzero. Lets assume that
j 6= 0. Then
kxk = max |i | | j | > 0.
i

kxk = ||kxk (k k is homogeneous):


kxk = maxi |i | = maxi |||i | = || maxi |i | = ||kxk .
kx + yk kxk + kyk (k k obeys the triangle inequality).

kx + yk = max |i + i |
i

max(|i | + |i |)
i

max(|i | + max | j |)
i

= max |i | + max | j | = kxk + kyk .


i

* BACK TO TEXT

Homework 2.13 Show that the Frobenius norm


 is a norm.

Answer: The answer is to realize that if A =

a0 a1 an1

then

v
u
u
u
a
v
v
v
u 0
um1 n1
un1 m1
un1
u
u
u
u
u a1
2
2
2
t
t
t
kAkF =
.
|i, j | = |i, j | = ka j k2 = u
u
..
i=0 j=0
j=0 i=0
j=0
u

t
an1

2





.




2

In other words, it equals the vector 2-norm of the vector that is created by stacking the columns of A on
top of each other. The fact that the Frobenius norm is a norm then comes from realizing this connection
and exploiting it.
Alternatively, just grind through the three conditions!
* BACK TO TEXT

abT0

abT
1
Homework 2.20 Let A Cmn and partition A = .
..

abTm1

. Show that

kAk = max kb
ai k1 = max (|i,0 | + |i,1 | + + |i,n1 |)
0i<m

0i<m

363

abT0

abT
1
Answer: Partition A = .
..

abTm1

kAk

. Then









T
Tx
b
b
a
a




0
0




abT
abT x
1
1

= max kAxk = max . x = max

.

..
..
kxk =1
kxk =1
kxk =1








abT

abT x
m1
m1



n1
n1
= max max |aTi x| = max max | i,p p | max max |i,p p |
i

kxk =1

kxk =1

n1

kxk =1

p=0

p=0

n1

n1

max max (|i,p || p |) max max (|i,p |(max |k |)) max max (|i,p |kxk )
i

kxk =1

kxk =1

p=0

p=0

kxk =1

p=0

n1

= max (|i,p | = kb
ai k1
i

p=0

so that kAk maxi kb


ai k1 .
We
also
want
to
show
that kAk maxi kb
ai k1 . Let k be such that maxi kb
ai k1 = kb
ak k1 and pick y =

ak k1 . (This is a matter of picking i so that


. so that abTk y = |k,0 | + |k,1 | + + |k,n1 | = kb
..

n1
|i | = 1 and i k,i = |k,i |.) Then

kAk = max kAxk


kxk1 =1



abT

0
abT
1
= max .
.
kxk1 =1
.

abT
m1




abT0 y




abT y
1

=

..


.




T
ab y
m1





abT



0

abT

1
x .

..





abT
m1

|b
aTk y| = abTk y = kb
ak k1 . = max kb
ai k1
i

* BACK TO TEXT

364

Homework 2.21 Let y Cm and x Cn . Show that kyxH k2 = kyk2 kxk2 .


Answer: W.l.o.g. assume that x 6= 0.
We know by the Holder inequality that |xH z| kxk2 kzk2 . Hence
kyxH k2 = max kyxH zk2 = max |xH z|kyk2 max kxk2 kzk2 kyk2 = kxk2 kyk2 .
kzk2 =1

kzk2 =1

kzk2 =1

But also
kyxH k2 = max kyxH zk2 /kzk2 kyxH xk2 /kxk2 = kyk2 kxk2 .
z6=0

Hence
kyxH k2 = kyk2 kxk2 .
* BACK TO TEXT
Homework 2.25 Show that kAxk kAk, kxk .
Answer: W.l.o.g. let x 6= 0.
kAk, = max
y6=0

kAyk kAxk

.
kyk
kxk

Rearranging this establishes the result.


* BACK TO TEXT
Homework 2.26 Show that kABk kAk, kBk .
Answer:
kABk, = max kABxk max kAk, kBxk = kAk, max kBxk = kAk, kBk
kxk =1

kxk =1

kxk =1

* BACK TO TEXT
Homework 2.27 Show that the Frobenius norm, k kF , is submultiplicative.
Answer:

kABk2F



abH

0
abH
1
= .
..


abH
m1
=

b0 b1 bn1

2
|b
aH
i b j|

kbaik22kb j k22
i

(Cauchy-Schwartz)

!
=

2


abH
abH
abH


0 b1
0 bn1
0 b0


abH b0

abH
abH

0
0 b1
0 bn1
=
.
.
..


..
..
.





abH b0 abH b1 abH bn1
m1
m1
m1
F

kbaik22
i

kb j k2

= kAk2F kBk2F .

365

so that kABk2F kAk22 kBk22 . Taking the square-root of both sides established the desired result.
* BACK TO TEXT

Homework 2.28 Let kk be a matrix norm induced by the kk vector norm. Show that (A) = kAkkA1 k
1.
Answer:
kIk = kAA1 k kAkkA1 k.
But
kIk = max kIxk = max kxk = 1.
kxk=1

kxk

Hence 1 kAkkA1 k.
* BACK TO TEXT

Homework 2.29 A vector x R2 can


berepresented by the point to which it points when rooted at the
2
origin. For example, the vector x = can be represented by the point (2, 1). With this in mind, plot
1
1. The points corresponding to the set {x | kxk2 = 1}.
'$

&%

2. The points corresponding to the set {x | kxk1 = 1}.


@
@
@
@
@
@

3. The points corresponding to the set {x | kxk = 1}.

* BACK TO TEXT

366

Homework 2.30 Consider

A=

1 2 1
1 1

1. Compute the 1-norm of each column and pick the largest of these:
kAk1 = max(|1| + | 1|, |2| + |1|, | 1| + |0|) = max(2, 3, 1) = 3
2. Compute the 1-norm of each row and pick the largest of these:
kAk = max(|1| + |2| + | 1|, | 1| + |1| + |0|) = max(2, 4) = 4
p

3. kAkF = 12 + 22 + (1)2 + (1)2 + 12 + 02 = 8 = 2 2


* BACK TO TEXT

Homework 2.31 Show that for all x Cn

1. kxk2 kxk1 nkxk2 .


Answer:
kxk2 kxk1 : We will instead prove that kxk22 kxk21 :
!2
kxk22 = |i |2
i

|i|

= kxk21 .

Hence kxk2 kxk1 .

kxk1 nkxk2 : This requires the following trick: Let e be the vector of size n with only ones as its
entry and |x| be the vector
x except with its elements replaced by their absolute values. Notice that
T
T

kxk1 = |x| e = |x| e . Also, k |x| k2 = kxk2 .
Now

kxk1 = |x|T e = |x|T e k |x| k2 kek2 = nkxk2 .

Here, the inequality comes from applying the Cauchy-Schwartz inequality.


2. kxk kxk1 nkxk .
Answer:
kxk kxk1 :
Let k be the index such that |xk | = kxk . Then
kxk = |xk | |xi | = kxk1 .
i

kxk1 nkxk :
Let k be the index such that |xk | = kxk . kx|1 = i |xi | i |xk | = n|xk | = nkxk .
367

3. (1/ n)kxk2 kxk nkxk2 .


Answer:

(1/ n)kxk2 kxk : Let k be the index such that |xk | = kxk .
r
p
q
2
|x
|

i |xi |2 kxk2
k
i

kxk = |xk | = |xk |2 =

=
n
n
n
kxk

nkxk2 :

This follows from the fact that kxk kxk1 sqrtnkxk2 .


* BACK TO TEXT

Homework 2.32 (I need to double check that this is true!)


Prove that if for all x kxk kxk then kAk kAk, .
Answer:
kAk = max kAxk max kAxk = max kAxk = kAk, .
kxk =1

kxk =1

kxk =1

* BACK TO TEXT

Homework 2.33 Partition

A=

a0 a1 an1

abT0

abT
1
= .
..

abTm1

Prove that
1. kAkF =

p
ka0 k2 + ka1 k2 + + kan1 k2 .

2. kAkF = kAT kF .
p
3. kAkF = kb
a0 k2 + kb
a1 k2 + + kb
an1 k2 .
(Note: abi = (b
aTi )T .)

* BACK TO TEXT

Homework 2.34 Let for e j Rn (a standard basis vector), compute


ke j k2 = 1
368

ke j k1 = 1
ke j k = 1
ke j k p = 1
* BACK TO TEXT

Homework 2.35 Let for I Rnn (the identity matrix), compute


kIkF =

kIk1 = 1
kIk = 1
* BACK TO TEXT

Homework 2.36 Let k k be a vector norm defined for a vector of any size (arbitrary n). Let k k be the
induced matrix norm. Prove that kIk = 1 (where I equals the identity matrix).
Answer:
kIk = max kIxk = max kxk = 1.
kxk=1

kxk=1

Conclude that kIk p = 1 for any p-norm.


* BACK TO TEXT

Homework 2.37 Let D =

0
..
.

1
... ...

(a diagonal matrix). Compute

n1

kDk1 = maxi |i |
kDk = maxi |i |
* BACK TO TEXT

369

Homework 2.38 Let D =

0
..
.

1
.. ..
.
.

(a diagonal matrix). Then

n1
kDk p = max |i |.
i

Prove your answer.


Answer:
kDk p =

=
=

max kDxk p

kxk p =1




0
0
0


0

0
1

1
max . .


.. ...
...
kxk p =1 ..
0




0 0 n1
n1
p




0 0






p
1 1


max
= max p |0 0 | p + + |n1 n1 | p
.
..

kxk p =1
kxk p =1




n1 n1
p
q
p
p
p
p
|0 | |0 | + + |n1 | p |n1 | p max p max |k | p |0 | p + + max |k | p |n1 | p
max
k
k
kxk p =1
kxk p =1
p
max |k | max p |0 | p + + |n1 | p = max |k | max kxk p = max |k |.
k

kxk p =1

kxk p =1

Also,
kDk p =

max kDxk p kDeJ k p = kJ eJ k p = |J |keJ k p = |J | = max |k |

kxk p =1

where J is chosen to be the index so that = |J | = maxk |k |. Thus


max |k | kDk p max |k |
k

from which we conclude that kDk p = maxk |k |.


* BACK TO TEXT
Homework 2.39 Let y Cn . Show that kyH k2 = kyk2 .
Answer:
kyH k2 =

max kyH xk2 = max |yH x|

kxk2 =1

kxk2 =1

max kyk2 kxk2 = kyk2 max kxk2 = kyk2 .

kxk2 =1

kxk2 =1

370

Also,
kyH k2 = max
x6=0

kyH xk2 kyH yk2 kyk22

=
= kyk2 .
kxk2
kyk2
kyk2

Hence kyH k2 = kyk2 .


* BACK TO TEXT

371

Chapter 3. Notes on Orthogonality and the SVD (Answers)




Homework 3.4 Let Q Cmn (with n m). Partition Q = q0 q1 qn1 . Show that Q is an
orthonormal matrix if and only if q0 , q1 , . . . , qn1 are mutually orthonormal.


Answer: Let Q Cmn (with n m). Partition Q = q0 q1 qn1 . Then
H 
q0 q1 qn1
q0 q1

qH
0
qH 

1
= . q0 q1 qn1
..

H
qn1

qH q0
qH
qH
0 q1
0 qn1
0
qH q0
qH
qH
1
1 q1
1 qn1
=
.
.
..

..
..
.

H
H
qH
n1 q0 qn1 q1 qn1 qn1

QH Q =

Now consider QH Q = I:

qH
qH
qH
0 q0
0 q1
0 qn1

qH q0
qH
qH
1
1 q1
1 qn1

.
.
..

..
..
.

H
H
qH
n1 q0 qn1 q1 qn1 qn1

qn1

1 0 0

1
..
.

0
..
.

0
..
.

0 0 1

Clearly Q is orthogonal if and only if q0 , q1 , . . . , qn1 are mutually orthonormal.


* BACK TO TEXT
Homework 3.6 Let Q Cmm . Show that if Q is unitary then Q1 = QH and QQH = I.
Answer: If Q is unitary, then QH Q = I. If A, B Cmm , the matrix B such that BA = I is the inverse of
A. Hence Q1 = QH . Also, if BA = I then AB = I and hence QQH = I.
* BACK TO TEXT
Homework 3.7 Let Q0 , Q1 Cmm both be unitary. Show that their product, Q0 Q1 , is unitary.
Answer: Obviously, Q0 Q1 is a square matrix.
H
(Q0 Q1 )H (Q0 Q1 ) = QH
QH
1 = I.
1 Q0 Q0 Q1 = |
1{zQ}
| {z }
I
I

372

Hence Q0 Q1 is unitary.
* BACK TO TEXT

Homework 3.8 Let Q0 , Q1 , . . . , Qk1 Cmm all be unitary. Show that their product, Q0 Q1 Qk1 , is
unitary.
Answer: Strictly speaking, we should do a proof by induction. But instead we will make the more
informal argument that
H H
(Q0 Q1 Qk1 )H Q0 Q1 Qk1 = QH
k1 Q1 Q0 Q0 Q1 Qk1

H
QH
QH
0 Q0 Q1 Qk1 = I.
k1 Q1 |
{z }
I
|
{z
}
I
|
{z
}
I
{z
}
|
I

* BACK TO TEXT

Homework 3.9 Let U Cmm be unitary and x Cm , then kUxk2 = kxk2 .


H
H
2
Answer: kUxk22 = (Ux)H (Ux) = xH U
| {zU} x = x x = kxk2 . Hence kUxk2 = kxk2 .
I

* BACK TO TEXT

Homework 3.10 Let U Cmm and V Cnn be unitary matrices and A Cmn . Then
kUAk2 = kAV k2 = kAk2 .
Answer:
kUAk2 = maxkxk2 =1 kUAxk2 = maxkxk2 =1 kAxk2 = kAk2 .
kAV k2 = maxkxk2 =1 kAV xk2 = maxkV xk2 =1 kA(V x)k2 = maxkyk2 =1 kAyk2 = kAk2 .
* BACK TO TEXT

Homework 3.11 Let U Cmm and V Cnn be unitary matrices and A Cmn . Then
kUAkF = kAV kF = kAkF .
Answer:
373


2
Partition A = a0 a1 an1 . Then it is easy to show that kAk2F = n1
j=0 ka j k2 . Thus




2
2
kUAkF = kU a0 a1 an1 kF = k Ua0 Ua1 Uan1 k2F


n1

n1

kUa j k22 =

ka j k22 = kAk2F .

j=0

j=0

Hence kUAkF = kAkF .

abT0

abT
1
Partition A = . . Then it is easy to show that kAk2F
..

T
abm1

abT0
abT V

0
abT
abT V
1
1
kAV k2F = k . V k2F = k
..
..

abTm1
abTm1V
m1

= m1
aTi )H k22 . Thus
i=0 k(b

m1
2
k
=
F
k(baTi V )H k22

i=0

m1

kV H (b
aTi )H k22 =

i=0

k(baTi )H k22 = kAk2F .

i=0

Hence kAV kF = kAkF .


* BACK TO TEXT
Homework 3.13 Let D = diag(0 , . . . , n1 ). Show that kDk2 = maxn1
i=0 |i |.
Answer:

kDk22

2
0

0
0


0


0
1

1
= max kDxk22 = max . . .
.
..
..
..
.
kxk2 =1
kxk2 =1 ..
.

.


0 0 n1
n1
2

2


0 0




"
#
"
#


n1 
n1

1 1


= max
= max | j j |2 = max | j |2 | j |2
.
.

kxk2 =1
kxk2 =1 j=0
kxk2 =1 j=0
.




n1 n1
2 #
" 
"
# 

2
n1
n1
n1
n1
n1
2
2
2
2
max max |i | | j |
= max max |i | | j | = max |i |
max kxk22
kxk2 =1

j=0

i=0

kxk2 =1


2
n1
=
max |i | .
i=0

374

i=0

j=0

i=0

kxk2 =1

so that kDk2 maxn1


i=0 |i |.
Also, choose j so that | j | = maxn1
i=0 |i |. Then
n1

kDk2 = max kDxk2 kDe j k2 = k j e j k2 = | j |ke j k2 = | j | = max |i |.


kxk2 =1

i=0

n1
n1
so that maxn1
i=0 |i | kDk2 maxi=0 |i |, which implies that kDk2 = maxi=0 |i |.

* BACK TO TEXT

Homework 3.14 Let A =

AT
0

. Use the SVD of AT to show that kAk2 = kAT k2 .

Answer: Let AT = UT T VTH be the SVD of AT . Then

AT
UT T VTH
UT
=
=
T VTH ,
A=
0
0
0
which is the SVD of A. As a result, clearly the largest singular value of AT equals the largest singular
value of A and hence kAk2 = kAT k2 .
* BACK TO TEXT

Homework 3.15 Assume that U Cmm and V Cnn are unitary matrices. Let A, B Cmn with
B = UAV H . Show that the singular values of A equal the singular values of B.
Answer: Let A = UA AVAH be the SVD of A. Then B = UUA AVAH V H = (UUA )A (VVA )H where both
UUA and VVA are unitary. This gives us the SVD for B and it shows that the singular values of B equal the
singular values of A.
* BACK TO TEXT

Homework 3.16 Let A Cmn with A =

and assume that kAk2 = 0 . Show that kBk2

kAk2 . (Hint: Use the SVD of B.)


Answer: Let B = UB BVBH be the SVD of B. Then

A=

0
0

0
B

UB BVBH

0 UB

0 VB

which shows the relationship between the SVD of A and B. Since kAk2 = 0 , it must be the case that the
diagonal entries of B are less than or equal to 0 in magnitude, which means that kBk2 kAk2 .
* BACK TO TEXT
375

Homework 3.17 Prove Lemma 3.12 for m n.


Answer: You can use the following as an outline for your proof:
Proof: First, let us observe that if A
= 0 (thezero matrix) then the theorem trivially holds: A = UDV H
, so that DT L is 0 0. Thus, w.l.o.g. assume that A 6= 0.

where U = Imm , V = Inn , and D =

0
We will employ a proof by induction on m.
Base case: m = 1. In this case A =

abT0

where abT0 R1 n is its only row. By assumption,

abT0 6= 0. Then

H
v0


where v0 = (b
aT0 )H /kb
aT0 k2 . Choose V1 Cn(n1) so that V = v0 V1 is unitary. Then
A=
where DT L =


0

abT0

A=

kb
aT0 k2

abT0



kb
aT0 k2 )

and U =

kb
aT0 k2





v0 V1

H

= UDV H


1 .

Inductive step: Similarly modify the inductive step of the proof of the theorem.
By the Principle of Mathematical Induction the result holds for all matrices A Cmn with m n.
* BACK TO TEXT

Homework 3.35 Show that if A Cmm is nonsingular, then


kAk2 = 0 , the largest singular value;
kA1 k2 = 1/m1 , the inverse of the smallest singular value; and
2 (A) = 0 /m1 .
Answer: Answer needs to be filled in
* BACK TO TEXT

Homework 3.36 Let A Cnn have singular values 0 1 n1 (where n1 may equal zero).
Prove that n1 kAxk2 /kxk2 0 (assuming x 6= 0).
Answer:

kAxk2
kAyk2
max
= 0 .
kxk2
y6=0 kyk2
376

(for example, as a consequence of how 0 shows up in the derivation of the SVD).


kAyk2
kUV H yk2
kV H yk2
kAxk2
min
= min
= min
kxk2
kyk2
kyk2
y6=0 kyk2
y6=0
y6=0
H
kV yk2
kzk2
= min
=
min
= n1
y6=0 kV H yk2
z6=0 kzk2
(through a simple argument about the smallest element (in magnitude) on the diagonal of a diagonal
matrix).
* BACK TO TEXT

Homework 3.37 Let D Rnn be a diagonal matrix with entries 0 , 1 , . . . , n1 on its diagonal. Show
that

1. maxx6=0 kDxk2 /kxk2 = max0i<n |i |. Answer: It is usually simpler to square both sides: maxx6=0 kDxk22 /kxk22 =
max0i<n 2i
max kDxk22 /kxk22 =
x6=0


max kDxk22 = max 20 20 + + 2n1 2n1
kxk2 =1
kxk2 =1


2 2
2 2
max max i 0 + + max i n1 = max 2i max kxk22
kxk2 =1

max

0i<n

0i<n

0i<n

0i<n

kxk2 =1

2i

Also, letting k be the index such that k = max0i<n 2i ,


max kDxk22 /kxk22 kDek k22 /kek k22 = kDek k22
x6=0

= eTk D2 ek = 2k = max 2i .
0i<n

2. minx6=0 kDxk2 /kxk2 = min0i<n |i |. Answer: Again, it is usually simpler to square both sides:
minx6=0 kDxk22 /kxk22 = min0i<n 2i

min kDxk22 = min 20 20 + + 2n1 2n1
kxk2 =1
kxk2 =1


2 2
2 2
min
min i 0 + + min i n1 = min 2i min kxk22

min kDxk22 /kxk22 =


x6=0

kxk2 =1

0i<n

0i<n

0i<n

min 2i

0i<n

Also, letting k be the index such that k = min0i<n 2i ,


min kDxk22 /kxk22 kDek k22 /kek k22 = kDek k22
x6=0

= eTk D2 ek = 2k = min 2i .
0i<n

*
377

kxk2 =1

* BACK TO TEXT

Homework 3.38 Let A Cnn have the property that for all vectors x Cn it holds that kAxk2 = kxk2 .
Use the SVD to prove that A is unitary.
Answer: Let A = UV H be the singular value decomposition. It suffices so show that I or, equivalently, that 0 = = n1 . This then means A is the result of multiplying three unitary matrices, and
hence unitary itself.
Answer 1: Partition
U=

u0 un1

and V =

v0 vn1

We will consider kAxk22 = kxk22 Then


kxk22 = kUV H xk22 = xH V U H UV H x = xH V 2V H x.
Recall that, by convention, has nonnegative real values on its diagonal.
Now, pick x = vi . Then

2 H
H 2
2
1 = kvi k22 = vH
i V V vi = ei ei = i .

Hence i = 1.
Answer 2: 0 1 n1 .
0 = max
x6=0

kAxk2
= 1.
kxk2

n1 = min
x6=0

kAxk2
= 1.
kxk2

Hence i = 1 for 0 i < n.


Answer 3: (Does not use the SVD) kAxk2 = kxk2 is equivalent to kAxk22 = kxk22 and to xH AH Ax = xH x.
H
H
H
Let ai denote the ith column of A. Pick x = ei . Then we see that eH
i A Aei = (Aei ) Aei = ai ai = 1
and hence the columns of A are all of length one. Now, consider x = ei + e j where i 6= j:
H H
H
H
H H
2 = kei + e j k22 = (ei + e j )H AH A(ei + e j ) = eH
i A Aei + 2ei A Ae j + e j A Ae j = 2 + 2ai a j .

This means that aH


i a j = 0. Hence the columns of A are mutually orthonormal and A is unitary.
* BACK TO TEXT

Homework 3.39 Use the SVD to prove that kAk2 = kAT k2


Answer:
kAT k2 = kUV H k2 = kk2 = 0 = kAk2 .
* BACK TO TEXT

378


Homework 3.40 Compute the SVD of

1
2

2 1


.

Answer: The trick is to recognize that you need to normalize each of the vectors:






1
1 1 1 

1 1

=
5
2
1 1
5
2
2
2



1 1
1 

=
( 5 2)
1 1
| {z }
5
2
2
|
{z
}
|
{z
}

H
V
U
* BACK TO TEXT

379

Chapter 4. Notes on Gram-Schmidt QR Factorization (Answers)


Homework 4.1 What happens in the Gram-Schmidt algorithm if the columns of A are NOT linearly
independent? Answer: If a j is the first column such that {a0 , . . . , a j } are linearly dependent, then aj will
equal the zero vector and the process breaks down.

How might one fix this? Answer: When a vector

aj

is encountered, the columns can be rearranged so that that column (or those columns) come last.
with
How can the Gram-Schmidt algorithm be used to identify which columns of A are linearly independent?
Answer: Again, if aj = 0 for some j, then the columns are linearly dependent.
* BACK TO TEXT

Homework 4.2 Convince yourself that the relation between the vectors {a j } and {q j } in the algorithms
in Figure 4.2 is given by

0,0 0,1 0,n1

 
 0 1,1 1,n1


,
a0 a1 an1 = q0 q1 qn1 .
.
.
.
..

..
..
..

0
0 n1,n1
where

1 for i = j
H
qi q j =
0 otherwise

and i, j =

qi a j

for i < j
j1

ka j i=0 i, j qi k2 for i = j

otherwise.

Answer: Just watch the video for this lecture!


* BACK TO TEXT

Homework 4.5 Let A have linearly independent columns and let A = QR be a QR factorization of A.
Partition





RT L RT R
,
A AL AR , Q QL QR , and R
0
RBR
where AL and QL have k columns and RT L is k k. Show that
1. AL = QL RT L : QL RT L equals the QR factorization of AL ,
2. C (AL ) = C (QL ): the first k columns of Q form an orthonormal basis for the space spanned by the
first k columns of A.
3. RT R = QH
L AR ,
4. (AR QL RT R )H QL = 0,
380

5. AR QL RT R = QR RBR , and
6. C (AR QL RT R ) = C (QR ).
Answer: Consider the fact that A = QR. Then


 
 RT L
AL AR = QL QR
0

RT R
RBR

QL RT L

QL RT R + QR RBR

Hence
AL = QL RT L

and AR = QL RT R + QR RBR .

The left equation answers 1.


Rearranging the right equation yields AR QL RT R = QR RBR ., which answers 5.
C (AL ) = C (QL ) can be shown by showing that C (AL ) C (QL ) and C (QL ) C (AL ):

C (AL ) C (QL ): Let y C (AL ). Then there exists x such that AL x = y. But then QL RT L x = y and
hence QL (RT L x) = y which means that y C (QL ).
C (QL ) C (AL ): Let y C (QL ). Then there exists x such that QL x = y. But then AL R1
T L x = y and
1
hence AL (RT L x) = y which means that y C (AL ). (Notice that RT L is nonsingular because it
is a triangular matrix that has only nonzeroes on its diagonal.)
This answer 2.
Take AR QL RT R = QR RBR . and multiply both side by QH
L:
H
QH
L (AR QL RT R ) = QL QR RBR

is equivalent to
H
QH
Q R = QH
Q R = 0.
L AR Q
| L{z L} T R
| L{z R} BR
I
0

Rearranging yields 3.
Since AR QL RT R = QR RBR we find that (AR QL RT R )H QL = (QR RBR )H QL and
H
(AR QL RT R )H QL = RH
BR QR QL = 0.

The proof of 6. follows similar to the proof of 2.


* BACK TO TEXT

Homework 4.6 Implement GS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (Im not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
Answer: Implementations can be found in Programming/chapter04 answers.
* BACK TO TEXT
381

Homework 4.7 Implement MGS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (Im not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
Answer: Implementations can be found in Programming/chapter04 answers.
* BACK TO TEXT

382

Chapter 6. Notes on Householder QR Factorization (Answers)

Homework 6.2 Show that if H is a reflector, then


HH = I (reflecting the reflection of a vector results in the original vector).
Answer:
(I 2uuH )(I 2uuH ) = I 2uuH 2uuH + 4u |{z}
uH u uH = I 4uuH + 4uuH = I
1

H = HH .
Answer:
(I 2uuH )H = I 2(uH )H uH = I 2uuH

H H H = I (a reflector is unitary).
Answer:
H H H = HH = I

* BACK TO TEXT

Homework 6.4 Show that if x Rn , v = x kxk2 e0 , and = vT v/2 then (I 1 vvT )x = kxk2 e0 .
Answer: This is surprisingly messy...
* BACK TO TEXT

Homework 6.5 Verify that

I 1

u2
u2
x2
0
1

kxk2 .
where = uH u/2 = (1 + uH
2 u2 )/2 and =

Hint:

= ||2

= kxk22

since H preserves the norm. Also,

kxk22

= |1 |2 + kx2 k22

and

z
z

z
|z| .

Answer: Again, surprisingly messy...


* BACK TO TEXT

383

Homework 6.6 Function Housev.m implements the steps in Figure 6.2 (left). Update this implementation
with the equivalent steps in Figure 6.2 (right), which is closer to how it is implemented in practice.
Answer: See Programming/chapter06 answers/Housev alt.m
* BACK TO TEXT

Homework 6.7 Show that

0 I

0
0

H
1


=
I

1
1
1
1
1
1


1
u2
u2
u2
u2

Answer:

0 I 1 1 1
1
u2
u2

1
1
1


1
u2
u2

0
0

H
1

= I

1 0 1 1
u2
u2

0 0
0

H
= I 0 1
u2
1

0 u2 u2 uH
2

H
0
0


= I 1 1 .
1


u2
u2

=
I

* BACK TO TEXT

Homework 6.8 Given A Rmn show that the cost of the algorithm in Figure 6.4 is given by
2
CHQR (m, n) 2mn2 n3 flops.
3
T
Answer: The bulk of the computation is in wT12 = (aT12 + uH
21 A22 )/1 and A22 u21 w12 . During the kth
H
iteration (when RT L is k k), this means a matrix-vector multiplication (u21 A22 ) and rank-1 update with

384

matrix A22 which is of size approximately (m k) (n k) for a cost of 4(m k)(n k) flops. Thus the
total cost is approximately
n1

n1

n1

n1

4(m k)(n k) = 4 (m n + j) j = 4(m n) j + 4 j2


k=0

j=0

j=0

j=0

n1

= 2(m n)n(n 1) + 4 j2
j=0

2(m n)n2 + 4

Z n
0

4
2
x2 dx = 2mn2 2n3 + n3 = 2mn2 n3 .
3
3
* BACK TO TEXT

Homework 6.9 Implement the algorithm in Figure 6.4 as


function [ Aout, tout ] = HQR unb var1( A, t )
Input is an mn matrix A and vector t of size n. Output is the overwritten matrix A and the vector of scalars
that define the Householder transformations. You may want to use Programming/chapter06/test HQR unb var1.m
to check your implementation.
Answer: See Programming/chapter06 answers/HQR unb var1.m.
* BACK TO TEXT

Homework 6.12 Implement the algorithm in Figure 6.6 as


function Aout = FormQ unb var1( A, t )
You may want to use Programming/chapter06/test HQR unb var1.m to check your implementation.
Answer: See Programming/chapter06 answers/FormQ unb var1.m.
* BACK TO TEXT

Homework 6.13 Given A Cmn the cost of the algorithm in Figure 6.6 is given by
2
CFormQ (m, n) 2mn2 n3 flops.
3
Answer: The answer for Homework 6.8 can be easily modified to establish this result.
* BACK TO TEXT

385

Homework 6.14 If m = n then Q could be accumulated by the sequence


Q = ( ((IH0 )H1 ) Hn1 ).
Give a high-level reason why this would be (much) more expensive than the algorithm in Figure 6.6.
Answer: The benefit of accumulating Q via the sequence (H1 (H0 I)) is that many zeroes are preserved,
thus reducing the necessary computations. In contrast, IH0 creates a dense matrix, after which application
of subsequent Householder transformations benefits less from zeroes.
* BACK TO TEXT
Homework 6.15 Consider u1 Cm with u1 6= 0 (the zero vector), U0 Cmk , and nonsingular T00 Ckk .
Define 1 = (uH
1 u1 )/2, so that
1
H1 = I u1 uH
1
1
equals a Householder transformation, and let
1 H
Q0 = I U0 T00
U0 .

Show that

1 H
Q0 H1 = (I U0 T00
U0 )(I

1
u1 uH
1 )=I
1

U0 u1

T00 t01

1


where t01 = QH
0 u1 .
Answer:
1 H
Q0 H1 = (I U0 T00
U0 )(I

1
u1 uH
1)
1

1
1 H 1
u1 uH
u1 uH
1 +U0 T00 U0
1
1
1
1
1
1 H
1
= I U0 T00
U0 u1 uH
U0 T00
t01 uH
1 +
1
1
1
1 H
= I U0 T00
U0

Also

U0 u1

T00 t01

= I

U0 u1

= I

U0 u1

U0 u1

H

1
T00

1
T00
t01 /1

1/1

1 H
T00
U0

1
T00
t01 uH
1 /1
1/1 uH
1

U0 u1

1 H
1
H
= I U0 (T00
U0 T00
t01 uH
1 /1 ) 1/1 u1 u1
1
1
1 H
1
= I U0 T00
U0 u1 uH
U0 T00
t01 uH
1 +
1
1
1

386

H

U0 u1

H

* BACK TO TEXT

Homework 6.16 Consider ui Cm with ui 6= 0 (the zero vector). Define i = (uH


i ui )/2, so that
Hi = I

1 H
ui u
i i

equals a Householder transformation, and let


U=

u0 u1 uk1

Show that
H0 H1 Hk1 = I UT 1U H ,
where T is an upper triangular matrix.
Answer: The results follows from Homework 6.15 via a proof by induction.
* BACK TO TEXT

Homework 6.17 Implement the algorithm in Figure 6.8 as


function T = FormT unb var1( U, t, T )
* BACK TO TEXT

Homework 6.18 Assuming all inverses exist, show that

T00 t01
0

1
T00

1
T00
t01 /1

1/1

Answer:

1
1
1
1
T
t
T
T00 t01 /1
T T
T00 T00 t01 /1 + 1 /1
I 0
00 01 00
= 00 00
=
.
0 1
0
1/1
0
1 /1
0 1
* BACK TO TEXT

387


Homework 6.19 Let A =

a0 a1 an1



1
, v= .
..

n1

ka0 k22


ka k2
1 2

=
..

.

ka2n1 k2

T
, q q = 1 (of same




1

T
T
size as the columns of A), and r = A q = . . Compute B := Aqr with B = b0 b1 bn1 .
..

n1
Then

kb0 k22
0 20

kb k2 2
1 2
1

=
.
.
.
..
..

2
2
kbn1 k2
n1 n1
Answer:
ai = (ai aTi qq) + aTi qq
and
(ai aTi qq)T q = aTi q aTi qqT q = aTi q aTi q = 0.
This means that
kai k22 = k(ai aTi qq) + aTi qqk22 = kai aTi qqk22 + kaTi qqk22 = kai i qk22 + ki qk22 = kbi k22 + 2i
so that
kbi k22 = kai k22 2i = i 2i .
* BACK TO TEXT

Homework 6.20 In Section 4.4 we discuss how MGS yields a higher quality solution than does CGS in
terms of the orthogonality of the computed columns. The classic example, already mentioned in Chapter 4,
that illustrates this is

1 1 1

0 0

A=
,
0 0

0 0

where = mach . In this exercise, you will compare and contrast the quality of the computed matrix Q
for CGS, MGS, and Householder QR factorization.
Start with the matrix
388

format long
eps = 1.0e-8
A = [
1
1
1
eps 0
0
0 eps 0
0
0 eps
}

% print out 16 digits


% roughly the square root of the machine epsilon

With the various routines you implemented for Chapter 4 and the current chapter compute
[ Q_CGS, R_CGS ] = CGS_unb_var1( A )
[ Q_MGS, R_MGS ] = MGS_unb_var1( A )
[ A_HQR, t_HQR ] = HQR_unb_var1( A )
Q_HQR = FormQ_unb_var1( A_HQR, t_HQR )
Finally, check whether the columns of the various computed matices Q are mutually orthonormal:
Q_CGS * Q_CGS
Q_MGS * Q_MGS
Q_HQR * Q_HQR
What do you notice? When we discuss numerical stability of algorithms we will gain insight into
why HQR produces high quality mutually orthogonal columns.
Check how well QR approximates A:
A - Q_CGS * triu( R_CGS )
A - Q_MGS * triu( R_MGS )
A - Q_HQR * triu( A_HQR( 1:3, 1:3 ) )
What you notice is that all approximate A well.
Later, we will see how the QR factorization can be used to solve Ax = b and linear least-squares
problems. At that time we will examine how accurate the solution is depending on which method for QR
factorization is used to compute Q and R.
Answer:
>> Q_CGS * Q_CGS
ans =
1.000000000000000
-0.000000007071068
-0.000000007071068

-0.000000007071068
1.000000000000000
0.500000000000000

-0.000000007071068
0.500000000000000
1.000000000000000

389

>> Q_MGS * Q_MGS


ans =
1.000000000000000
-0.000000007071068
-0.000000004082483

-0.000000007071068
1.000000000000000
0.000000000000000

-0.000000004082483
0.000000000000000
1.000000000000000

0
1.000000000000000
0

0
0
1.000000000000000

>> Q_HQR * Q_HQR


ans =
1.000000000000000
0
0

Clearly, the orthogonality of the matrix Q computed via Householder QR factorization is much more
accurate.
* BACK TO TEXT

Homework 6.21 Consider the matrix

A
B

where A has linearly independent columns. Let

A = QA RA be the QR factorization of A.

RA
RA
= QB RB be the QR factorization of
.

B
B

A
B

= QR be the QR factorization of

A
B

Assume that the diagonal entries of RA , RB , and R are all positive. Show that R = RB .
Answer:

Also,

A
B

A
B

QA
0

0
I

RA
B

QA

= QR. By the uniqueness of the QR factorization, Q =

QB RB
QA

QB and R = RB .
* BACK TO TEXT

390


Homework 6.22 Consider the matrix

where R is an upper triangular matrix. Propose a modB


ification of HQR UNB VAR 1 that overwrites R and B with Householder vectors and updated matrix R.
Importantly, the algorithm should take advantage of the zeros in R (in other words, it should avoid computing with the zeroes below its diagonal). An outline for the algorithm is given in Figure 6.15.
Answer: See Figure 6.6.
* BACK TO TEXT

Homework 6.23 Implement the algorithm from Homework 6.22.


* BACK TO TEXT

391

Algorithm: [R, B,t] := HQR U PDATE UNB VAR 1(R, B,t)



RT L RT R
tT
, B BL BR , t

Partition R
RBL RBR
tB
whereRT L is 0 0, BL has 0 columns, tT has 0 rows
while m(RT L ) < m(R) do
Repartition

RT L

RT R

RBL

RBR

r01 R02

R
00

rT
10
R20

T
r12

11

r21 R22

11

t
0

BL BR B0 b1 B2 ,

tB
t2
where11 is 1 1, b1 has 1 column, 1 has 1 row

b1

Update

T
r12

B2
via the steps

11

:= I

1
1

tT

, 1 := Housev

b1

1
b1

1 bH
1

T

r12

B2

T + bH B )/
wT12 := (r12
1
1 2

T
T
T
r
r w12

12 := 12
T
B2
B2 b2 w12

Continue with

RT L

RT R

RBL

RBR

r
R
00 01

rT 11
10
R20 r21

R02
T
r12

R22

BL

BR

B0 b1 B2

t
0

1
,

tB
t2

tT

endwhile
392
Figure 6.6: Algorithm that compute the QR factorization of a triangular matrix, R, appended with a ma-

Chapter 8. Notes on Solving Linear Least-squares Problems (Answers)


Homework 8.2 Let A Cmn with m < n have linearly independent rows. Show that there exist a lower
triangular matrix LL Cmm and a matrix QT Cmn with orthonormal
rows
 such that A = LL QT , noting

that LL does not have any zeroes on the diagonal. Letting L = LL 0 be Cmn and unitary Q =

Q
T , reason that A = LQ.
QB
Dont overthink the problem: use results you have seen before.
Answer: We know that AH Cnm with linearly independent
columns (and n m). Hence there exists

a unitary matrix Q = FlaOneByTwoQL QR and R =


zeroes on its diagonal such that
desired LL equals RH
T.

AH

RT
0

with upper triangular matrix RT that has no

= Q L RT . It is easy to see that the desired QT equals QT = Q H


L and the
* BACK TO TEXT

Homework 8.3 Let A Cmn with m < n have linearly independent rows. Consider
kAx yk2 = min kAz yk2 .
z

Use the fact that A = LL QT , where LL Cmm is lower triangular and QT has orthonormal rows, to argue
1
H
nm ) is a solution to the LLS
that any vector of the
QH
T LL y + QB wB (where wB is any vector in C
form
problem. Here Q =

QT

QB

Answer:
min kAz yk2 = min kLQz yk2
z

min kLw yk2


w
z = QH w




min
LL
w

H

z=Q w




w
T y


wB

min kLL wT yk2 .


w

z = QH w

Hence wT = LL1 y minimizes. But then

z = QH w =

QH
T

QH
B

393

wT
wB

= QH
T wT + QB wB .

describes all solutions to the LLS problem.


* BACK TO TEXT

Homework 8.4 Continuing Exercise 8.2, use Figure 8.1 to give a Classical Gram-Schmidt inspired algorithm for computing LL and QT . (The best way to check you got the algorithm right is to implement
it!)
Answer:
* BACK TO TEXT

Homework 8.5 Continuing Exercise 8.2, use Figure 8.2 to give a Householder QR factorization inspired
algorithm for computing L and Q, leaving L in the lower triangular part of A and Q stored as Householder
vectors above the diagonal of A. (The best way to check you got the algorithm right is to implement it!)
Answer:
* BACK TO TEXT

394

Chapter 8. Notes on the Condition of a Problem (Answers)


Homework 9.1 Show that, if A is a nonsingular matrix, for a consistent matrix norm, (A) 1.
Answer: Hmmm, I need to go and check what the exact definition of a consistent matrix norm is
here... What I mean is a matrix norm induced by a vector norm. The reason is that then kIk = 1.
Let k k be the norm that is used to define (A) = kAA1 k. Then
1 = kIk = kAA1 k kAkkA1 k = (A).
* BACK TO TEXT

Homework 9.2 If A has linearly independent columns, show that k(AH A)1 AH k2 = 1/n1 , where n1
equals the smallest singular value of A. Hint: Use the SVD of A.
Answer: Let A = UV H be the reduced SVD of A. Then
k(AH A)1 AH k2 =
=
=
=
=
=

k((UV H )H UV H )1 (UV H )H k2
k(V U H UV H )1V U H k2
k(V 1 1V H )V U H k2
kV 1U H k2
k1 k2
1/n1

(since the two norm of a diagonal matrix equals its largest diagonal element in absolute value).
* BACK TO TEXT

Homework 9.3 Let A have linearly independent columns. Show that 2 (AH A) = 2 (A)2 .
Answer: Let A = UV H be the reduced SVD of A. Then
kAH Ak2 k(AH A)1 k2
k(UV H )H UV H k2 k((UV H )H UV H )1 k2
kV 2V H k2 kV (1 )2V H k2
k2 k2 k(1 )2 k2


20
0 2
=
= 2 (A)2 .
=
n1
2n1

2 (AH A) =
=
=
=

* BACK TO TEXT

Homework 9.4 Let A Cnn have linearly independent columns.


395

Show that Ax = y if and only if AH Ax = AH y.


Answer: Since A has linearly independent columns and is square, we know that A1 and AH exist.
If Ax = y, then multiplying both sides by AH yields AH Ax = AH y. If AH Ax = AH y then multiplying
both sides by AH yields Ax = y.
Reason that using the method of normal equations to solve Ax = y has a condition number of 2 (A)2 .
* BACK TO TEXT

Homework 9.5 Let U Cnn be unitary. Show that 2 (U) = 1.


Answer: The SVD of U is given by U = UV H where = I and V = I. Hence 0 = n1 = 1 and
2 (U) = 0 /n1 = 1.
* BACK TO TEXT

Homework 9.6 Characterize the set of all square matrices A with 2 (A) = 1.
Answer: If 2 (A) = 1 then 0 /n1 = 1 which means that 0 = n1 and hence 0 = 1 = = n1 .
Hence the singular value decomposition of A must equal A = UV H = 0UV H . But UV H is unitary since
U and V H are, and hence we conclude that A must be a nonzero multiple of a unitary matrix.
* BACK TO TEXT

Homework 9.7 Let k k be a vector norm with corresponding induced matrix norm. If A Cnn is
nonsingular and
1
kAk
<
.
(9.1)
kAk
(A)
then A + A is nonsingular.
Answer: Equation (9.1) can be rewritten as kAkkA1 k < 1. We will prove that if A + A is singular,
then kAkkA1 k 1.

A + A is singular
< equivalent condition >
(A + A)y = 0 for some y 6= 0
< linear algebra >
y = A1 Ay
< take norms >
kyk = kA1 Ayk
< property of induced matrix norms >
kyk kA1 kkAkkyk
< kyk > 0 >
1 kA1 kkAk.
396

* BACK TO TEXT

Homework 9.8 Let A Cnn be nonsingular, Ax = y, and (A + A)(x + x) = y. Show that (for induced
norm)
kAk
kxk
(A)
.
kx + xk
kAk
Answer: (A + A)(x + x) = y can be rewritten as Ax + Ax + A(x + x) = y. Since Ax = y we find that
x = A1 A(x + x) and hence
kxk kA1 kkAkkx + xk.
Thus

(A)
kAk
kxk
kA1 kkAk = kA1 k
kAk = (A)
.
1
kx + xk
kAkkA k
kAk
* BACK TO TEXT

Homework 9.9 Let A Cnn be nonsingular, kAk/kAk < 1/(A), y 6= 0, Ax = y, and (A + A)(x + x) =
y. Show that (for induced norm)
(A) kAk
kxk
kAk

.
kxk
1 (A) kAk
kAk
Answer:
* BACK TO TEXT

Homework 9.10 Let A Cnn be nonsingular, kAk/kAk < 1, y 6= 0, Ax = y, and (A + A)(x + x) =


y + y. Show that (for induced norm)


kAk
kyk
kxk (A) kAk + kyk

.
kxk
1 (A) kAk
kAk

Answer:
(A + A)(x + x) = y + y
is equivalent to
Ax + Ax + Ax + Ax = y + y.
Since Ax = y we can simplify and rearrange this to yield
x = A1 (y Ax Ax).
397

Taking the norm on both side yields


kxk = kA1 kky Ax Axk kA1 k(kyk + kAkkxk + kAkkxk)
kyk kAkkxk
A
= (A)(
+
) + (A)
kxk
kAk
kAk
kAk
Bringing terms that involve x to the left and dividing by kxk yields
(1 (A)

A kxk
kyk
kAk
k)
(A)(
+
)
kAk kxk
kAkkxk
kAk
kyk kAk
(A)(
+
)
kyk
kAk

since kyk = kAxk kAkkxk. Rearranging this yields the desired result.
* BACK TO TEXT

398

Chapter 9. Notes on the Stability of an Algorithm (Answers)

Homework 10.3 Assume a floating point number system with = 2 and a mantissa with t digits so that
a typical positive number is written as .d0 d1 . . . dt1 2e , with di {0, 1}.
Write the number 1 as a floating point number.
Answer: . 10
0} 21 .
| {z
t
digits
What is the largest positive real number u (represented as a binary fraction) such that the floating point representation of 1 + u equals the floating point representation of 1? (Assume rounded
arithmetic.)
0} 1 21 = . 10
0} 1 21 which rounds to . 10
0} 21 = 1.
Answer: . |10{z
0} 21 + . |00{z
| {z
| {z
t
t
t
digits

digits

digits

t
digits

It is not hard to see that any larger number rounds to . 10


01} 21 > 1.
| {z
t
digits
Show that u = 12 21t .
Answer: . 0| {z
00} 1 21 = .1 21t = 12 21t .
t
digits
* BACK TO TEXT

Homework 10.10 Prove Lemma 10.9.


Answer: Let C = AB. Then the (i, j) entry in |C| is given by


k

k
k


|i, j | = i,p p, j |i,p p, j | = |i,p || p, j |
p=0
p=0
p=0
which equals the (i, j) entry of |A||B|. Thus |AB| |A||B|.
* BACK TO TEXT
399

Homework 10.12 Prove Theorem 10.11.


Answer:
Show that if |A| |B| then kAk1 kBk1 :
Let


A = a0 an1

and B =

b0 bn1

Then
m1

kAk1 =

max ka j k1 = max

0 j<n

|i, j |

0 j<n

i=0

m1

|i,k |

where k is the index that maximizes

i=0

m1

|i,k |

since |A| |B|

i=0

m1

max

0 j<n

|i, j |

= max kb j k1 = kBk1 .
0 j<n

i=0

Show that if |A| |B| then kAk kBk :


Note: kAk = kAT k1 and kBk = kBT k1 . Also, if |A| |B| then, clearly, |AT | |BT |. Hence kAk =
T
kA k1 kBT k1 = kBk .
Show that if |A| |B| then kAkF kBkF :
m1 n1

kAk2F =

m1 n1

2i, j

i=0 j=0

2i, j = kBk2F .

i=0 j=0

Hence kAkF kBkF .


* BACK TO TEXT

Homework 10.13 Repeat the above steps for the computation


:= ((0 0 + 1 1 ) + 2 2 ),
computing in the indicated order.
Answer:
* BACK TO TEXT

Homework 10.15 Complete the proof of Lemma 10.14.


Answer: We merely need to fill in the details for Case 1 in the proof:
400

1
Case 1: ni=0 (1 + i )1 = (n1
i=0 (1 + i ) )(1 + n ). By the I.H. there exists a n such that (1 + n ) =
1 and | | nu/(1 nu). Then
n1
n
i=0 (1 + i )
!
n1

(1 + i)1
i=0

(1 + n ) = (1 + n )(1 + n ) = 1 + n + n + n n ,
|
{z
}
n+1

which tells us how to pick n+1 . Now


nu
nu
+u+
u
1 nu
1 nu
nu + u(1 nu) + nu2 (n + 1) + u
(n + 1) + u
=
=

.
1 nu
1 nu
1 (n + 1)u

|n+1 | = |n + n + n n | |n | + |n | + |n ||n |

* BACK TO TEXT

Homework 10.18 Prove Lemma 10.17.


Answer:
n =

nu
(n + b)u
(n + b)u

= n+b .
1 nu
1 nu
1 (n + b)u

nu
bu
nu
bu
+
+
1 nu 1 bu (1 nu) (1 bu)
nu(1 bu) + (1 nu)bu + bnu2
=
(1 nu)(1 bu)
nu bnu2 + bu bnu2 + bnu2
(n + b)u bnu2
=
=
1 (n + b)u + bnu2
1 (n + b)u + bnu2
(n + b)u
(n + b)u

= n+b .

2
1 (n + b)u + bnu
1 (n + b)u

n + b + n b =

* BACK TO TEXT

Homework 10.19 Let k 0 and assume that |1 |, |2 | u, with 1 = 0 if k = 0. Show that

(k)
I +
0

(1 + 2 ) = (I + (k+1) ).
0
(1 + 1 )
Hint: reason the case where k = 0 separately from the case where k > 0.
Answer:
Case: k = 0. Then

(k)
I +
0

(1 + 2 ) = (1 + 0)(1 + 2 ) = (1 + 2 ) = (1 + 1 ) = (I + (1) ).
0
(1 + 1 )
401

Case: k = 1. Then

I + (k)
0
1 + 1
0
(1 + 2 ) =
(1 + 2 )

0
(1 + 1 )
0
(1 + 1 )

(1 + 1 )(1 + 2 )
0

=
0
(1 + 1 )(1 + 2 )

(1 + 2 )
0
= (I + (2) ).
=
0
(1 + 2 )
Case: k > 1. Notice that
(I + (k) )(1 + 2 ) = diag(()(1 + k ), (1 + k ), (1 + k1 ), , (1 + 2 ))(1 + 2 )
= diag(()(1 + k+1 ), (1 + k+1 ), (1 + k ), , (1 + 3 ))
Then

I + (k)

(1 + 1 )

(1 + 2 ) =

(I + (k) )(1 + 2 )

(1 + 1 )(1 + 2 )

(k)
(I + )(1 + 2 )
0
= (I + (k+1) ).
=
0
(1 + 2 )
0

* BACK TO TEXT
Homework 10.23 Prove R1-B.
Answer: From Theorem 10.20 we know that
= xT (I + (n) )y = (x + (n) x )T y.
x
Then







(n)
|x| = | x| =

|n ||0 |

| || |
n
1

|n ||2 |

..




n 0
|n 0 |


n 1 |n 1 |



= |n1 2 | =
n1 2




..
..


.
.



2 n1
|2 n1 |

|0 |

| |
1

= |n | |2 | n |x|.

.
.

|n ||n1 |
|n1 |
402

|n ||0 |

|n ||1 |

|n1 ||2 |

..

|2 ||n1 |

(Note: strictly speaking, one should probably treat the case n = 1 separately.)!.
* BACK TO TEXT

Homework 10.25 In the above theorem, could one instead prove the result
y = A(x + x),
where x is small?
Answer: The answer is no. The reason is that for each individual element of y
i = aTi (x + x)

which would appear to support that


1
.
..

m1

aT0 (x + x)


aT (x + x)
1
=
..

.

aTm1 (x + x)

i is different, meaning that we cannot factor out x + x to find that


However, the x for each entry
y = A(x + x).
* BACK TO TEXT

Homework 10.27 In the above theorem, could one instead prove the result
C = (A + A)(B + B),
where A and B are small?
Answer: The answer is no for reasons similar to why the answer is no for Exercise 10.25.
* BACK TO TEXT

403

Chapter 10. Notes on Performance (Answers)


No exercises to answer yet.

404

Chapter 11. Notes on Gaussian Elimination and LU Factorization (Answers)

Homework 12.6 Show that

Ik

1
l21

Ik

= 0

1
l21

Answer:

I
k

0
1
l21

I
k

0 0

I
0

0
1
l21

I
k

0 = 0

I
0

0
1
l21 + l21

I
k

0 = 0

I
0

0 0

1 0

0 I
* BACK TO TEXT

L
0 0

00

Homework 12.7 Let Lk = L0 L1 . . . Lk . Assume that Lk has the form Lk1 = l10 1 0 , where L 00

L20 0 I

L
I
0 0
0
0
00

b
..
(Recall:
L
=
is k k. Show that L k is given by L k = l10

1 0
0
1
0 .)
k

L20 l21 I
0 l21 I
Answer:
b1
L k = L k1 Lk = L k1 L
k


1
L00 0 0
I
0
0
L

00
T

T
= l10
1 0 . 0
1
0 = l10

L20 0 I
0 l21 I
L20

L
0 0
00

= l10 1 0 .

L20 l21 I

0 0

1 0 . 0 1 0

0 I
0 l21 I

* BACK TO TEXT
405

function [ A_out ] = LU_unb_var5( A )


[ ATL, ATR, ...
ABL, ABR ] = FLA_Part_2x2( A, ...
0, 0, FLA_TL );
while ( size( ATL, 1 ) < size( A, 1 ) )
[ A00, a01,
A02, ...
a10t, alpha11, a12t, ...
A20, a21,
A22 ] = FLA_Repart_2x2_to_3x3( ATL, ATR, ...
ABL, ABR, ...
1, 1, FLA_BR );
%------------------------------------------------------------%
a21 = a21/ alpha11;
A22 = A22 - a21 * a12t;
%------------------------------------------------------------%
[ ATL, ATR, ...
ABL, ABR ] = FLA_Cont_with_3x3_to_2x2( A00, a01,
A02, ...
a10t, alpha11, a12t, ...
A20, a21,
A22, ...
FLA_TL );
end
A_out = [ ATL, ATR
ABL, ABR ];
return

Figure 12.7: Answer to Exercise 12.12.

Homework 12.12 Implement LU factorization with partial pivoting with the FLAME@lab API, in Mscript.
Answer: See Figure 12.7.
* BACK TO TEXT

Homework 12.13 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts most
computation in terms of DOT products. Hint: Partition

T
11 u12
.
U
0 U22
Call this Variant 1 and use Figure 12.6 to state the algorithm.
406

Algorithm: Solve Uz = y, overwriting y (Variant 1)

y
UT L UT R
,y T
Partition U
UBL UBR
yB
where UBR is 0 0, yB has 0 rows

Algorithm: Solve Uz = y, overwriting y (Variant 2)

y
UT L UT R
,y T
Partition U
UBL UBR
yB
where UBR is 0 0, yB has 0 rows

while m(UBR ) < m(U) do

while m(UBR ) < m(U) do

Repartition

Repartition

U u U
00 01 02

uT 11 uT
,
12
10

UBL UBR
U u U
20 21 22

y
0
yT

yB
y2

UT L UT R

UT L UT R

1 := 1 uT12 x2

1 := 1 /11

1 := 1 /11

y0 := y0 1 u01

Continue with

Continue with

UT L UT R

U u U
00 01 02

uT 11 uT
,
12
10

UBL UBR
U u U
20 21 22

y
0
yT

yB
y2

U00 u01 U02

uT10 11 uT12 ,

UBL UBR
U u U
20 21 22

y
0
yT


1

yB
y2

UT L UT R

U00 u01 U02

uT10 11 uT12 ,

UBL UBR
U u U
20 21 22

y
0
yT


1

yB
y2

endwhile

endwhile

Figure 6 (Answer): Algorithms for the solution of upper triangular system Ux = y that overwrite y with x.

407

Answer:
Partition

uT12

11

U22

x2

1
y2

Multiplying this out yields

11 1 + uT12 x2

U22 x2

y2

So, if we assume that x2 has already been computed and has overwritten y2 , then 1 can be computed as
1 = (1 uT12 x2 )/11
which can then overwrite 1 . The resulting algorithm is given in Figure 12.6 (Answer) (left).
* BACK TO TEXT
Homework 12.14 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts most
computation in terms of AXPY operations. Call this Variant 2 and use Figure 12.6 to state the algorithm.
Answer: Partition

U00 u01

x0

11

y0

Multiplying this out yields

U00 x0 + u01 1

11 1

y0

So, 1 = 1 /11 after which x0 can be computed by solving U00 x0 = y0 1 u01 . The resulting algorithm
is given in Figure 12.6 (Answer) (right).
* BACK TO TEXT
Homework 12.15 If A is an n n matrix, show that the cost of Variant 1 is approximately 23 n3 flops.
Answer: During the kth iteration, L00 is k k, for k = 0, . . . , n 1. Then the (approximate) cost of the
steps are given by
Solve L00 u01 = a01 for u01 , overwriting a01 with the result. Cost: k2 flops.
T U = aT (or, equivalently, U T (l T )T = (aT )T for l T ), overwriting aT with the result.
Solve l10
00
10
00 10
10
10
10
Cost: k2 flops.
T u , overwriting with the result. Cost: 2k flops.
Compute 11 = 11 l10
01
11

Thus, the total cost is given by


n1

k=0

n1

1
2
k + k + 2k 2 k2 2 n3 = n3 .
3
3
k=0
2

* BACK TO TEXT
408

Homework 12.16 Derive the up-looking variant for computing the LU factorization.
Answer: Consider the loop invariant:

A
AT R
L\UT L UT R
TL
=

b
b
ABL ABR
ABL
ABR
bT L
bT R
LT LUT L = A
LT LUT R = A
(
(

(
(
(
((((
(
(
(
(
b
b
(
(
(UT L = ABL LBL
LBL
U R + LBRUBR = ABR
(
(((T(
At the top of the loop, after repartitioning, A contains
L\U00

u01 U02

abT10
b20
A

b 11 abT12

b22
ab21 A

L\U00

b02
u01 A

T
l10
b20
A

11 uT12
b22
ab21 A

while at the bottom it must contain

b we notice that
where the entries in blue are to be computed. Now, considering LU = A
b00
L00U00 = A

L00 u01 = ab01

b02
L00U02 = A

TU =a
bT10
l10
00

T u + =
b 11
l10
01
11

T U + uT = a
bT12
l10
02
12

b20 L20 u01 + 11 l21 = ab21


L20U00 = A

b22
L20U02 + l21 uT12 + L22U22 = A

The equalities in yellow can be used to compute the desired parts of L and U:
T U = aT for l T , overwriting aT with the result.
Solve l10
00
10
10
10
T u , overwriting with the result.
Compute 11 = 11 l10
01
11
T U , overwriting aT with the result.
Compute uT12 := aT12 l10
02
12

* BACK TO TEXT

Homework 12.17 Implement all five LU factorization algorithms with the FLAME@lab API, in Mscript.
* BACK TO TEXT

409

Homework 12.18 Which of the five variants can be modified to incorporate partial pivoting?
Answer: Variants 2, 4, and 5.
* BACK TO TEXT

Homework 12.23 Apply LU with partial pivoting to

0 0

0 1

1
1 0
0

1 1 1
0

A= .
.
.
..
.
.. .. . .
..
.

1 1

1 1
1

.
..
.

Pivot only when necessary.


Answer: Notice that no pivoting is necessary: Eliminating the entries below the diagonal in the first
column yields:

1
0 0
0 1

0
1 0
0 2

0 1 1
0
2

.
.
.. .. . .
.. ..

..
.
.
.
.
.

0 1

1 2

0 1
1 2
Eliminating the entries below the diagonal in the second column yields:

1 0 0

0 1 0
0 0 1
.. .. .. . .
.
. . .
0 0

0 0

410

0 1

0 2

0 4

.
.. ..
. .

1 4

1 4

Eliminating the entries below the diagonal in the (n 1)st column yields:

1 0 0 0
1

0 1 0 0
2

0 0 1 0

.
. . . .
..
.. .. .. . . ...

n2

0 0
1 2

n1
0 0
0 2
* BACK TO TEXT

Homework 12.24 Let L be a unit lower triangular matrix partitioned as

L00 0
.
L=
T
l10
1
Show that

L1 =

1
L00
T L1
l10
00

Answer:
Answer 1:

L00 0
T
l10

1
L00
T L1
l10
00

0
1

1
L00 L00
T L1 l T L1
l10
10 00
00

0
1

I 0
0 1

Answer 2: We know that the inverse of a unit lower triangular matrix is unit lower triangular. Lets call
the inverse matrix X. Then LX = I and hence

L
0
X
0
I 0
00
00
=
.
T
T
l10
1
x10
1
0 1
Now, this means that
1
L00 X00 = I and hence X00 = L00
.
T X + xT = 0 and hence xT = l T L1 .
l10
00
10
10
10 00

* BACK TO TEXT

411

Algorithm: [L] := LI NV UNB VAR 1(L)

LT L LT R

Partition L
LBL LBR
whereLT L is 0 0
while m(LT L ) < m(L) do
Repartition

l
L02
L
00 01

l T 11 l T
12
10
LBL LBR
L20 l21 L22
where11 is 1 1

LT L

LT R

T := l T L
l10
10 00

Continue with

LT L

LT R

LBL

LBR

L
l
00 01

l T 11
10
L20 l21

L02
T
l12

L22

endwhile
Figure 12.8: Algorithm for inverting a unit lower triangular matrix. The algorithm assumes that in previous
1
iterations L00 has been overwritten with L00
.
Homework 12.25 Use Homework 12.24 and Figure 12.12 to propose an algorithm for overwriting a unit
lower triangular matrix L with its inverse.
Answer: See Figure 12.8
* BACK TO TEXT

Homework 12.26 Show that the cost of the algorithm in Homework 12.25 is, approximately, 13 n3 , where
L is n n.
Answer: Assume that L00 is k k. Then the cost of the current iteration is, approximately, k2 flops (the
cost multiplying a lower triangular matrix times a vector). Thus, the total cost is, approximately
n1

k
k=0

Z n
0

1
x2 dx = n3 .
3

412

* BACK TO TEXT

Homework 12.27 Given n n upper triangular matrix U and n n unit lower triangular matrix L, propose
an algorithm for computing Y where UY = L. Be sure to take advantage of zeros in U and L. The cost of
the resulting algorithm should be, approximately, n3 flops.
There is a way of, given that L and U have overwritten matrix A, overwriting this matrix with Y . Try
to find that algorithm. It is not easy... Answer:
Variant 1

Partition

11 uT12
0

U22

,L

11

l21

L22

, and Y

11 yT12
y21 Y22

Then, since UY = L,

11

and hence

uT12
U22

yT12

11

y21 Y22

. =

11 11 + uT12 y21

11 yT12 + uT12Y22

U22 y21

U22Y22

11

l21

L22

l21 L22

This suggests the following steps. We also give the approximate cost of the step when U22 is k k.
Assume that Y22 has already been computed.
Solve U22 y21 = l21 .
Cost: k2 flops.
yT12 := uT12Y22 /22 .
Cost: 2k2 flops.
11 := (1 uT12 y21 )/22 .
Cost: negligible.
The total cost is then, approximately,
n1

3k
k=0

Z n

k 2 = n3 .

The algorithm is given in Figure 12.9.


* BACK TO TEXT

413

Algorithm: [Y ] := S OLVE UY EQ L UNB VAR 1(U, L,Y )

LT L LT R
YT L YT R
UT L UT R
,L
,Y

Partition U
UBL UBR
LBL LBR
YBL YBR
whereUBR , LBR , and YBR are 0 0
while m(UBR ) < m(U) do
Repartition

u
U02
U

00 01
L

uT 11 uT
TL
,

12
10
UBL UBR
LBL
U20 u21 U22

y
Y02
Y

00 01
YT L YT R

yT 11 yT
12

10
YBL YBR
Y20 y21 Y22
where11 is 1 1, 11 is 1 1, 11 is 1 1

UT L UT R

LT R
LBR

l
L02
L
00 01

l T 11 l T
12
10
L20 l21 L22

Solve U22 y21 = l21


yT12 := uT12Y22 /22
11 := (1 uT12 y21 )/22

Continue with

U
00

uT
10
UBR
U
20

Y
00
YT R

yT
10
YBR
Y20

UT L UT R
UBL

YT L
YBL

u01 U02
11
u21
y01
11
y21

LT L

T
u12 ,

LBL
U22

Y02

yT12

Y22

LT R
LBR

L
00

lT
10
L20

endwhile
Figure 12.9: Algorithm for solving UY = L.

Chapter 12. Notes on Cholesky Factorization (Answers)

414

l01

L02

11

T
l12

l21

L22

Homework 13.3 Let B Cmn have linearly independent columns. Prove that A = BH B is HPD.
Answer: Let x Cm be a nonzero vector. Then xH BH Bx = (Bx)H (Bx). Since B has linearly independent
columns we know that Bx 6= 0. Hence (Bx)H Bx > 0.
* BACK TO TEXT

Homework 13.4 Let A Cmm be HPD. Show that its diagonal elements are real and positive.
Answer: Let e j be the jth unit basis vectors. Then 0 < eHj Ae j = j, j .
* BACK TO TEXT

Homework 13.14 Implement the Cholesky factorization with M-script.


* BACK TO TEXT

Homework 13.15 Consider B Cmn with linearly independent columns. Recall that B has a QR factorization, B = QR where Q has orthonormal columns and R is an upper triangular matrix with positive
diagonal elements. How are the Cholesky factorization of BH B and the QR factorization of B related?
Answer:
R .
BH B = (QR)H QR = RH QH Q R = |{z}
RH |{z}
| {z }
LH
L
I
* BACK TO TEXT

Homework 13.16 Let A be SPD and partition

A00 a10

aT10

11

(Hint: For this exercise, use techniques similar to those in Section 13.5.)
1. Show that A00 is SPD.
Answer: Assume that A is n n so that A00 is (n 1) (n 1). Let x0 Rn1 be a nonzero vector.
Then

x0T A00 x0 =

since

x0
0

x0
0

A00 a10

aT10

T
is a nonzero vector and A is SPD.
415

x0

11

>0

T , where L is lower triangular and nonsingular, argue that the assign2. Assuming that A00 = L00 L00
00
T := aT LT is well-defined.
ment l10
10 00
1
T := aT LT uniquely
Answer: The compution is well-defined because L00
exists and hence l10
10 00
T .
computes l10
T where L is lower triangular and nonsingular, and l T =
3. Assuming that A00 is SPD, A00 = L00 L00
00
10
q
T
T
T
T
a10 L00 , show that 11 l10 l10 > 0 so that 11 := 11 l10 l10 is well-defined.
T l > 0. To do so, we are going to construct a nonzero
Answer: We want to show that 11 l10
10
T l , at which point we can invoke the fact that A is SPD.
vector x so that xT Ax = 11 l10
10

Rather than just giving the answer, we go through the steps that will give insight into the thought
process leading up to the answer as well. Consider

x0

A00 a10
aT10

x0

11

x0

A00 x0 + 1 a10

1
aT10 x0 + 11 1
x0T A00 x0 + 1 a10 + 1 aT10 x0 + 11 1
x0T A00 x0 + 1 x0T a10 + 1 aT10 x0 + 11 21 .

1
=
=

T l , perhaps we should pick = 1. Then


Since we are trying to get to 11 l10
10
1

x0

A00 a10

aT10

11

x0
1

= x0T A00 x0 + x0T a10 + aT10 x0 + 11 .

The question now becomes how to pick x0 so that


T
x0T A00 x0 + x0T a10 + aT10 x0 = l10
l10 .
1
T = aT LT so that L l = a . Then
Lets try x0 = L00
l10 and recall that l10
00 10
10
10 00
T
x0T A00 x0 + x0T a10 + aT10 x0 = x0T L00 L00
x0 + x0T a10 + aT10 x0
1
1
1
1
T
= (L00
l10 )T L00 L00
(L00
l10 ) + (L00
l10 )T (L00 l10 ) + (L00 l10 )T (L00
l10 )
T T
T 1
T T
T T 1
= l10
L00 L00 L00
L00 l10 l10
L00 L00 l10 ) l10
L00 L00 l10
T
= l10
l10 .

We now put all these insights together:

0 <

1
L00
l10

A00 a10
aT10

11

1
L00
l10

1
1
1
1
T
= (L00
l10 )T L00 L00
(L00
l10 ) + (L00
l10 )T (L00 l10 ) + (L00 l10 )T (L00
l10 ) + 11
T
= 11 l10
l10 .

416

Algorithm: A := C HOL UNB(A) Bordered algorithm)

AT L
?

Partition A
ABL ABR
whereAT L is 0 0
while m(AT L ) < m(A) do
Repartition

AT L

A00

aT
10
ABL ABR
A20
where11 is 1 1

?
11
a21

A22

T has already been computed and


(A00 = L00 L00

L00 has overwritten the lower triangular part of A00 )


aT10

T
:= aT10 L00

(Solve L00 l10 = a10 , overwriting aT10 with l10 .)

11 := 11 aT10 a10

11 := 11

Continue with

AT L
ABL

ABR

A00

aT
10
A20

?
11
a21

A22

endwhile

Figure 13.10: Answer for Homework 13.17.


4. Show that

A00 a10
aT10

11

L00
T
l10

11

L00
T
l10

11

Answer: This is just a matter of multiplying it all out.


* BACK TO TEXT

Homework 13.17 Use the results in the last exercise to give an alternative proof by induction of the
Cholesky Factorization Theorem and to give an algorithm for computing it by filling in Figure 13.3. This
algorithm is often referred to as the bordered Cholesky factorization algorithm.
Answer:
Proof by induction.
417

Base case: n = 1. Clearly the result is true for a 1 1 matrix A = 11 : In this case, the fact that A is

SPD means that 11 is real and positive and a Cholesky factor is then given by 11 = 11 , with
uniqueness if we insist that 11 is positive.
Inductive step: Inductive Hypothesis (I.H.): Assume the result is true for SPD matrix A C(n1)(n1) .
We will show that it holds for A Cnn . Let A Cnn be SPD. Partition A and L like

L00 0
A00 ?
and L =
.
A=
T
T
a10 11
l10 11
T is the Cholesky factorization of A . By the I.H., we know this exists
Assume that A00 = L00 L00
00
since A00 is (n 1) (n 1) and that L00 is nonsingular. By the previous homework, we then know
that
T = aT LT is well-defined,
l10
01 00
q
T l is well-defined, and
11 = 11 l10
10

A = LLT .
Hence L is the desired Cholesky factor of A.
By the principle of mathematical induction, the theorem holds.
The algorithm is given in 13.10.
* BACK TO TEXT

Homework 13.18 Show that the cost of the bordered algorithm is, approximately, 13 n3 flops.
Answer: During the kth iteration, k = 0, . . . , n 1 assume that A00 is k k. Then
T
aT10 = aT10 L00
(implemented as the lower triangular solve L00 l10 = a10 ) requires, approximately, k2
flops.

The cost of update 11 can be ignored.


Thus, the total cost equals (approximately)
n1

k
k=0

Z n
0

1
x2 dx = n3 .
3
* BACK TO TEXT

418

Chapter 13. Notes on Eigenvalues and Eigenvectors (Answers)


Homework 14.3 Eigenvectors are not unique.
Answer: If Ax = x for x 6= 0, then A(x) = Ax = x = (x). Hence any (nonzero) scalar multiple
of x is also an eigenvector. This demonstrates we care about the direction of an eigenvector rather than its
length.
* BACK TO TEXT

Homework 14.4 Let be an eigenvalue of A and let E (A) = {x Cm |Ax = x} denote the set of all
eigenvectors of A associated with (including the zero vector, which is not really considered an eigenvector). Show that this set is a (nontrivial) subspace of Cm .
Answer:
0 E (A): (since we explicitly include it).
x E (A) implies x E (A): (by the last exercise).
x, y E (A) implies x + y E (A):
A(x + y) = Ax + Ay = A + y = (x + y).
* BACK TO TEXT

Homework 14.8 The eigenvalues of a diagonal matrix equal the values on its diagonal. The eigenvalues
of a triangular matrix equal the values on its diagonal.
Answer: Since a diagonal matrix is a special case of a triangular matrix, it suffices to prove that the
eigenvalues of a triangular matrix are the values on its diagonal.
If A is triangular, so is A I. By definition, is an eigenvalue of A if and only if A I is singular.
But a triangular matrix is singular if and only if it has a zero on its diagonal. The triangular matrix A I
has a zero on its diagonal if and only if equals one of the diagonal elements of A.
* BACK TO TEXT

Homework 14.19 Prove Lemma 14.18. Then generalize it to a result for block upper triangular matrices:

A0,0 A0,1 A0,N1

0 A1,1 A1,N1

A=
.
.
.

0
..
..
0

0
0 AN1,N1
Answer:
419


Lemma 14.18 Let A Cmm be of form A =

AT L

AT R

ABR

unitary of appropriate size. Then

H
QT L
0
QT L AT L QH
TL

A=
0
QBR
0

. Assume that QT L and QBR are

QT L AT R QH
BR

QBR ABR QH
BR

QT L

QBR

Proof:

QT L
0

0
QBR

QT L AT L QH
TL

QT L AT R QH
BR

QBR ABR QH
BR

H
QH
T L QT L AT L QT L QT L

H
QH
T L QT L AT R QBR QBR

H
QH
BR QBR ABR QBR QBR

AT L

AT R

ABR

QT L
0

QBR

= A.

By simple extension A = QH BQ where

0
Q0,0

0 Q1,1

Q=
0
0

0
0

..
.

0
..
.

QN1,N1

and

B=

H
Q0,0 A0,0 QH
0,0 Q0,0 A0,1 Q1,1

Q0,0 A0,N1 QH
N1,N1

Q1,1 A1,1 QH
1,1
..
.
0

Q1,1 A1,N1 QH
N1,N1
..
.

0
0
0

QN1,N1 AN1,N1 QH
N1,N11

* BACK TO TEXT

Homework 14.21 Prove Corollary 14.20. Then generalize it to a result for block upper triangular matrices.
Answer:

Colollary 14.20 Let A Cmm be of for A =


(ABR ).
420

AT L

AT R

ABR

. Then (A) = (AT L )

H
Proof: This follows immediately from Lemma 14.18. Let AT L = QT L TT Li QH
T L and ABR = QBR TBRi QBR
be the Shur decompositions of the diagonal blocks. Then (AT L ) equals the diagonal elements of TT L
and (ABR ) equals the diagonal elements of TBR . Now, by Lemma 14.18 the Schur decomposition of A is
given by

H
H
Q A Q
QT L
Q
0
QT L AT R QBR
0
TL TL TL
TL

A =
0
QBR
0
QBR ABR QH
0
Q
BR
BR

QT L
TT L QT L AT R QH
0
0
BR QT L

.
=
0
QBR
0
TBR
0
QBR
|
{z
}
TA

Hence the diagonal elements of TA (which is upper triangular) equal the elements of (A). The set of
those elements is clearly (AT L ) (ABR ).
The generalization is that if

A0,0 A0,1 A0,N1

0 A1,1 A1,N1

A=

.
.

0
..
..
0

0
0 AN1,N1
where the blocks on the diagonal are square, then
(A) = (A0,0 ) (A1,1 ) (AN1,N1 ).
* BACK TO TEXT
Homework 14.24 Let A be Hermitian and and be distinct eigenvalues with eigenvectors x and x ,
respectively. Then xH x = 0. (In other words, the eigenvectors of a Hermitian matrix corresponding to
distinct eigenvalues are orthogonal.)
Answer: Assume that Ax = x and Ax = x , for nonzero vectors x and x and 6= . Then
xH Ax = xH x
and
xH Ax = xH x .
Because A is Hermitian and is real
xH x = xH Ax = (xH Ax )H = (xH x )H = xH x .
Hence
xH x = xH x .
If xH x 6= 0 then = , which is a contradiction.

* BACK TO TEXT
421

Homework 14.25 Let A Cmm be a Hermitian matrix, A = QQH its Spectral Decomposition, and
A = UV H its SVD. Relate Q, U, V , , and .
Answer: I am going to answer this by showing how to take the Spectral decomposition of A, and turn
this into the SVD of A. Observations:
We will assume all eigenvalues are nonzero. It should be pretty obvious how to fix the below if some
of them are zero.
Q is unitary and is diagonal. Thus, A = QQH is the SVD except that the diagonal elements of
are not necessarily nonnegative and they are not ordered from largest to smallest in magnitude.
We can fix the fact that they are not nonnegative with the following observation:

0 0

A = QQH = Q

0 1
.. .. . .
.
. .

0
..
.

H
Q

0 0

= Q

0 1
.. .. . .
.
. .
0

0 n1

0
sign(0 )
0

0
0
sign(1 )

..
.
..

..
.
.

n1
0
0
|

0 0

sign(0 )

= Q

0 1
.. .. . .
.
. .

0
..
.

0
..
.

0
|

0 n1

sign(0 )

..
.

0
..
.

0
..
.

sign(n1 )

sign(1 )
..
..
.
.

0
..
.

0
{z
I

sign(n1 )

|0 |

0
..
.

|1 |
.. . .
.
.

0
..
.

{z
0

0
|

sign(1 )
..
..
.
.

0
..
.

H
Q

sign(n1 )
}

sign(0 )
0
..
.
0

} |

sign(1 )
..
..
.
.

0
..
.

H
Q

sign(n1 )
{z
Q H

|n1 |
{z
}

Q H .
= Q

has nonnegative diagonal entries and Q is unitary since it is the product of two unitary
Now
matrices.
are not ordered from largest to smallest. We do so by
Next, we fix the fact that the entries of
H equals the diagonal matrix with the
noting that there is a permutation matrix P such that = PP
values ordered from the largest to smallest. Then
Q H
A = Q
422

PH P Q H
= Q |{z}
PH P
|{z}
I
I
=

H PQ H
QPH P
P
|{z} | {z } |{z}

U
VH

= UV H .
* BACK TO TEXT

Homework 14.26 Let A Cmm and A = UV H its SVD. Relate the Spectral decompositions of AH A
and AAH to U, V , and .
Answer:
AH A = (UV H )H UV H = V U H UV H = V 2V H .
Thus, the eigenvalues of AH A equal the square of the singular values of A.
AAH = UV H (UV H )H = UV H V U H = U2U H .
Thus, the eigenvalues of AAH equal the square of the singular values of A.
* BACK TO TEXT

423

Chapter 14. Notes on the Power Method and Related Methods (Answers)
Homework 15.4 Prove Lemma 15.3.
Answer: We need to show that
Let y 6= 0. Show that kykX > 0: Let z = Xy. Then z 6= 0 since X is nonsingular. Hence
kykX = kXyk = kzk > 0.
Show that if C and y Cm then kykX = ||kykX :
kykX = kX(y)k = kXyk = ||kXyk = ||kykX .
Show that if x, y Cm then kx + ykX kxkX + kykX :
kx + ykX = kX(x + y)k = kXx + Xyk kXxk + kXyk = kxkX + kykX .
* BACK TO TEXT

Homework 15.7 Assume that


|0 | |1 | |m2 | > |m1 | > 0.
Show that







1 1 1
>

1 .

0
m1 m2 m3

Answer: This follows immediately from the fact that if > 0 and > 0 then
> implies that 1/ > 1/ and
implies that 1/ 1/.
* BACK TO TEXT

Homework 15.9 Prove Lemma 15.8.


Answer: If Ax = x then (A I)x = Ax x = x x = ( )x.
* BACK TO TEXT

Homework 15.11 Prove Lemma 15.10.


Answer:
A I = XX 1 XX 1 = X( I)X 1 .
* BACK TO TEXT
424

Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers (Answers)
b(k) = A(k) , k = 0, . . ..
Homework 16.2 Prove that in Figure 16.3, Vb (k) = V (k) , and A
Answer: This requires a proof by induction.
b(0) = A(0) . This is trivially true.
Base case: k = 0. We need to show that Vb (0) = V (0) and A
b(k) = A(k) .
Inductive step: Inductive Hypothesis (IH): Vb (k) = V (k) and A
b(k+1) = A(k+1) . Notice that
We need to show that Vb (k+1) = V (k+1) and A
AVb (k) = Vb (k+1) Rb(k+1) (QR factorization)
so that
b(k) = Vb (k) T AVb ( k) = Vb (k) T Vb (k+1) Rb(k+1)
A
b(k) .
b(k) = Vb (k) T Vb (k+1) Rb(k+1) is a QR factorization of A
Since Vb (k) T Vb (k+1) this means that A
Now, by the I.H.,
b(k)Vb (k) T Vb (k+1) Rb(k+1)
A(k) = A
and in the algorithm on the right
A(k) = Q(k+1) R(k+1) .
By the uniqueness of the QR factorization, this means that
Q(k+1) = Vb (k) T Vb (k+1) .
But then
b (k)Vb (k) T Vb (k+1) = Vb (k+1) .
V (k+1) = V (k) Q(k+1) = Vb (k) Q(k+1) = V
}
| {z
I
Finally,
b(k+1) = Vb (k+1) T AVb (k+1) = V (k+1) T AV (k+1) = (V (k) Q(k+1) )T A(V (k) Q(k+1) )
A
= Q(k+1) T V (k) AV (k) Q(k+1) = Q(k+1) T A(k) Q(k+1) = R(k+1) Q(k+1) = A(k+1)
|
{z
}
(k+1)
R
By the Principle of Mathematical Induction the result holds for all k.
* BACK TO TEXT

b(k) = A(k) , k = 0, . . ..
Homework 16.3 Prove that in Figure 16.4, Vb (k) = V (k) , and A
Answer: This requires a proof by induction.
425

b(0) = A(0) . This is trivially true.


Base case: k = 0. We need to show that Vb (0) = V (0) and A
b(k) = A(k) .
Inductive step: Inductive Hypothesis (IH): Vb (k) = V (k) and A
b(k+1) = A(k+1) .
We need to show that Vb (k+1) = V (k+1) and A
The proof of the inductive step is a minor modification of the last proof except that we need to show
that b
k = k :
(k) T

(k)

b
k = vbn1 Ab
vn1 (V ( k)en1 )T A(V ( k)en1 ) = en1V (k) T AV ( k)en1
=

(k)

b(k) en1 = en1 A(k) en1 =


e A
n1,n1 = k
{z
}
|n1
by I.H.

By the Principle of Mathematical Induction the result holds for all k.


* BACK TO TEXT

Homework 16.5 Prove the above theorem.


Answer: I believe we already proved this in Notes on Eigenvalues and Eigenvectors.
* BACK TO TEXT

Homework 16.9 Give all the details of the above proof.


Answer: Assume that q1 , . . . , qk and the column indexed with k 1 of B have been shown to be uniquely
determined under the stated assumptions. We now show that then qk+1 and the column indexed by k of B
are uniquely determined. (This is the inductive step in the proof.) Then
Aqk = 0,k q0 + 1,k q1 + + k,k qk + k+1,k qk+1 .
We can determine 0,k throught k,k by observing that
qTj Aqk = j,k
for j = 0, . . . , k. Then
k+1,k qk+1 = Aqk (0,k q0 + 1,k q1 + + k,k qk ) = qk+1 .
Since it is assumed that k+1,k > 0, it can be determined as
k+1,k = kqk+1 k2
and then
qk+1 = qk+1 /k+1,k .
This way, the columns of Q and B can be determined, one-by-one.
* BACK TO TEXT
426

b 233 .
b 211 +
Homework 16.11 In the above discussion, show that 211 + 2231 + 233 =
b31 k2 = kA31 k2 because multiplying on the left and/or the right by a
b31 = J31 A31 J T then kA
Answer: If A
F
F
31
unitary matrix preserves the Frobenius norm of a matrix. Hence

2
2






11 31
b 11 0

.
=




b 33
0

31 33
F

But

2


11 31

= 211 + 2231 + 233


31 33
F

and

2




b 11 0

=
b 211 + +
b 233 ,


b 33
0

F

which proves the result.


* BACK TO TEXT

Homework 16.14 Give the approximate total cost for reducing a nonsymmetric A Rnn to upperHessenberg form.
Answer: To prove this, assume that in the current iteration of the algorithm A00 is k k. The update in
the current iteration is then
[u21 , 1 , a21 ] := HouseV(a21 )

lower order cost, ignore

y01 := A02 u21

2k(n k 1) flops since A02 is k (n k 1).

A02 := A02 1 y01 uT21

2k(n k 1) flops since A02 is k (n k 1).

11 := aT12 u21

lower order cost, ignore

aT12 := aT12 + 11 uT21

lower order cost, ignore

y21 := A22 u21

2(n k 1)2 flops since A22 is (n k 1) (n k 1).

:= uT21 y21 /2

lower order cost, ignore

z21 := (y21 u21 /1 )/1

lower order cost, ignore

w21 := (AT22 u21 u21 /)/

2(n k 1)2 flops since A22 is (n k 1) (n k 1) and


the significant cost is in the matrix-vector multiply.

A22 := A22 (u21 wT21 + z21 uT21 ) 2 2k(n k 1) flops since A22 is (n k 1) (n k 1),
which is updated by two rank-1 updates.
427

Thus, the total cost in flops is, approximately,


n1 

2k(n k 1) + 2k(n k 1) + 2(n k 1)2 + 2(n k 1)2 + 2 2(n k 1)2

k=0

n1 

 n1
4k(n k 1) + 4(n k 1)2 + 4(n k 1)2

k=0
n1

k=0

k=0

[4k(n k 1) + 4(n k 1)2 ] + 4(n k 1)2


{z
}
|
= 4(k + n k 1)(n k 1)
= 4(n 1)(n k 1)
4n(n k 1)

n1

n1

n1

n1

4n (n k 1) + 4 (n k 1) = 4n j + 4 j2
k=0

4n

n(n 1)
+4
2

k=0

Z n
j=0

j=0

j=0

4
10
j2 d j 2n3 + n3 = n3 .
3
3

Thus, the cost is (approximately)


10 3
n flops.
3
* BACK TO TEXT

428

Algorithm: A := LDLT(A)

AT L ?

Partition A
ABL ABR
whereAT L is 0 0
while m(AT L ) < m(A) do
Repartition

AT L

A00

aT
10 11 ?
ABL ABR
A20 a21 A22
where11 is 1 1

Option 1:

Option 2:

l21 := a21 /11

a21 := a21 /11

A22 := A22 l21 aT21

A22 := A22 11 a21 aT21

(updating lower triangle)

Option 3:

(updating lower triangle)

A22 := A22 111 a21 aT21


(updating lower triangle)
a21 := a21 /11

a21 := l21

Continue with

AT L
ABL

A00 ?

aT 11 ?
10

ABR
A20 a21 A22

endwhile

Figure 17.11: Unblocked algorithm for computing A LDLT , overwriting A.

Chapter 17. Notes on the Method of Relatively Robust Representations (Answers)


Homework 17.1 Modify the algorithm in Figure 17.1 so that it computes the LDLT factorization. (Fill in
Figure 17.3.)
Answer:
Three possible answers are given in Figure 17.11.
* BACK TO TEXT

Homework 17.2 Modify the algorithm in Figure 17.2 so that it computes the LDLT factorization of a
tridiagonal matrix. (Fill in Figure 17.4.) What is the approximate cost, in floating point operations, of
429

Algorithm: A := LDLT TRI(A)

AFF
?
?

Partition A MF eTL MM
?

0
LM eF ALL
whereALL is 0 0
while m(AFF ) < m(A) do
Repartition

A00

A
?
?

eT
FF
?
?
10
11

MF eTL MM
?
0

21 22 ?

0
LM eF ALL
0
0 32 eF A33
where22 is a scalars
Option 1:

Option 2:

21 := 21 /11

21 := 21 /11

22 := 22 21 21

22 := 22 11 221

(updating lower triangle)

Option 3:

22 := 22 111 221

(updating lower triangle)

(updating lower triangle)


21 := 21 /11

21 := 21

Continue with

A00

?
?
A
FF
10 eT 11
?
?
L

MF eTL MM
?

0 21 22
?

0
LM eF ALL
0
0 32 eF A33
endwhile

Figure 17.12: Algorithm for computing the the LDLT factorization of a tridiagonal matrix.

computing the LDLT factorization of a tridiagonal matrix? Count a divide, multiply, and add/subtract as a
floating point operation each. Show how you came up with the algorithm, similar to how we derived the
algorithm for the tridiagonal Cholesky factorization.
Answer:
Three possible algorithms are given in Figure 17.12.

The key insight is to recognize that, relative to the algorithm in Figure 17.11, a21 =
430

21
0

so that,

for example, a21 := a21 /11 becomes

/
21 := 21 /11 = 21 11 .
0
0
0
Then, an update like A22 := A22 11 a21 aT21 becomes

22

32 eF A33

22

:=

32 eF A33

11

21
0

21

22 11 221

32 eF

A33

* BACK TO TEXT

Homework 17.3 Derive an algorithm that, given an indefinite matrix A, computes A = UDU T . Overwrite
only the upper triangular part of A. (Fill in Figure 17.5.) Show how you came up with the algorithm,
similar to how we derived the algorithm for LDLT .
Answer: Three possible algorithms are given in Figure 17.13.
Partition

A00 a01
U00 u01
D00 0
U
, and D

A
T
a01 11
0
1
0 1
Then

A00 a01
aT01 11

U00 u01
0

D00

U00 u01

0 1
0

U00 D00 u01 1


UT 0
00

T
0
1
u01 1

T
T
U00 D00U00 + 1 u01 u01 u01 1

?
1

This suggests the following algorithm for overwriting the strictly upper triangular part of A with the strictly
upper triangular part of U and the diagonal of A with D:

A00 a01
.
Partition A
T
a01 11
11 := 11 = 11 (no-op).
431

Algorithm: A := UDU T(A)

AT L AT R

Partition A
? ABR
whereABR is 0 0
while m(ABR ) < m(A) do
Repartition

AT L AT R

A00 a01 A02

? 11 aT
12

? ABR
? ? A22
where11 is 1 1

Option 1:

Option 2:

u01 := a01 /11

u01 := a01 /11

A00 := A00 u01 aT01

A00 := A00 11 a01 aT01

(updating upper triangle)

Option 3:

(updating upper triangle)

A00 := A00 111 a01 aT01


(updating upper triangle)
a01 := a01 /11

a01 := u01

Continue with

AT L AT R
?

A00 a01 A02

ABR
?

11 aT12

? A22

endwhile

Figure 17.13: Algorithm for computing the the UDU T factorization of a tridiagonal matrix.

432

Compute u01 := u01 /11 .


Update A22 := A22 u01 aT01 (updating only the upper triangular part).
a01 := u01 .
T .
Continue with computing A00 U00 D00 L00

This algorithm will complete as long as 11 6= 0,


* BACK TO TEXT

Homework 17.4 Derive an algorithm that, given an indefinite tridiagonal matrix A, computes A = UDU T .
Overwrite only the upper triangular part of A. (Fill in Figure 17.6.) Show how you came up with the
algorithm, similar to how we derived the algorithm for LDLT .
Answer: Three possible algorithms are given in Figure 17.14.
key insight is to recognize that, relative to the algorithm in Figure 17.13, 11 = 22 and a10 =
The
0

12

so that, for example, a10 := a10 /11 becomes

0
12

:=

0
12

/22 =

0
12 /22

Then, an update like A00 := A00 11 a01 aT01 becomes

01 eL
01 eL
0
A
A
0
:= 00
22

00
?
11
?
11
12
12

A00
01 eL
.
=
? 11 212
* BACK TO TEXT

Homework 17.5 Show that, provided 1 is chosen appropriately,

L00 0
0
D00 0
0
L
0
0

00

10 eTL 1 21 eTF 0 1 0 10 eTL 1 21 eTF

0
0 U22
0
0 E22
0
0 U22

A00
01 eL
0

T
T
= 01 eL
11
21 eF .

0
21 eF
A22
433

Algorithm: A := UDU T TRI(A)

A
eT
0
FF FM L

Partition A ?
MM ML eTF

?
?
ALL
whereAFF is 0 0

while m(ALL ) < m(A) do


Repartition

A
e
0
FF FM L

?
MM ML eTF

?
?
ALL

A00 01 eL

11

12

22

23 eTF

A33
0

where
Option 1:

Option 2:

12 := 12 /22

12 := 12 /22

11 := 11 12 T12

11 := 11 22 212

(updating upper triangle)

Option 3:
11 := 11 122 212

(updating upper triangle)

(updating upper triangle)


12 := 12 /22

12 := 12

Continue with

e
0
A
FF FM L

?
MM ML eTF

?
?
ALL

A00 01 eL

11
?
?

T
22 23 eF

? A33
12

endwhile

Figure 17.14: Algorithm for computing the the UDU T factorization of a tridiagonal matrix.

434

(Hint: multiply out A = LDLT and A = UEU T with the partitioned matrices first. Then multiply out the
above. Compare and match...) How should 1 be chosen? What is the cost of computing the twisted
factorization given that you have already computed the LDLT and UDU T factorizations? A Big O
estimate is sufficient. Be sure to take into account what eTL D00 eL and eTF E22 eF equal in your cost estimate.
Answer:

L00

10 eTL

21 eF

L22

L00

D00
0
0

L00

1
0 10 eTL
1

0 D22
0
21 eF

D00 0
LT
0
0
00

0
1
0 0
1

0
0 D22
0

L22

10 eL

= 10 eTL
1
0
21 eTF

T
0
21 eF L22
L22

L00
D LT
0
0
10 D00 eL
0

00 00

= 10 eTL
1
0
0
1
21 1 eTF

T
0
21 eF L22
0
0
D22 L22


T
L00 D00 L00
10 L00 D00 eL
0
A00

T
= 10 eTL D00 L00
= 01 eTL
210 eTL D00 eL + 1
21 1 eTF


T
0
21 1 eF
221 1 eF eTF + L22 D22 L22
0

01 eL
11
21 eF

21 eTF .

A22

(17.2)

and

U00

01 eL

21 eTF

U22

U00

01 eL

E00

0
1
0

0
0 E22

E
0
0
00

21 eTF 0
1

U22
0
0

0
E UT
00 00

21 eTF 1 01 eTL

U22
0

= 0
1

0
0

U
01 eL
00

= 0
1

0
0

U E U T + 201 1 eL eTL
00 00 00

=
01 1 eTL

U00

0
1

0
0

T
U00
0

0 01 eTL

E22
0

21 eTF
U22
0

21 eF

21 E22 eF

T
E22U22

1 + 201 eTF E22 eF


TE e
21U22
22 F

L00

10 eTL

0
1
0

21 eTF

U22

D00

T
U22

A00

T
= 01 eTL
1 21 eTF E22U22

T
U22 E22U22
0

Finally,

10 1 eL

01 eL

L00

0 10 eTL

E22
0

435

0
1
0

21 eTF

U22

01 eL
11
21 eF

21 eTF . (17.3)

A22

L00

D00

T
L00

10 eL

= 10 eTL 1 21 eTF 0
1
0 0
1
0

T
0
0
U22
0
0 E22
0
21 eF U22

T
L00
D00 L00
0
0
10 D00 eL
0

= 10 eTL 1 21 eTF

0
1
0

T
0
0
U22
0
21 E22 eF E22U22


T
L00 D00 L00
10 L00 D00 eL
0
A00

T
T = eT
= 10 eTL D00 L00
210 eTL D00 eL + 1 + 221 eTF E22 eF 21 eTF E22U22

01 L
T
0
21U22 E22 eF
U22 E22U22
0

01 eL

11

21 eTF

21 eF

A22

. (17.4)

The equivalence of the submatrices highlighted in yellow in (17.2) and (??) justify the submatrices highlighted in yellow in (17.4).
The equivalence of the submatrices highlighted in grey in (17.2), (??, and (17.4) tell us that
11 = 210 eTL D00 eL + 1
11 = 201 eTF E22 eF + 1
11 = 210 eTL D00 eL + 1 + 221 eTF E22 eF .
or
11 1 = 210 eTL D00 eL
11 1 = 201 eTF E22 eF
so that
11 = (11 1 ) + 1 + (11 1 ).
Solving this for 1 yields
1 = 1 + 1 11 .
Notice that, given the factorizations A = LDLT and A = UEU T the cost of computing the twisted factorization is O(1).
* BACK TO TEXT

Homework 17.6 Compute x0 , 1 , and x2 so that

L
0
0
D
0 0
00
00

T
T
10 eL 1 21 eF 0 0 0

0
0 U22
0 0 E22

L
00

10 eTL

0
|

436

T
1 21 eF

0 U22
{z

0


Hint: 1

0

x
0
0

1 = 0

x2
0
}

x
0

where x = 1 is not a zero vector. What is the cost of this computation, given that L00 and U22 have

x2
special structure?
Answer: Choose x0 , 1 , and x2 so that

L
00

10 eTL

T
1 21 eF

0 U22

x
0
0


1 = 1


x2
0

or, equivalently,

T
L00

10 eL
1
21 eF

0
x
0


0 1 = 1


T
U22
x2
0

We conclude that 1 = 1 and


T
L00
x0 = 10 eL
T
U22 x2 = 21 eF

T ) and bidiagonal
so that x0 and x2 can be computed via solves with a bidiagonal upper triangular matrix (L00
T ), respectively.
lower triangular matrix (U00
Then

L00

10 eTL

21 eTF

U22

L
00

= 10 eTL

L
00

= 10 eTL

0
1
0
0
1
0

D00

T
21 eF

U22

T
21 eF

U22
437

L00

0 0 10 eTL

0 E22
0

D00 0 0
0

0 0 0 1

0 0 E22
0

0
0


0 = 0 ,

0
0

21 eTF

0 U22

x0

x2

T x = e :
Now, consider solving L00
0
10 L

0 0

0 0 0

0 0 0

0 0 0

0
0

0
0
0

0
0
0
0
0
10

T x = e requires O(k) flops.


A moment of reflection shows that if L00 is k k, then solving L00
0
10 L
T x = e requires O(n
Similarly, since then U22 is then (n k 1) (n k 1), then solving U22
2
21 F
k 1) flops. Thus, computing x requires O(n) flops.
Thus,

b computing all eigenvalues requires O(n2 ) computation. (We have not discussed
Given tridiagonal A
this in detail...)
Then, for each eigenvalue
b I requires O(n) flops.
Computing A := A
Factoring A = LDLT requires O(n) flops.
Factoring A = UEU T requires O(n) flops.
Computing all 1 so that the smallest can be chose requires O(n) flops.
b associated with from the twisted factorization requires O(n)
Computing the eigenvector of A
computation.
Thus, computing all eigenvalues and eigenvectors of a tridiagonal matrix via this method requires
O(n2 ) computation. This is much better than computing these via the tridiagonal QR algorithm.
* BACK TO TEXT

438

Chapter 18. Notes on Computing the SVD (Answers)


Homework 18.2 If A = UV T is the SVD of A then AT A = V 2V T is the Spectral Decomposition of
AT A.
* BACK TO TEXT

Homework 18.3 Homework 18.4 Give the approximate total cost for reducing A Rnn to bidiagonal
form.
* BACK TO TEXT

439

440

Appendix

How to Download
Videos associated with these notes can be viewed in one of three ways:
* YouTube links to the video uploaded to YouTube.
* Download from UT Box links to the video uploaded to UT-Austins UT Box File Sharing
Service.
* View After Local Download links to the video downloaded to your own computer in directory
Video within the same directory in which you stored this document. You can download videos by
first linking on * Download from UT Box. Alternatively, visit the UT Box directory in which the
videos are stored and download some or all.

441

442

Appendix

LAFF Routines (FLAME@lab)


Figure B summarizes the most important routines that are part of the laff FLAME@lab (MATLAB)
library used in these materials.

443

Definition

x := x
x := x/

Vector scaling (SCAL)

Vector scaling (SCAL)

:= kxk2

Length (NORM 2)

b := L1 b, b := U 1 b

Triangular matrix

B = laff trsm( Left, Lower triangular,


No transpose, Nonunit diagonal,
alpha, U, B )

B := U T B
B := BL1
B := BU T

with MRHs (TRSM)

example:

+ C

alpha, A, B, beta, C )

Triangular solve

:= L1 B

+ C

:= AT BT

C = laff gemm( Transpose, No transpose,

:= ABT

multiplication (GEMM) C := AT B + C

Nonunit diagonal, U, x )

x = laff trmv( Upper triangular, No transpose,

example:

example:

:= U T x

Nonunit diagonal, U, b )

b = laff trsv( Upper triangular, No transpose,

General matrix-matrix C := AB + C

Matrix-matrix operations

vector multiply (TRMV) x

:= LT x,

x := Lx, x := Ux

Triangular matrix-

:= U T b

solve (TRSV)

:= LT b,

example:

A = laff ger( alpha, x, y, A )

Rank-1 update (GER)

+A

y = laff gemv( Transpose, alpha, A, x, beta, y )

:= xyT

multiplication (GEMV) y

y := Ax + y
:= AT x + y

General matrix-vector

y = laff gemv( No transpose, alpha, A, x, beta, y )

alpha = laff norm2( x )

:= xT y +

Dot product (DOTS)

Matrix-vector operations

alpha = laff dots( x, y, alpha )

alpha = laff dot( x, y )

y = laff axpy( alpha, x, y )

x = laff invscal( alpha, x )

x = laff scal( alpha, x )

y = laff copy( x, y )

Function

Dot product (DOT)

:= xT y

Scaled addition (AXPY) y := x + y

y := x

Copy (COPY)

Vector-vector operations

Operation Abbrev.

n2 /2

n2 /2

mn

mn

mn

2n

2n

3n

2n

2n

2n

memops

m2 n

m2 n

m2 + mn

m2 + mn

2mnk 2mn + mk + nk

n2

n2

2mn

2mn

2mn

2n

2n

2n

2n

flops

Approx. cost

Bibliography

[1] E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, Jack J. Dongarra, J. Du Croz, S. Hammarling, A. Greenbaum, A. McKenney, and D. Sorensen. LAPACK Users guide (third ed.). SIAM,
Philadelphia, PA, USA, 1999.
[2] E. Anderson, Z. Bai, J. Demmel, J. E. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. E.
McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users Guide. SIAM, Philadelphia, 1992.
[3] Paolo Bientinesi. Mechanical Derivation and Systematic Analysis of Correct Linear Algebra Algorithms. PhD thesis, Department of Computer Sciences, The University of Texas, 2006. Technical
Report TR-06-46. September 2006.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[4] Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Ort, and Robert A.
van de Geijn. The science of deriving dense linear algebra algorithms. ACM Trans. Math. Soft.,
31(1):126, March 2005.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[5] Paolo Bientinesi, Brian Gunter, and Robert A. van de Geijn. Families of algorithms related to the
inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw., 35(1):3:13:22, July
2008.
[6] Paolo Bientinesi, Enrique S. Quintana-Ort, and Robert A. van de Geijn. Representing linear algebra algorithms in code: The FLAME APIs. ACM Trans. Math. Soft., 31(1):2759, March 2005.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[7] Paolo Bientinesi and Robert A. van de Geijn. The science of deriving stability analyses. FLAME
Working Note #33. Technical Report AICES-2008-2, Aachen Institute for Computational Engineering Sciences, RWTH Aachen, November 2008.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[8] Paolo Bientinesi and Robert A. van de Geijn. Goal-oriented and modular stability analysis. SIAM J.
Matrix Anal. & Appl., 32(1):286308, 2011.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[9] Paolo Bientinesi and Robert A. van de Geijn. Goal-oriented and modular stability analysis. SIAM J.
Matrix Anal. Appl., 32(1):286308, March 2011.
445

Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.


We suggest you read FLAME Working Note #33 for more details.
[10] Christian Bischof and Charles Van Loan. The WY representation for products of Householder matrices. SIAM J. Sci. Stat. Comput., 8(1):s2s13, Jan. 1987.
[11] Basic linear algebra subprograms technical forum standard. International Journal of High Performance Applications and Supercomputing, 16(1), 2002.
[12] James W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997.
[13] I. S. Dhillon. A New O(n2 ) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem. PhD thesis, Computer Science Division, University of California, Berkeley, California, May
1997. Available as UC Berkeley Technical Report No. UCB//CSD-97-971.
[14] I. S. Dhillon. Reliable computation of the condition number of a tridiagonal matrix in O(n) time.
SIAM J. Matrix Anal. Appl., 19(3):776796, July 1998.
[15] I. S. Dhillon and B. N. Parlett. Multiple representations to compute orthogonal eigenvectors of
symmetric tridiagonal matrices. Lin. Alg. Appl., 387:128, August 2004.
[16] Inderjit S. Dhillon, Beresford N. Parlett, and Christof Vomel. The design and implementation of the
MRRR algorithm. ACM Trans. Math. Soft., 32(4):533560, December 2006.
[17] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. LINPACK Users Guide. SIAM,
Philadelphia, 1979.
[18] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. A set of level 3 basic linear
algebra subprograms. ACM Trans. Math. Soft., 16(1):117, March 1990.
[19] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of
FORTRAN basic linear algebra subprograms. ACM Trans. Math. Soft., 14(1):117, March 1988.
[20] Jack J. Dongarra, Iain S. Duff, Danny C. Sorensen, and Henk A. van der Vorst. Solving Linear
Systems on Vector and Shared Memory Computers. SIAM, Philadelphia, PA, 1991.
[21] Jack J. Dongarra, Sven J. Hammarling, and Danny C. Sorensen. Block reduction of matrices to
condensed forms for eigenvalue computations. Journal of Computational and Applied Mathematics,
27, 1989.
[22] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University
Press, Baltimore, 3nd edition, 1996.
[23] Kazushige Goto and Robert van de Geijn. Anatomy of high-performance matrix multiplication. ACM
Trans. Math. Soft., 34(3):12:112:25, May 2008.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[24] John A. Gunnels. A Systematic Approach to the Design and Analysis of Parallel Dense Linear Algebra Algorithms. PhD thesis, Department of Computer Sciences, The University of Texas, December
2001.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.

[25] John A. Gunnels, Fred G. Gustavson, Greg M. Henry, and Robert A. van de Geijn. FLAME: Formal
Linear Algebra Methods Environment. ACM Trans. Math. Soft., 27(4):422455, December 2001.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[26] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. Society for Industrial and
Applied Mathematics, Philadelphia, PA, USA, second edition, 2002.

[27] C. G. J. Jacobi. Uber


ein leichtes Verfahren, die in der Theorie der Sakular-storungen vorkommenden
Gleichungen numerisch aufzulosen. Crelles Journal, 30:5194, 1846.
[28] Thierry Joffrain, Tze Meng Low, Enrique S. Quintana-Ort, Robert van de Geijn, and Field G. Van
Zee. Accumulating Householder transformations, revisited. ACM Trans. Math. Softw., 32(2):169
179, June 2006.
[29] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for
Fortran usage. ACM Trans. Math. Soft., 5(3):308323, Sept. 1979.
[30] Margaret E. Myers, Pierce M. van de Geijn, and Robert A. van de Geijn. Linear Algebra: Foundations to Frontiers - Notes to LAFF With. Self published, 2014.
Download from http://www.ulaff.net.
[31] Jack Poulson, Bryan Marker, Robert A. van de Geijn, Jeff R. Hammond, and Nichols A. Romero.
Elemental: A new framework for distributed memory dense matrix computations. ACM Trans. Math.
Softw., 39(2):13:113:24, February 2013.
[32] C. Puglisi. Modification of the Householder method based on the compact WY representation. SIAM
J. Sci. Stat. Comput., 13:723726, 1992.
[33] Enrique S. Quintana, Gregorio Quintana, Xiaobai Sun, and Robert van de Geijn. A note on parallel
matrix inversion. SIAM J. Sci. Comput., 22(5):17621771, 2001.
[34] Gregorio Quintana-Ort and Robert van de Geijn. Improving the performance of reduction to Hessenberg form. ACM Trans. Math. Softw., 32(2):180194, June 2006.
[35] Robert Schreiber and Charles Van Loan. A storage-efficient WY representation for products of
Householder transformations. SIAM J. Sci. Stat. Comput., 10(1):5357, Jan. 1989.
[36] G. W. Stewart. Matrix Algorithms Volume 1: Basic Decompositions. SIAM, Philadelphia, PA, USA,
1998.
[37] G. W. Stewart. Matrix Algorithms Volume II: Eigensystems. SIAM, Philadelphia, PA, USA, 2001.
[38] Lloyd N. Trefethen and David Bau III. Numerical Linear Algebra. SIAM, 1997.
[39] Robert van de Geijn and Kazushige Goto. Encyclopedia of Parallel Computing, chapter BLAS (Basic
Linear Algebra Subprograms), pages Part 2, 157164. Springer, 2011.
[40] Robert A. van de Geijn and Enrique S. Quintana-Ort. The Science of Programming Matrix Computations. www.lulu.com/contents/contents/1911788/, 2008.
[41] Field G. Van Zee. libflame: The Complete Reference. www.lulu.com, 2012.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.

[42] Field G. Van Zee, Ernie Chan, Robert van de Geijn, Enrique S. Quintana-Ort, and Gregorio
Quintana-Ort. The libflame library for dense matrix computations. IEEE Computation in Science &
Engineering, 11(6):5662, 2009.
[43] Field G. Van Zee and Robert A. van de Geijn. BLIS: A framework for rapid instantiation of BLAS
functionality. ACM Trans. Math. Soft., 2015. To appear.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[44] Field G. Van Zee, Robert A. van de Geijn, and Gregorio Quintana-Ort. Restructuring the tridiagonal
and bidiagonal QR algorithms for performance. ACM Trans. Math. Soft., 40(3):18:118:34, April
2014.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[45] Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ort, and G. Joseph Elizondo. Families
of algorithms for reducing a matrix to condensed form. ACM Trans. Math. Soft., 39(1), 2012.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[46] H. F. Walker. Implementation of the GMRES method using Householder transformations. SIAM J.
Sci. Stat. Comput., 9(1):152163, 1988.
[47] David S. Watkins. Fundamentals of Matrix Computations, 3rd Edition. Wiley, third edition, 2010.
[48] Stephen J. Wright. A collection of problems for which Gaussian elimination with partial pivoting is
unstable. SIAM J. Sci. Comput., 14(1):231238, 1993.

Index

B30D B30D1 -norm


vector, 40
B30D B30D2 -norm
vector, 38
B30D B30D -norm
vector, 40
1-norm
vector, 40
2-norm
vector, 38
absolute value, 13, 38
axpy, 18
cost, 18, 20
blocked matrix-matrix multiplication, 32
bordered Cholesky factorization algorithm, 265
CGS, 88
Cholesky factorization
bordered algorithm, 265
other algorithm, 265
Classical Gram-Schmidt, 88
code skeleton, 106
complete pivoting
LU factorization, 228
complex conjugate, 13
complex scalar
absolute value, 13
condition number, 82
estimation, 249253
conjugate, 13
complex, 13
matrix, 13
scalar, 13
vector, 13

dot, 19
dot product, 19
dot product, 20
eigenpair
definition, 278
eigenvalue
(, 277
), 283
definition, 278
eigenvector
(, 277
), 283
definition, 278
Euclidean length
see vector 2-norm, 38
FLAME
API, 103109
notation, 104
FLAME API, 103109
FLAME notation, 21
floating point operations, 17
flop, 17
Gauss transform, 216218
Gaussian elimination, 211247
gemm
algorithm
by columns, 28
by rows, 29
Variant 1, 28
Variant 2, 29
Variant 3, 31
via rank-1 updates, 31
gemm, 26
cost, 28
449

gemv
cost, 21
gemv, 20
ger
cost, 25
ger, 23
Gram-Schmidt
Classical, 88
cost, 99
Madified, 93
Gram-Schmidt orthogonalization
implementation, 104
Gram-Schmidt QR factorization, 85100
Hermitian transpose
vector, 14
Householder QR factorization, 111
blocked, 127138
Householder transformation, 115118
Housev(), 118
HQR(), 122
infinity-norm
vector, 40
inner product, 19
inverse
Moore-Penrose generalized, 78
pseudo, 78
laff
routines, 441445
LAFF Notes, iv
laff operations, 444
linear system solve, 229
triangular, 229232
low-rank approximation, 78
lower triangular solve, 229231
LU decomposition, 215
LU factorization, 211247
complete pivoting, 228
cost, 218219
existence, 215
existence proof, 226228
partial pivoting, 219226
algorithm, 221
LU factorization:derivation, 215216
MAC, 32

Matlab, 104
matrix
condition number, 82
conjugate, 13
inversion, 247249
low-rank approximation, 78
norms(, 41
norms), 45
orthogonal, 5383
orthogormal, 64
permutation, 219
spectrum, 278
transpose, 1316
unitary, 64
matrix-matrix multiplication, 26
blocked, 32
cost, 28
element-by-element, 26
via matrix-vector multiplications, 27
via rank-1 updates, 30
via row-vector times matrix multiplications, 29
matrix-matrix operation
gemm, 26
matrix-matrix multiplication, 26
matrix-matrix operations, 444
matrix-matrix product, 26
matrix-vector multiplication, 20
algorithm, 22
by columns, 22
by rows, 22
cost, 21
via axpy, 22
via dot, 22
matrix-vector operation, 20
gemv, 20
ger, 23
matrix-vector multiplication, 20
rank-1 update, 23
matrix-vector operations, 444
matrix-vector product, 20
MGS, 93
Modified Gram-Schmidt, 93
Moore-Penrose generalized inverse, 78
multiplication
matrix-matrix, 26
blocked, 32

element-by-element, 26
via matrix-vector multiplications, 27
via rank-1 updates, 30
via row-vector times matrix multiplications,
29
matrix-vector, 20
cost, 21
multiply-accumulate, 32

by rows, 25
cost, 25
via axpy, 25
via dot, 25
reduced Singular Value Decomposition, 73
reduced SVD, 73
Reflector, 115
reflector, 115

norms
matrix(, 41
matrix), 45
vector(, 38
vector), 41
notation
FLAME, 21

scal, 16
sca
cost, 17
scaled vector addition, 18
cost, 18, 20
scaling
vector
cost, 17
singular value, 69
Singular Value Decomposition, 5383
geometric interpretation, 69
reduced, 73
theorem, 69
solve
triangular, 229232
Spark webpage, 104
spectral radius
definition, 278
spectrum
definition, 278
SVD
reduced, 73

Octave, 104
operations
laff, 444
matrix-matrix, 444
matrix-vector, 444
orthonormal basis, 85
outer product, 23
partial pivoting
LU factorization, 219226
permutation
matrix, 219
preface, ii
product
dot, 19
inner, 19
matrix-matrix, 26
matrix-vector, 20
outer, 23
projection
onto column space, 77
pseudo inverse, 78
QR factorization, 90
Gram-Schmidt, 85100
Rank Revealing, 143

Taxi-cab norm
see vector 1-norm, 40
transpose, 1316
matrix, 14
vector, 13
triangular solve, 229
lower, 229231
upper, 232
triangular system solve, 232
upper triangular solve, 232

Rank Revealing QR factorization, 143


rank-1 update, 23
algorithm, 25
by columns, 25

vector
1-norm
see 1-norm, vector, 40

2-norm
see 2-norm, vector, 38
complex conjugate, 13
conjugate, 13
Hermitian transpose, 14
infinity-norm
see infinity-norm, vector, 40
length
see vector 2-norm, 38
norms(, 38
norms), 41
operations, 444
orthogonal, 64
orthonormal, 64
perpendicular, 64
scaling
cost, 17
transpose, 13
vector addition
scaled, 18
vector length
see vector 2-norm, 38
vector-vector operation, 16
axpy, 18
dot, 19
scal, 16
vector-vector operations, 444